The goal wasn't to catch malware. These are well-known repos with strong communities. The goal was to answer a harder question: how do you tell the difference between "popular" and "trustworthy"?

Stars and downloads don't measure security posture, author credibility, or whether the maintainer will respond to your bug report. Trust is multidimensional. So we built a multidimensional score.

The Results

RepoScoreTierMode
anthropics/anthropic-sdk-typescript
8.6
Verified Standard
supabase/supabase-js
8.5
Verified Standard
prisma/prisma
8.5
Verified Standard
stripe/stripe-node
8.4
Verified Standard
drizzle-team/drizzle-orm
8.3
Verified Standard
langchain-ai/langchainjs
8.2
Verified Standard
nextauthjs/next-auth
7.4
Verified Standard
vercel/ai
7.3
Verified Skills Mode
modelcontextprotocol/typescript-sdk
7.2
Verified Skills Mode
modelcontextprotocol/servers
7.0
Established Skills Mode
anthropics/anthropic-sdk-ts8.6
VerifiedStandard
supabase/supabase-js8.5
VerifiedStandard
prisma/prisma8.5
VerifiedStandard
stripe/stripe-node8.4
VerifiedStandard
drizzle-team/drizzle-orm8.3
VerifiedStandard
langchain-ai/langchainjs8.2
VerifiedStandard
nextauthjs/next-auth7.4
VerifiedStandard
vercel/ai7.3
VerifiedSkills Mode
modelcontextprotocol/ts-sdk7.2
VerifiedSkills Mode
modelcontextprotocol/servers7.0
EstablishedSkills Mode

9 of 10 repos earned Verified status (score 7.0+ with strong dimensions across all 4 areas). Only MCP Servers landed in Established — a solid tool pulled down by specific weakness areas in the Skills Mode evaluation.

Key Findings

1
Stars don't predict trust
The MCP Servers repo has 80,000+ stars — more than any other repo scanned — but scored 7.0 and was the only repo that didn't reach Verified. High star count masked low spec compliance in the Usable dimension. Conversely, Anthropic's SDK has far fewer stars but scored 8.6 because every dimension was strong.
2
Skills Mode matters
Three repos triggered Skills Mode (Vercel AI SDK, MCP TypeScript SDK, MCP Servers). This switches from 12 to 14 signals, prioritizes security (37% weight across Solid), and adds tool safety and supply chain safety scans. All three passed clean — no prompt injection, no credential access, no obfuscated payloads. But the weight shift changed their scores vs. what Standard Mode would have produced.
3
Usability is the weak spot
Even among top repos, the Usable dimension (README quality, spec compliance, license clarity) was consistently the lowest scorer. The MCP TypeScript SDK scored 3.8/10 on Usability despite being the official SDK. Developer experience documentation is an afterthought even for the best-maintained projects.
4
Org-backed repos score higher on Legit
Every repo backed by a verified organization (Anthropic, Vercel, Stripe, Supabase) scored 8.0+ on the Legit dimension. Solo-developer repos face a credibility disadvantage that only community adoption can overcome.
5
Auth.js shows the maintenance risk
Auth.js (NextAuth) is widely used across thousands of Next.js projects but scored 7.4 — just barely clearing the Verified threshold. The Alive dimension flags slowing commit velocity. A tool can be popular and still be trending toward abandonment. This is the signal most developers miss.

What the 4 Dimensions Measure

Every score breaks down into 4 dimensions, each answering a different trust question:

A
Alive
Is anyone home?

Commit recency, release cadence, issue response time. A repo that hasn't shipped in 6 months scores low here regardless of star count.

L
Legit
Who made this?

Author credibility, org verification, community adoption with star/fork gaming detection.

S
Solid
Is it secure?

OpenSSF Scorecard data, dependency health, and for AI skills: 5 safety scans for prompt injection, shell execution, and exfiltration.

U
Usable
Can I work with this?

README quality, code examples, spec compliance, license clarity. Great code with terrible docs is still a bad dependency.

Skills Mode: Enhanced scanning for AI skills

For AI skills and MCP servers, the algorithm switches to Skills Mode — expanding from 12 to 14 signals. Two new signals activate (tool_safety at 14% weight and supply_chain_safety at 5%), and the Solid dimension grows to 37% total weight. When a skill has access to your terminal, security weight should be higher than when you're importing a utility library.

Why This Matters

In January 2026, security researchers discovered 1,184 malicious AI skills on a major agent marketplace (ClawHavoc). One trojan skill was downloaded 7,700 times before detection. It stole API keys, crypto wallets, and browser credentials.

A separate study, ToxicSkills, found that 36% of skills on one major registry contained detectable prompt injection patterns.

These attacks worked because there was no trust layer. No score. No signal. Just a README and a star count. Developers couldn't distinguish a safe skill from a malicious one because the ecosystem provided no framework for trust assessment.

That's the gap mcpskills.io fills. Not malware scanning (binary safe/unsafe), but trust scoring — a multi-dimensional assessment that tells you whether a tool is maintained, credible, secure, and usable before you install it.

Methodology

All scores were generated by mcpskills.io's trust algorithm on March 8, 2026. The algorithm uses data from the GitHub API and OpenSSF Scorecard — no manual input or subjective judgment. The same algorithm runs on every scan, whether you're scoring Prisma or an unknown MCP server with 3 stars.

Scores are not permanent. They reflect the state of the repo at scan time. A repo that ships a security patch or improves its README will score differently on the next scan.

Score your stack

Paste any GitHub repo and get a trust score in seconds.

Scan Now — Free
Full reports with all 14 signals for $9 · 10-pack for $29