The goal wasn't to catch malware. These are well-known repos with strong communities. The goal was to answer a harder question: how do you tell the difference between "popular" and "trustworthy"?
Stars and downloads don't measure security posture, author credibility, or whether the maintainer will respond to your bug report. Trust is multidimensional. So we built a multidimensional score.
The Results
| Repo | Score | Tier | Mode |
|---|---|---|---|
| anthropics/anthropic-sdk-typescript | 8.6 |
Verified | Standard |
| supabase/supabase-js | 8.5 |
Verified | Standard |
| prisma/prisma | 8.5 |
Verified | Standard |
| stripe/stripe-node | 8.4 |
Verified | Standard |
| drizzle-team/drizzle-orm | 8.3 |
Verified | Standard |
| langchain-ai/langchainjs | 8.2 |
Verified | Standard |
| nextauthjs/next-auth | 7.4 |
Verified | Standard |
| vercel/ai | 7.3 |
Verified | Skills Mode |
| modelcontextprotocol/typescript-sdk | 7.2 |
Verified | Skills Mode |
| modelcontextprotocol/servers | 7.0 |
Established | Skills Mode |
9 of 10 repos earned Verified status (score 7.0+ with strong dimensions across all 4 areas). Only MCP Servers landed in Established — a solid tool pulled down by specific weakness areas in the Skills Mode evaluation.
Key Findings
What the 4 Dimensions Measure
Every score breaks down into 4 dimensions, each answering a different trust question:
Commit recency, release cadence, issue response time. A repo that hasn't shipped in 6 months scores low here regardless of star count.
Author credibility, org verification, community adoption with star/fork gaming detection.
OpenSSF Scorecard data, dependency health, and for AI skills: 5 safety scans for prompt injection, shell execution, and exfiltration.
README quality, code examples, spec compliance, license clarity. Great code with terrible docs is still a bad dependency.
Skills Mode: Enhanced scanning for AI skills
For AI skills and MCP servers, the algorithm switches to Skills Mode — expanding from 12 to 14 signals. Two new signals activate (tool_safety at 14% weight and supply_chain_safety at 5%), and the Solid dimension grows to 37% total weight. When a skill has access to your terminal, security weight should be higher than when you're importing a utility library.
Why This Matters
In January 2026, security researchers discovered 1,184 malicious AI skills on a major agent marketplace (ClawHavoc). One trojan skill was downloaded 7,700 times before detection. It stole API keys, crypto wallets, and browser credentials.
A separate study, ToxicSkills, found that 36% of skills on one major registry contained detectable prompt injection patterns.
These attacks worked because there was no trust layer. No score. No signal. Just a README and a star count. Developers couldn't distinguish a safe skill from a malicious one because the ecosystem provided no framework for trust assessment.
That's the gap mcpskills.io fills. Not malware scanning (binary safe/unsafe), but trust scoring — a multi-dimensional assessment that tells you whether a tool is maintained, credible, secure, and usable before you install it.
Methodology
All scores were generated by mcpskills.io's trust algorithm on March 8, 2026. The algorithm uses data from the GitHub API and OpenSSF Scorecard — no manual input or subjective judgment. The same algorithm runs on every scan, whether you're scoring Prisma or an unknown MCP server with 3 stars.
Scores are not permanent. They reflect the state of the repo at scan time. A repo that ships a security patch or improves its README will score differently on the next scan.
Score your stack
Paste any GitHub repo and get a trust score in seconds.
Scan Now — Free