State of AI Skill Security — March 2026

The goal wasn't to catch malware. These are well-known repos with strong communities. The goal was to answer a harder question: how do you tell the difference between "popular" and "trustworthy"?

Stars and downloads don't measure security posture, author credibility, or whether the maintainer will respond to your bug report. Trust is multidimensional. So we built a multidimensional score.

The Results

Repo	Score	Tier	Mode
anthropics/anthropic-sdk-typescript	8.6	Verified	Standard
supabase/supabase-js	8.5	Verified	Standard
prisma/prisma	8.5	Verified	Standard
stripe/stripe-node	8.4	Verified	Standard
drizzle-team/drizzle-orm	8.3	Verified	Standard
langchain-ai/langchainjs	8.2	Verified	Standard
nextauthjs/next-auth	7.4	Verified	Standard
vercel/ai	7.3	Verified	Skills Mode
modelcontextprotocol/typescript-sdk	7.2	Verified	Skills Mode
modelcontextprotocol/servers	7.0	Established	Skills Mode

anthropics/anthropic-sdk-ts8.6

VerifiedStandard

supabase/supabase-js8.5

VerifiedStandard

prisma/prisma8.5

VerifiedStandard

stripe/stripe-node8.4

VerifiedStandard

drizzle-team/drizzle-orm8.3

VerifiedStandard

langchain-ai/langchainjs8.2

VerifiedStandard

nextauthjs/next-auth7.4

VerifiedStandard

vercel/ai7.3

VerifiedSkills Mode

modelcontextprotocol/ts-sdk7.2

VerifiedSkills Mode

modelcontextprotocol/servers7.0

EstablishedSkills Mode

9 of 10 repos earned Verified status (score 7.0+ with strong dimensions across all 4 areas). Only MCP Servers landed in Established — a solid tool pulled down by specific weakness areas in the Skills Mode evaluation.

Key Findings

Stars don't predict trust

The MCP Servers repo has 80,000+ stars — more than any other repo scanned — but scored 7.0 and was the only repo that didn't reach Verified. High star count masked low spec compliance in the Usable dimension. Conversely, Anthropic's SDK has far fewer stars but scored 8.6 because every dimension was strong.

Skills Mode matters

Three repos triggered Skills Mode (Vercel AI SDK, MCP TypeScript SDK, MCP Servers). This switches from 12 to 15 signals, prioritizes security (37% weight across Solid), and adds tool safety and supply chain safety scans. All three passed clean — no prompt injection, no credential access, no obfuscated payloads. But the weight shift changed their scores vs. what Standard Mode would have produced.

Usability is the weak spot

Even among top repos, the Usable dimension (README quality, spec compliance, license clarity) was consistently the lowest scorer. The MCP TypeScript SDK scored 3.8/10 on Usability despite being the official SDK. Developer experience documentation is an afterthought even for the best-maintained projects.

Org-backed repos score higher on Legit

Every repo backed by a verified organization (Anthropic, Vercel, Stripe, Supabase) scored 8.0+ on the Legit dimension. Solo-developer repos face a credibility disadvantage that only community adoption can overcome.

Auth.js shows the maintenance risk

Auth.js (NextAuth) is widely used across thousands of Next.js projects but scored 7.4 — just barely clearing the Verified threshold. The Alive dimension flags slowing commit velocity. A tool can be popular and still be trending toward abandonment. This is the signal most developers miss.

What the 4 Dimensions Measure

Every score breaks down into 4 dimensions, each answering a different trust question:

Alive

Is anyone home?

Commit recency, release cadence, issue response time. A repo that hasn't shipped in 6 months scores low here regardless of star count.

Legit

Who made this?

Author credibility, org verification, community adoption with star/fork gaming detection.

Solid

Is it secure?

OpenSSF Scorecard data, dependency health, and for AI skills: 5 safety scans for prompt injection, shell execution, and exfiltration.

Usable

Can I work with this?

README quality, code examples, spec compliance, license clarity. Great code with terrible docs is still a bad dependency.

Skills Mode: Enhanced scanning for AI skills

For AI skills and MCP servers, the algorithm switches to Skills Mode — expanding from 12 to 15 signals. Two new signals activate (tool_safety at 14% weight and supply_chain_safety at 5%), and the Solid dimension grows to 37% total weight. When a skill has access to your terminal, security weight should be higher than when you're importing a utility library.

Why This Matters

In January 2026, security researchers discovered 1,184 malicious AI skills on a major agent marketplace (ClawHavoc). One trojan skill was downloaded 7,700 times before detection. It stole API keys, crypto wallets, and browser credentials.

A separate study, ToxicSkills, found that 36% of skills on one major registry contained detectable prompt injection patterns.

These attacks worked because there was no trust layer. No score. No signal. Just a README and a star count. Developers couldn't distinguish a safe skill from a malicious one because the ecosystem provided no framework for trust assessment.

That's the gap mcpskills.io fills. Not malware scanning (binary safe/unsafe), but trust scoring — a multi-dimensional assessment that tells you whether a tool is maintained, credible, secure, and usable before you install it.

Methodology

All scores were generated by mcpskills.io's trust algorithm on March 8, 2026. The algorithm uses data from the GitHub API and OpenSSF Scorecard — no manual input or subjective judgment. The same algorithm runs on every scan, whether you're scoring Prisma or an unknown MCP server with 3 stars.

Scores are not permanent. They reflect the state of the repo at scan time. A repo that ships a security patch or improves its README will score differently on the next scan.

Latest Research (April 2026)

CLAWHUB · 200 SKILLS

State of ClawHub Trust →

0% declared their security posture

MCP REGISTRY · 202 SERVERS

State of MCP Security →

83% carry a disqualifier flag

Score your stack

Paste any GitHub repo and get a trust score in seconds.

Scan Now — Free

Single full report (15 signals + vulnerability intel) for $2 · Pro $9/mo or $39/yr unlimited