ClawHavoc and the Missing Trust Layer
In January 2026, security researchers discovered 1,184 malicious AI skills on a major agent marketplace. One trojan skill — disguised as a weather tool — was downloaded 7,700 times before anyone noticed.
It stole API keys. Crypto wallets. Browser credentials. All from developers who thought they were installing something harmless.
The attack was called ClawHavoc. And it worked because there's no trust layer for AI skills.
The Gap Nobody Noticed
Think about how you evaluate dependencies in the rest of the software stack:
npm package
Downloads/week visible
Stars and forks on GitHub
Last publish date
Known vulnerability alerts
License clearly displayed
AI skill / MCP server
A README you skimmed for 10 seconds
Maybe a star count
No security assessment
No quality score
No maintenance signal
When you install an npm package, you can check weekly downloads, open issues, last update, and known CVEs. When you install an AI skill into Claude Code or Cursor, you get a README and a promise.
And you're giving that skill access to your terminal, your code, your environment variables — permissions that npm packages rarely need.
The Attack Surface Is Wider Than You Think
A separate study, ToxicSkills (arXiv:2504.08623), found that 36% of skills on one major registry contained detectable prompt injection patterns. Not sophisticated zero-days — patterns that static analysis could catch.
The five attack vectors researchers identified:
- Prompt injection — hidden instructions in SKILL.md or README that override the AI agent's behavior
- Shell execution — piped commands (curl | sh) or eval() with network-fetched content
- Credential theft — reading SSH keys, AWS credentials, browser storage
- Network exfiltration — sending stolen data to Discord webhooks, Telegram bots, or raw IP addresses
- Obfuscated payloads — base64-encoded strings, hex sequences, or String.fromCharCode chains hiding malicious intent
These aren't theoretical. They're the exact patterns used in ClawHavoc. And they're detectable — if anyone is looking.
What Exists Today
The ecosystem has registries (Anthropic's MCP Registry, Smithery, mcp.so) — they tell you what exists. Some have malware scanning — binary safe/unsafe checks.
But nobody built the trust layer. Nobody answers the multi-dimensional question: "Is this skill actively maintained? Is the author credible? Does it follow the spec? Is the README actually helpful? Are there hidden prompt injection patterns?"
That gap is why ClawHavoc worked. Developers couldn't tell good skills from bad ones because there was no scoring system that combined security, maintenance, credibility, and usability into a single assessment.
What We Built
mcpskills.io scores AI skills across 12 signals, grouped into 4 dimensions:
- Alive — Is it maintained? (commit recency, release cadence, issue response time)
- Legit — Who made it? (author credibility, community adoption)
- Solid — Is it safe? (OpenSSF Scorecard data, dependency health, 5 safety scans)
- Usable — Can I work with it? (README quality, spec compliance, license clarity)
It assigns trust tiers — Verified, Established, New — so you know at a glance. For AI skills and MCP servers, it activates Skills Mode: security weight increases to 34% and 5 safety scans run against the exact attack patterns from ClawHavoc and ToxicSkills.
The data comes from the GitHub API and OpenSSF Scorecard. No manual judgment. The same algorithm scores every repo identically.
The MCP Registry was designed for downstream curators to add quality signals. This is that quality signal.
What Comes Next
We're building this in public. Next steps:
- An MCP server that provides trust scores directly inside Claude Code and Cursor — so you never have to leave your IDE to check
- Curated skill packages — pre-vetted stacks organized by use case (full-stack dev, data research, DevOps)
- A public API for CI/CD integration — block untrusted skills before they reach production
If you care about AI agent security, I'd love to hear what you think. What signals matter most to you when evaluating a skill?