ClawHavoc and the Missing Trust Layer

By Michael Browne · March 8, 2026 · 5 min read

In January 2026, security researchers discovered 1,184 malicious AI skills on a major agent marketplace. One trojan skill — disguised as a weather tool — was downloaded 7,700 times before anyone noticed.

It stole API keys. Crypto wallets. Browser credentials. All from developers who thought they were installing something harmless.

The attack was called ClawHavoc. And it worked because there's no trust layer for AI skills.

The Gap Nobody Noticed

Think about how you evaluate dependencies in the rest of the software stack:

npm package

Downloads/week visible

Stars and forks on GitHub

Last publish date

Known vulnerability alerts

License clearly displayed

AI skill / MCP server

A README you skimmed for 10 seconds

Maybe a star count

No security assessment

No quality score

No maintenance signal

When you install an npm package, you can check weekly downloads, open issues, last update, and known CVEs. When you install an AI skill into Claude Code or Cursor, you get a README and a promise.

And you're giving that skill access to your terminal, your code, your environment variables — permissions that npm packages rarely need.

The Attack Surface Is Wider Than You Think

A separate study, ToxicSkills (arXiv:2504.08623), found that 36% of skills on one major registry contained detectable prompt injection patterns. Not sophisticated zero-days — patterns that static analysis could catch.

The five attack vectors researchers identified:

These aren't theoretical. They're the exact patterns used in ClawHavoc. And they're detectable — if anyone is looking.

What Exists Today

The ecosystem has registries (Anthropic's MCP Registry, Smithery, mcp.so) — they tell you what exists. Some have malware scanning — binary safe/unsafe checks.

But nobody built the trust layer. Nobody answers the multi-dimensional question: "Is this skill actively maintained? Is the author credible? Does it follow the spec? Is the README actually helpful? Are there hidden prompt injection patterns?"

That gap is why ClawHavoc worked. Developers couldn't tell good skills from bad ones because there was no scoring system that combined security, maintenance, credibility, and usability into a single assessment.

What We Built

mcpskills.io scores AI skills across 12 signals, grouped into 4 dimensions:

It assigns trust tiers — Verified, Established, New — so you know at a glance. For AI skills and MCP servers, it activates Skills Mode: security weight increases to 34% and 5 safety scans run against the exact attack patterns from ClawHavoc and ToxicSkills.

The data comes from the GitHub API and OpenSSF Scorecard. No manual judgment. The same algorithm scores every repo identically.

The MCP Registry was designed for downstream curators to add quality signals. This is that quality signal.

What Comes Next

We're building this in public. Next steps:

If you care about AI agent security, I'd love to hear what you think. What signals matter most to you when evaluating a skill?

Score your AI skills

Paste any GitHub repo URL and get a trust score in seconds.

Scan Now — Free