ClawHavoc and the Missing Trust Layer

By Michael Browne · March 8, 2026 · 5 min read

In January 2026, security researchers discovered 1,184 malicious AI skills on a major agent marketplace. One trojan skill — disguised as a weather tool — was downloaded 7,700 times before anyone noticed.

It stole API keys. Crypto wallets. Browser credentials. All from developers who thought they were installing something harmless.

The attack was called ClawHavoc. And it worked because there's no trust layer for AI skills.

The Gap Nobody Noticed

Think about how you evaluate dependencies in the rest of the software stack:

npm package

Downloads/week visible

Stars and forks on GitHub

Last publish date

Known vulnerability alerts

License clearly displayed

AI skill / MCP server

A README you skimmed for 10 seconds

Maybe a star count

No security assessment

No quality score

No maintenance signal

When you install an npm package, you can check weekly downloads, open issues, last update, and known CVEs. When you install an AI skill into Claude Code or Cursor, you get a README and a promise.

And you're giving that skill access to your terminal, your code, your environment variables — permissions that npm packages rarely need.

The Attack Surface Is Wider Than You Think

A separate study, ToxicSkills (arXiv:2504.08623), found that 36% of skills on one major registry contained detectable prompt injection patterns. Not sophisticated zero-days — patterns that static analysis could catch.

The five attack vectors researchers identified:

Prompt injection — hidden instructions in SKILL.md or README that override the AI agent's behavior
Shell execution — piped commands (curl | sh) or eval() with network-fetched content
Credential theft — reading SSH keys, AWS credentials, browser storage
Network exfiltration — sending stolen data to Discord webhooks, Telegram bots, or raw IP addresses
Obfuscated payloads — base64-encoded strings, hex sequences, or String.fromCharCode chains hiding malicious intent

These aren't theoretical. They're the exact patterns used in ClawHavoc. And they're detectable — if anyone is looking.

What Exists Today

The ecosystem has registries (Anthropic's MCP Registry, Smithery, mcp.so) — they tell you what exists. Some have malware scanning — binary safe/unsafe checks.

But nobody built the trust layer. Nobody answers the multi-dimensional question: "Is this skill actively maintained? Is the author credible? Does it follow the spec? Is the README actually helpful? Are there hidden prompt injection patterns?"

That gap is why ClawHavoc worked. Developers couldn't tell good skills from bad ones because there was no scoring system that combined security, maintenance, credibility, and usability into a single assessment.

What We Built

mcpskills.io scores AI skills across 12 signals, grouped into 4 dimensions:

Alive — Is it maintained? (commit recency, release cadence, issue response time)
Legit — Who made it? (author credibility, community adoption)
Solid — Is it safe? (OpenSSF Scorecard data, dependency health, 5 safety scans)
Usable — Can I work with it? (README quality, spec compliance, license clarity)

It assigns trust tiers — Verified, Established, New — so you know at a glance. For AI skills and MCP servers, it activates Skills Mode: security weight increases to 34% and 5 safety scans run against the exact attack patterns from ClawHavoc and ToxicSkills.

The data comes from the GitHub API and OpenSSF Scorecard. No manual judgment. The same algorithm scores every repo identically.

The MCP Registry was designed for downstream curators to add quality signals. This is that quality signal.

What Comes Next

We're building this in public. Next steps:

An MCP server that provides trust scores directly inside Claude Code and Cursor — so you never have to leave your IDE to check
Curated skill packages — pre-vetted stacks organized by use case (full-stack dev, data research, DevOps)
A public API for CI/CD integration — block untrusted skills before they reach production

If you care about AI agent security, I'd love to hear what you think. What signals matter most to you when evaluating a skill?

Latest Research

CLAWHUB · 200 SKILLS

State of ClawHub Trust →

0% declared their security posture

MCP REGISTRY · 202 SERVERS

State of MCP Security →

83% carry a disqualifier flag

Score your AI skills

Paste any GitHub repo URL and get a trust score in seconds.

Scan Now — Free