Trail of Bits Is Right About Skill Scanners

Q: What did Trail of Bits find about AI skill scanners?

Trail of Bits bypassed five AI skill scanners across three platforms — ClawHub's malicious-skill detector, Cisco's skill-scanner, and the three scanners integrated into skills.sh (Gen Agent Trust Hub, Socket, and Snyk). They used four simple techniques: newline padding to force file truncation, hiding shell scripts inside binary archives, poisoning Python bytecode, and plainly-worded prompt injection. They concluded that no single content scan can reliably detect a malicious skill.

Q: Can a security scanner detect a malicious AI skill?

A scanner can catch known dangerous patterns, but it cannot reliably catch a determined attacker. The attack surface of a skill spans code, natural language, and binary data, and the attacker always moves second. Scanning raises the floor by removing obvious threats. It is necessary, but it is not sufficient on its own to decide that a skill is safe to install.

Q: Does Trail of Bits' research apply to MCP Skills?

Trail of Bits did not test MCP Skills. But the structural critique of content scanning applies to the tool-safety check inside our score as much as to any scanner, which is exactly why tool safety is one of fifteen signals rather than the whole decision. A scan that can be bypassed should never be the only thing standing between an agent and a malicious skill.

Q: If scanners can be bypassed, how do you decide whether a skill is safe to install?

You weigh provenance and reputation that an attacker cannot fabricate in an afternoon: who published it, whether it has a real maintenance history, whether it is genuinely adopted, whether contributors are diverse, and whether it has known vulnerabilities. A cleverly obfuscated payload from a throwaway account still has no track record, no adoption, and no credible author, so it scores low regardless of how well the payload is hidden.

By Michael Browne · June 4, 2026 · 7 min read

Trail of Bits bypassed five AI skill scanners in under an hour each. Their conclusion is correct: no single content scan can reliably catch a malicious skill.

That does not break the case for vetting skills. It clarifies it. A scan is the floor — it removes the obvious threats. The trust decision rests on something an attacker cannot fabricate in an afternoon: a real maintenance history, real adoption, a credible publisher, and no known vulnerabilities. That is the layer we build.

On June 3, 2026, Trail of Bits published “The Sorry State of Skill Distribution.” In it, they bypassed five separate skill scanners across three platforms — and most of the attacks took less than an hour to build.

If you sell trust in the AI skill ecosystem, the honest move is to read that research carefully and ask whether it indicts what you do. I did. It does not refute the trust-layer model. It is the clearest argument for it I have seen this year.

What Trail of Bits actually did

These are serious tools built by serious teams. The point of the research is not that any one vendor is careless. It is that the entire category of “scan the contents and decide if it is malicious” has a structural ceiling. Trail of Bits demonstrated it with four techniques against five scanners:

Technique	Target	Why it worked
Newline padding	ClawHub	Prepending roughly 100,000 newlines before the malicious code forced the scanner to truncate the file and never reach the payload.
Binary archive hiding	skills.sh, Cisco	A malicious shell script was embedded inside a binary archive (a ZIP-backed document), so pattern matching saw a benign-looking file.
Bytecode poisoning	skills.sh, Cisco	Precompiled Python bytecode behaved differently from its readable source — the xz-utils pattern, where the artifact is not the code you reviewed.
Plain-language prompt injection	All scanners	Politely-worded text in the skill manifest convinced LLM analyzers that a malicious configuration (redirecting a package registry) was ordinary corporate setup.

The scanners bypassed were ClawHub’s malicious-skill detector (which integrates VirusTotal), Cisco’s open-source skill-scanner, and all three scanners wired into the skills.sh platform: Gen Agent Trust Hub, Socket, and Snyk. In Trail of Bits’ words, “it took us less than an hour to conceive and implement three of the four malicious skills.”

Why their conclusion is structurally true

Trail of Bits states it plainly:

“No amount of scanning or LLM analysis can reliably detect malicious content in agent skills.”

That is not pessimism. It follows from three properties of skills that no scanner can engineer away:

The attack surface is enormous. A skill is code plus natural language plus binary assets plus dependencies. A malicious payload can hide in any of them, or in the seam between them.
Maliciousness is context-dependent. “Point npm at this registry” is legitimate inside one company and an attack in another. No static rule separates them, and an LLM asked to judge can be talked out of it.
The attacker moves second. Scanners are public. An attacker can iterate against them in a tight loop until something passes. Defense that the adversary can test for free is defense on a clock.

So the right reading is not “scanning is worthless.” Scanning removes the lazy and the obvious, which is real value. The wrong reading is “a passed scan means safe.” That equation is what Trail of Bits broke.

This validates the trust-layer thesis

Read to the end of the post and Trail of Bits makes a recommendation. Stop outsourcing the judgment to a single automated tool. Prefer curated distribution, version pinning, and a human-controlled approval gate.

That recommendation is the trust layer. The whole reason MCP Skills scores a project across four dimensions — not just its file contents — is that contents are the part an attacker controls, and provenance is the part they do not.

Dimension	What it measures	Can an attacker fake it in an hour?
Alive	Commit recency, release cadence, issue responsiveness.	No. A real maintenance history takes time to accrue.
Legit	Author credibility, community adoption, contributor diversity, downloads.	No. Stars, dependents, and a credible identity are not minted on demand.
Solid	Security posture, dependency health, known vulnerabilities, tool safety, supply-chain safety.	Partly. Tool safety is a content scan, with the limits above — which is why it is one signal, not the verdict.
Usable	README quality, spec compliance, license clarity.	Easy to fake, low weight — useful as a tie-breaker, never as proof.

A perfectly obfuscated malicious skill — newline-padded, bytecode-poisoned, politely worded — still arrives as a brand-new repository from an account with no history, no adoption, and no credible author. It lands in the lowest tier. Not because we detected the payload. Because it has no trust to show.

Where this leaves our own scanner

I would rather say this directly than have a reader infer it. MCP Skills runs a content-safety check — the tool_safety signal — and Trail of Bits’ critique applies to it as much as to any scanner on their list. A pattern-based check that inspects source files will not see a payload hidden in a binary archive, and a regex does not read a politely-worded instruction as hostile.

That is a known limitation, and it is the reason tool safety was designed as one of fifteen signals carrying a fraction of the weight — never the whole decision. The product was built on the assumption Trail of Bits just proved: a scan can be bypassed, so a scan must never be the only thing standing between an agent and a malicious skill.

The honest version of trust is layered. Catch the obvious with a scan. Decide the rest with provenance, reputation, and a human-controlled gate. Anyone selling a single green checkmark as proof of safety is selling the thing Trail of Bits just disproved.

What actually holds when scanning fails

If you take one practical thing from the Trail of Bits research, make it this checklist. None of it depends on a scanner being perfect.

Weigh the publisher, not just the package

Who shipped it? A credible, established maintainer with a track record is a stronger signal than any single clean scan of the contents.

Require real adoption and maintenance

Genuine dependents, a live commit history, and responsive issues are expensive to fake. A skill that appeared yesterday with none of these deserves scrutiny no matter how clean it looks.

Check known vulnerabilities at the installed version

Query real feeds — OSV.dev, CISA KEV, EPSS — for the exact version you are about to install, not the repo in the abstract.

Pin, review, and sandbox

Pin the exact version, read the manifest and declared permissions, prefer curated and signed sources, and sandbox the first run if the tool touches files, shell, credentials, or the network.

The bigger shift

2026 keeps making the same point from different directions. NVIDIA shipped SkillSpector and paired it with skill cards and signing. Trail of Bits showed that the scanning half of that pipeline, on its own, is porous. Both arrive at the same conclusion: scanning is necessary, scanning is not sufficient, and the install decision needs more than artifact inspection.

That “more” is the trust-layer category. Who made this, is it maintained, is it adopted, does it have known vulnerabilities, did a scan find anything, does the license make sense, and has the score moved since last week. MCP Skills exists to answer those as one repeatable decision — for a developer, an agent, or a CI gate — before a skill ever runs.

Bottom line: Trail of Bits did not break trust in AI skills. They broke the idea that a single scan equals trust. The defenses that survive their attacks are provenance, reputation, and a human in the loop — the layer above the scanner.

Frequently asked questions

What did Trail of Bits find about AI skill scanners?

They bypassed five scanners across three platforms — ClawHub, Cisco’s skill-scanner, and the three scanners in skills.sh — using four simple techniques: newline padding, binary-archive hiding, Python bytecode poisoning, and plain-language prompt injection. Their conclusion is that no single content scan can reliably detect a malicious skill.

Can a security scanner detect a malicious AI skill?

It can catch known dangerous patterns, but not a determined attacker. The attack surface spans code, language, and binary data, and the attacker iterates against a public scanner until something passes. Scanning is necessary and removes obvious threats. It is not sufficient on its own.

Does Trail of Bits’ research apply to MCP Skills?

Trail of Bits did not test MCP Skills. But the critique of content scanning applies to the tool-safety check inside our score as much as to any scanner, which is exactly why tool safety is one of fifteen signals rather than the whole decision. A scan that can be bypassed should never be the only safeguard.

If scanners can be bypassed, how do you decide whether a skill is safe to install?

Weigh provenance and reputation an attacker cannot fabricate quickly: publisher credibility, real maintenance history, genuine adoption, contributor diversity, and known vulnerabilities at the installed version. An obfuscated payload from a throwaway account still has no track record to show, so it scores low regardless of how well the payload is hidden.

What is a trust layer?

A trust layer scores whether a project is worth depending on before it reaches an agent, combining maintenance health, author credibility, adoption, security posture, known vulnerabilities, documentation, and license clarity into a single pre-install decision. It complements deep artifact scanners rather than replacing them.

What should I do before installing an AI skill or MCP server?

Check a trust score to decide whether the project is worth attention. Run a deep artifact scanner for high-privilege tools. Then pin the version, review the manifest and permissions, prefer curated and signed sources, and sandbox the first run if the server needs filesystem, shell, credential, or network access.

Trail of Bits Is Right About Skill Scanners

What Trail of Bits actually did

Why their conclusion is structurally true

This validates the trust-layer thesis

Where this leaves our own scanner

What actually holds when scanning fails

Weigh the publisher, not just the package

Require real adoption and maintenance

Check known vulnerabilities at the installed version

Pin, review, and sandbox

The bigger shift

Frequently asked questions

What did Trail of Bits find about AI skill scanners?

Can a security scanner detect a malicious AI skill?

Does Trail of Bits’ research apply to MCP Skills?

If scanners can be bypassed, how do you decide whether a skill is safe to install?

What is a trust layer?

What should I do before installing an AI skill or MCP server?

Sources

Subscribe to The Trust Diff