The Trust Middle — State of MCP Server Security, June 2026

Two months ago I scored 202 servers from the official MCP Registry and found that 83% carried a disqualifier flag. That was a sample. This is the population: 2,233 MCP servers, AI skills, and supporting packages, scored through the same production engine, pulled from four registries at once — the MCP Registry, ClawHub, npm, and GitHub. Every score is in the public dataset at /data/latest.json (CC BY 4.0), so you can check my arithmetic.

The headline isn't a scandal. There's no new ClawHavoc in this data. The headline is a distribution: the MCP ecosystem has no top end. Out of 2,233 projects, not one scores above 9 out of 10 — and the best-resourced official SDKs on earth top out at 8.97. Below them, two-thirds of everything published clusters in a flat, undifferentiated 5-to-7 band. I've started calling it the trust middle, and once you see it you can't unsee it.

TL;DR

2,233Scored

10.2%Verified

0%Score above 9

8.2%MCP-native verified

Mean composite: 5.78/10. Median: 5.71. Tiers: 227 Verified (10.2%), 1,689 Established (75.6%), 253 New (11.3%), 64 Blocked (2.9%). The dataset skews GitHub-backed (96.1%) with a thin npm-only slice (3.9%). The single most important cut in the whole report is the one between general developer tools that happen to speak MCP and purpose-built MCP servers and skills — they behave like two different ecosystems.

The Ceiling Nobody Passes

Start with the distribution, because it's the finding everything else hangs off. Here's where all 2,233 composite scores land, bucketed into one-point bands:

0–3
0%

494

3–5
22.1%

1,489

5–7
66.7%

250

7–9
11.2%

9–10
0%

Two empty edges and a giant hump. Nothing scores below 3, because the genuinely dangerous projects don't get a low grade — they get hard-gated to Blocked and removed from the curve entirely. And nothing scores above 9, which is the part that surprised me. I expected the top of the MCP world to have a few near-perfect entries. It doesn't.

Look at who sits at the actual ceiling:

Project	Score	Tier	Stars
openai/openai-node	8.97	Verified	10,852
auth0/nextjs-auth0	8.81	Verified	2,296
prisma/prisma	8.75	Verified	45,849
anthropics/anthropic-sdk-typescript	8.53	Verified	1,817
supabase/supabase	8.48	Verified	99,959
stripe/stripe-node	8.47	Verified	4,384
ollama/ollama	8.22	Verified	166,624
langchain-ai/langchainjs	8.19	Verified	17,411

These are the most mature, best-funded, most-audited projects that touch this ecosystem — OpenAI's official SDK, Stripe, Prisma, Supabase, Ollama at 166,000 stars. They are doing everything right, and they land in the low-to-mid 8s. The remaining gap to 9+ isn't laziness; it's the signals that even great open-source projects rarely max out at once: published security policies, signed releases, OpenSSF Scorecard adoption, branch protection, and deep contributor diversity and fast issue response and a clean dependency tree, all simultaneously. The 9–10 band isn't unreachable. It's just empty today, which tells you how much headroom the whole ecosystem still has.

The Fat Middle

1,489 of 2,233 projects — 66.7% — score between 5.0 and 7.0. This is the band where you cannot tell quality from a glance, a star count, or a registry badge. A 5.4 and a 6.8 look identical on a marketplace listing. Both install with one command. Neither is broken, neither is excellent, and the difference between them is exactly the thing a human shopper never checks: license clarity, bus factor, whether the CI workflow leaks secrets, whether a tool quietly shells out.

The middle is the actual problem the ecosystem has. Outright malicious skills are rare and, increasingly, caught. What's everywhere is the merely-mediocre — projects good enough to publish, popular enough to install, and unproven enough that you're taking on risk you can't see. The trust middle is where install decisions go to be guessed.

Two Ecosystems Wearing One Name

Purpose-built MCP servers and AI skills are markedly less trustworthy than the general developer tools that merely speak MCP. The mcpskills engine runs in two modes: Standard Mode for general repositories, and Skills Mode, which auto-activates for purpose-built MCP servers and AI skills (detected via server.json, SKILL.md, and MCP keywords) and turns on the safety scanner plus skill-spec compliance. Split the 2,233 projects by mode and the averages tell two completely different stories:

General SDKs & tools (n=310)

24.8%

reach Verified · 0% blocked

Purpose-built MCP skills (n=1,840)

8.2%

reach Verified · 3.5% blocked

A general developer tool in this dataset is three times more likely to earn Verified than a purpose-built MCP server — and the MCP-native projects are the only ones tripping the Blocked tier at all (64 of them, 3.5%). The stuff written specifically to be plugged into your agent is, on average, the least trustworthy stuff in the catalog.

That's not a slur on MCP authors; it's the maturity curve. The Stripes and Prismas have a decade of governance baked in — multiple maintainers, security policies, release signing — and they grew MCP-compatible. A purpose-built MCP server is usually someone's recent project: one author, a handful of stars, no second contributor, a LICENSE file that may or may not have been added. The ecosystem that exists because of MCP is younger and thinner than the ecosystem that merely speaks MCP, and the trust scores draw that line cleanly.

The Floor: 64 Blocked, Nothing Below 3

Sixty-four projects (2.9%) are hard-gated to Blocked. A Blocked tier doesn't mean confirmed-malicious — the safety scanner flags patterns, not intent — but it means at least one disqualifier fired: static analysis caught a dangerous pattern (shell execution on tool arguments, credential-path reads), a CI workflow checked out untrusted PR code, a critical CVE sits unpatched at the installable version, or the project is a single author with effectively no adoption. Every one of the 64 is Skills Mode. None are general SDKs.

The reassuring half of this: the floor is clean. Zero projects scored below 3.0, because the disqualifier system pulls the dangerous ones out of the distribution rather than letting them sit as "low scores." The unreassuring half: a marketplace shows you none of this. All 64 install with the same one-liner as the 8.97.

Licensing Hasn't Moved

Roughly 490 of the 2,233 scored projects (21.9%) carry no clear license — a figure essentially unchanged from the 21% measured in April's smaller sample.

That 490 is either projects with no license at all, or a NOASSERTION that GitHub couldn't resolve to a real SPDX identifier. MIT covers 54.2% of the field and Apache-2.0 another 16.7%, but the no-license fifth has held steady even as the dataset grew tenfold. It's the most fixable disqualifier in the whole system — one file — and it remains the most common one. Publishing speed is still outrunning governance.

What This Means for You

If you install MCP servers:

Stop reading the registry badge as a trust signal and stop reading star counts as a quality signal — the top of this dataset proves popularity and trust are different axes. Two-thirds of what you'll install sits in the 5–7 middle where the difference is invisible to the eye. Run the pre-install audit, or just paste the repo into the scanner, before you wire it into an agent.

If you publish MCP servers:

You're competing in the most crowded, least-differentiated band in the catalog. Three moves lift you out of it: add a LICENSE (you'll clear the 21.9% trap), land a second contributor (you'll exit the bus-factor cohort), and add a security policy plus pinned CI. None is hard; together they're most of the distance from the middle to Verified.

If you operate a registry or marketplace:

A required-LICENSE check at publish time would clear a fifth of all disqualifiers overnight. Surfacing a trust score next to the install button would do more for ecosystem hygiene than any post-install scanner — because the danger isn't the rare malicious entry, it's the 66.7% nobody can evaluate at a glance.

What This Data Doesn't Tell You

Honest limitations: This is a static, project-level snapshot. The engine scores the project around a server — repo health, author signals, security posture, dependency and vulnerability hygiene — not the runtime behavior of the server when an agent calls its tools. A project can score 8 and still leak a credential at runtime or honor a prompt-injected instruction. This report is also a single point in time: the version archive that will let me show change over months is young, so I'm deliberately not claiming any trend here — only a baseline. Treat the score as a strong prior, not a verdict.

The dataset is everything currently in the public score cache (2,233 entries as of June 9, 2026), assembled by the nightly crawler across the MCP Registry, GitHub topic and keyword search, npm, and prior ClawHub batch scans. It is not a clean random sample of a defined frame — it's the working catalog — so read the percentages as the shape of "what's discoverable and scoreable," which is also exactly the shape a developer browsing for an MCP server actually encounters.

The Baseline, and What Comes Next

This is edition one. The version archive is now watching all 2,233.

Every project in this dataset is now tracked for version-level change — new tools slipped into a server, install scripts that appear overnight, maintainer flips, same-version republishes, newly added network endpoints. This report is the baseline; future editions will track the deltas the moment they appear. That's the part no point-in-time scanner can do, and it ships weekly as The Trust Diff.

Methodology

Scoring: every project ran through the production mcpskills.io engine — the same 15-signal algorithm available at mcpskills.io, across four dimensions (Alive, Legit, Solid, Usable). Skills Mode auto-detects MCP servers and AI skills via server.json / SKILL.md / MCP keywords and adds the static safety scanner, skill-spec compliance, and tool-safety checks. The known_vulnerabilities signal queries OSV.dev (unified GHSA + npm + PyPA + Go + RustSec), CISA KEV (actively-exploited), and FIRST.org EPSS (exploit probability) at the currently-installable version.

Dataset: 2,233 entries from the public score cache, generated 2026-06-09, published at /data/latest.json under CC BY 4.0 — also available as a Hugging Face dataset. Full algorithm: /methodology.

Companion reports: State of MCP Security — April 2026 · State of ClawHub Trust — April 2026.

Data sources

Every score in this report is reproducible from public data. The trust algorithm is an opinionated combination; the inputs are not.

MCP Registry API — official catalog of Model Context Protocol servers. registry.modelcontextprotocol.io
GitHub REST API — repository metadata, contributor graph, commit cadence, releases, issue responsiveness, license detection, file tree (for SKILL.md / server.json detection and source scanning). docs.github.com/en/rest
OpenSSF Scorecard — security-posture signals (branch protection, signed releases, dependency-update tooling, dangerous workflow patterns). scorecard.dev
OSV.dev — unified vulnerability database queried at the currently-installable version. osv.dev
CISA Known Exploited Vulnerabilities (KEV) — confirmed in-the-wild exploitation; any KEV CVE hard-gates the tier to blocked. cisa.gov
FIRST.org EPSS — 30-day exploit probability, used to weight non-KEV vulnerabilities. first.org/epss
npm Registry — package metadata, weekly downloads, maintainer graph; used for partial scoring of npm-published projects without GitHub source. docs.npmjs.com

Prior research that frames this work

Trail of Bits — "ClawHavoc" (Jan 2026): 1,184 malicious AI skills on a major marketplace; registry presence is not a trust signal. blog.trailofbits.com
OX Security (Apr 2026): 9 of 11 public MCP registries published a benign malicious-PoC server with no review. ox.security
Snyk — "ToxicSkills" (Apr 2025): 36.82% of sampled skills had at least one security flaw. arxiv.org/abs/2504.03767

Score your own MCP server

Free trust report — paste any GitHub repo, npm package name, or registry URL.

Open Scanner

The Trust Middle
State of MCP Server Security, June 2026