Trust Scoring Methodology

How we evaluate AI skills, MCP servers, and npm packages. Every score is computed from public data — no manual curation, no pay-to-play, no gaming the algorithm.

Philosophy

Stars don't mean safe. Downloads don't mean maintained. Brand recognition doesn't mean the code is audited. We built MCP Skills because the trust signals that developers actually need are scattered across half a dozen platforms and none of them talk to each other.

Our scoring engine pulls signals from multiple public data sources, combines them into a single composite score, and assigns a trust tier. Every repo gets the same treatment — we don't inflate scores for popular tools or penalize newcomers arbitrarily.

We deliberately do not publish the exact weights or formulas behind each signal. Transparent methodology doesn't require giving adversaries a blueprint for gaming it. What we do publish: every signal name, every dimension, every tier threshold, every disqualifier, and every safety scan pattern. You can see exactly what we evaluate and why — just not the precise math.

Where mcpskills sits in the stack

MCP Skills is the trust layer for AI skills and MCP servers — it scores publishers, source code, dependencies, vulnerabilities, and supply-chain risk so you can decide what to install before running anything locally. The decision happens upstream of configuration. Once you've decided to install something, runtime defenses (sandboxes, manifest scanners, agent-firewall proxies) take over.

If you also use a configured-server scanner like mcp-scan, the two complement each other. Pick whichever scanner you prefer for runtime defense; mcpskills runs upstream of that — it tells you which servers are worth installing in the first place. mcp-scan inspects what you've already configured (tool descriptions, manifest hashes, runtime requests through its proxy mode); we score the package itself before configuration happens.

This is why the trust score is a strong prior, not a verdict. It's the input to your install decision. Sandboxing, manifest hashing, and version pinning remain separate controls — covered in our MCP Pre-Install Audit walkthrough.

Data sources

Scores are computed from publicly available data. We never access private repositories, paid APIs, or non-public metadata.

Two scoring modes

Standard Mode (12 signals)

Applied to general-purpose repositories — libraries, frameworks, utilities. Evaluates maintenance activity, author credibility, security hygiene, and documentation quality.

Skills Mode (15 signals)

Auto-detected when the repository contains indicators of an AI skill, MCP server, or agent tool. Skills Mode adds two signals focused on tool-specific safety and spec compliance, and shifts the scoring weight toward the Solid dimension because AI tools get access to sensitive resources — your terminal, environment variables, and network.

Detection is confidence-based: the engine looks for multiple independent indicators (manifest files, naming patterns, metadata keywords) and only activates Skills Mode when the confidence threshold is met. No single indicator triggers it alone.

4 Dimensions

Every signal maps to one of four dimensions. Each dimension answers a specific question a developer should ask before installing anything.

Alive
Is anyone home?

Measures whether the project is actively maintained. A repo that hasn't been touched in 6 months, never cuts releases, or ignores open issues is a liability even if the code was good once.

Commit recency Release cadence Issue responsiveness
Legit
Can I trust who made this?

Evaluates the people behind the project. A single anonymous author with zero community adoption is higher-risk than an org-backed project with diverse contributors and real download volume.

Author credibility Community adoption Contributor diversity Download adoption
Solid
Is it secure?

The heaviest dimension in Skills Mode. Checks security hygiene, dependency risk, CI/CD supply chain integrity, and — for AI skills — runs 5 pattern-based safety scans against actual source code.

Security posture Dependency health Tool safety * Supply chain safety *

* Skills Mode only

Usable
Can I actually work with this?

Scores the developer experience. A brilliantly-engineered library with no README, no license, and no spec compliance is still unusable in practice.

README quality Spec compliance * License clarity

* Skills Mode only — checks for SKILL.md, server manifest, OpenClaw frontmatter

Safety scanning

When Skills Mode activates, the engine reads up to 20 source files from the repository and scans for 5 threat patterns derived from real-world attacks — including ClawHavoc (1,184 malicious skills on a major marketplace) and the ToxicSkills academic research.

Prompt injection
Hidden instructions that override the user's intent or manipulate tool-calling behavior
Shell execution
Unguarded exec/spawn calls that could run arbitrary commands on the host
Credential access
Patterns that read API keys, tokens, or secrets from the environment
Network exfiltration
Outbound requests that could leak sensitive data to external servers
Obfuscated payloads
Base64-encoded strings, eval() calls, or minified code blocks that hide intent

Safety scanning is static analysis — it checks source code patterns, not runtime behavior. A clean scan does not guarantee runtime safety in all contexts. We're transparent about this limitation in our safety guide.

Trust tiers

Every scored repo receives one of four trust tiers based on its composite score, dimension strength, signal coverage, and the presence of disqualifiers.

Verified
Strong scores across all dimensions. Sufficient signal coverage. No disqualifiers. Safe to build on with confidence.
Established
Solid overall but may have weaker areas. Check the dimension breakdown before depending on it.
New
Below threshold. Could be a new project with limited history, or a repo with real weaknesses. Proceed with awareness.
Blocked
One or more disqualifying conditions detected. We strongly recommend against installing repos in this tier until the underlying issue is resolved.
Disqualifying conditions
  • Archived — the repository is marked as archived by its owner
  • Critical CVE — known critical vulnerability in the dependency tree
  • No license — no license file detected (legal risk for adopters)
  • Dangerous workflow — CI/CD workflow with known exploitable patterns
  • Safety block — tool safety score critically low (Skills Mode only)
  • Supply chain risk — token exfiltration or PR target checkout in CI pipelines
  • Single author + low adoption — one contributor with minimal community validation

Verified program

Repos that clear the trust bar — strong composite score, solid security dimension, sufficient signal coverage, and no disqualifiers — are automatically awarded Verified status. Verified is algorithmic, not pay-to-play, and the same criteria apply to every repo.

Verified repos receive a gold badge that can be embedded in READMEs. Verified status is re-evaluated nightly and automatically revoked if the repo drops below requirements. There is no permanent Verified status — trust is earned continuously.

View the current Verified repos at /verified.

Partial scoring

Some packages don't have a public GitHub repository — they exist only on npm or another registry. For these, MCP Skills generates a partial score using 7 signals from registry metadata: publish recency, publish cadence, download volume, maintainer count, package age, dependency count, and license clarity.

Partial scores are clearly labeled and capped at Established tier. They can never reach Verified because without source code access, safety scanning and supply chain analysis can't run. If you're evaluating a package with a partial score, finding the source repository is the single most valuable step you can take.

What we don't do (yet)

Transparency means being honest about limitations. Here's what MCP Skills does not currently evaluate:

See our roadmap for planned improvements.

Frequently asked questions

What is a trust score?

A trust score is a composite 0–10 number computed across 15 signals grouped into 4 dimensions (Alive, Legit, Solid, Usable). It quantifies how trustworthy a GitHub repo, npm package, MCP server, or OpenClaw skill is — combining maintenance health, author credibility, security posture, and usability into a single comparable number. Scores are deterministic given the same repository state at the same point in time.

What's the difference between Verified, Established, New, and Blocked?

Verified means composite ≥ 7.0 with the Solid dimension ≥ 5.0 and at least 8 sufficient signals — safe to build on. Established means composite ≥ 5.0 — solid choice with caveats worth checking. New means composite < 5.0 — promising but unproven, use with awareness. Blocked applies whenever a hard disqualifier is present (archived repo, no license, safety pattern detected, supply-chain risk, critical/KEV-listed CVE) — do not install.

How is Skills Mode detected?

Skills Mode is a confidence-based detection. SKILL.md presence in the repo root scores confidence 3 (highest). server.json or MCP-related metadata scores 2. MCP / OpenClaw / ClawHub keywords in name, description, or topics score 1. Skills Mode activates when total confidence is ≥ 2, or two or more independent indicators are present. It enables 5 safety scans, the tool_safety signal, skill_spec_compliance scoring, and YAML frontmatter parsing for OpenClaw transparency bonuses.

What does the Solid dimension measure?

Solid measures security posture across five signals: security_posture (OpenSSF Scorecard adoption + branch protection), dependency_health (transitive dependency hygiene + count), tool_safety (static safety scanning), supply_chain_safety (CI workflow patterns), and known_vulnerabilities (OSV.dev + CISA KEV + FIRST.org EPSS lookup). It is the most heavily weighted dimension in Skills Mode.

What disqualifies a repo from earning a high trust score?

Six disqualifiers can hard-gate a repo to the Blocked tier:

  • ARCHIVED — repo is archived on GitHub
  • NO_LICENSE — no LICENSE file or unrecognized SPDX identifier
  • SAFETY_BLOCK — static analysis flagged dangerous patterns in source files
  • SUPPLY_CHAIN_RISK — CI workflow has token exfiltration patterns or PR-target checkout
  • SINGLE_AUTHOR_LOW_ADOPTION — one contributor + low community signals
  • CRITICAL_CVE — any unpatched critical vulnerability OR any CVE on the CISA KEV catalog
How does scoring work for npm packages without a source GitHub repo?

When a package has no resolvable source repository, MCP Skills falls back to partial scoring using npm metadata only. This produces a 7-signal score (versus 15 for full scoring) covering download adoption, maintainer credibility, dependency count, license clarity, README quality, known vulnerabilities (OSV/KEV), and lifecycle script detection. Partial scores are clearly labeled limited and capped at the Established tier — they cannot earn Verified.

How often are scores refreshed?

A nightly crawler runs at 02:00 UTC discovering new repos from MCP Registry, GitHub topics, GitHub keyword search, npm search, and Smithery, then refreshes stale cache entries. On-demand scans always hit live data; cached results are at most 24 hours old. Monitored repos on paid plans get a dedicated daily re-scan at 08:00 UTC with email alerts if the composite score moves by 0.3 or more or the tier changes.

What data sources are used for vulnerability intelligence?

The known_vulnerabilities signal queries three real-time sources for the currently-installable version: OSV.dev (the unified advisory database — GHSA + npm advisories + PyPA + Go vulndb + RustSec), CISA Known Exploited Vulnerabilities (the federal authoritative list of vulnerabilities with confirmed in-the-wild exploitation — any CVE on KEV hard-gates the tier to blocked), and FIRST.org EPSS (Exploit Prediction Scoring System for 30-day exploit probability).

How do I get the Verified badge for my repo?

Verified is awarded automatically when a repo passes all four requirements during a scan: composite ≥ 7.0, Solid dimension ≥ 5.0, no disqualifiers, and at least 8 sufficient signals. There is no application form — open your repo's score page at /score/owner/repo and click Claim your Verified badge, then confirm your maintainer email to activate the gold badge. The same criteria apply to every repo. Verified status is re-checked daily and auto-revoked if the repo no longer meets requirements (with email notification). See /verify for the full explainer.

Can I trust a "Verified" repo blindly?

No. Verified means strong on every measurable signal, but the engine scores the project around a skill — repo health, author credibility, security posture, dependency hygiene — not the runtime behavior of the skill when an agent invokes its tools. Static analysis catches obvious dangerous patterns in source files; it cannot model multi-step tool-call dynamics, runtime credential leaks, or downstream agent context contamination. Treat Verified as a strong prior, not a verdict. Sandboxed runtime monitoring is on the roadmap.

Reproducibility

Scores are deterministic given the same repository state at the same point in time. Two scans of the same repo within the same hour will produce identical results (barring concurrent commits or releases).

Scores change over time as repositories evolve — commits land, issues get responded to, dependencies update, new releases ship. Our nightly crawler re-scores repos to keep the data fresh. Historical scores are stored and available through the weekly digest and trending page.

The full dataset is available under CC BY 4.0 at /data/latest.json.

Score any repo, package, or skill

Paste a GitHub URL, npm package name, Smithery listing, or OpenClaw skill. Results in 10 seconds.

Scan now