Trust Scoring Methodology

Q: What's the difference between Verified, Established, New, and Blocked tiers?

Verified means composite >= 7.0 with the Solid dimension >= 5.0, Alive >= 5.5, Legit >= 4.5, at least 8 sufficient signals, and zero disqualifiers — safe to build on. Established means composite >= 4.5 — solid choice with caveats worth checking. New means composite < 4.5 — promising but unproven, use with awareness. Blocked applies whenever a hard disqualifier is present (archived repo, no license, safety pattern detected, supply-chain risk, critical/KEV-listed CVE) — do not install.

Q: How is Skills Mode detected?

Skills Mode is a confidence-based detection. SKILL.md presence in the repo root scores confidence 3 (highest). server.json or MCP-related metadata scores 2. MCP / OpenClaw / ClawHub keywords in name, description, or topics score 1. Skills Mode activates when total confidence is >= 2, or two or more independent indicators are present. It enables 7 safety scans, the tool_safety signal, skill_spec_compliance scoring, and YAML frontmatter parsing for OpenClaw transparency bonuses.

Q: What does the Solid dimension measure?

Solid measures security posture across five signals: security_posture (OpenSSF Scorecard adoption + branch protection), dependency_health (transitive dependency hygiene + count), tool_safety (behavior-based safety scanning of the installed/runtime artifact for prompt injection, shell exec, credential exfil, network exfil, obfuscated payloads, unsafe 0.0.0.0 binding, and risky npm lifecycle scripts — findings in infrastructure or documentation are informational only and never lower the score), supply_chain_safety (CI workflow patterns), and known_vulnerabilities (OSV.dev + CISA KEV + FIRST.org EPSS lookup). It is the most heavily weighted dimension in Skills Mode.

Q: What disqualifies a repo from earning a high trust score?

Five hard disqualifiers gate a repo to the Blocked tier: ARCHIVED (repo is archived on GitHub), NO_LICENSE (no LICENSE file or unrecognized SPDX), SAFETY_BLOCK (a Blocking-level finding in the installed/runtime artifact — source code, entry points, npm-published files, or agent-facing SKILL.md; findings in infrastructure like Dockerfiles/CI or in documentation are Note-level and never trigger this), SUPPLY_CHAIN_RISK (a CI workflow pipes secrets.GITHUB_TOKEN to an external network command — token exfiltration), and CRITICAL_CVE (any unpatched critical vulnerability OR any CVE on the CISA KEV catalog of actively-exploited vulnerabilities). Three soft gates never block but veto Verified: NO_SCORECARD (no OpenSSF Scorecard and no detectable security infrastructure), SINGLE_AUTHOR_LOW_ADOPTION (one contributor + low community signals), and CI_PR_TARGET_RISK (a pull_request_target workflow checks out untrusted PR code — a CI-secrets risk that does not affect the installed artifact).

Q: How does scoring work for npm packages without a source GitHub repo?

When a package has no resolvable source repository, MCP Skills falls back to partial scoring using npm metadata only. This produces a 7-signal score (versus 15 for full scoring) covering download adoption, maintainer credibility, dependency count, license clarity, README quality, known vulnerabilities (OSV/KEV), and lifecycle script detection. Partial scores are clearly labeled 'limited' and capped at the Established tier — they cannot earn Verified.

Q: How often are scores refreshed?

A nightly crawler runs at 02:00 UTC discovering new repos from MCP Registry, GitHub topics, GitHub keyword search, npm search, and Smithery, then refreshes stale cache entries. On-demand scans always hit live data; cached results are at most 24 hours old. Monitored repos on paid plans get a dedicated daily re-scan at 08:00 UTC with email alerts if the composite score moves by 0.3 or more or the tier changes.

Q: What data sources are used for vulnerability intelligence?

The known_vulnerabilities signal queries three real-time sources for the currently-installable version: OSV.dev (the unified advisory database that combines GHSA, npm advisories, PyPA, Go vulndb, and RustSec), CISA Known Exploited Vulnerabilities catalog (the federal authoritative list of vulnerabilities with confirmed in-the-wild exploitation — any CVE on KEV hard-gates the tier to blocked), and FIRST.org EPSS (Exploit Prediction Scoring System for 30-day exploit probability).

Q: How do I get the Verified badge for my repo?

Verified is awarded automatically when a repo passes all four requirements during a scan: composite >= 7.0, Solid dimension >= 5.0, no disqualifiers, and at least 8 sufficient signals. There is no application form — open your repo's score page at /score/owner/repo and click Claim your Verified badge, then confirm your maintainer email to activate the gold badge. The same criteria apply to every repo. Verified status is re-checked daily and auto-revoked if the repo no longer meets requirements (with email notification).

Q: Can I trust a Verified repo blindly?

No. Verified means strong on every measurable signal, but the engine scores the project around a skill — repo health, author credibility, security posture, dependency hygiene — not the runtime behavior of the skill when an agent invokes its tools. Static analysis catches obvious dangerous patterns in source files; it cannot model multi-step tool-call dynamics, runtime credential leaks, or downstream agent context contamination. Treat Verified as a strong prior, not a verdict. Sandboxed runtime monitoring is on the roadmap.

How we evaluate AI skills, MCP servers, and npm packages. Every score is computed from public data — no manual curation, no pay-to-play, no gaming the algorithm.

Philosophy

Stars don't mean safe. Downloads don't mean maintained. Brand recognition doesn't mean the code is audited. We built MCP Skills because the trust signals that developers actually need are scattered across half a dozen platforms and none of them talk to each other.

Our scoring engine pulls signals from multiple public data sources, combines them into a single composite score, and assigns a trust tier. Every repo gets the same treatment — we don't inflate scores for popular tools or penalize newcomers arbitrarily.

We deliberately do not publish the exact weights or formulas behind each signal. Transparent methodology doesn't require giving adversaries a blueprint for gaming it. What we do publish: every signal name, every dimension, every tier threshold, every disqualifier, and every safety scan pattern. You can see exactly what we evaluate and why — just not the precise math.

Where mcpskills sits in the stack

MCP Skills is the trust layer for AI skills and MCP servers — it scores publishers, source code, dependencies, vulnerabilities, and supply-chain risk so you can decide what to install before running anything locally. The decision happens upstream of configuration. Once you've decided to install something, runtime defenses (sandboxes, manifest scanners, agent-firewall proxies) take over.

If you also use a configured-server scanner like mcp-scan, the two complement each other. Pick whichever scanner you prefer for runtime defense; mcpskills runs upstream of that — it tells you which servers are worth installing in the first place. mcp-scan inspects what you've already configured (tool descriptions, manifest hashes, runtime requests through its proxy mode); we score the package itself before configuration happens.

This is why the trust score is a strong prior, not a verdict. It's the input to your install decision. Sandboxing, manifest hashing, and version pinning remain separate controls — covered in our MCP Pre-Install Audit walkthrough.

Data sources

Scores are computed from publicly available data. We never access private repositories, paid APIs, or non-public metadata.

GitHub REST API — commit history, release tags, issue tracker, contributor list, CI/CD workflows, file contents (README, LICENSE, package.json, SKILL.md)
npm registry — download counts, publish history, maintainer count, dependency list, package age
OpenSSF Scorecard signals — security posture indicators derived from repository metadata
Source code analysis — pattern matching against known attack signatures in up to 20 source files per repository

Two scoring modes

Standard Mode (12 signals)

Applied to general-purpose repositories — libraries, frameworks, utilities. Evaluates maintenance activity, author credibility, security hygiene, and documentation quality.

Skills Mode (15 signals)

Auto-detected when the repository contains indicators of an AI skill, MCP server, or agent tool. Skills Mode adds two signals focused on tool-specific safety and spec compliance, and shifts the scoring weight toward the Solid dimension because AI tools get access to sensitive resources — your terminal, environment variables, and network.

Detection is confidence-based: the engine looks for multiple independent indicators (manifest files, naming patterns, metadata keywords) and only activates Skills Mode when the confidence threshold is met. No single indicator triggers it alone.

4 Dimensions

Every signal maps to one of four dimensions. Each dimension answers a specific question a developer should ask before installing anything.

Alive

Is anyone home?

Measures whether the project is actively maintained. A repo that hasn't been touched in 6 months, never cuts releases, or ignores open issues is a liability even if the code was good once.

Commit recency Release cadence Issue responsiveness

Legit

Can I trust who made this?

Evaluates the people behind the project. A single anonymous author with zero community adoption is higher-risk than an org-backed project with diverse contributors and real download volume.

Author credibility Community adoption Contributor diversity Download adoption

Solid

Is it secure?

The heaviest dimension in Skills Mode. Checks security hygiene, dependency risk, CI/CD supply chain integrity, and — for AI skills — runs 5 pattern-based safety scans against actual source code.

Security posture Dependency health Tool safety * Supply chain safety *

* Skills Mode only

Usable

Can I actually work with this?

Scores the developer experience. A brilliantly-engineered library with no README, no license, and no spec compliance is still unusable in practice.

README quality Spec compliance * License clarity

* Skills Mode only — checks for SKILL.md, server manifest, OpenClaw frontmatter

Safety scanning

When Skills Mode activates, the engine reads up to 20 source files from the repository and scans for 7 threat patterns derived from real-world attacks — including ClawHavoc (1,184 malicious skills on a major marketplace) and the ToxicSkills academic research.

Prompt injection

Hidden instructions that override the user's intent or manipulate tool-calling behavior

Shell execution

Unguarded exec/spawn calls that could run arbitrary commands on the host

Credential access

Patterns that read API keys, tokens, or secrets from the environment

Network exfiltration

Outbound requests that could leak sensitive data to external servers

Obfuscated payloads

Base64-encoded strings, eval() calls, or minified code blocks that hide intent

Unsafe binding

HTTP servers that bind to 0.0.0.0, exposing the host to DNS-rebinding and local-network attacks

Package script risk

npm lifecycle scripts (preinstall/install/postinstall) that pipe to a shell, spawn child processes, or run opaque local scripts

Each finding is classified by where it lives and how severe it is. Files fall into three tiers: the installed/runtime artifact (source code, entry points, npm-published files, and agent-facing SKILL.md, plus any file an npm lifecycle script executes automatically on install), infrastructure (Dockerfiles, CI workflows, devcontainer configs, deploy scripts), and documentation. Only findings in the installed/runtime artifact can reach the top severity level and trigger a hard SAFETY_BLOCK.

Findings resolve to one of three levels. Blocking — dangerous behavior in the installed/runtime artifact (a read of a known secret path, an exfiltration call to a public host, curl | sh from an unrecognized source) — counts toward the block decision. Review — a real concern that lowers the tool_safety score but never blocks on its own. Note — anything found in infrastructure or documentation, surfaced for transparency with zero effect on score or tier. Checks are behavior-based: they evaluate what a pattern actually does, not whether a keyword appears.

Safety scanning is static analysis — it checks source code patterns, not runtime behavior. A clean scan does not guarantee runtime safety in all contexts. We're transparent about this limitation in our safety guide.

Trust tiers

Every scored repo receives one of four trust tiers based on its composite score, dimension strength, signal coverage, and the presence of disqualifiers.

Verified

Strong scores across all dimensions. Sufficient signal coverage. No disqualifiers. Safe to build on with confidence.

Established

Solid overall but may have weaker areas. Check the dimension breakdown before depending on it.

New

Below threshold. Could be a new project with limited history, or a repo with real weaknesses. Proceed with awareness.

Blocked

One or more hard disqualifying conditions detected. We strongly recommend against installing repos in this tier until the underlying issue is resolved.

Hard disqualifiers — any one gates the tier to Blocked

Safety block — a Blocking-level finding in the installed/runtime artifact (source code, entry points, npm-published files, or agent-facing SKILL.md); infrastructure and documentation findings are informational notes and never trigger this (Skills Mode only)
Critical CVE — unpatched critical vulnerability in the latest version, or any CVE on the CISA KEV catalog
Supply chain risk — a CI workflow pipes secrets.GITHUB_TOKEN to an external network command (token exfiltration)
Archived — the repository is marked as archived by its owner
No license — no license file detected (legal risk for adopters)

Soft gates — never Blocked, but veto Verified

No scorecard — no OpenSSF Scorecard and no detectable security infrastructure (workflows, SECURITY.md, CodeQL, CODEOWNERS)
Single author + low adoption — one contributor with minimal community validation
CI pull_request_target checkout — a workflow checks out untrusted PR code under pull_request_target. A real CI-secrets risk worth surfacing, but it doesn't make the installed package unsafe, so it vetoes Verified rather than blocking

Verified program

Repos that clear the trust bar — strong composite score, solid security dimension, sufficient signal coverage, and no disqualifiers — are automatically awarded Verified status. Verified is algorithmic, not pay-to-play, and the same criteria apply to every repo.

Verified repos receive a gold badge that can be embedded in READMEs. Verified status is re-evaluated nightly and automatically revoked if the repo drops below requirements. There is no permanent Verified status — trust is earned continuously.

View the current Verified repos at /verified.

Partial scoring

Some packages don't have a public GitHub repository — they exist only on npm or another registry. For these, MCP Skills generates a partial score using 7 signals from registry metadata: publish recency, publish cadence, download volume, maintainer count, package age, dependency count, and license clarity.

Partial scores are clearly labeled and capped at Established tier. They can never reach Verified because without source code access, safety scanning and supply chain analysis can't run. If you're evaluating a package with a partial score, finding the source repository is the single most valuable step you can take.

What we don't do (yet)

Transparency means being honest about limitations. Here's what MCP Skills does not currently evaluate:

Runtime behavior — we analyze source code patterns, not live execution. A tool that behaves differently at runtime (e.g., fetching payloads dynamically) would not be caught by static analysis.
Multi-step agent chains — context leaks across chained tool calls are a known risk that static analysis can't fully model.
Closed-source or obfuscated packages — if the code isn't accessible, we can't scan it. Partial scores make this clear.
Sandboxed execution testing — we don't run tools in a sandbox to observe actual behavior. This is on the roadmap.

See our roadmap for planned improvements.

Frequently asked questions

What is a trust score?

A trust score is a composite 0–10 number computed across 15 signals grouped into 4 dimensions (Alive, Legit, Solid, Usable). It quantifies how trustworthy a GitHub repo, npm package, MCP server, or OpenClaw skill is — combining maintenance health, author credibility, security posture, and usability into a single comparable number. Scores are deterministic given the same repository state at the same point in time.

What's the difference between Verified, Established, New, and Blocked?

Verified means composite ≥ 7.0 with the Solid dimension ≥ 5.0, Alive ≥ 5.5, Legit ≥ 4.5, at least 8 sufficient signals, and zero disqualifiers — safe to build on. Established means composite ≥ 4.5 — solid choice with caveats worth checking. New means composite < 4.5 — promising but unproven, use with awareness. Blocked applies whenever a hard disqualifier is present (archived repo, no license, safety pattern detected, supply-chain risk, critical/KEV-listed CVE) — do not install.

How is Skills Mode detected?

Skills Mode is a confidence-based detection. SKILL.md presence in the repo root scores confidence 3 (highest). server.json or MCP-related metadata scores 2. MCP / OpenClaw / ClawHub keywords in name, description, or topics score 1. Skills Mode activates when total confidence is ≥ 2, or two or more independent indicators are present. It enables 7 safety scans, the tool_safety signal, skill_spec_compliance scoring, and YAML frontmatter parsing for OpenClaw transparency bonuses.

What does the Solid dimension measure?

Solid measures security posture across five signals: security_posture (OpenSSF Scorecard adoption + branch protection), dependency_health (transitive dependency hygiene + count), tool_safety (behavior-based safety scanning of the installed/runtime artifact — infrastructure and documentation findings are informational only), supply_chain_safety (CI workflow patterns), and known_vulnerabilities (OSV.dev + CISA KEV + FIRST.org EPSS lookup). It is the most heavily weighted dimension in Skills Mode.

What disqualifies a repo from earning a high trust score?

Five hard disqualifiers gate a repo to the Blocked tier:

SAFETY_BLOCK — a Blocking-level finding in the installed/runtime artifact (source code, entry points, npm-published files, or agent-facing SKILL.md); infrastructure (Dockerfiles, CI, devcontainer, deploy) and documentation findings are Note-level and never trigger this
CRITICAL_CVE — any unpatched critical vulnerability OR any CVE on the CISA KEV catalog
SUPPLY_CHAIN_RISK — a CI workflow pipes secrets.GITHUB_TOKEN to an external network command (token exfiltration)
ARCHIVED — repo is archived on GitHub
NO_LICENSE — no LICENSE file or unrecognized SPDX identifier

Three soft gates never block, but veto Verified:

NO_SCORECARD — no OpenSSF Scorecard and no detectable security infrastructure
SINGLE_AUTHOR_LOW_ADOPTION — one contributor + low community signals
CI_PR_TARGET_RISK — pull_request_target checkout of untrusted PR code (a CI-secrets risk, not an installed-artifact risk)

How does scoring work for npm packages without a source GitHub repo?

When a package has no resolvable source repository, MCP Skills falls back to partial scoring using npm metadata only. This produces a 7-signal score (versus 15 for full scoring) covering download adoption, maintainer credibility, dependency count, license clarity, README quality, known vulnerabilities (OSV/KEV), and lifecycle script detection. Partial scores are clearly labeled limited and capped at the Established tier — they cannot earn Verified.

How often are scores refreshed?

A nightly crawler runs at 02:00 UTC discovering new repos from MCP Registry, GitHub topics, GitHub keyword search, npm search, and Smithery, then refreshes stale cache entries. On-demand scans always hit live data; cached results are at most 24 hours old. Monitored repos on paid plans get a dedicated daily re-scan at 08:00 UTC with email alerts if the composite score moves by 0.3 or more or the tier changes.

What data sources are used for vulnerability intelligence?

The known_vulnerabilities signal queries three real-time sources for the currently-installable version: OSV.dev (the unified advisory database — GHSA + npm advisories + PyPA + Go vulndb + RustSec), CISA Known Exploited Vulnerabilities (the federal authoritative list of vulnerabilities with confirmed in-the-wild exploitation — any CVE on KEV hard-gates the tier to blocked), and FIRST.org EPSS (Exploit Prediction Scoring System for 30-day exploit probability).

How do I get the Verified badge for my repo?

Verified is awarded automatically when a repo passes all four requirements during a scan: composite ≥ 7.0, Solid dimension ≥ 5.0, no disqualifiers, and at least 8 sufficient signals. There is no application form — open your repo's score page at /score/owner/repo and click Claim your Verified badge, then confirm your maintainer email to activate the gold badge. The same criteria apply to every repo. Verified status is re-checked daily and auto-revoked if the repo no longer meets requirements (with email notification). See /verify for the full explainer.

Can I trust a "Verified" repo blindly?

No. Verified means strong on every measurable signal, but the engine scores the project around a skill — repo health, author credibility, security posture, dependency hygiene — not the runtime behavior of the skill when an agent invokes its tools. Static analysis catches obvious dangerous patterns in source files; it cannot model multi-step tool-call dynamics, runtime credential leaks, or downstream agent context contamination. Treat Verified as a strong prior, not a verdict. Sandboxed runtime monitoring is on the roadmap.

Reproducibility

Scores are deterministic given the same repository state at the same point in time. Two scans of the same repo within the same hour will produce identical results (barring concurrent commits or releases).

Scores change over time as repositories evolve — commits land, issues get responded to, dependencies update, new releases ship. Our nightly crawler re-scores repos to keep the data fresh. Historical scores are stored and available through the weekly digest and trending page.

The full dataset is available under CC BY 4.0 at /data/latest.json.

Score any repo, package, or skill

Paste a GitHub URL, npm package name, Smithery listing, or OpenClaw skill. Results in 10 seconds.

Scan now