How we evaluate AI skills, MCP servers, and npm packages. Every score is computed from public data — no manual curation, no pay-to-play, no gaming the algorithm.
Stars don't mean safe. Downloads don't mean maintained. Brand recognition doesn't mean the code is audited. We built MCP Skills because the trust signals that developers actually need are scattered across half a dozen platforms and none of them talk to each other.
Our scoring engine pulls signals from multiple public data sources, combines them into a single composite score, and assigns a trust tier. Every repo gets the same treatment — we don't inflate scores for popular tools or penalize newcomers arbitrarily.
We deliberately do not publish the exact weights or formulas behind each signal. Transparent methodology doesn't require giving adversaries a blueprint for gaming it. What we do publish: every signal name, every dimension, every tier threshold, every disqualifier, and every safety scan pattern. You can see exactly what we evaluate and why — just not the precise math.
MCP Skills is the trust layer for AI skills and MCP servers — it scores publishers, source code, dependencies, vulnerabilities, and supply-chain risk so you can decide what to install before running anything locally. The decision happens upstream of configuration. Once you've decided to install something, runtime defenses (sandboxes, manifest scanners, agent-firewall proxies) take over.
If you also use a configured-server scanner like mcp-scan, the two complement each other. Pick whichever scanner you prefer for runtime defense; mcpskills runs upstream of that — it tells you which servers are worth installing in the first place. mcp-scan inspects what you've already configured (tool descriptions, manifest hashes, runtime requests through its proxy mode); we score the package itself before configuration happens.
This is why the trust score is a strong prior, not a verdict. It's the input to your install decision. Sandboxing, manifest hashing, and version pinning remain separate controls — covered in our MCP Pre-Install Audit walkthrough.
Scores are computed from publicly available data. We never access private repositories, paid APIs, or non-public metadata.
Applied to general-purpose repositories — libraries, frameworks, utilities. Evaluates maintenance activity, author credibility, security hygiene, and documentation quality.
Auto-detected when the repository contains indicators of an AI skill, MCP server, or agent tool. Skills Mode adds two signals focused on tool-specific safety and spec compliance, and shifts the scoring weight toward the Solid dimension because AI tools get access to sensitive resources — your terminal, environment variables, and network.
Detection is confidence-based: the engine looks for multiple independent indicators (manifest files, naming patterns, metadata keywords) and only activates Skills Mode when the confidence threshold is met. No single indicator triggers it alone.
Every signal maps to one of four dimensions. Each dimension answers a specific question a developer should ask before installing anything.
Measures whether the project is actively maintained. A repo that hasn't been touched in 6 months, never cuts releases, or ignores open issues is a liability even if the code was good once.
Evaluates the people behind the project. A single anonymous author with zero community adoption is higher-risk than an org-backed project with diverse contributors and real download volume.
The heaviest dimension in Skills Mode. Checks security hygiene, dependency risk, CI/CD supply chain integrity, and — for AI skills — runs 5 pattern-based safety scans against actual source code.
* Skills Mode only
Scores the developer experience. A brilliantly-engineered library with no README, no license, and no spec compliance is still unusable in practice.
* Skills Mode only — checks for SKILL.md, server manifest, OpenClaw frontmatter
When Skills Mode activates, the engine reads up to 20 source files from the repository and scans for 5 threat patterns derived from real-world attacks — including ClawHavoc (1,184 malicious skills on a major marketplace) and the ToxicSkills academic research.
Safety scanning is static analysis — it checks source code patterns, not runtime behavior. A clean scan does not guarantee runtime safety in all contexts. We're transparent about this limitation in our safety guide.
Every scored repo receives one of four trust tiers based on its composite score, dimension strength, signal coverage, and the presence of disqualifiers.
Repos that clear the trust bar — strong composite score, solid security dimension, sufficient signal coverage, and no disqualifiers — are automatically awarded Verified status. Verified is algorithmic, not pay-to-play, and the same criteria apply to every repo.
Verified repos receive a gold badge that can be embedded in READMEs. Verified status is re-evaluated nightly and automatically revoked if the repo drops below requirements. There is no permanent Verified status — trust is earned continuously.
View the current Verified repos at /verified.
Some packages don't have a public GitHub repository — they exist only on npm or another registry. For these, MCP Skills generates a partial score using 7 signals from registry metadata: publish recency, publish cadence, download volume, maintainer count, package age, dependency count, and license clarity.
Partial scores are clearly labeled and capped at Established tier. They can never reach Verified because without source code access, safety scanning and supply chain analysis can't run. If you're evaluating a package with a partial score, finding the source repository is the single most valuable step you can take.
Transparency means being honest about limitations. Here's what MCP Skills does not currently evaluate:
See our roadmap for planned improvements.
A trust score is a composite 0–10 number computed across 15 signals grouped into 4 dimensions (Alive, Legit, Solid, Usable). It quantifies how trustworthy a GitHub repo, npm package, MCP server, or OpenClaw skill is — combining maintenance health, author credibility, security posture, and usability into a single comparable number. Scores are deterministic given the same repository state at the same point in time.
Verified means composite ≥ 7.0 with the Solid dimension ≥ 5.0 and at least 8 sufficient signals — safe to build on. Established means composite ≥ 5.0 — solid choice with caveats worth checking. New means composite < 5.0 — promising but unproven, use with awareness. Blocked applies whenever a hard disqualifier is present (archived repo, no license, safety pattern detected, supply-chain risk, critical/KEV-listed CVE) — do not install.
Skills Mode is a confidence-based detection. SKILL.md presence in the repo root scores confidence 3 (highest). server.json or MCP-related metadata scores 2. MCP / OpenClaw / ClawHub keywords in name, description, or topics score 1. Skills Mode activates when total confidence is ≥ 2, or two or more independent indicators are present. It enables 5 safety scans, the tool_safety signal, skill_spec_compliance scoring, and YAML frontmatter parsing for OpenClaw transparency bonuses.
Solid measures security posture across five signals: security_posture (OpenSSF Scorecard adoption + branch protection), dependency_health (transitive dependency hygiene + count), tool_safety (static safety scanning), supply_chain_safety (CI workflow patterns), and known_vulnerabilities (OSV.dev + CISA KEV + FIRST.org EPSS lookup). It is the most heavily weighted dimension in Skills Mode.
Six disqualifiers can hard-gate a repo to the Blocked tier:
ARCHIVED — repo is archived on GitHubNO_LICENSE — no LICENSE file or unrecognized SPDX identifierSAFETY_BLOCK — static analysis flagged dangerous patterns in source filesSUPPLY_CHAIN_RISK — CI workflow has token exfiltration patterns or PR-target checkoutSINGLE_AUTHOR_LOW_ADOPTION — one contributor + low community signalsCRITICAL_CVE — any unpatched critical vulnerability OR any CVE on the CISA KEV catalogWhen a package has no resolvable source repository, MCP Skills falls back to partial scoring using npm metadata only. This produces a 7-signal score (versus 15 for full scoring) covering download adoption, maintainer credibility, dependency count, license clarity, README quality, known vulnerabilities (OSV/KEV), and lifecycle script detection. Partial scores are clearly labeled limited and capped at the Established tier — they cannot earn Verified.
A nightly crawler runs at 02:00 UTC discovering new repos from MCP Registry, GitHub topics, GitHub keyword search, npm search, and Smithery, then refreshes stale cache entries. On-demand scans always hit live data; cached results are at most 24 hours old. Monitored repos on paid plans get a dedicated daily re-scan at 08:00 UTC with email alerts if the composite score moves by 0.3 or more or the tier changes.
The known_vulnerabilities signal queries three real-time sources for the currently-installable version: OSV.dev (the unified advisory database — GHSA + npm advisories + PyPA + Go vulndb + RustSec), CISA Known Exploited Vulnerabilities (the federal authoritative list of vulnerabilities with confirmed in-the-wild exploitation — any CVE on KEV hard-gates the tier to blocked), and FIRST.org EPSS (Exploit Prediction Scoring System for 30-day exploit probability).
Verified is awarded automatically when a repo passes all four requirements during a scan: composite ≥ 7.0, Solid dimension ≥ 5.0, no disqualifiers, and at least 8 sufficient signals. There is no application form — open your repo's score page at /score/owner/repo and click Claim your Verified badge, then confirm your maintainer email to activate the gold badge. The same criteria apply to every repo. Verified status is re-checked daily and auto-revoked if the repo no longer meets requirements (with email notification). See /verify for the full explainer.
No. Verified means strong on every measurable signal, but the engine scores the project around a skill — repo health, author credibility, security posture, dependency hygiene — not the runtime behavior of the skill when an agent invokes its tools. Static analysis catches obvious dangerous patterns in source files; it cannot model multi-step tool-call dynamics, runtime credential leaks, or downstream agent context contamination. Treat Verified as a strong prior, not a verdict. Sandboxed runtime monitoring is on the roadmap.
Scores are deterministic given the same repository state at the same point in time. Two scans of the same repo within the same hour will produce identical results (barring concurrent commits or releases).
Scores change over time as repositories evolve — commits land, issues get responded to, dependencies update, new releases ship. Our nightly crawler re-scores repos to keep the data fresh. Historical scores are stored and available through the weekly digest and trending page.
The full dataset is available under CC BY 4.0 at /data/latest.json.
Paste a GitHub URL, npm package name, Smithery listing, or OpenClaw skill. Results in 10 seconds.
Scan now