Methodology
How a risk score gets computed. A deterministic core — account reputation, cross-repo patterns, PR heuristics, and maintainer reports — with an LLM review layered on top. The score is advisory: we post nothing to your PRs and never auto-block.
Account heuristics & reputation
Account age, followers, owned-repo stars, total public contributions, and bio/handle patterns. Reputation is a dampener as much as a signal.
A fresh, follower-less account paired with other evidence is corroborated upward (young-account boost, gated on existing evidence). A long-lived, starred, prolific account is dampened downward (reputation penalty, capped at −28) — unless a maintainer has validated a report, in which case human judgment wins. Bio/handle bot-patterns (ai-helper-*, gpt-*, '*-bot-NN'), a machine-random handle, and a bot-like follow graph (follows many, followed by few) each add a small gated corroborator.
youngAccountBoost = evidence>0 && ageDays<30 ? +8..12 : 0
botPatternBoost = evidence>0 && handle/bio matches ? +10 : 0
reputationPenalty = validatedReport ? 0
: age + stars + contributions + followers // 0..-28LLM review (on top of the core)
An LLM reads the PR title, body, patch, commit messages, the conversation/review comments, and the author's account context, and returns a verdict, a reason code, and a confidence.
Credential- and malicious-risk dimensions are clamped to 0 unless the actual patch contains matching tokens, so the model can't hallucinate a 'credential phishing' label onto a benign PR. The verdict adds weight on top of the deterministic core — it does not replace it.
confidence >= 90 → +65 confidence >= 80 → +50 confidence >= 65 → +30 otherwise → 0 (surfaced for review, no score)
Cross-repo campaigns & velocity
One PR is a data point. The same title/patch across unrelated repos — or a burst of PRs scattered across many unrelated orgs in a week — is a fingerprint.
Duplicate campaigns require at least 3 matching-title PRs across at least 2 repositories (+55). Separately, PR velocity weighted by org-diversity flags scattershot bursts: a prolific maintainer working inside their own org scores low because diversity is low, while 15 PRs across 8 unrelated owners in 7 days does not (+up to 25).
campaign: matches(sameTitle) >= 3 && repos >= 2 → +55
velocity: prs_7d >= 15 && distinctOwners >= 8 → +25
(capped so velocity alone stays in 'watch')Deterministic PR heuristics
Diff signature and commit-message voice, computed without any model so they're reproducible on appeal.
Diff signature flags scattershot/templated patches (many files each touched by one line, near-identical change sizes). Commit voice flags vacuous or duplicated messages ('update', 'fix', identical lines). These run on every PR and on the backfill.
diffSignature = scattershot + uniformity // 0..1 commitVoice = vacuousRatio + identicalRatio // 0..1 weight = max(diffSignature, commitVoice) → +10..25
Maintainer reports & corrections
A maintainer's report or correction command on a PR is the strongest human signal. Validated reports add weight; dismiss/allow corrections subtract it.
Reports are weighted by the reporter's historical accuracy (validated ÷ total, with a prior so new reporters trend neutral) and capped per reporter so a single account can't report-bomb a target. Corrections are idempotent against webhook re-delivery.
reportScore = Σ_reporter ( maxValidatedReport × trust ) trust = clamp(validated / max(total, 3), 0.2, 1) dismiss = -30 confirm = +25 allow/reset = reset
External imports & time-decay
Accounts imported from public OSS-abuse blocklists start elevated. All reports and signals lose weight as they age.
Imported accounts get a +48 base pending local verification. Every report and signal keeps full weight for 30 days, then decays linearly to a floor of 0.2 by one year — old context still counts, just less than fresh evidence.
importedBlocklist → +48 (review locally) ageDecay = 1.0 (≤30d) → 0.2 (≥365d), linear between
score = clamp0_100( maintainerReports // trust-weighted, decayed + llmReview // verdict on top + duplicateCampaign + prHeuristics + activity (≤ +20) + youngAccount + botPattern // gated on evidence + externalBlocklist − reputationDampener // skipped if a report is validated )
The score routes a flag to the maintainer's dashboard. We never post a PR comment, status check, or auto-block — a human always decides what happens to a contributor.
Strong repeated or severe evidence. Maintainers may choose to block or require manual approval.
Multiple strong signals or one severe signal. Maintainers should inspect before merging.
Moderate signal. Needs maintainer judgment and should not be treated as a final verdict.
Low or early signal. Track for context, but do not act without more evidence.
- Private repository contents (unless a repo policy opts in)
- Profile photos (we don't fetch images)
- Private email addresses or real names
- Geographic location or IP
- A single signal — no one input flags an account on its own