What's in a risk score? A methodological primer

We get this question a lot: how do you turn raw blockchain data into a 0–100 number? Here's the honest answer.

Oliver Reeves

CTO, Okanewatch LTD

Feb 20, 2026, 10:00 AM UTC

9 min read

A recurring question from customers and regulators: what exactly goes into the AMLRegister risk score? We aim for transparency; this post walks through the current methodology. It will change as we iterate, so treat the details here as a snapshot of our approach as of April 2026, not a contract.

The risk score is a number from 0 to 100. Zero means no detected risk; 100 means maximum detected risk. The score is the weighted maximum of ten category scores, not their sum. We chose the weighted maximum because severity in compliance is typically driven by the worst signal, not the average signal — a single sanctions hit should not be averaged away by nine clean categories.

Each category score is itself a 0–100 value representing the strength of evidence for the category. "Strength of evidence" combines three inputs: direct match (the address is directly attributed to a known entity in the category), proximity (the address has transacted with attributed entities in the category), and pattern (the address exhibits behavioural patterns consistent with the category).

Direct match is the strongest signal and produces category scores in the 70–100 range. An address listed on OFAC's SDN for terrorism financing will produce a terrorism_financing score of 95+ and a sanctions score of 90+. Direct matches are rare but high-value.

Proximity is the workhorse. "One hop from a sanctioned cluster" is a common signal; we score it higher than "three hops away" but lower than "direct." We model proximity on a decay curve — exponential decay with a half-life of about 2.4 hops. Addresses that frequently touch a sanctioned cluster score higher than addresses that touched it once years ago.

Pattern analysis is the newest and most controversial component. We look at behavioural signatures associated with specific activity types: peeling chains, specific time-of-day distributions, specific amounts, specific counterparty distributions. Pattern-only scores are capped lower than direct-match scores to reflect their probabilistic nature. False-positive management is the main design constraint.

The weights applied when combining categories into the total score reflect our view of severity. Sanctions and terrorism financing weight 1.0. Darknet, ransomware, and stolen funds weight 0.85–0.9. Mixer weights 0.8. High-risk exchange weights 0.5. Gambling weights 0.3. Clean exchange does not contribute to the score at all (it's a positive-signal-only category). These weights are tuneable for enterprise customers with specific risk-appetite frameworks.

Calibration. Every month, we sample a statistically significant set of recent scores and review them against ground truth (confirmed attributions, regulatory actions, community intelligence). Where we observe systematic over- or under-scoring, we tune the inputs. The calibration report is published to enterprise customers each month.

What the score is not. It is not a prediction of criminal activity. It is not a legal determination. It is not a substitute for human review on high-stakes decisions. It is a signal — a high-density, rapidly updated signal — that compliance professionals can use to triage and prioritise.

If you want a deeper technical description — including the exact formulas, weight tables, and calibration curves — we publish a methodology whitepaper twice a year. The April 2026 edition is available to enterprise customers from the research portal; a summary version is linked from the footer of this blog.

Found this useful? Share it.

Twitter LinkedIn

What's in a risk score? A methodological primer

We get this question a lot: how do you turn raw blockchain data into a 0–100 number? Here's the honest answer.