Bitcoin clustering — the process of grouping addresses that are probably controlled by the same entity — has been a foundational technique in blockchain analysis for over a decade. In 2026, it is also getting meaningfully harder. This is a short guide to why, and what we're doing about it.

The original Bitcoin clustering heuristics were simple: co-spend (multiple inputs to a transaction imply common ownership) and change detection (one of the outputs is usually the change to the sender). These techniques were devastating for casual users who did nothing to obscure ownership. They gave rise to a generation of analytics firms that could attribute most Bitcoin activity to a handful of exchanges.

CoinJoin adoption has flipped the playing field. In a CoinJoin, multiple parties combine inputs into a single transaction with equal-denomination outputs; co-spend no longer implies common ownership, and change detection becomes unreliable. While CoinJoin adoption peaked in 2023 and has declined with the enforcement pressure on Wasabi and Samourai, the absolute volume remains non-trivial, and the residual ambiguity contaminates clusters that touched CoinJoins at any point in their history.

Lightning Network further complicates the picture. Payments across the Lightning Network do not produce on-chain transactions at all — only channel openings and closings are visible on-chain. For any Bitcoin wallet active on Lightning, the observable on-chain activity is a dramatically reduced sample of actual activity.

Taproot, introduced in November 2021, has been a slow-burn disruptor. As Taproot adoption grows, outputs that would have been distinguishable (multi-sig vs single-sig, various script types) collapse into indistinguishable Schnorr signatures. By 2027, we expect Taproot to be the dominant script type and conventional script-based fingerprinting to be largely unusable.

What still works? Behavioural clustering. Instead of relying on cryptographic heuristics, behavioural clustering looks at transaction timing, fee preferences, wallet software signatures, receiving address reuse patterns, and off-chain intelligence signals. Behavioural clustering is probabilistic rather than deterministic, but its signal density is high.

Layered attribution. We combine on-chain heuristics with off-chain intelligence: exchange deposit addresses disclosed to us, darknet market seizures with on-chain artefacts, academic research on known attacks. No single source is authoritative; the combination is more robust than any component.

Transaction graph embeddings. In the past eighteen months, we've invested in graph neural networks that learn embeddings from the transaction graph and cluster on distance in embedding space. This approach captures subtle co-occurrence patterns that heuristic clustering misses — at the cost of interpretability.

For compliance teams, the implication is that cluster confidence scores matter more than cluster identity. A "this address is in the same cluster as X" finding should be accompanied by "we are 92% confident of that" or similar. Our API has returned these scores since Q3 2025, and we recommend that compliance workflows surface them in the analyst UI rather than treating attribution as a binary.

Bitcoin will continue to be both the most studied and the most stubborn chain for attribution. The good news is that the dominant use of Bitcoin for value transfer has shifted toward Lightning and institutional custody, reducing the on-chain signal surface but also reducing the attribution pressure. The bad news is that what remains on-chain is increasingly the high-effort, privacy-conscious activity, which is also where compliance attention is needed most.