Structural variation in 1,019 diverse humans based on long-read sequencing | LongRead Sequencing wiki

Source

PDF: raw/sources/schloissnig_2025_structural_variation_in_1019_diverse.pdf
Status: deep ingested on 2026-04-07
Scope: globally diverse long-read population resource for SVs and VNTRs, built at intermediate coverage

Samples: 1,019 individuals from 26 populations linked to the 1000 Genomes Project
Platform: intermediate-coverage long-read sequencing with integrated linear- and graph-genome analyses
Aim: generate a diverse population SV resource and test whether it improves downstream variant prioritization

This paper is the most globally representative population resource in the corpus.
It shows that intermediate-coverage long-read sequencing can still produce a highly useful SV atlas when paired with strong graph-aware analysis.
The clinical hook is important: using the resource reduces rare disease candidate SV lists substantially while retaining most known causal events.

The study reports over 100,000 sequence-resolved biallelic SVs and 300,000 multiallelic VNTRs.
Many insertions, especially mobile element insertions, were not represented in prior large resources such as gnomAD.
When used for rare disease filtering, the resource cut candidate SV lists by roughly 55% while filtering out only 2 of 35 previously validated causal SVs in the authors' evaluation.
The paper also highlights graph-based analysis as a major enabler for population-scale long-read interpretation.

Strong population diversity relative to many earlier long-read atlas papers.
Connects resource building to practical patient-genome filtering.
Clear discussion of where long-read graph-based approaches already work and where they still struggle.

Extremely rare SVs, large inversions, centromeres, high-identity segmental duplications, and multiallelic VNTRs remain difficult.
Intermediate coverage is a practical compromise, not a complete replacement for deeper multi-platform assemblies.
The study is strongest on SVs and related repetitive variation, not on full-spectrum clinical WGS.

This is the strongest paper in the corpus for the idea that long-read population resources can directly improve patient-level filtering.
It complements Otsuki and Gong by adding much broader ancestry coverage.

How much better do candidate-filtering resources become when intermediate-coverage long reads are paired with deeper assemblies in representative subsets?
What is the best way to combine graph-based population resources with clinical interpretation pipelines?