Long-read sequencing of 945 Han individuals identifies structural variants associated with phenotypic diversity and disease susceptibility | LongRead Sequencing wiki

Source

PDF: raw/sources/gong_2025_long-read_sequencing_of_945_han.pdf
Status: deep ingested on 2026-04-07
Scope: large Han Chinese long-read SV atlas with functional follow-up into phenotype and disease relevance

Samples: 945 Han Chinese genomes
Platform: long-read sequencing with downstream population, multi-omics, and functional analyses
Aim: build a large ancestry-specific SV resource and test whether specific SVs have causal phenotypic effects

This paper moves beyond atlas-building into trait interpretation.
It reports 111,288 SVs, with 24.56% not previously reported, and uses phenotypic, multi-omics, and mouse-model follow-up to argue that selected SVs are causal rather than merely associated.
It is especially notable for connecting population-scale long-read discovery to biomedical hypotheses like cisplatin-induced kidney injury risk.

The cohort yielded over one hundred thousand SVs, many novel and many predicted to have functional relevance.
The study highlights a GSDMD SV linked to bone mineral density and a modern-human-specific WWP2 SV affecting multiple anthropometric, craniofacial, and immune phenotypes.
The authors propose the GSDMD SV as a potential biomarker for cisplatin-induced acute kidney injury risk.
Positive-selection signals and mouse-model experiments push the paper beyond cataloging into mechanistic interpretation.

Large cohort size for long-read population sequencing.
Functional follow-up helps distinguish a resource paper from a pure discovery inventory.
Strong example of how ancestry-focused long-read datasets can surface variants underrepresented in broader short-read resources.

Findings are centered on one ancestry group, so direct transfer to other populations is incomplete.
The paper focuses on SV biology and selected functional exemplars rather than general clinical deployment questions.
Population associations and follow-up are compelling, but not every highlighted SV will generalize equally across contexts.

This is one of the main corpus papers for the argument that long-read WGS should help build population-specific SV resources rather than rely only on short-read-era databases.
It pairs well with Otsuki and Schloissnig as part of the corpus's atlas-and-filtering theme.

How many of these phenotype-linked SVs remain informative across non-Han populations?
What cohort size is needed before ancestry-specific SV resources become routine inputs for clinical filtering?