Source
- PDF: raw/sources/gong_2025_long-read_sequencing_of_945_han.pdf
- Status: deep ingested on 2026-04-07
- Scope: large Han Chinese long-read SV atlas with functional follow-up into phenotype and disease relevance
Study design
- Samples: 945 Han Chinese genomes
- Platform: long-read sequencing with downstream population, multi-omics, and functional analyses
- Aim: build a large ancestry-specific SV resource and test whether specific SVs have causal phenotypic effects
Summary
- This paper moves beyond atlas-building into trait interpretation.
- It reports 111,288 SVs, with 24.56% not previously reported, and uses phenotypic, multi-omics, and mouse-model follow-up to argue that selected SVs are causal rather than merely associated.
- It is especially notable for connecting population-scale long-read discovery to biomedical hypotheses like cisplatin-induced kidney injury risk.
Key findings
- The cohort yielded over one hundred thousand SVs, many novel and many predicted to have functional relevance.
- The study highlights a GSDMD SV linked to bone mineral density and a modern-human-specific WWP2 SV affecting multiple anthropometric, craniofacial, and immune phenotypes.
- The authors propose the GSDMD SV as a potential biomarker for cisplatin-induced acute kidney injury risk.
- Positive-selection signals and mouse-model experiments push the paper beyond cataloging into mechanistic interpretation.
Strengths
- Large cohort size for long-read population sequencing.
- Functional follow-up helps distinguish a resource paper from a pure discovery inventory.
- Strong example of how ancestry-focused long-read datasets can surface variants underrepresented in broader short-read resources.
Limitations and caveats
- Findings are centered on one ancestry group, so direct transfer to other populations is incomplete.
- The paper focuses on SV biology and selected functional exemplars rather than general clinical deployment questions.
- Population associations and follow-up are compelling, but not every highlighted SV will generalize equally across contexts.
Relevance to this corpus
- This is one of the main corpus papers for the argument that long-read WGS should help build population-specific SV resources rather than rely only on short-read-era databases.
- It pairs well with Otsuki and Schloissnig as part of the corpus's atlas-and-filtering theme.
Related concepts
Open questions
- How many of these phenotype-linked SVs remain informative across non-Han populations?
- What cohort size is needed before ancestry-specific SV resources become routine inputs for clinical filtering?