Metadata

title
Long-read sequencing of 945 Han individuals identifies structural variants associated with phenotypic diversity and disease susceptibility
kind
paper
status
ingested
added
2026-04-07T09:46:12+09:00
raw source
raw/sources/gong_2025_long-read_sequencing_of_945_han.pdf
deep ingested
2026-04-07

Source

Study design

  • Samples: 945 Han Chinese genomes
  • Platform: long-read sequencing with downstream population, multi-omics, and functional analyses
  • Aim: build a large ancestry-specific SV resource and test whether specific SVs have causal phenotypic effects

Summary

  • This paper moves beyond atlas-building into trait interpretation.
  • It reports 111,288 SVs, with 24.56% not previously reported, and uses phenotypic, multi-omics, and mouse-model follow-up to argue that selected SVs are causal rather than merely associated.
  • It is especially notable for connecting population-scale long-read discovery to biomedical hypotheses like cisplatin-induced kidney injury risk.

Key findings

  • The cohort yielded over one hundred thousand SVs, many novel and many predicted to have functional relevance.
  • The study highlights a GSDMD SV linked to bone mineral density and a modern-human-specific WWP2 SV affecting multiple anthropometric, craniofacial, and immune phenotypes.
  • The authors propose the GSDMD SV as a potential biomarker for cisplatin-induced acute kidney injury risk.
  • Positive-selection signals and mouse-model experiments push the paper beyond cataloging into mechanistic interpretation.

Strengths

  • Large cohort size for long-read population sequencing.
  • Functional follow-up helps distinguish a resource paper from a pure discovery inventory.
  • Strong example of how ancestry-focused long-read datasets can surface variants underrepresented in broader short-read resources.

Limitations and caveats

  • Findings are centered on one ancestry group, so direct transfer to other populations is incomplete.
  • The paper focuses on SV biology and selected functional exemplars rather than general clinical deployment questions.
  • Population associations and follow-up are compelling, but not every highlighted SV will generalize equally across contexts.

Relevance to this corpus

  • This is one of the main corpus papers for the argument that long-read WGS should help build population-specific SV resources rather than rely only on short-read-era databases.
  • It pairs well with Otsuki and Schloissnig as part of the corpus's atlas-and-filtering theme.

Open questions

  • How many of these phenotype-linked SVs remain informative across non-Han populations?
  • What cohort size is needed before ancestry-specific SV resources become routine inputs for clinical filtering?