Metadata

title
Long Read WGS Starter Corpus
status
active
created
2026-04-07T09:48:00+09:00

Scope

This corpus contains 10 deeply ingested papers on long-read whole-genome sequencing across human structural variation, clinical diagnostics, large-cohort benchmarking, rare disease resolution, and population-scale studies.

Coverage Map

Foundational SV and benchmarking

Population-scale SV resources

Clinical and rare disease resolution

Frontier application

Reading Order

  1. Chaisson 2019
  2. Sanford Kobayashi 2022
  3. Mahmoud 2024
  4. Kosugi 2024
  5. Showpnil 2024
  6. Sinha 2025
  7. Otsuki 2022
  8. Gong 2025
  9. Schloissnig 2025
  10. Hard 2023

This order still works well for reading because it builds core concepts first, then clinical utility, then larger cohort studies, and finally the more specialized single-cell application.

Cross-paper Claims

  • Long-read WGS has its strongest and most stable advantage in structural variation, especially insertions, repeat-associated events, and rearrangement interpretation.
  • The clinical case for long reads is strongest in rare disease when dead zones, hard genes, phasing, methylation, or complex SVs are plausible mechanisms.
  • Population-scale long-read SV resources are now useful for filtering patient genomes, not only for discovery papers.
  • HiFi currently has the strongest overall accuracy signal in this corpus, while ONT has strong representation for methylation-aware and scalable workflows.
  • Short-read WGS remains competitive for many SNVs and some deletions outside repetitive regions, so long-read universal first-line use is still an open decision rather than a settled conclusion.

Main Tensions

  • targeted long-read escalation versus one unified long-read clinical workflow
  • benchmark-style maximal sensitivity versus scalable intermediate-coverage cohorts
  • HiFi accuracy advantages versus ONT workflow flexibility and native methylation support

Concept Entry Points

Questions To Drive Next Work

  • Which variant classes benefit most from long-read WGS over short-read WGS?
  • Where do HiFi and ONT differ in accuracy, coverage, and clinical utility?
  • What evidence already exists for diagnostic uplift in rare disease cohorts?
  • How much incremental value comes from long-read data in medically relevant or hard-to-map regions?
  • What population-scale SV resources now exist, and how transferable are they across ancestries?