Long Read WGS Starter Corpus | LongRead Sequencing wiki

Scope

This corpus contains 10 deeply ingested papers on long-read whole-genome sequencing across human structural variation, clinical diagnostics, large-cohort benchmarking, rare disease resolution, and population-scale studies.

Coverage Map

Foundational SV and benchmarking

Multi-platform discovery of haplotype-resolved structural variation in human genomes
Approaches to long-read sequencing in a clinical setting to improve diagnostic rate
Utility of long-read sequencing for All of Us
Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data

Population-scale SV resources

Clinical and rare disease resolution

Frontier application

Long-read whole-genome analysis of human single cells

Reading Order

Chaisson 2019
Sanford Kobayashi 2022
Mahmoud 2024
Kosugi 2024
Showpnil 2024
Sinha 2025
Otsuki 2022
Gong 2025
Schloissnig 2025
Hard 2023

This order still works well for reading because it builds core concepts first, then clinical utility, then larger cohort studies, and finally the more specialized single-cell application.

Cross-paper Claims

Long-read WGS has its strongest and most stable advantage in structural variation, especially insertions, repeat-associated events, and rearrangement interpretation.
The clinical case for long reads is strongest in rare disease when dead zones, hard genes, phasing, methylation, or complex SVs are plausible mechanisms.
Population-scale long-read SV resources are now useful for filtering patient genomes, not only for discovery papers.
HiFi currently has the strongest overall accuracy signal in this corpus, while ONT has strong representation for methylation-aware and scalable workflows.
Short-read WGS remains competitive for many SNVs and some deletions outside repetitive regions, so long-read universal first-line use is still an open decision rather than a settled conclusion.

Main Tensions

targeted long-read escalation versus one unified long-read clinical workflow
benchmark-style maximal sensitivity versus scalable intermediate-coverage cohorts
HiFi accuracy advantages versus ONT workflow flexibility and native methylation support

Concept Entry Points

Questions To Drive Next Work

Which variant classes benefit most from long-read WGS over short-read WGS?
Where do HiFi and ONT differ in accuracy, coverage, and clinical utility?
What evidence already exists for diagnostic uplift in rare disease cohorts?
How much incremental value comes from long-read data in medically relevant or hard-to-map regions?
What population-scale SV resources now exist, and how transferable are they across ancestries?

Metadata