Utility of long-read sequencing for All of Us | LongRead Sequencing wiki

Source

PDF: raw/sources/mahmoud_2024_utility_of_long-read_sequencing_for.pdf
Status: deep ingested on 2026-04-07
Scope: pilot comparison of short-read and long-read sequencing strategies for a large national biobank context

Samples: HapMap samples and All of Us control samples spanning eight datasets
Platform: comparison across short-read and long-read technologies, including PacBio HiFi and ONT, with scalable cloud pipelines
Aim: determine which long-read approaches are most useful for medically relevant genes, small variants, and SVs in a large-cohort program

This paper asks a deployment question: if a program the size of All of Us considers long reads, what do they buy?
The answer is that long reads materially improve coverage and variant recovery in medically relevant, technically challenging genes, and that HiFi gives the strongest overall variant-calling accuracy in this pilot.
The paper also contributes operational infrastructure by releasing a cloud-based long-read analysis pipeline.

HiFi produced the most accurate results in the study for both small and large variants.
Long reads improved sequencing and variant calling in challenging medically relevant genes, including complex loci where short reads remain weaker.
The authors show that machine-learning-based callers and merged calling strategies can improve long-read small-variant performance.
The paper frames low-coverage long-read sequencing as a potential strategy for scaling sample counts, but treats that as a tradeoff rather than a solved recipe.

Strong translational relevance to biobank-scale decision-making.
Focuses on clinically important gene sets rather than only genome-wide aggregate metrics.
Provides practical pipeline infrastructure instead of only benchmark figures.

Pilot scale: the paper does not settle how long-read sequencing should be rolled out across a cohort of one million people.
Technology conclusions are sensitive to rapidly changing chemistry and caller performance.
The paper is richer on comparative performance than on end-to-end clinical outcome.

This paper sharpens the corpus-wide platform question.
It supports broad value for long reads, but its strongest claims are in hard medically relevant regions rather than in every part of the genome equally.

At what coverage and cost point do long reads become justified for national-scale screening programs?
How durable is the HiFi-over-ONT advantage as ONT chemistry and duplex workflows improve?