Two decades after the Human Genome Project and Celera Genomics first sequenced the human genome, researchers have finally completed the missing 8% of the genome, thanks to advances in HiFi sequencing technology. This milestone results from the Telomere-to-Telomere (T2T) Consortium, a global collaboration of over 30 institutions, which published their findings on May 27, 2021.
The Breakthrough: T2T-CHM13
The newly completed genome, known as T2T-CHM13, adds approximately 200 million base pairs of novel sequences to the 2013 reference genome. This includes:
- 2,226 paralogous gene copies, 115 of which are protein-coding.
- Complete sequences of centromeric satellite arrays.
- The short arms of all five acrocentric chromosomes.
This achievement unlocks previously inaccessible regions of the genome, enabling new studies on variation and function in these areas.
HiFi Sequencing: The Technology Behind the Discovery
The team utilized HiFi sequencing technology from PacBio (Pacific Biosciences, California) and a cell line derived from a complete hydatidiform mole. This unique tissue type forms when a sperm fertilizes an egg without a nucleus, resulting in a cell with only one set of chromosomes. This approach eliminates the need to distinguish between maternal and paternal chromosomes, simplifying genome assembly.
HiFi sequencing stands out for its high accuracy and ability to produce long reads, which are essential for resolving complex genomic regions. This method was pivotal in producing a gap-free, haploid human genome assembly.
New to HiFi Sequencing? – Learn about this new paradigm in sequencing technology.
Implications of the T2T-CHM13 Genome
- Enhanced Reference Genome:
- The T2T-CHM13 genome removes the gaps and errors present in the previous GRCh38 reference, which had left 8% of the genome inaccessible to sequence-based studies for over 20 years.
- This includes all centromeric regions and the short arms of human chromosomes, areas crucial for understanding structural variation and genome function.
- Advancing Genomic Research:
- With the newly completed genome, researchers can conduct more comprehensive studies on previously unexplored genomic regions.
- The data lays the groundwork for improved functional studies and genetic analyses.
- Future Challenges:
- Around 3% of the genome still presents unresolved complexities, particularly in a few chromosomal regions.
- The T2T-CHM13 genome lacks the Y chromosome, which is critical for understanding male-specific development and genetic disorders.
Key Insights from the Study
- Innovative Approach: Researchers simplified the sequencing process and resolved complex genomic regions by using a hydatidiform mole-derived cell line.
- Limitations: Quality control remains a challenge in certain areas, and the absence of the Y chromosome limits the genome’s applicability to male-specific studies.
- Future Prospects: Researchers aim to extend their work to include the Y chromosome and refine error-prone areas.
Conclusion
Completing the human genome is a monumental step forward for genomics, made possible by HiFi sequencing and the collaborative efforts of the T2T Consortium. While challenges remain, this achievement paves the way for more accurate reference genomes and deeper insights into the complexities of human biology. The T2T-CHM13 genome sets a new standard for genomic research, marking the beginning of an exciting new era in the field.
FAQ Section
1. What is HiFi sequencing?
HiFi sequencing is an advanced sequencing technology that produces highly accurate long DNA reads, making it ideal for resolving complex genomic regions.
2. What is the T2T-CHM13 genome?
T2T-CHM13 is the first gap-free, haploid human genome assembly, incorporating 200 million previously missing base pairs.
3. Why was a hydatidiform mole used in the study?
A hydatidiform mole contains a single set of chromosomes, simplifying genome assembly by eliminating the need to differentiate between maternal and paternal DNA.
4. What are the limitations of the T2T-CHM13 genome?
The genome does not include the Y chromosome, and certain regions still present challenges in quality control and error resolution.
5. What are the implications of this achievement?
The T2T-CHM13 genome enables comprehensive studies on previously inaccessible regions, advancing our understanding of genetic variation and function.
Publication reference: Nurk, S. et al. Preprint at bioRxiv https://doi.org/10.1101/2021.05.26.445798 (2021).