Advancements in sequencing technologies have made metagenomic analyses of complex microbial samples routine and accessible. Mock communities of known composition are often run in parallel to allow for accurate data evaluation and to facilitate cross-study and inter-lab comparisons, yet they lack the microbial diversity of real-world samples. The ZymoBIOMICS Fecal Reference with TruMatrix Technology (D6323) is a highly diverse pooled human gut microbiome standard that provides a truly complex alternative to mock communities. However, the microbial content of this standard is only partially characterized, and species level composition remains underexplored. Here, we explore the content of this sample using highly accurate long-read sequencing. We generated 11.9 million HiFi reads (88.3 Gigabases) for this sample using PacBio HiFi sequencing on the Sequel IIe System. We performed taxonomic and functional profiling, as well as metagenome assembly, using analyses tailored to HiFi reads. With taxonomic profiling settings intended to optimize high precision and recall, we detected 155 species from 80 genera. With less stringent profiling settings with no filtering, we detected as many as ~7,200 total species. We found 92% of HiFi reads were assigned a functional annotation, with an average of 2–4 annotations per read. This resulted in over 66.9 million total functional annotations representing over 17,000 unique classes. We used hifiasm-meta to perform metagenome assembly and a PacBio binning pipeline to identify and characterize high-quality metagenome assembled genomes (MAGs). This workflow produced ~2600 genome bins and identified 199 high-quality MAGs (>70% complete, <10% contamination, <20 contigs). Of these, 102 MAGs were >95% complete and included 54 MAGs composed of a single, circular contig. Finally, we downsampled our data to simulate several multiplexing schemes and investigated the impact on these analyses. Species detection and functional profiling results were largely robust across data levels, whereas we observed a predictable decrease in MAG recovery with decreasing data. The species-level relative abundance profiles and highly complete MAGs generated in our study helps shed light on the diverse content of this novel metagenomic control and the use of PacBio HiFi sequencing for generating high-quality metagenomic data. Overall, this newly developed reference sample should prove useful in assessing sequencing results and consistency (from cross-platform technologies and wet lab methods), aid in the development of new bioinformatics approaches, and improve methodological benchmarking studies.
Organization: PacBio
Year: 2022