A comparison of assemblers and strategies for complex, large-genome sequencing with PacBio long reads.

Author(s): Gu, Jenny and Hall, Richard and Heiner, Cheryl and Sogoloff, Brian and Meldrim, James and Connolly, Kristen and Shea, Terrance and Russ, Carsten and Cuomo, Christina and Szabo, Les

PacBio sequencing holds promise for addressing large-genome complexities, such as long, highly repetitive, low-complexity regions and duplication events that are difficult to resolve with short-read technologies. Several strategies, with varying outcomes, are available for de novo sequencing and assembling of larger genomes. Using a diploid fungal genome, estimated to be ~80 Mb in size, as the basis dataset for comparison, we highlight assembly options when using only PacBio sequencing or a combined strategy leveraging data sets from multiple sequencing technologies. Data generated from SMRT Sequencing was subjected to assembly using different large-genome assemblers, and comparisons of the results will be shown. These include results generated with HGAP, Celera Assembler, MIRA, PBJelly, and other assembly tools currently in development. Improvements observed include a near 50% reduction in the number of contigs coupled with at least a doubling of contig N50 size in genome assemblies incorporating SMRT Sequencing data. We further show how incorporating long reads also highlights new challenges and missed insights of short-read assemblies arising from heterozygosity inherent in multiploid genomes.

Organization: PacBio
Year: 2014

View Conference Poster




在本网页上注册,即表示您同意,并同意 PacBio 根据我们的隐私政策收集和使用该信息.