• Document: Next Generation Sequencing: An Overview
  • Size: 1.84 MB
  • Uploaded: 2019-03-14 13:18:12
  • Status: Successfully converted


Some snippets from your converted document:

Next Generation Sequencing: An Overview Cavan Reilly November 13, 2017 Table of contents Next generation sequencing NGS and microarrays Study design Quality assessment Burrows Wheeler transform Next generation sequencing Over the last 10 years or so there has been rapid development of methods for next generation sequencing. Here is the process for the Illumina technology (one of the major producers of platforms for next generation sequencing). The biological sample (e.g. a sample of mRNA molecules) is first randomly fragmented into short molecules. Then the ends of the fragments are adenylated and adaptor oligos are attached to the ends. Next generation sequencing The fragments are then size selected, purified and put on a flow cell. An Illumina flow cell has 8 lanes and is covered with other oligonucleotides that bind to the adaptors that have been ligated to the fragmented nucleotide molecules from the sample. The bound fragments are then extended to make copies and these copies bind to the surface of the flow cell. This is continued until there are many copies of the original fragment resulting in a collection of hundreds of millions of clusters. Next generation sequencing The reverse strands are then cleaved off and washed away and sequencing primer is hybridized to the bound DNA. The individual clusters are then sequenced in parallel, base by base, by hybridizing fluorescently labeled nucleotides. After each round of extension of the nucleotides a laser excites all of the clusters and a read is made of the base that was just added at each cluster. If a very short sequence is bound to the flow cell it is possible that the machine will sequence the adaptor sequence-this is referred to as adaptor contamination. There is also a measure of the quality of the read that is saved along with the read itself. Next generation sequencing These quality measures are on the PHRED scale, so if there is an estimated probability of an error of p, the PHRED based score is −10 log10 p. If we fragment someone’s DNA we can then sequence the fragments, and if we can then put the fragments back together we can then get the sequence of that person’s genome or transcriptome. This would allow us to determine what alleles this subject has at every locus that displays variation among humans (e.g. SNPs). Next generation sequencing There are a number of popular algorithms for putting all of the fragments back together: BWA, Maq, SOAP, ELAND and Bowtie. We’ll discuss Bowtie in some detail later (BWA uses the same ideas as Bowtie). ELAND is a proprietary algorithm from Illumina and Maq and SOAP use hash tables and are considerably slower than Bowtie. Applications There are many applications of this basic idea: 1. resequencing (DNA-seq) 2. gene expression (RNA-seq) 3. miRNA discovery and quantitation 4. DNA methylation studies 5. ChIP-seq studies 6. metagenomics 7. ultra-deep sequencing of viral genomes We will focus on resequencing (and SNP calling) and gene expression in this course. NGS and microarrays Currently microarrays are not used to study gene expression, almost every researcher would use sequencing based techniques. Compared to microarrays, RNA-seq 1. higher sensitivity 2. higher dynamic range 3. lower technical variation 4. don’t need a sequenced genome (but it helps, a lot) 5. more information about different exon uses (more than 90% of human genes with more than 1 exon have alternative isoforms) NGS and microarrays In one head to head comparison, 30% more genes are found to be differentially expressed with the same FDR (Marioni, Mason, Mane, et al. (2008)). These authors also found that the correlation between normalized intensity values from a microarray and the logarithm of the counts for the same transcript were 0.7-0.8 (this is similar to what others have reported). The largest differences occurred when the read count was low and the microarray intensity was high which probably reflects cross-hybridization in the microarray. NGS and microarrays Others have found slightly higher correlations, here is a figure from a study by Su, Li, Chen, et al. (2011). Study design Before addressing the technical details, we will outline some considerations regarding study design specific to NGS technology. There are several major manufacturers of the technology necessary to generate short reads, and they all have different platforms with resulting differences in the data and processing methods. A few of the major vendors are 1. Illumina, formerly known as Solexa sequencing, they now have MiSeq and HiSeq (the Genome analyzer is the old platform) 2. Roche, which owns 454 Life Sciences, which supports GS FLX+ system 3. Life technologies, which supports Ion-Torrent

Recently converted files (publicly available):