• Document: CS 776 Spring 2016 Anthony Gitter
  • Size: 1.07 MB
  • Uploaded: 2019-03-14 13:20:50
  • Status: Successfully converted

Some snippets from your converted document:

Measuring transcriptomes with RNA-Seq BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu Overview • RNA-Seq technology • The RNA-Seq quantification problem • Generative probabilistic models and Expectation-Maximization for the quantification task Goals for lecture • What is RNA-Seq? • How is RNA-Seq used to measure the abundances of RNAs within cells? • What probabilistic models and algorithms are used for analyzing RNA-Seq? Measuring transcription the old way: microarrays • Each spot has “probes” for a certain gene • Probe: a DNA sequence complementary to a certain gene • Relies on complementary hybridization • Intensity/color of light from each spot is measurement of the number of transcripts for a certain gene in a sample • Requires knowledge of gene sequences Advantages of RNA-Seq over microarrays • No reference sequence needed • With microarrays, limited to the probes on the chip • Low background noise • Large dynamic range • 105 compared to 102 for microarrays • High technical reproducibility • Identify novel transcripts and splicing events RNA-Seq technology • Leverages rapidly advancing sequencing technology (e.g., Illumina) • Transcriptome analog to whole genome shotgun sequencing • Two key differences from genome sequencing: 1. Transcripts sequenced at different levels of coverage - expression levels 2. Sequences already known (in many cases) - coverage is measurement A generic RNA-Seq protocol Sample RNA cDNA RNA fragments fragments reads CCTTCNCACTTCGTTTCCCAC reverse transcription + sequencing TTTTTNCAGAGTTTTTTCTTG fragmentation amplification machine GAACANTCCAACGCTTGGTGA GGAAANAAGACCCTGTTGAGC CCCGGNGATCCGCTGGGACAA GCAGCATATTGATAGATAACT CTAGCTACGCGTACGCGATCG CATCTAGCATCGCGTTGCGTT CCCGCGCGCTTAGGCTACTCG TCACACATCTCTAGCTAGCAT CATGCTAGCTATGCCTATCTA RNA-Seq data @HWUSI-EAS1789_0001:3:2:1708:1305#0/1 CCTTCNCACTTCGTTTCCCACTTAGCGATAATTTG name +HWUSI-EAS1789_0001:3:2:1708:1305#0/1 sequence read VVULVBVYVYZZXZZ\ee[a^b`[a\a[\\a^^^\ @HWUSI-EAS1789_0001:3:2:2062:1304#0/1 qualities TTTTTNCAGAGTTTTTTCTTGAACTGGAAATTTTT +HWUSI-EAS1789_0001:3:2:2062:1304#0/1 a__[\Bbbb`edeeefd`cc`b]bffff`ffffff paired-end reads @HWUSI-EAS1789_0001:3:2:3194:1303#0/1 GAACANTCCAACGCTTGGTGAATTCTGCTTCACAA read1 +HWUSI-EAS1789_0001:3:2:3194:1303#0/1 ZZ[[VBZZY][TWQQZ\ZS\[ZZXV__\OX`a[ZZ @HWUSI-EAS1789_0001:3:2:3716:1304#0/1 read2 GGAAANAAGACCCTGTTGAGCTTGACTCTAGTCTG +HWUSI-EAS1789_0001:3:2:3716:1304#0/1 aaXWYBZVTXZX_]Xdccdfbb_\`a\aY_^]LZ^ 1 Illumina HiSeq @HWUSI-EAS1789_0001:3:2:5000:1304#0/1 CCCGGNGATCCGCTGGGACAAGCAGCATATTGATA 2500 lane +HWUSI-EAS1789_0001:3:2:5000:1304#0/1 aaaaaBeeeeffffehhhhhhggdhhhhahhhadh ~150 million reads RNA-Seq is a relative abundance measurement technology • RNA-Seq gives you reads from RNA the ends of a random sample sample of fragments in your library • Without additional data this cDNA only gives information about fragments relative abundances • Additional information, such as levels of “spike-in” transcripts, reads are needed for absolute measurements Issues with relative abundance measures Sample 1 Sample 1 Sample 2 Sample 2 Gene absolute relative absolute relative abundance abundance abundance abundance 1 20 10% 20 5% 2 20 10% 20 5%

Recently converted files (publicly available):