• Document: Measuring transcriptomes with RNA-Seq
  • Size: 1.42 MB
  • Uploaded: 2019-03-14 13:21:59
  • Status: Successfully converted


Some snippets from your converted document:

Measuring transcriptomes with RNA-Seq BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2017 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Colin Dewey, Mark Craven, and Anthony Gitter Overview • RNA-Seq technology • The RNA-Seq quantification problem • Generative probabilistic models and Expectation-Maximization for the quantification task Goals for lecture • What is RNA-Seq? • How is RNA-Seq used to measure the abundances of RNAs within cells? • What probabilistic models and algorithms are used for analyzing RNA-Seq? Measuring transcription the old way: microarrays • Each spot has “probes” for a certain gene • Probe: a DNA sequence complementary to a certain gene • Relies on complementary hybridization • Intensity/color of light from each spot is measurement of the number of transcripts for a certain gene in a sample • Requires knowledge of gene sequences Advantages of RNA-Seq over microarrays • No reference sequence needed • With microarrays, limited to the probes on the chip • Low background noise • Large dynamic range • 105 compared to 102 for microarrays • High technical reproducibility • Identify novel transcripts and splicing events RNA-Seq technology • Leverages rapidly advancing sequencing technology (e.g., Illumina) • Transcriptome analog to whole genome shotgun sequencing • Two key differences from genome sequencing: 1. Transcripts sequenced at different levels of coverage - expression levels 2. Sequences already known (in many cases) - coverage is measurement A generic RNA-Seq protocol Sample RNA cDNA RNA fragments fragments reads CCTTCNCACTTCGTTTCCCAC reverse transcription + sequencing TTTTTNCAGAGTTTTTTCTTG fragmentation amplification machine GAACANTCCAACGCTTGGTGA GGAAANAAGACCCTGTTGAGC CCCGGNGATCCGCTGGGACAA GCAGCATATTGATAGATAACT CTAGCTACGCGTACGCGATCG CATCTAGCATCGCGTTGCGTT CCCGCGCGCTTAGGCTACTCG TCACACATCTCTAGCTAGCAT CATGCTAGCTATGCCTATCTA RNA-Seq data: FASTQ format @HWUSI-EAS1789_0001:3:2:1708:1305#0/1 CCTTCNCACTTCGTTTCCCACTTAGCGATAATTTG name +HWUSI-EAS1789_0001:3:2:1708:1305#0/1 sequence read VVULVBVYVYZZXZZ\ee[a^b`[a\a[\\a^^^\ @HWUSI-EAS1789_0001:3:2:2062:1304#0/1 qualities TTTTTNCAGAGTTTTTTCTTGAACTGGAAATTTTT +HWUSI-EAS1789_0001:3:2:2062:1304#0/1 a__[\Bbbb`edeeefd`cc`b]bffff`ffffff paired-end reads @HWUSI-EAS1789_0001:3:2:3194:1303#0/1 GAACANTCCAACGCTTGGTGAATTCTGCTTCACAA read1 +HWUSI-EAS1789_0001:3:2:3194:1303#0/1 ZZ[[VBZZY][TWQQZ\ZS\[ZZXV__\OX`a[ZZ @HWUSI-EAS1789_0001:3:2:3716:1304#0/1 read2 GGAAANAAGACCCTGTTGAGCTTGACTCTAGTCTG +HWUSI-EAS1789_0001:3:2:3716:1304#0/1 aaXWYBZVTXZX_]Xdccdfbb_\`a\aY_^]LZ^ 1 Illumina HiSeq @HWUSI-EAS1789_0001:3:2:5000:1304#0/1 CCCGGNGATCCGCTGGGACAAGCAGCATATTGATA 2500 lane +HWUSI-EAS1789_0001:3:2:5000:1304#0/1 aaaaaBeeeeffffehhhhhhggdhhhhahhhadh ~150 million reads Tasks with RNA-Seq data • Assembly: • Given: RNA-Seq reads (and possibly a genome sequence) • Do: Reconstruct full-length transcript sequences from the reads • Quantification (our focus): • Given: RNA-Seq reads and transcript sequences • Do: Estimate the relative abundances of transcripts (“gene expression”) • Differential expression: • Given: RNA-Seq reads from two different samples and transcript sequences • Do: Predict which transcripts have different abundances between two samples RNA-Seq is a relative abundance measurement

Recently converted files (publicly available):