• Document: RNA-Sequencing analysis
  • Size: 854.57 KB
  • Uploaded: 2019-03-14 13:21:03
  • Status: Successfully converted


Some snippets from your converted document:

RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content:  Biological background  Overview transcriptomics  RNA-Seq  RNA-Seq technology  Challenges  Comparable technologies  Expression quantification  ReCount database RNA-Seq - Overview 2 Biological background (I):  Structure of a protein coding mRNA  Non coding RNAs: Type Size Function  microRNA (miRNA) 21-23 nt regulation of gene expression  small interfering RNA (siRNA) 19-23 nt antiviral mechanisms  piwi-interacting RNA (piRNA) 26-31 nt interaction with piwi proteins/spermatogenesis  small nuclear RNA (snRNA) 100-300 nt RNA splicing  small nucleolar RNA (snoRNA) - modification of other RNAs Biological Background 3 Biological Background (II):  Processing  Splicing / Alternative Splicing / Trans-Splicing  RNA editing  Secondary structures  Example hairpin structure: Biological Background 4 RNA-Seq technology -Aims:  Catalogue all species of transcript including: mRNAs, non-coding RNAs and small RNAs  Determine the transcriptional structure of genes in terms of:  Start sites  5′ and 3′ ends  Splicing patterns  Other post-transcriptional modifications  Quantification of expression levels and comparison (different conditions, tissues, etc.) RNA-Seq technology 5 RNA-Seq analysis (I): Long RNAs are first converted into a library of cDNA fragments through either: RNA fragmentation or DNA fragmentation RNA-Seq analysis 6 RNA-Seq analysis (II):  In contrast to small RNAs (like piRNAs, miRNAs, siRNAs) larger RNA must be fragmented  RNA fragmentation or cDNA fragmentation (different techniques)  Methods create different type of bias:  RNA: depletion for ends  cDNA: biased towards 5’ end RNA-Seq analysis 7 RNA-Seq analysis (III): Sequencing adaptors (blue) are subsequently added to each cDNA fragment and a short sequence is obtained from each cDNA using high-throughput sequencing Technology (typical read length: 30-400 bp depending on technology) RNA-Seq analysis 8 RNA-Seq analysis (IV): The resulting sequence reads are aligned with the reference genome or transcriptome and classified as three types: exonic reads, junction reads and poly(A) end-reads. (de novo assembly also possible => attractive for non-model organisms) RNA-Seq analysis 9 RNA-Seq analysis (V): These three types are used to generate a base-resolution expression profile for each gene Example: A yeast ORF with one intron RNA-Seq analysis 10 RNA-Seq - Bioinformatic challenges (I):  Storing, retrieving and processing of large amounts of data  Base calling  Quality analysis for bases and reads => FastQ files  Mapping/aligning RNA-Seq reads (Alternative: assemble contigs and align them to genome)  Multiple alignment possible for some reads  Sequencing errors and polymorphisms =>SAM/BAM files RNA-Seq - Bioinformatic challenges 11 RNA-Seq - Bioinformatic challenges (II): Specific challenges for RNA-Seq:  Exon junctions and poly(A) ends  Identification of poly(A) -> long stretches of A or T at end of reads  Splice sites:  Specific sequence context: CT – AG dinucleotides  Low expression for intronic regions  Known or predicted splice sites  Detection of new sites (e.g. via split read mapping)  Overlapping genes  RNA editing  Secondary structure of transcripts  Quantification of expression signals RNA-Seq - Bioinformatic challenges 12 Coverage, sequencing depth and costs:  Number of detected genes (coverage) and costs increase with sequence depth (number of analyzed read)  Calculation of coverage is less straightforward in transcriptome analysis (transcription activity varies) RNA-Seq - Coverage 13 RNA-Seq - Comparable technologies:  Tiling array analysis  Classical sequencing of cDNA or EST  Classical gene expression arrays RNA-Seq - technology 14 Transcriptome mapping using tiling arrays: Chip design

Recently converted files (publicly available):