• Document: Post-assembly Data Analysis
  • Size: 1.3 MB
  • Uploaded: 2019-03-14 13:19:30
  • Status: Successfully converted


Some snippets from your converted document:

Assembled transcriptome Post-assembly Data Analysis • Quantification: the expression level of each gene in each sample • DE genes: genes differentially expressed between samples • Clustering/network analysis • Identifying over-represented functional categories in DE genes • Evaluation of the quality of the assembly Part 1. Abundance estimation using RSEM Different summarization strategies will result in the inclusion or exclusion of different sets of reads in the table of counts. Map reads to genome Map reads to Transcriptome (TOPHAT) vs (BOWTIE) What is easier? No issues with alignment across splicing junctions. What is more difficult? The same reads could be repeatedly aligned to different splicing isoforms, and paralogous genes. RSEM assign ambiguous reads based on unique reads mapped to the same transcript. Red & yellow: unique regions Blue: regions shared between two transcripts Transcript 1 Transcript 2 Hass et al. Nature Protocols 8:1494 Trinity provides a script for calling BOWTIE and RSEM align_and_estimate_abundance.pl align_and_estimate_abundance.pl \ --transcripts Trinity.fasta \ --seqType fq \ --left sequence_1.fastq.gz \ --right sequence_2.fastq.gz \ --SS_lib_type RF \ --aln_method bowtie \ --est_method RSEM \ --thread_count 4 \ --trinity_mode \ --output_prefix tis1rep1 \ Parameters for align_and_estimate_abundance.pl --aln_method: alignment method Default: “bowtie” . Alignment file from other aligner might not be supported. --est_method: abundance estimation method Default : RSEM, slightly more accurate. Optional: eXpress, faster and less RAM required. --thread_count: number of threads --trinity_mode: the input reference is from Trinity. Non-trinity reference requires a gene-isoform mapping file (--gene_trans_map). Output files from RSEM (two files per sample) *.isoforms.results table transcript_id gene_id length effective_ expected_ TPM FPKM IsoPct length count gene1_isoform1 gene1 2169 2004.97 22.1 3.63 3.93 92.08 gene1_isoform2 gene1 2170 2005.97 1.9 0.31 0.34 7.92 … *.genes.results table gene_id transcript_id(s) length effective_ expected_ TPM FPKM length count gene1 gene1_isoform1 2169.1 2005.04 24 3.94 4.27 ,gene1_isoform 2 … Output files from RSEM (two files per sample) *.isoforms.results table transcript_id gene_id length effective_ expected_ TPM FPKM IsoPct length count gene1_isoform1 gene1 2169 2004.97 22.1 3.63 3.93 92.08 gene1_isoform2 gene1 2170 2005.97 1.9 0.31 0.34 7.92 … Percentage of an isoform in a gene *.genes.results table sum gene_id transcript_id(s) length effective_ expected_ TPM FPKM length count gene1 gene1_isoform1 2169.1 2005.04 24 3.94 4.27 ,gene1_isoform 2 … Filtering transcriptome reference based on RSEM filter_fasta_by_rsem_values.pl filter_fasta_by_rsem_values.pl \ --rsem_output=s1.isoforms.results,s2.isoforms.results \ --fasta=Trinity.fasta \ --output=Trinity.filtered.fasta \ --isopct_cutoff=5 \ --fpkm_cutoff=10 \ --tpm_cutoff=10 \ • Can be filtered by multiple RSEM files simultaneously. (criteria met in any one filter will be filtered ) Part 2. Differentially Expressed Genes … with 3 biological replicates in each species 0.25 0.20 % of replicates 0.15 0.10 0.05 0.00 0 5 10 15 20 Expression level Distribution of Expressio

Recently converted files (publicly available):