• Size: 2.53 MB
  • Uploaded: 2019-03-14 13:28:06
  • Status: Successfully converted

Some snippets from your converted document:

RNASEQ WITHOUT A REFERENCE Experimental Design Assembly in Non-Model Organisms And other (hopefully useful) Stuff Meg Staton mstaton1@utk.edu University of Tennessee Knoxville, TN I. Project Design Things you need to know BEFORE you begin Cost Read Count Replicates Pro Tip: Who is your resident statistician? Buy them a coffee and make friends. Replicates – What? •  Biological Replicates – independent biological sample, processed separately and barcoded •  Technical Replicates – independent library construction or sequencing of the same biological sample •  Technical reproducibility is very good for RNASeq •  Biological variation is much greater! •  Different genes have different variances and are potentially subject to different errors and biases. “Thinking About RNA Seq Experimental Design for Measuring Differential Gene Expression: The Basics” http://gkno2.tumblr.com/post/24629975632/thinking-about-rna-seq-experimental-design-for Marioni, J.C., et al. (2008) RNA-seq: An assessment of technical reproducibility and Replicates – How many? •  beyond a depth of 10 million reads, replicates provide more statistical power than depth for detecting differential gene expression •  Many people say at least 3 – this enables the t-test •  What if one fails? •  (Fishers exact test can utilize no replicates) Replicates – Software? •  Both EdgeR and DeSeq will calculate variance from replicates (but neither do a t-test) •  From the horse’s mouth: •  “to use something like a t test, you need enough replicates to estimate a variance for each gene. With two groups of five samples, you are already entering the regime there this should work well. For comparison, also try a tool that pools information from several genes to get better confidence in variance estimates, such as our DESeq or the Smyth group's edgeR. Of course, we like to claim that DESeq is better than edgeR, and for only two or three replicates, I do think so, but for five or more replicates, edgeR's "moderation" feature really pays off. So, even though I don't like admitting this, for your set-up [of 5 replicates per treatment], edgeR should work better than DESeq.” -Simon Anders on SeqAnswers http://seqanswers.com/forums/showthread.php?t=10410 Replicates – And Blocks? •  Randomized Block Design •  Randomize - assigning individuals at random to treatments in an experiment •  Blocking - Experimental units are grouped into homogeneous clusters in an attempt to improve the comparison of treatments •  Example – all organisms from the same location are “blocks”, multiple locations used •  Example - each block is a cultivar, with individuals from that cultivar randomly assigned to a treatment Read Count - How to Decide? •  Standards, Guidelines and Best Practices for RNA-Seq •  V1.0 (June 2011) •  The ENCODE Consortium •  What are you trying to do? •  Compare two mRNA samples for differential expression (30M PE per sample) •  Discover novel elements, perform more precise quantification, especially of lowly expressed transcripts (100-200M PE per sample) •  What resources do you already have? •  Well assembled and annotated genomes – single ends, shorter reads •  De novo – longer reads, paired ends •  What is being published in your community? http://encodeproject.org/ENCODE/protocols/dataStandards/ENCODE_RNAseq_Standards_V1.0.pdf Read Count – How to Decide? (cont.) •  Blogosphere disagrees •  Need half the coverage, double the replicates! •  Current experiments indicate that we are NOT discovering significantly more transcripts with a hiSeq run vs a miSeq run. (At least not transcripts that look like genes) •  A deep biased view Scotty – You need more power! •  Scotty is a web service to plan RNA-Seq experiments that measure differential gene expression. •  Prototype data required •  Pilot data -at least two replicates of either control or treatment •  Pre-loaded data http://bioinformatics.bc.edu/marthlab/scotty/scotty.php Scotty – up to $20k User Inputs Used in the Analysis Control columns in pilot data: 3 Test columns in pilot data: 3 Cost per replicate, control: $200 Cost per replicate, test: $200 Cost per million reads: $23 Alignment Rate: 90% Maximum cost of experiment: $20000 Percentage of genes detected: 50 At p value cutoff: 0.01 For the following true fold change: 2 Maximum percentage of genes with l

Recently converted files (publicly available):