Next Generation Bioinformatics on the Cloud http://www.easygenomics.com Sifei He Director of BGI Cloud hesifei@genomics.cn Xing Xu, Ph.D Senior Product Manager EasyGenomics | BGI xuxing@genomics.cn Contact Us info@easygenomics.com Agenda  Vision and Strategy  Problems and Solutions  Product Introduction  LIVE Demo  Future Roadmap  Q&A Trend of Volume and Cost $/Mb D N A S e q u e n c e Human Genome Sequenced Figures adapted from Sboner A, et al.: The real cost of sequencing: higher than you think! Genome Biology 2011, 12:125 Numbers and Images from private research and the open Internet 3 Geological side of the problem Sequencing is a COMMODITY and happens EVERYWHERE. + BGI Images from omicsmaps.com Interpretation is the KEY  Analysis and Interpretation is the KEY  Application is the “Silver Bullet” Difficulties of Analysis Post Tertiary Secondary Tertiary Analysis Primary analysis Analysis Analysis Base calling Mapping Variant Calling In-depth Annotation Data Computation Complicated throughput intensive Algorithms Lack of Data storage Data storage Computation knowledge intensive Problems and Solutions Solutions Problems: Cloud • Big genomic data High Speed Data Exchange • Geological distribution Workflows • Algorithm integration +) Resource Management • Computational demand 7 EasyGenomics™  EasyGenomics is the bioinformatics platform for research and applications on the cloud EasyGenomics™ Computational Algorithms, Resources Workflows, Database, Reports Data management Web portal, High speed Simple UI connection EasyGenomics is the bioinformatics platform for research and applications on the cloud Bioinformatics Core  Algorithms: Carefully chosen, tested and optimized  Workflows: Whole genome resequencing, exome resequencing, RNA-Seq, small RNA, de novo Assembly Enabling Technology Hadoop-based Flexible Computing Human Genome SOAPdenovo EasyGenomicsTM (192 cores) Best Practice Award Genome Coverage 86% 86% Assembly Time 70h 55h for IT Infrastructure No. of Servers 1 15 Memory Size 500GB x 1 24 GB x 15 Mode Centralized Distributed 11 Data Management Sample A Analysis I Analysis II Raw Data Analysis X Project I Sample B  “Sample”, “Analysis”, “Project”  Mimicking real research procedure  Automatic management of underlying data structure High Speed Data Exchange  Aspera’s patented fasp™ high-speed file transferring technology  10~100X faster than FTP 13 Resource Management Managed Managed Multitenancy Workspace Data Structure Task Safe Backup

