Streamlined processing and analysis of 16S rRNA amplicon sequencing data with OCMS_16S and OCMSlooksy
Yen S., Johnson J., Ilott NE.
16S rRNA gene sequencing is a cost-effective method for profiling the bacterial component of a microbiome. Nevertheless, processing and analysis of the resulting sequencing data is often constrained by the availability of dedicated bioinformaticians - creating a bottleneck for biological interpretation. Multiple visualisation and analysis tools now exist for downstream analysis of 16S rRNA data. These tools are designed with biological scientists in mind and therefore consist of a graphical user interface that interacts with taxonomic counts tables to perform tasks such as alpha- and beta-diversity analysis and differential abundance. However, generating the input to these applications still relies on bioinformatics experience, creating a disconnect between data processing and data analysis. We aimed to bridge the gap between data processing and data analysis. To do this we have created two tools - OCMS_16S and OCMSlooksy - that perform data processing and data visualisation/analysis, respectively. OCMS_16S is a cgat-core based pipeline that wraps DADA2 functionality in order to facilitate processing of raw sequence reads into tables of amplicon sequence variant (ASV) counts using a simple command line interface. OCMSlooksy is an RShiny application that takes an OCMS_16S-generated SQLite database as input to facilitate data exploration and analysis. Combining these tools provides a simple, user-friendly workflow to facilitate 16S rRNA gene amplicon sequencing data analysis from raw reads to results.