EPPS, a metabarcoding bioinformatics pipeline using Nextflow

LI Yiyuan 1 David C.Molik1 Michael E.Pfrender1

(1.Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46554, USA)

【Abstract】Metabarcoding helps to quickly assess biodiversity. In this study, we discussed the popular metabarcoding analytical tools and parameter settings. We also developed a metabarcoding bioinformatics pipeline, EPPS, to process the data from quality control of raw reads to biodiversity comparisons between samples using a pipeline building program, Nextflow. The EPPS pipeline could summarize the time and memory cost of each process in the pipeline. We also applied the pipeline on a test dataset and a public dataset from a previous study. The result suggested that this pipeline could reliably analyze metabarcoding data and facilitate pipeline sharing of metabarcoding studies.

【Keywords】 environmental DNA; USEARCH; Trimmomatic; principal component analysis;

Download this article

(Translated by QI RS)


    Bazinet AL, Cummings MP (2012) A comparative evaluation of sequence classification programs. BMC Bioinformatics, 13, 92.

    Berger SA, Krompass D, Stamatakis A (2011) Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Systematic Biology, 60, 291–302.

    Bik HM, Interactive Pitch Inc. (2014) Phinch: An interactive, exploratory data visualization framework for-Omic datasets. bio Rxiv, 009944.

    Bista I, Carvalho GR, Tang M, Walsh K, Zhou X, Hajibabaei M, Shokralla S, Seymour M, Bradley D, Liu S, Christmas M (2018) Performance of amplicon and shotgun sequencing for accurate biomass estimation in invertebrate community samples. Molecular Ecology Resources, 18, 1020–1034.

    Bohmann K, Evans A, Gilbert MT, Carvalho GR, Creer S, Knapp M, Douglas WY, De Bruyn M (2014) Environmental DNA for wildlife biology and biodiversity monitoring. Trends in Ecology & Evolution, 29, 358–367.

    Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: Aflexible trimmer for Illumina sequence data. Bioinformatics, 30, 2114–2120.

    Boyer F, Mercier C, Bonin A, Le Bras Y, Taberlet P, Coissac E (2016) obitools: A unix-inspired software package for DNAmetabarcoding. Molecular Ecology Resources, 16, 176–182.

    Brady A, Salzberg SL (2009) Phymm and PhymmBL: Metagenomic phylogenetic classification with interpolated Markov models. Nature Methods, 6, 673.

    Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP (2016) DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13, 581.

    Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: Architecture and applications. BMC Bioinformatics, 10, 421.

    Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, Turnbaugh PJ, Fierer N, Knight R (2011) Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proceedings of the National Academy of Sciences, USA, 108, 4516–4522.

    Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, Huttley GA (2010) QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335.

    Cardoso P, Borges PA, Veech JA (2009) Testing the performance of beta diversity measures based on incidence data: The robustness to undersampling. Diversity and Distributions, 15, 1081–1090.

    Collen B, Whitton F, Dyer EE, Baillie JE, Cumberlidge N, Darwall WR, Pollock C, Richman NI, Soulsby AM, Böhm M (2014) Global patterns of freshwater species diversity, threat and endemism. Global Ecology and Biogeography, 23, 40–51.

    Crampton-Platt A, Timmermans MJ, Gimmel ML, Kutty SN, Cockerill TD, Vun Khen C, Vogler AP (2015) Soup to tree: The phylogeny of beetles inferred by mitochondrial metagenomics of a Bornean rainforest sample. Molecular Biology and Evolution, 32, 2302–2316.

    Crampton-Platt A, Douglas WY, Zhou X, Vogler AP (2016) Mitochondrial metagenomics: Letting the genes out of the bottle. GigaScience, 5, 15.

    Deiner K, Bik HM, Mächler E, Seymour M, LacoursièreRoussel A, Altermatt F, Creer S, Bista I, Lodge DM, de Vere N, Pfrender ME (2017a) Environmental DNAmetabarcoding: Transforming how we survey animal and plant communities. Molecular Ecology, 26, 5872–5895.

    Deiner K, Renshaw MA, Li Y, Olds BP, Lodge DM, Pfrender ME (2017b) Long–range PCR allows sequencing of mitochondrial genomes from environmental DNA. Methods in Ecology and Evolution, 8, 1888–1898.

    Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C (2017) Nextflow enables reproducible computational workflows. Nature Biotechnology, 35, 316.

    Dowle EJ, Pochon X, Banks JC, Shearer K, Wood SA (2016) Targeted gene enrichment and high-throughput sequencing for environmental biomonitoring: A case study using freshwater macroinvertebrates. Molecular Ecology Resources, 16, 1240–1254.

    Edgar RC (2016) SINTAX: A simple non-Bayesian taxonomy classifier for 16S and ITS sequences. bioRxiv, 074161.

    Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460–2461.

    Edgar RC (2013) UPARSE: Highly accurate OTU sequences from microbial amplicon reads. Nature Methods, 10, 996.

    Evans NT, Li Y, Renshaw MA, Olds BP, Deiner K, Turner CR, Jerde CL, Lodge DM, Lamberti GA, Pfrender ME (2017) Fish community assessment with e DNA metabarcoding: Effects of sampling design and bioinformatic filtering. Canadian Journal of Fisheries and Aquatic Sciences, 74, 1362–1374.

    Evans NT, Olds BP, Renshaw MA, Turner CR, Li Y, Jerde CL, Mahon AR, Pfrender ME, Lamberti GA, Lodge DM (2016) Quantification of mesocosm fish and amphibian species diversity via environmental DNA metabarcoding. Molecular Ecology Resources, 16, 29–41.

    Gerlach W, Stoye J (2011) Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Research, 39, e91.

    Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGANanalysis of metagenomic data. Genome Research, 17, 377–386.

    Ji Y, Ashton L, Pedley SM, Edwards DP, Tang Y, Nakamura A, Kitching R, Dolman PM, Woodcock P, Edwards FA, Larsen TH (2013) Reliable, verifiable and efficient monitoring of biodiversity via metabarcoding. Ecology Letters, 16, 1245–1257.

    Li Y, Evans NT, Renshaw MA, Jerde CL, Olds BP, Shogren AJ, Deiner K, Lodge DM, Lamberti GA, Pfrender ME (2018) Estimating fish alpha-and beta-diversity along a small stream with environmental DNA metabarcoding. Metabarcoding and Metagenomics, 2, e24262.

    Liu B, Gibbons T, Ghodsi M, Pop M (2010) MetaPhyler: Taxonomic profiling for metagenomic sequences. In: Bioinformatics and Biomedicine (BIBM), 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 95–100.

    Liu S, Wang X, Xie L, Tan M, Li Z, Su X, Zhang H, Misof B, Kjer KM, Tang M, Niehuis O (2016) Mitochondrial capture enriches mito-DNA 100 fold, enabling PCR-free mitogenomics biodiversity analysis. Molecular Ecology Resources, 16, 470–479.

    Liu S, Li Y, Lu J, Su X, Tang M, Zhang R, Zhou L, Zhou C, Yang Q, Ji Y, Yu DW (2013) SOAPBarcode: Revealing arthropod biodiversity through assembly of Illumina shotgun sequences of PCR amplicons. Methods in Ecology and Evolution, 4, 1142–1150.

    Lodge DM, Turner CR, Jerde CL, Barnes MA, Chadderton L, Egan SP, Feder JL, Mahon AR, Pfrender ME (2012) Conservation in a cup of water: Estimating biodiversity and population abundance from environmental DNA. Molecular Ecology, 21, 2555–2558.

    Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal, 17, 10–12.

    Masella AP, Bartram AK, Truszkowski JM, Brown DG, Neufeld JD (2012) PANDAseq: Paired-end assembler for Illumina sequences. BMC Bioinformatics, 13, 31.

    Matsen FA, Kodner RB, Armbrust EV (2010) pplacer: Linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMCBioinformatics, 11, 538.

    Millennium Ecosystem Assessment (2005) Ecosystem and Human Well-being: Biodiversity Synthesis. World Resources Institute, Washington, DC.

    Munch K, Boomsma W, Huelsenbeck JP, Willerslev E, Nielsen R (2008) Statistical assignment of DNA sequences using Bayesian phylogenetics. Systematic Biology, 57, 750–757.

    Newbold T, Hudson LN, Hill SL, Contu S, Lysenko I, Senior RA, Börger L, Bennett DJ, Choimes A, Collen B, Day J (2015) Global effects of land use on local terrestrial biodiversity. Nature, 520, 45.

    Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’hara RB, Simpson GL, Solymos P, Stevens MH, Wagner H (2013) Package ‘vegan’. Community Ecology Package, version. 2. (accessed on 2018-08-01)

    Olds BP, Jerde CL, Renshaw MA, Li Y, Evans NT, Turner CR, Deiner K, Mahon AR, Brueseke MA, Shirey PD, Pfrender ME (2016) Estimating species richness using environmental DNA. Ecology and Evolution, 6, 4214–4226.

    Patil KR, Roune L, McHardy AC (2012) The PhyloPythiaSweb server for taxonomic assignment of metagenome sequences. PLoS ONE, 7, e38581.

    Pfrender M, Hawkins C, Bagley M, Courtney G, Creutzburg B, Epler J, Fend S, Ferrington L Jr, Hartzell P, Jackson S, Larsen D (2010) Assessing macroinvertebrate biodiversity in freshwater ecosystems: Advances and challenges in DNA-based approaches. The Quarterly Review of Biology, 85, 319–340.

    Pimm SL, Jenkins CN, Abell R, Brooks TM, Gittleman JL, Joppa LN, Raven PH, Roberts CM, Sexton JO (2014) The biodiversity of species and their rates of extinction, distribution, and protection. Science, 344, 1246752.

    Piro VC, Matschkowski M, Renard BY (2017) MetaMeta: Integrating metagenome analysis tools to improve taxonomic profiling. Microbiome, 5, 101.

    Price MN, Dehal PS, Arkin AP (2009) Fast Tree: Computing large minimum evolution trees with profiles instead of a distance matrix. Molecular Biology and Evolution, 26, 1641–1650.

    R Core Team (2016) R: A Language and Environment for Statistical Computing. https://www.R-project.org/. (accessed on 2018-08-01)

    Rognes T, Flouri T, Nichols B, Quince C, MahéF (2016) VSEARCH: A versatile open source tool for metagenomics. PeerJ, 4, e2584.

    Rosen GL, Reichenberger ER, Rosenfeld AM (2010) NBC: The Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics, 27, 127–129.

    Sato Y, Miya M, Fukunaga T, Sado T, Iwasaki W (2018) MitoFish and MiFish pipeline: A mitochondrial genome database of fish with an analysis pipeline for environmental DNA metabarcoding. Molecular Biology and Evolution, 35, 1553–1555.

    Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW (2009) Introducing Mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537–7541.

    Simon TP, Evans NT (2017) Environmental quality assessment using stream fishes. In: Methods in Stream Ecology, 3rd edn. (eds Hauer FR, Lamberti G), pp. 319–334. Elsevier, London.

    Slowikowski K (2018) ggrepel: Automatically Position NonOverlapping Text Labels with ‘ggplot2’. https://CRAN.R-project.org/package=ggrepel. (accessed on 2018-08-01)

    Taberlet P, Coissac E, Hajibabaei M, Rieseberg LH (2012) Environmental DNA. Molecular Ecology, 21, 1789–1793.

    Tang M, Hardman CJ, Ji Y, Meng G, Liu S, Tan M, Yang S, Moss ED, Wang J, Yang C, Bruce C (2015) High-throughput monitoring of wild bee diversity and abundance via mitogenomics. Methods in Ecology and Evolution, 6, 1034–1043.

    Thomsen PF, Kielgast JO, Iversen LL, Wiuf C, Rasmussen M, Gilbert MT, Orlando L, Willerslev E (2012) Monitoring endangered freshwater biodiversity using environmental DNA. Molecular Ecology, 21, 2565–2573.

    Thomsen PF, Willerslev E (2015) Environmental DNA—An emerging tool in conservation for monitoring past and present biodiversity. Biological Conservation, 183, 4–18.

    Uritskiy GV, Di Ruggiero J, Taylor J (2018) Meta WRAP—A flexible pipeline for genome–resolved metagenomic data analysis. bioRxiv, 277442.

    Visconti A, Martin TC, Falchi M (2018) YAMP: A containerised workflow enabling reproducibility in metagenomics research. GigaScience, 7, giy072.

    Wang Q, Garrity GM, Tiedje JM, Cole JR (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73, 5261–5267.

    Wickham H (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York. http://ggplot2.org. (accessed on 2018–08–01)

    Wilcox TM, Zarn KE, Piggott MP, Young MK, McKelvey KS, Schwartz MK (2018) Capture enrichment of aquatic environmental DNA: A first proof of concept. Molecular Ecology Resources, 18, 1392–1401.

    Worm B, Barbier EB, Beaumont N, Duffy JE, Folke C, Halpern BS, Jackson JB, Lotze HK, Micheli F, Palumbi SR, Sala E (2006) Impacts of biodiversity loss on ocean ecosystem services. Science, 314, 787–790.

    Zhou HW, Li DF, Tam NF, Jiang XT, Zhang H, Sheng HF, Qin J, Liu X, Zou F (2011) BIPES, a cost-effective high-throughput method for assessing microbial diversity. The ISME Journal, 5, 741.

    Zhou X, Li Y, Liu S, Yang Q, Su X, Zhou L, Tang M, Fu R, Li J, Huang Q (2013) Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification. GigaScience, 2, 4.

This Article


CN: 11-3247/Q

Vol 27, No. 05, Pages 567-575

May 2019


Article Outline



  • 1 Design of EPPS pipeline
  • 2 Test data and output of EPPS
  • References