Vés enrere Automated DNA-based plant identification for large-scale biodiversity assessment

Automated DNA-based plant identification for large-scale biodiversity assessment

Papadopoulou, A., Chesters, D., Coronado, I., De la Cadena, G., Cardoso, A., Reyes, J. C., Maes, J.-M., Rueda, R. M. & Gómez-Zurita, J. 2015. Automated DNA-based plant identification for large-scale biodiversity assessment. Molecular Ecology Resources  15: 136-152.
22.12.2014

Imagine a world where its main wonder are its 10 million living species. Imagine that each species can be recognised by sequence differences in a short, universal DNA marker and that there is a library relating each species to its DNA tag. In this world, the relevant business of species identification turns out to be a trivial exercise of matching DNA sequences.

This world pretty much exists, as the recent development of DNA-barcoding has shown. It's ours. Nonetheless, however ideal it may sound, the scenario described above is utopian for several reasons. Two are discouraging. First, with slightly over 1 million species described so far, we are still very far from the goal of knowing all our species. Second, for the small percentage we know, species-specific DNA tags have been characterised for a minority, although there are new additions to public sequence databases almost daily.

In these circumstances, when it seems impossible to have a complete DNA-dictionary for biodiversity, shall we disregard the aim of objective DNA-based species identification? Obviously not. We can still take advantage of one important fact: that all species share common ancestors and that these relationships have been used to develop a hierarchical phylogenetic system. Thus, molecular phylogenies using species-specific DNA tags are useful for robust species identification if matching sequences are available in a reference library, and even if there are large gaps in species sampling, as it is (and will be) the case, they still allow for useful taxonomic inferences at higher ranks. 

However, molecular phylogenetics is a highly-specialised field, it is time-consuming and the rate of growth of public sequence databases makes its routine use impractical. To encourage and facilitate phylogeny-based species identification, researchers have developed an automated procedure and collection of scripts (BAGpipe; freely available at http://molevol.cmima.csic.es/gomez-zurita/software.html) for regular data mining of GenBank for any marker of interest and taxonomic assignment of the source of homolog sequences based on standardised distance- and tree-based procedures. The use, strength and caveats of this procedure are illustrated investigating uses of one cpDNA marker for the flora of Nicaraguan seasonally dry forests.

Reference Article: Papadopoulou, A., Chesters, D., Coronado, I., De la Cadena, G., Cardoso, A., Reyes, J. C., Maes, J.-M., Rueda, R. M. & Gómez-Zurita, J. 2015. Automated DNA-based plant identification for large-scale biodiversity assessment. Molecular Ecology Resources  15: 136-152.