Is MetageNN a robust and memory-efficient neural network classifier for taxonomic analysis, even with sequencing errors and missing genomes?

Original title: MetageNN: a memory-efficient neural network taxonomic classifier robust to sequencing errors and missing genomes

Authors: Rafael Peres da Silva,Chayaporn Suphavilai,Niranjan Nagarajan

This article discusses the development of a new taxonomic classifier called MetageNN that is designed to classify long reads from sequencing data. Long-read sequencing technologies have become increasingly popular, but their higher error rates can make taxonomic classification challenging. Traditional alignment-based methods are often slower and may have lower sensitivity for strains and species that are not in the database.

MetageNN is a neural network model that uses short k-mer profiles of sequences to reduce the impact of distribution shifts on error-prone long reads. The researchers benchmarked MetageNN against other machine learning approaches and found significant improvements in accuracy, particularly with long-read data. MetageNN also outperformed other tools in terms of sensitivity and speed, while requiring less database storage.

This study demonstrates the potential of machine-learning-based methods for taxonomic classification using long reads and provides a memory-efficient alternative for classifiers. This research opens up new possibilities for classifying sequences that are not classified by conventional methods and suggests opportunities for further optimization.

Original article: https://www.biorxiv.org/content/10.1101/2023.12.01.569515v2