Original title: BEND: Benchmarking DNA Language Models on biologically meaningful tasks
Authors: Frederikke Isa Marin, Felix Teufel, Marc Horrender, Dennis Madsen, Dennis Pultz, Ole Winther, Wouter Boomsma
In their article, researchers delve into the complexities of understanding the genome’s functional elements. While genome availability has surged, decoding these elements remains challenging and expensive. To tackle this, they explore unsupervised language models for genomic DNA, a successful approach for protein data. However, existing DNA language models lack unified evaluation methods reflecting real genome annotation challenges—length, scale, and sparsity. Introducing BEND, a benchmark for DNA language models, they define tasks based on the human genome’s practical biological aspects. Assessing current models, they note these models show promise in some tasks but struggle with capturing extensive genomic features. BEND, available online, aims to refine these models for a more comprehensive understanding of genome functionality.
Original article: https://arxiv.org/abs/2311.12570