Effectiveness of Cross-linguistic Extraction of Genetic Information using Generative Large Language Models

2024-09-24Proceedings of the Iberian Languages Evaluation Forum (IberLEF) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN) 2024Code Available0· sign in to hype

Milindi Kodikara, Karin Verspoor

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/Milindi-Kodikara/RMIT-READ-BioMed
In papernone★ 1

Abstract

This paper presents the RMIT University system (RMIT-READ-BioMed) developed for the GenoVarDis shared task at IberLEF 2024, focusing on the task of Named Entity Recognition (NER) of genes, genetic variants, and associated diseases from Spanish-language scientific literature texts. The approach involves exploration of a general generative Large Language Model (LLM), GPT-3.5, for NER. We explore the impact of providing English-language instructions with the Spanish-language target text (crosslinguistic setting) as compared to a within-language setting where the instruction language matches the language of the text. We further experiment with a range of instruction strategies, including zero-shot and few-shot prompting under these two settings. Results indicate that the optimal results could be obtained with Englishlanguage instructions under the few-shot learning paradigm, resulting in an F1-score of 0.5. While this approach does not match the top results achieved for the shared task, our experiments provide insight into limitations associated with simple prompting of LLMs in languages other than English.

Tasks

Cross-Lingual NER Few-Shot Learning Genetic IE Language Modeling Language Modelling Large Language Model named-entity-recognition Named Entity Recognition Named Entity Recognition (NER)NER

Effectiveness of Cross-linguistic Extraction of Genetic Information using Generative Large Language Models

Code

Abstract

Tasks

Reproductions