NeSy is alive and well: A LLM-driven symbolic approach for better code comment data generation and classification

2024-02-25Code Available0· sign in to hype

Hanna Abi Akl

Code Available — Be the first to reproduce this paper.

Code

github.com/hannaabiakl/nesy-code-generation-workflow
OfficialIn papernone★ 5

Abstract

We present a neuro-symbolic (NeSy) workflow combining a symbolic-based learning technique with a large language model (LLM) agent to generate synthetic data for code comment classification in the C programming language. We also show how generating controlled synthetic data using this workflow fixes some of the notable weaknesses of LLM-based generation and increases the performance of classical machine learning models on the code comment classification task. Our best model, a Neural Network, achieves a Macro-F1 score of 91.412% with an increase of 1.033% after data augmentation.

Tasks

Classification Data Augmentation Language Modeling Language Modelling Large Language Model

NeSy is alive and well: A LLM-driven symbolic approach for better code comment data generation and classification

Code

Abstract

Tasks

Reproductions