Assemble Foundation Models for Automatic Code Summarization

2022-01-13Code Available1· sign in to hype

Jian Gu, Pasquale Salza, Harald C. Gall

Code Available — Be the first to reproduce this paper.

Code

github.com/jianguda/afm4acs
OfficialIn paperpytorch★ 15

Abstract

Automatic code summarization is beneficial to daily software development since it could help reduce the requirement of manual writing. Currently, artificial intelligence is undergoing a paradigm shift. The foundation models pretrained on massive data and finetuned to downstream tasks surpass specially customized models. This trend inspired us to consider reusing foundation models instead of learning from scratch. Thereby, we propose a flexible and robust approach for automatic code summarization, based on neural models. We assemble available foundation models, such as CodeBERT and GPT-2, into a single neural model named AdaMo. Moreover, we utilize Gaussian noise as the simulation of contextual information to optimize the latent representation. Furthermore, we introduce two adaptive schemes from the perspective of knowledge transfer, namely continuous pretraining and intermediate finetuning, and design intermediate stage tasks for general sequence-to-sequence learning. Finally, we evaluate AdaMo against a benchmark dataset for code summarization, by comparing it with state-of-the-art models.

Tasks

Code Documentation Generation Code Summarization Sequence-to-sequence Language Modeling Source Code Summarization Transfer Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CodeSearchNet - Python	AdaMo-basic	BLEU-4	16.46	—	Unverified
DeepCom-Java	AdaMo-noise	BLEU-4	45.35	—	Unverified
DeepCom-Java	AdaMo-basic	BLEU-4	45.3	—	Unverified
Java scripts	AdaMo-basic	BLEU-4	37.64	—	Unverified
ParallelCorpus-Python	AdaMo-noise	BLEU-4	34.05	—	Unverified
ParallelCorpus-Python	AdaMo-basic	BLEU-4	33.85	—	Unverified

Assemble Foundation Models for Automatic Code Summarization

Code

Abstract

Tasks

Benchmark Results

Reproductions