Convolutional Neural Networks over Tree Structures for Programming Language Processing
Lili Mou, Ge Li, Lu Zhang, Tao Wang, Zhi Jin
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/crestonbunch/tbcnntf★ 154
- github.com/rabbitjy/fuzztuningpytorch★ 24
- github.com/handdl/btcnnpytorch★ 0
- github.com/bdqnghi/bi-tbcnntf★ 0
- github.com/RyanMarcus/TreeConvolutionpytorch★ 0
- github.com/jacobwwh/tbcnn-dglpytorch★ 0
- github.com/spcl/ncctf★ 0
- github.com/bdqnghi/tbcnn.tensorflowtf★ 0
- github.com/bdqnghi/tbcnn-tensorflowtf★ 0
Abstract
Programming language processing (similar to natural language processing) is a hot research topic in the field of software engineering; it has also aroused growing interest in the artificial intelligence community. However, different from a natural language sentence, a program contains rich, explicit, and complicated structural information. Hence, traditional NLP models may be inappropriate for programs. In this paper, we propose a novel tree-based convolutional neural network (TBCNN) for programming language processing, in which a convolution kernel is designed over programs' abstract syntax trees to capture structural information. TBCNN is a generic architecture for programming language processing; our experiments show its effectiveness in two different program analysis tasks: classifying programs according to functionality, and detecting code snippets of certain patterns. TBCNN outperforms baseline methods, including several neural models for NLP.