BLens: Contrastive Captioning of Binary Functions using Ensemble Embedding
Tristan Benoit, Yunru Wang, Moritz Dannehl, Johannes Kinder
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Function names can greatly aid human reverse engineers, which has spurred the development of machine learning-based approaches to predicting function names in stripped binaries. Much current work in this area now uses transformers, applying a metaphor of machine translation from code to function names. Still, function naming models face challenges in generalizing to projects unrelated to the training set. In this paper, we take a completely new approach by transferring advances in automated image captioning to the domain of binary reverse engineering, such that different parts of a binary function can be associated with parts of its name. We propose BLens, which combines multiple binary function embeddings into a new ensemble representation, aligns it with the name representation latent space via a contrastive learning approach, and generates function names with a transformer architecture tailored for function names. Our experiments demonstrate that BLens significantly outperforms the state of the art. In the usual setting of splitting per binary, we achieve an F_1 score of 0.79 compared to 0.70. In the cross-project setting, which emphasizes generalizability, we achieve an F_1 score of 0.46 compared to 0.29. Finally, in an experimental setting reducing shared components across projects, we achieve an F_1 score of 0.32 compared to 0.19.