SOTAVerified

TrafficGPT: An LLM Approach for Open-Set Encrypted Traffic Classification

2024-08-06AINTEC 2024Code Available2· sign in to hype

Yasod Ginige, Thilini Dahanayaka, Suranga Seneviratne

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Encrypted traffic has been known to be vulnerable to traffic analysis attacks that exploit the statistical features of encrypted traffic flows, such as packet sizes, timing, and direction, to infer information about the underlying content, which undermines the privacy guarantees of end-to-end encryption. While state-of-the-art attacks leverage deep learning models to achieve high accuracy, most attacks work under the less realistic closed-set assumption. Deploying such attacks in practice requires addressing the open-set scenario, which allows the models to filter out target content from other background traffic. Concurrently, Large Language Models (LLM) are increasingly gaining traction due to their ability to adapt to diverse tasks in domains outside NLP, especially in applications with sequential data. Inspired by this, our work introduces TrafficGPT, a novel traffic analysis attack that leverages GPT-2, a popular LLM, to enhance feature extraction, thereby improving the open-set performance of downstream classification. We use five existing encrypted traffic datasets to show how the feature extraction by GPT-2 improves the open-set performance of traffic analysis attacks compared to ET-BERT and CNN-based approaches by 12.7% and 13.7%, respectively.

Tasks

Reproductions