Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 1706 papers

Title	Date	Tasks	Status
FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes	Jun 3, 2025	BenchmarkingFeature Engineering	CodeCode Available
Universal Reusability in Recommender Systems: The Case for Dataset- and Task-Independent Frameworks	Jun 3, 2025	Feature EngineeringModel Selection	—Unverified
CNN-LSTM Hybrid Model for AI-Driven Prediction of COVID-19 Severity from Spike Sequences and Clinical Data	May 29, 2025	Feature EngineeringRobust classification	CodeCode Available
Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging	May 29, 2025	Feature EngineeringMedical Image Analysis	CodeCode Available
Transforming Podcast Preview Generation: From Expert Models to LLM-Based Systems	May 29, 2025	Feature Engineering	—Unverified
Machine Learning Algorithm for Noise Reduction and Disease-Causing Gene Feature Extraction in Gene Sequencing Data	May 26, 2025	Feature Engineering	—Unverified
AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science	May 25, 2025	BenchmarkingFeature Engineering	—Unverified
Action is All You Need: Dual-Flow Generative Ranking Network for Recommendation	May 22, 2025	AllAttribute	—Unverified
Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype	May 22, 2025	Feature EngineeringLarge Language Model	—Unverified
Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories	May 21, 2025	Decision MakingFeature Engineering	—Unverified
Time to Embed: Unlocking Foundation Models for Time Series with Channel Descriptions	May 20, 2025	Feature EngineeringRepresentation Learning	—Unverified
Enhancing Abstractive Summarization of Scientific Papers Using Structure Information	May 20, 2025	Abstractive Text SummarizationFeature Engineering	CodeCode Available
Text embedding models can be great data engineers	May 20, 2025	Feature EngineeringTime Series	—Unverified
GSDFuse: Capturing Cognitive Inconsistencies from Multi-Dimensional Weak Signals in Social Media Steganalysis	May 20, 2025	Data AugmentationFeature Engineering	CodeCode Available
Deep Learning-Based Forecasting of Boarding Patient Counts to Address ED Overcrowding	May 20, 2025	Feature Engineering	—Unverified
A Hybrid Quantum Classical Pipeline for X Ray Based Fracture Diagnosis	May 19, 2025	Dimensionality ReductionFeature Engineering	—Unverified
Machine Learning-Based Prediction of Mortality in Geriatric Traumatic Brain Injury Patients	May 19, 2025	Decision MakingFeature Engineering	—Unverified
Lightweight Spatio-Temporal Attention Network with Graph Embedding and Rotational Position Encoding for Traffic Forecasting	May 17, 2025	Feature EngineeringGraph Embedding	—Unverified
IISE PG&E Energy Analytics Challenge 2025: Hourly-Binned Regression Models Beat Transformers in Load Forecasting	May 16, 2025	Computational EfficiencyDeep Learning	—Unverified
NeurIPS 2024 Ariel Data Challenge: Characterisation of Exoplanetary Atmospheres Using a Data-Centric Approach	May 13, 2025	Feature Engineering	—Unverified
Machine Learning-Based Detection of DDoS Attacks in VANETs for Emergency Vehicle Communication	May 12, 2025	Feature EngineeringFeature Importance	—Unverified
Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs	May 12, 2025	BenchmarkingDocument Layout Analysis	—Unverified
QoSBERT: An Uncertainty-Aware Approach based on Pre-trained Language Models for Service Quality Prediction	May 9, 2025	Feature EngineeringPrediction	—Unverified
Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning	May 8, 2025	Feature EngineeringGeneral Knowledge	—Unverified
Rethinking Multimodal Sentiment Analysis: A High-Accuracy, Simplified Fusion Architecture	May 5, 2025	Emotion ClassificationFeature Engineering	—Unverified
Wide & Deep Learning for Node Classification	May 4, 2025	ClassificationDeep Learning	CodeCode Available
MPEC: Manifold-Preserved EEG Classification via an Ensemble of Clustering-Based Classifiers	Apr 30, 2025	ClassificationClustering	—Unverified
LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection	Apr 25, 2025	Feature EngineeringRAG	—Unverified
FLARE: Feature-based Lightweight Aggregation for Robust Evaluation of IoT Intrusion Detection	Apr 21, 2025	Feature EngineeringIntrusion Detection	—Unverified
Word Embedding Techniques for Classification of Star Ratings	Apr 18, 2025	ClassificationDimensionality Reduction	—Unverified
HMPE:HeatMap Embedding for Efficient Transformer-Based Small Object Detection	Apr 18, 2025	DecoderFeature Engineering	—Unverified
Morphing-based Compression for Data-centric ML Pipelines	Apr 15, 2025	Feature Engineering	—Unverified
Beyond Glucose-Only Assessment: Advancing Nocturnal Hypoglycemia Prediction in Children with Type 1 Diabetes	Apr 12, 2025	Decision MakingFeature Engineering	—Unverified
Bringing Structure to Naturalness: On the Naturalness of ASTs	Apr 11, 2025	Feature EngineeringLanguage Modelling	—Unverified
Boosting Relational Deep Learning with Pretrained Tabular Models	Apr 7, 2025	Deep LearningFeature Engineering	CodeCode Available
Feature Engineering on LMS Data to Optimize Student Performance Prediction	Apr 3, 2025	Feature EngineeringManagement	—Unverified
Unleashing the Power of Pre-trained Encoders for Universal Adversarial Attack Detection	Apr 1, 2025	Adversarial AttackAdversarial Attack Detection	—Unverified
SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data Science	Mar 30, 2025	Feature Engineering	—Unverified
FeRG-LLM : Feature Engineering by Reason Generation Large Language Models	Mar 30, 2025	Feature EngineeringLarge Language Model	—Unverified
Embedding Domain-Specific Knowledge from LLMs into the Feature Engineering Pipeline	Mar 27, 2025	Feature Engineeringfeature selection	—Unverified
RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts	Mar 27, 2025	Code RepairFeature Engineering	—Unverified
Feature-Enhanced Machine Learning for All-Cause Mortality Prediction in Healthcare Data	Mar 27, 2025	AllFeature Engineering	—Unverified
Asset price movement prediction using empirical mode decomposition and Gaussian mixture models	Mar 26, 2025	Ensemble LearningFeature Engineering	—Unverified
Machine Learning - Driven Materials Discovery: Unlocking Next-Generation Functional Materials -- A minireview	Mar 22, 2025	AutoMLBayesian Optimization	—Unverified
CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings	Mar 17, 2025	Code GenerationEthics	—Unverified
Applications of Large Language Model Reasoning in Feature Generation	Mar 15, 2025	Computational EfficiencyDomain Adaptation	—Unverified
Exploring LLM Agents for Cleaning Tabular Machine Learning Datasets	Mar 9, 2025	Data IntegrationFeature Engineering	—Unverified
VORTEX: Challenging CNNs at Texture Recognition by using Vision Transformers with Orderless and Randomized Token Encodings	Mar 9, 2025	Computational EfficiencyFeature Engineering	CodeCode Available
Bridging the Semantic Gap in Virtual Machine Introspection and Forensic Memory Analysis	Mar 7, 2025	Feature Engineering	—Unverified
YARE-GAN: Yet Another Resting State EEG-GAN	Mar 4, 2025	EEGFeature Engineering	CodeCode Available

Show:10 25 50

← PrevPage 4 of 35Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	CNN	14 gestures accuracy	0.98	—	Unverified