Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 1706 papers

Title	Date	Tasks	Status	Hype
EvoGP: A GPU-accelerated Framework for Tree-based Genetic Programming	Jan 21, 2025	Feature EngineeringGPU	CodeCode Available	7
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks	Jun 27, 2024	Feature EngineeringModel Selection	CodeCode Available	4
Baichuan 2: Open Large-scale Language Models	Sep 19, 2023	Feature EngineeringGSM8K	CodeCode Available	4
Fairness Implications of Encoding Protected Categorical Attributes	Jan 27, 2022	FairnessFeature Engineering	CodeCode Available	4
NeuralFoil: An Airfoil Aerodynamics Analysis Tool Using Physics-Informed Machine Learning	Mar 20, 2025	Feature EngineeringPhysics-informed machine learning	CodeCode Available	3
The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple Features	Jan 6, 2025	Feature EngineeringTime Series	CodeCode Available	3
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions	Oct 27, 2024	Feature Engineering	CodeCode Available	3
RelBench: A Benchmark for Deep Learning on Relational Databases	Jul 29, 2024	Deep LearningFeature Engineering	CodeCode Available	3
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification	Apr 16, 2024	Feature EngineeringLanguage Modeling	CodeCode Available	3
Universal Time-Series Representation Learning: A Survey	Jan 8, 2024	Feature EngineeringRepresentation Learning	CodeCode Available	3
How Can Recommender Systems Benefit from Large Language Models: A Survey	Jun 9, 2023	EthicsFeature Engineering	CodeCode Available	3
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers	Mar 18, 2025	Automated Feature EngineeringFeature Engineering	CodeCode Available	2
MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving	Sep 11, 2024	Autonomous DrivingFeature Engineering	CodeCode Available	2
DeepMol: An Automated Machine and Deep Learning Framework for Computational Chemistr	Jun 1, 2024	Activity PredictionAutoML	CodeCode Available	2
DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation	Mar 18, 2024	Feature EngineeringImage Manipulation	CodeCode Available	2
Fraud Dataset Benchmark and Applications	Aug 30, 2022	AutoMLFeature Engineering	CodeCode Available	2
OmniXAI: A Library for Explainable AI	Jun 1, 2022	counterfactualCounterfactual Explanation	CodeCode Available	2
TSFEL: Time Series Feature Extraction Library	Mar 21, 2020	Feature EngineeringTime Series	CodeCode Available	2
Context-Aware Deep Learning for Multi Modal Depression Detection	Dec 26, 2024	Data AugmentationDeep Learning	CodeCode Available	1
Graph Neural Networks for Quantifying Compatibility Mechanisms in Traditional Chinese Medicine	Nov 18, 2024	Drug DiscoveryFeature Engineering	CodeCode Available	1
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists	Oct 30, 2024	Feature Engineering	CodeCode Available	1
LML-DAP: Language Model Learning a Dataset for Data-Augmented Prediction	Sep 27, 2024	ClassificationFeature Engineering	CodeCode Available	1
Towards Autonomous Cybersecurity: An Intelligent AutoML Framework for Autonomous Intrusion Detection	Sep 5, 2024	AutoMLBayesian Optimization	CodeCode Available	1
Classification of Raw MEG/EEG Data with Detach-Rocket Ensemble: An Improved ROCKET Algorithm for Multivariate Time Series Analysis	Aug 5, 2024	ClassificationEEG	CodeCode Available	1
A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data	Jul 2, 2024	Feature EngineeringHyperparameter Optimization	CodeCode Available	1
The Remarkable Robustness of LLMs: Stages of Inference?	Jun 27, 2024	Feature EngineeringPrediction	CodeCode Available	1
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning	Jun 12, 2024	Automated Feature EngineeringFeature Engineering	CodeCode Available	1
Network Analytics for Anti-Money Laundering -- A Systematic Literature Review and Experimental Evaluation	May 29, 2024	Feature EngineeringFraud Detection	CodeCode Available	1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences	May 28, 2024	BenchmarkingFeature Engineering	CodeCode Available	1
VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections	Mar 24, 2024	Feature EngineeringGraph Learning	CodeCode Available	1
Retrieve, Merge, Predict: Augmenting Tables with Data Lakes	Feb 9, 2024	AutoMLBenchmarking	CodeCode Available	1
SMUTF: Schema Matching Using Generative Tags and Hybrid Features	Jan 22, 2024	Feature EngineeringHumanitarian	CodeCode Available	1
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect Segmentation	Dec 21, 2023	Edge DetectionFeature Engineering	CodeCode Available	1
Relational Deep Learning: Graph Representation Learning on Relational Databases	Dec 7, 2023	Deep LearningFeature Engineering	CodeCode Available	1
netFound: Foundation Model for Network Security	Oct 25, 2023	Feature Engineeringfeature selection	CodeCode Available	1
Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time series	Oct 19, 2023	DiversityFeature Engineering	CodeCode Available	1
FASER: Binary Code Similarity Search through the use of Intermediate Representations	Oct 5, 2023	Feature Engineering	CodeCode Available	1
Fine-Tuning Self-Supervised Learning Models for End-to-End Pronunciation Scoring	Sep 19, 2023	Feature EngineeringPhone-level pronunciation scoring	CodeCode Available	1
SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning	Aug 3, 2023	Feature EngineeringGraph Learning	CodeCode Available	1
Cognitive Evolutionary Search to Select Feature Interactions for Click-Through Rate Prediction	Aug 1, 2023	Click-Through Rate PredictionEvolutionary Algorithms	CodeCode Available	1
TimeTuner: Diagnosing Time Representations for Time-Series Forecasting with Counterfactual Explanations	Jul 19, 2023	counterfactualFeature Engineering	CodeCode Available	1
Benchmarks and Custom Package for Energy Forecasting	Jul 14, 2023	Feature EngineeringLoad Forecasting	CodeCode Available	1
Feature Programming for Multivariate Time Series Prediction	Jun 9, 2023	Automated Feature EngineeringFeature Engineering	CodeCode Available	1
An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint Programming	Jun 9, 2023	Combinatorial OptimizationFeature Engineering	CodeCode Available	1
Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering	May 5, 2023	Automated Feature EngineeringAutoML	CodeCode Available	1
Deep Dive into Hunting for LotLs Using Machine Learning and Feature Engineering.	Apr 21, 2023	Feature Engineering	CodeCode Available	1
SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language Model	Apr 17, 2023	Feature EngineeringLanguage Modeling	CodeCode Available	1
Bayesian Optimization of Catalysis With In-Context Learning	Apr 11, 2023	Bayesian OptimizationFeature Engineering	CodeCode Available	1
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection	Apr 1, 2023	Deep LearningFeature Engineering	CodeCode Available	1
DoE2Vec: Deep-learning Based Features for Exploratory Landscape Analysis	Mar 31, 2023	Deep LearningFeature Engineering	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 35Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	CNN	14 gestures accuracy	0.98	—	Unverified