Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 1706 papers

Title	Date	Tasks	Status	Hype	Score
EvoGP: A GPU-accelerated Framework for Tree-based Genetic Programming	Jan 21, 2025	Feature EngineeringGPU	CodeCode Available	7	5
Baichuan 2: Open Large-scale Language Models	Sep 19, 2023	Feature EngineeringGSM8K	CodeCode Available	4	5
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks	Jun 27, 2024	Feature EngineeringModel Selection	CodeCode Available	4	5
Fairness Implications of Encoding Protected Categorical Attributes	Jan 27, 2022	FairnessFeature Engineering	CodeCode Available	4	5
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification	Apr 16, 2024	Feature EngineeringLanguage Modeling	CodeCode Available	3	5
NeuralFoil: An Airfoil Aerodynamics Analysis Tool Using Physics-Informed Machine Learning	Mar 20, 2025	Feature EngineeringPhysics-informed machine learning	CodeCode Available	3	5
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions	Oct 27, 2024	Feature Engineering	CodeCode Available	3	5
The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple Features	Jan 6, 2025	Feature EngineeringTime Series	CodeCode Available	3	5
RelBench: A Benchmark for Deep Learning on Relational Databases	Jul 29, 2024	Deep LearningFeature Engineering	CodeCode Available	3	5
How Can Recommender Systems Benefit from Large Language Models: A Survey	Jun 9, 2023	EthicsFeature Engineering	CodeCode Available	3	5
Universal Time-Series Representation Learning: A Survey	Jan 8, 2024	Feature EngineeringRepresentation Learning	CodeCode Available	3	5
DeepMol: An Automated Machine and Deep Learning Framework for Computational Chemistr	Jun 1, 2024	Activity PredictionAutoML	CodeCode Available	2	5
DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation	Mar 18, 2024	Feature EngineeringImage Manipulation	CodeCode Available	2	5
TSFEL: Time Series Feature Extraction Library	Mar 21, 2020	Feature EngineeringTime Series	CodeCode Available	2	5
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers	Mar 18, 2025	Automated Feature EngineeringFeature Engineering	CodeCode Available	2	5
Fraud Dataset Benchmark and Applications	Aug 30, 2022	AutoMLFeature Engineering	CodeCode Available	2	5
OmniXAI: A Library for Explainable AI	Jun 1, 2022	counterfactualCounterfactual Explanation	CodeCode Available	2	5
MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving	Sep 11, 2024	Autonomous DrivingFeature Engineering	CodeCode Available	2	5
DriveML: An R Package for Driverless Machine Learning	May 1, 2020	AutoMLBIG-bench Machine Learning	CodeCode Available	1	5
DiviK: Divisive intelligent K-Means for hands-free unsupervised clustering in big biological data	Sep 22, 2020	ClusteringFeature Engineering	CodeCode Available	1	5
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect Segmentation	Dec 21, 2023	Edge DetectionFeature Engineering	CodeCode Available	1	5
Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor Factorization	Nov 28, 2021	Dimensionality ReductionFeature Engineering	CodeCode Available	1	5
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction	Mar 13, 2017	Click-Through Rate PredictionFeature Engineering	CodeCode Available	1	5
DeepSurv: Personalized Treatment Recommender System Using A Cox Proportional Hazards Deep Neural Network	Jun 2, 2016	Feature EngineeringPredicting Patient Outcomes	CodeCode Available	1	5
DeltaPy: A Framework for Tabular Data Augmentation in Python	May 22, 2020	BIG-bench Machine LearningData Augmentation	CodeCode Available	1	5
DIFER: Differentiable Automated Feature Engineering	Oct 17, 2020	Automated Feature EngineeringBIG-bench Machine Learning	CodeCode Available	1	5
A Survey of Information Cascade Analysis: Models, Predictions, and Recent Advances	May 22, 2020	Feature EngineeringMarketing	CodeCode Available	1	5
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection	Apr 1, 2023	Deep LearningFeature Engineering	CodeCode Available	1	5
DoE2Vec: Deep-learning Based Features for Exploratory Landscape Analysis	Mar 31, 2023	Deep LearningFeature Engineering	CodeCode Available	1	5
Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees	May 18, 2019	Feature EngineeringFeature Importance	CodeCode Available	1	5
Efficient End-to-End AutoML via Scalable Search Space Decomposition	Jun 19, 2022	AutoMLFeature Engineering	CodeCode Available	1	5
CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching	Dec 1, 2020	Computer SecurityCross-Modal Retrieval	CodeCode Available	1	5
CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT	Apr 20, 2020	Feature Engineering	CodeCode Available	1	5
Cognitive Evolutionary Search to Select Feature Interactions for Click-Through Rate Prediction	Aug 1, 2023	Click-Through Rate PredictionEvolutionary Algorithms	CodeCode Available	1	5
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?	Dec 1, 2020	Feature EngineeringQ-Learning	CodeCode Available	1	5
Classification of Raw MEG/EEG Data with Detach-Rocket Ensemble: An Improved ROCKET Algorithm for Multivariate Time Series Analysis	Aug 5, 2024	ClassificationEEG	CodeCode Available	1	5
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems	Jul 14, 2020	Automated Essay ScoringCommon Sense Reasoning	CodeCode Available	1	5
Cardea: An Open Automated Machine Learning Framework for Electronic Health Records	Oct 1, 2020	Automated Feature EngineeringAutoML	CodeCode Available	1	5
Compatible deep neural network framework with financial time series data, including data preprocessor, neural network model and trading strategy	May 11, 2022	Binary ClassificationFeature Engineering	CodeCode Available	1	5
BP-Net: Efficient Deep Learning for Continuous Arterial Blood Pressure Estimation using Photoplethysmogram	Nov 29, 2021	Blood pressure estimationFeature Engineering	CodeCode Available	1	5
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists	Oct 30, 2024	Feature Engineering	CodeCode Available	1	5
An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint Programming	Jun 9, 2023	Combinatorial OptimizationFeature Engineering	CodeCode Available	1	5
Anomaly Detection for Solder Joints Using β-VAE	Apr 24, 2021	Anomaly DetectionFeature Engineering	CodeCode Available	1	5
CASPR: Customer Activity Sequence-based Prediction and Representation	Nov 16, 2022	Feature EngineeringPrediction	CodeCode Available	1	5
Classification of Periodic Variable Stars with Novel Cyclic-Permutation Invariant Neural Networks	Nov 2, 2020	AstronomyFeature Engineering	CodeCode Available	1	5
Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference	Dec 16, 2020	Feature EngineeringMedical Question Answering	CodeCode Available	1	5
Deep & Cross Network for Ad Click Predictions	Aug 17, 2017	Click-Through Rate PredictionFeature Engineering	CodeCode Available	1	5
Deep Dive into Hunting for LotLs Using Machine Learning and Feature Engineering.	Apr 21, 2023	Feature Engineering	CodeCode Available	1	5
A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data	Jul 2, 2024	Feature EngineeringHyperparameter Optimization	CodeCode Available	1	5
A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch Literature	Nov 1, 2021	coreference-resolutionCoreference Resolution	CodeCode Available	1	5

Show:10 25 50

← PrevPage 1 of 35Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	CNN	14 gestures accuracy	0.98	—	Unverified