SOTAVerified

Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Showing 125 of 1706 papers

TitleStatusHype
EvoGP: A GPU-accelerated Framework for Tree-based Genetic ProgrammingCode7
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning BenchmarksCode4
Baichuan 2: Open Large-scale Language ModelsCode4
Fairness Implications of Encoding Protected Categorical AttributesCode4
NeuralFoil: An Airfoil Aerodynamics Analysis Tool Using Physics-Informed Machine LearningCode3
The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple FeaturesCode3
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science CompetitionsCode3
RelBench: A Benchmark for Deep Learning on Relational DatabasesCode3
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve ClassificationCode3
Universal Time-Series Representation Learning: A SurveyCode3
How Can Recommender Systems Benefit from Large Language Models: A SurveyCode3
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary OptimizersCode2
MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous DrivingCode2
DeepMol: An Automated Machine and Deep Learning Framework for Computational ChemistrCode2
DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image ManipulationCode2
Fraud Dataset Benchmark and ApplicationsCode2
OmniXAI: A Library for Explainable AICode2
TSFEL: Time Series Feature Extraction LibraryCode2
Context-Aware Deep Learning for Multi Modal Depression DetectionCode1
Graph Neural Networks for Quantifying Compatibility Mechanisms in Traditional Chinese MedicineCode1
Can Models Help Us Create Better Models? Evaluating LLMs as Data ScientistsCode1
LML-DAP: Language Model Learning a Dataset for Data-Augmented PredictionCode1
Towards Autonomous Cybersecurity: An Intelligent AutoML Framework for Autonomous Intrusion DetectionCode1
Classification of Raw MEG/EEG Data with Detach-Rocket Ensemble: An Improved ROCKET Algorithm for Multivariate Time Series AnalysisCode1
A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular DataCode1
Show:102550
← PrevPage 1 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN14 gestures accuracy0.98Unverified