SOTAVerified

Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Showing 125 of 1706 papers

TitleStatusHype
EvoGP: A GPU-accelerated Framework for Tree-based Genetic ProgrammingCode7
Fairness Implications of Encoding Protected Categorical AttributesCode4
Baichuan 2: Open Large-scale Language ModelsCode4
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning BenchmarksCode4
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science CompetitionsCode3
NeuralFoil: An Airfoil Aerodynamics Analysis Tool Using Physics-Informed Machine LearningCode3
Universal Time-Series Representation Learning: A SurveyCode3
RelBench: A Benchmark for Deep Learning on Relational DatabasesCode3
How Can Recommender Systems Benefit from Large Language Models: A SurveyCode3
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve ClassificationCode3
The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple FeaturesCode3
MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous DrivingCode2
TSFEL: Time Series Feature Extraction LibraryCode2
OmniXAI: A Library for Explainable AICode2
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary OptimizersCode2
Fraud Dataset Benchmark and ApplicationsCode2
DeepMol: An Automated Machine and Deep Learning Framework for Computational ChemistrCode2
DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image ManipulationCode2
Can Models Help Us Create Better Models? Evaluating LLMs as Data ScientistsCode1
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring SystemsCode1
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?Code1
Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time seriesCode1
Benchmarks and Custom Package for Energy ForecastingCode1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking SequencesCode1
Binary Black-box Evasion Attacks Against Deep Learning-based Static Malware Detectors with Adversarial Byte-Level Language ModelCode1
Show:102550
← PrevPage 1 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN14 gestures accuracy0.98Unverified