SOTAVerified

Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Showing 150 of 1706 papers

TitleStatusHype
EvoGP: A GPU-accelerated Framework for Tree-based Genetic ProgrammingCode7
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning BenchmarksCode4
Baichuan 2: Open Large-scale Language ModelsCode4
Fairness Implications of Encoding Protected Categorical AttributesCode4
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve ClassificationCode3
Universal Time-Series Representation Learning: A SurveyCode3
How Can Recommender Systems Benefit from Large Language Models: A SurveyCode3
The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple FeaturesCode3
NeuralFoil: An Airfoil Aerodynamics Analysis Tool Using Physics-Informed Machine LearningCode3
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science CompetitionsCode3
RelBench: A Benchmark for Deep Learning on Relational DatabasesCode3
DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image ManipulationCode2
DeepMol: An Automated Machine and Deep Learning Framework for Computational ChemistrCode2
TSFEL: Time Series Feature Extraction LibraryCode2
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary OptimizersCode2
MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous DrivingCode2
OmniXAI: A Library for Explainable AICode2
Fraud Dataset Benchmark and ApplicationsCode2
DriveML: An R Package for Driverless Machine LearningCode1
DiviK: Divisive intelligent K-Means for hands-free unsupervised clustering in big biological dataCode1
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect SegmentationCode1
DIFER: Differentiable Automated Feature EngineeringCode1
DeepFM: A Factorization-Machine based Neural Network for CTR PredictionCode1
Context-Aware Deep Learning for Multi Modal Depression DetectionCode1
DeepSurv: Personalized Treatment Recommender System Using A Cox Proportional Hazards Deep Neural NetworkCode1
DeltaPy: A Framework for Tabular Data Augmentation in PythonCode1
Disentangled Attribution Curves for Interpreting Random Forests and Boosted TreesCode1
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability DetectionCode1
DoE2Vec: Deep-learning Based Features for Exploratory Landscape AnalysisCode1
Deep Dive into Hunting for LotLs Using Machine Learning and Feature Engineering.Code1
Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor FactorizationCode1
Efficient End-to-End AutoML via Scalable Search Space DecompositionCode1
Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global InferenceCode1
CASPR: Customer Activity Sequence-based Prediction and RepresentationCode1
CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code MatchingCode1
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring SystemsCode1
Benchmarks and Custom Package for Energy ForecastingCode1
Classification of Raw MEG/EEG Data with Detach-Rocket Ensemble: An Improved ROCKET Algorithm for Multivariate Time Series AnalysisCode1
Binary Black-box Evasion Attacks Against Deep Learning-based Static Malware Detectors with Adversarial Byte-Level Language ModelCode1
Can Models Help Us Create Better Models? Evaluating LLMs as Data ScientistsCode1
Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time seriesCode1
BP-Net: Efficient Deep Learning for Continuous Arterial Blood Pressure Estimation using PhotoplethysmogramCode1
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?Code1
Cardea: An Open Automated Machine Learning Framework for Electronic Health RecordsCode1
CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERTCode1
Classification of Periodic Variable Stars with Novel Cyclic-Permutation Invariant Neural NetworksCode1
Cognitive Evolutionary Search to Select Feature Interactions for Click-Through Rate PredictionCode1
Deep & Cross Network for Ad Click PredictionsCode1
A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular DataCode1
AutoSmart: An Efficient and Automatic Machine Learning framework for Temporal Relational DataCode1
Show:102550
← PrevPage 1 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN14 gestures accuracy0.98Unverified