SOTAVerified

Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Showing 150 of 1706 papers

TitleStatusHype
EvoGP: A GPU-accelerated Framework for Tree-based Genetic ProgrammingCode7
Baichuan 2: Open Large-scale Language ModelsCode4
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning BenchmarksCode4
Fairness Implications of Encoding Protected Categorical AttributesCode4
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve ClassificationCode3
NeuralFoil: An Airfoil Aerodynamics Analysis Tool Using Physics-Informed Machine LearningCode3
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science CompetitionsCode3
The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple FeaturesCode3
RelBench: A Benchmark for Deep Learning on Relational DatabasesCode3
How Can Recommender Systems Benefit from Large Language Models: A SurveyCode3
Universal Time-Series Representation Learning: A SurveyCode3
DeepMol: An Automated Machine and Deep Learning Framework for Computational ChemistrCode2
DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image ManipulationCode2
TSFEL: Time Series Feature Extraction LibraryCode2
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary OptimizersCode2
Fraud Dataset Benchmark and ApplicationsCode2
OmniXAI: A Library for Explainable AICode2
MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous DrivingCode2
DriveML: An R Package for Driverless Machine LearningCode1
DiviK: Divisive intelligent K-Means for hands-free unsupervised clustering in big biological dataCode1
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect SegmentationCode1
Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor FactorizationCode1
DeepFM: A Factorization-Machine based Neural Network for CTR PredictionCode1
DeepSurv: Personalized Treatment Recommender System Using A Cox Proportional Hazards Deep Neural NetworkCode1
DeltaPy: A Framework for Tabular Data Augmentation in PythonCode1
DIFER: Differentiable Automated Feature EngineeringCode1
A Survey of Information Cascade Analysis: Models, Predictions, and Recent AdvancesCode1
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability DetectionCode1
DoE2Vec: Deep-learning Based Features for Exploratory Landscape AnalysisCode1
Disentangled Attribution Curves for Interpreting Random Forests and Boosted TreesCode1
Efficient End-to-End AutoML via Scalable Search Space DecompositionCode1
CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code MatchingCode1
CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERTCode1
Cognitive Evolutionary Search to Select Feature Interactions for Click-Through Rate PredictionCode1
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?Code1
Classification of Raw MEG/EEG Data with Detach-Rocket Ensemble: An Improved ROCKET Algorithm for Multivariate Time Series AnalysisCode1
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring SystemsCode1
Cardea: An Open Automated Machine Learning Framework for Electronic Health RecordsCode1
Compatible deep neural network framework with financial time series data, including data preprocessor, neural network model and trading strategyCode1
BP-Net: Efficient Deep Learning for Continuous Arterial Blood Pressure Estimation using PhotoplethysmogramCode1
Can Models Help Us Create Better Models? Evaluating LLMs as Data ScientistsCode1
An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint ProgrammingCode1
Anomaly Detection for Solder Joints Using β-VAECode1
CASPR: Customer Activity Sequence-based Prediction and RepresentationCode1
Classification of Periodic Variable Stars with Novel Cyclic-Permutation Invariant Neural NetworksCode1
Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global InferenceCode1
Deep & Cross Network for Ad Click PredictionsCode1
Deep Dive into Hunting for LotLs Using Machine Learning and Feature Engineering.Code1
A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular DataCode1
A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch LiteratureCode1
Show:102550
← PrevPage 1 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN14 gestures accuracy0.98Unverified