SOTAVerified

Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Showing 150 of 1706 papers

TitleStatusHype
EvoGP: A GPU-accelerated Framework for Tree-based Genetic ProgrammingCode7
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning BenchmarksCode4
Baichuan 2: Open Large-scale Language ModelsCode4
Fairness Implications of Encoding Protected Categorical AttributesCode4
NeuralFoil: An Airfoil Aerodynamics Analysis Tool Using Physics-Informed Machine LearningCode3
The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple FeaturesCode3
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science CompetitionsCode3
RelBench: A Benchmark for Deep Learning on Relational DatabasesCode3
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve ClassificationCode3
Universal Time-Series Representation Learning: A SurveyCode3
How Can Recommender Systems Benefit from Large Language Models: A SurveyCode3
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary OptimizersCode2
MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous DrivingCode2
DeepMol: An Automated Machine and Deep Learning Framework for Computational ChemistrCode2
DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image ManipulationCode2
Fraud Dataset Benchmark and ApplicationsCode2
OmniXAI: A Library for Explainable AICode2
TSFEL: Time Series Feature Extraction LibraryCode2
Context-Aware Deep Learning for Multi Modal Depression DetectionCode1
Graph Neural Networks for Quantifying Compatibility Mechanisms in Traditional Chinese MedicineCode1
Can Models Help Us Create Better Models? Evaluating LLMs as Data ScientistsCode1
LML-DAP: Language Model Learning a Dataset for Data-Augmented PredictionCode1
Towards Autonomous Cybersecurity: An Intelligent AutoML Framework for Autonomous Intrusion DetectionCode1
Classification of Raw MEG/EEG Data with Detach-Rocket Ensemble: An Improved ROCKET Algorithm for Multivariate Time Series AnalysisCode1
A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular DataCode1
The Remarkable Robustness of LLMs: Stages of Inference?Code1
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree ReasoningCode1
Network Analytics for Anti-Money Laundering -- A Systematic Literature Review and Experimental EvaluationCode1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking SequencesCode1
VCR-Graphormer: A Mini-batch Graph Transformer via Virtual ConnectionsCode1
Retrieve, Merge, Predict: Augmenting Tables with Data LakesCode1
SMUTF: Schema Matching Using Generative Tags and Hybrid FeaturesCode1
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect SegmentationCode1
Relational Deep Learning: Graph Representation Learning on Relational DatabasesCode1
netFound: Foundation Model for Network SecurityCode1
Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time seriesCode1
FASER: Binary Code Similarity Search through the use of Intermediate RepresentationsCode1
Fine-Tuning Self-Supervised Learning Models for End-to-End Pronunciation ScoringCode1
SimTeG: A Frustratingly Simple Approach Improves Textual Graph LearningCode1
Cognitive Evolutionary Search to Select Feature Interactions for Click-Through Rate PredictionCode1
TimeTuner: Diagnosing Time Representations for Time-Series Forecasting with Counterfactual ExplanationsCode1
Benchmarks and Custom Package for Energy ForecastingCode1
Feature Programming for Multivariate Time Series PredictionCode1
An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint ProgrammingCode1
Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature EngineeringCode1
Deep Dive into Hunting for LotLs Using Machine Learning and Feature Engineering.Code1
SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language ModelCode1
Bayesian Optimization of Catalysis With In-Context LearningCode1
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability DetectionCode1
DoE2Vec: Deep-learning Based Features for Exploratory Landscape AnalysisCode1
Show:102550
← PrevPage 1 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN14 gestures accuracy0.98Unverified