SOTAVerified

Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Showing 2650 of 1706 papers

TitleStatusHype
DeltaPy: A Framework for Tabular Data Augmentation in PythonCode1
Disentangled Attribution Curves for Interpreting Random Forests and Boosted TreesCode1
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability DetectionCode1
DoE2Vec: Deep-learning Based Features for Exploratory Landscape AnalysisCode1
Deep Dive into Hunting for LotLs Using Machine Learning and Feature Engineering.Code1
Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor FactorizationCode1
Efficient End-to-End AutoML via Scalable Search Space DecompositionCode1
Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global InferenceCode1
CASPR: Customer Activity Sequence-based Prediction and RepresentationCode1
CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code MatchingCode1
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring SystemsCode1
Benchmarks and Custom Package for Energy ForecastingCode1
Classification of Raw MEG/EEG Data with Detach-Rocket Ensemble: An Improved ROCKET Algorithm for Multivariate Time Series AnalysisCode1
Binary Black-box Evasion Attacks Against Deep Learning-based Static Malware Detectors with Adversarial Byte-Level Language ModelCode1
Can Models Help Us Create Better Models? Evaluating LLMs as Data ScientistsCode1
Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time seriesCode1
BP-Net: Efficient Deep Learning for Continuous Arterial Blood Pressure Estimation using PhotoplethysmogramCode1
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?Code1
Cardea: An Open Automated Machine Learning Framework for Electronic Health RecordsCode1
CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERTCode1
Classification of Periodic Variable Stars with Novel Cyclic-Permutation Invariant Neural NetworksCode1
Cognitive Evolutionary Search to Select Feature Interactions for Click-Through Rate PredictionCode1
Deep & Cross Network for Ad Click PredictionsCode1
A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular DataCode1
AutoSmart: An Efficient and Automatic Machine Learning framework for Temporal Relational DataCode1
Show:102550
← PrevPage 2 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN14 gestures accuracy0.98Unverified