SOTAVerified

Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Showing 51100 of 1706 papers

TitleStatusHype
Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time seriesCode1
Can Models Help Us Create Better Models? Evaluating LLMs as Data ScientistsCode1
Fatigue Assessment using ECG and Actigraphy SensorsCode1
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring SystemsCode1
CASPR: Customer Activity Sequence-based Prediction and RepresentationCode1
Cardea: An Open Automated Machine Learning Framework for Electronic Health RecordsCode1
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree ReasoningCode1
PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence UnderstandingCode1
Generative Pre-Training from MoleculesCode1
Relational Deep Learning: Graph Representation Learning on Relational DatabasesCode1
CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERTCode1
Replay and Synthetic Speech Detection with Res2net ArchitectureCode1
Enabling Collaborative Data Science Development with the Ballet FrameworkCode1
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect SegmentationCode1
End-to-end Deep Learning from Raw Sensor Data: Atrial Fibrillation Detection using WearablesCode1
DiviK: Divisive intelligent K-Means for hands-free unsupervised clustering in big biological dataCode1
DeltaPy: A Framework for Tabular Data Augmentation in PythonCode1
Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor FactorizationCode1
DoE2Vec: Deep-learning Based Features for Exploratory Landscape AnalysisCode1
End-to-End Optimized Arrhythmia Detection Pipeline using Machine Learning for Ultra-Edge DevicesCode1
DeepFM: A Factorization-Machine based Neural Network for CTR PredictionCode1
DIFER: Differentiable Automated Feature EngineeringCode1
Disentangled Attribution Curves for Interpreting Random Forests and Boosted TreesCode1
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability DetectionCode1
Deep Dive into Hunting for LotLs Using Machine Learning and Feature Engineering.Code1
DriveML: An R Package for Driverless Machine LearningCode1
A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular DataCode1
Efficient End-to-End AutoML via Scalable Search Space DecompositionCode1
An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint ProgrammingCode1
Context-Aware Deep Learning for Multi Modal Depression DetectionCode1
Deep & Cross Network for Ad Click PredictionsCode1
A Survey of Information Cascade Analysis: Models, Predictions, and Recent AdvancesCode1
CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code MatchingCode1
Attention-Based Deep Learning Framework for Human Activity Recognition with User AdaptationCode1
Fine-Tuning Self-Supervised Learning Models for End-to-End Pronunciation ScoringCode1
fseval: A Benchmarking Framework for Feature Selection and Feature Ranking AlgorithmsCode1
Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global InferenceCode1
AutoGL: A Library for Automated Graph LearningCode1
Cognitive Evolutionary Search to Select Feature Interactions for Click-Through Rate PredictionCode1
Understanding the Dynamics of DNNs Using Graph ModularityCode1
AutoML: A Survey of the State-of-the-ArtCode1
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?Code1
Itsy Bitsy SpiderNet: Fully Connected Residual Network for Fraud DetectionCode1
A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch LiteratureCode1
Anomaly Detection for Solder Joints Using β-VAECode1
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?Code1
Discovering Neural WiringsCode1
Bayesian Optimization of Catalysis With In-Context LearningCode1
AutoSmart: An Efficient and Automatic Machine Learning framework for Temporal Relational DataCode1
Compatible deep neural network framework with financial time series data, including data preprocessor, neural network model and trading strategyCode1
Show:102550
← PrevPage 2 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN14 gestures accuracy0.98Unverified