SOTAVerified

Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Showing 51100 of 1706 papers

TitleStatusHype
DriveML: An R Package for Driverless Machine LearningCode1
LML-DAP: Language Model Learning a Dataset for Data-Augmented PredictionCode1
fseval: A Benchmarking Framework for Feature Selection and Feature Ranking AlgorithmsCode1
Mill.jl and JsonGrinder.jl: automated differentiable feature extraction for learning from raw JSON dataCode1
Network Analytics for Anti-Money Laundering -- A Systematic Literature Review and Experimental EvaluationCode1
Cognitive Evolutionary Search to Select Feature Interactions for Click-Through Rate PredictionCode1
Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global InferenceCode1
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree ReasoningCode1
PTRAIL -- A python package for parallel trajectory data preprocessingCode1
Pushing the boundaries of molecular property prediction for drug discovery with multitask learning BERT enhanced by SMILES enumerationCode1
Compatible deep neural network framework with financial time series data, including data preprocessor, neural network model and trading strategyCode1
Replay and Synthetic Speech Detection with Res2net ArchitectureCode1
Cardea: An Open Automated Machine Learning Framework for Electronic Health RecordsCode1
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring SystemsCode1
CASPR: Customer Activity Sequence-based Prediction and RepresentationCode1
Context-Aware Deep Learning for Multi Modal Depression DetectionCode1
Bayesian Optimization of Catalysis With In-Context LearningCode1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking SequencesCode1
AutoML: A Survey of the State-of-the-ArtCode1
Automated Website Fingerprinting through Deep LearningCode1
Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time seriesCode1
BP-Net: Efficient Deep Learning for Continuous Arterial Blood Pressure Estimation using PhotoplethysmogramCode1
Can Models Help Us Create Better Models? Evaluating LLMs as Data ScientistsCode1
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?Code1
CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERTCode1
Classification of Periodic Variable Stars with Novel Cyclic-Permutation Invariant Neural NetworksCode1
A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular DataCode1
CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code MatchingCode1
Classification of Raw MEG/EEG Data with Detach-Rocket Ensemble: An Improved ROCKET Algorithm for Multivariate Time Series AnalysisCode1
Deep Dive into Hunting for LotLs Using Machine Learning and Feature Engineering.Code1
AutoSmart: An Efficient and Automatic Machine Learning framework for Temporal Relational DataCode1
Benchmarks and Custom Package for Energy ForecastingCode1
Attention-Based Deep Learning Framework for Human Activity Recognition with User AdaptationCode1
Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor FactorizationCode1
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability DetectionCode1
DiviK: Divisive intelligent K-Means for hands-free unsupervised clustering in big biological dataCode1
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect SegmentationCode1
Efficient End-to-End AutoML via Scalable Search Space DecompositionCode1
End-to-end Deep Learning from Raw Sensor Data: Atrial Fibrillation Detection using WearablesCode1
End-to-End Optimized Arrhythmia Detection Pipeline using Machine Learning for Ultra-Edge DevicesCode1
A Survey of Information Cascade Analysis: Models, Predictions, and Recent AdvancesCode1
AutoGL: A Library for Automated Graph LearningCode1
An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint ProgrammingCode1
A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch LiteratureCode1
General-Purpose User Embeddings based on Mobile App UsageCode1
Generative Pre-Training from MoleculesCode1
Understanding the Dynamics of DNNs Using Graph ModularityCode1
Discovering Neural WiringsCode1
Interpreting Machine Learning Models for Room Temperature Prediction in Non-domestic BuildingsCode1
Anomaly Detection for Solder Joints Using β-VAECode1
Show:102550
← PrevPage 2 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN14 gestures accuracy0.98Unverified