SOTAVerified

Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Showing 51100 of 1706 papers

TitleStatusHype
Online learning techniques for prediction of temporal tabular datasets with regime changesCode1
Pushing the boundaries of molecular property prediction for drug discovery with multitask learning BERT enhanced by SMILES enumerationCode1
fseval: A Benchmarking Framework for Feature Selection and Feature Ranking AlgorithmsCode1
CASPR: Customer Activity Sequence-based Prediction and RepresentationCode1
GenHPF: General Healthcare Predictive Framework with Multi-task Multi-source LearningCode1
Efficient End-to-End AutoML via Scalable Search Space DecompositionCode1
PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence UnderstandingCode1
Compatible deep neural network framework with financial time series data, including data preprocessor, neural network model and trading strategyCode1
Zero-shot hashtag segmentation for multilingual sentiment analysisCode1
BP-Net: Efficient Deep Learning for Continuous Arterial Blood Pressure Estimation using PhotoplethysmogramCode1
Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor FactorizationCode1
Understanding the Dynamics of DNNs Using Graph ModularityCode1
End-to-End Optimized Arrhythmia Detection Pipeline using Machine Learning for Ultra-Edge DevicesCode1
Interpreting Machine Learning Models for Room Temperature Prediction in Non-domestic BuildingsCode1
A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch LiteratureCode1
OMASGAN: Out-of-Distribution Minimum Anomaly Score GAN for Sample Generation on the BoundaryCode1
Structural Characterization for Dialogue DisentanglementCode1
Synerise at RecSys 2021: Twitter user engagement prediction with a fast neural modelCode1
Generative Pre-Training from MoleculesCode1
AutoSmart: An Efficient and Automatic Machine Learning framework for Temporal Relational DataCode1
Sequence-to-Sequence Learning with Latent Neural GrammarsCode1
PTRAIL -- A python package for parallel trajectory data preprocessingCode1
Graph Contrastive Learning for Anomaly DetectionCode1
Establishing process-structure linkages using Generative Adversarial NetworksCode1
VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space DecompositionCode1
Short-term Renewable Energy Forecasting in Greece using Prophet Decomposition and Tree-based EnsemblesCode1
Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep LearningCode1
Predicting crop yields with little ground truth: A simple statistical model for in-season forecastingCode1
Mill.jl and JsonGrinder.jl: automated differentiable feature extraction for learning from raw JSON dataCode1
Itsy Bitsy SpiderNet: Fully Connected Residual Network for Fraud DetectionCode1
Anomaly Detection for Solder Joints Using β-VAECode1
XCrossNet: Feature Structure-Oriented Learning for Click-Through Rate PredictionCode1
AutoGL: A Library for Automated Graph LearningCode1
Self-supervised learning for tool wear monitoring with a disentangled-variational-autoencoderCode1
Memory-based Deep Reinforcement Learning for POMDPsCode1
Symbolic regression for scientific discovery: an application to wind speed forecastingCode1
MalNet: A Large-Scale Image Database of Malicious SoftwareCode1
Knowledge-Preserving Incremental Social Event Detection via Heterogeneous GNNsCode1
The Challenges of Persian User-generated Textual Content: A Machine Learning-Based ApproachCode1
Summaformers @ LaySumm 20, LongSumm 20Code1
Simplified DOM Trees for Transferable Attribute Extraction from the WebCode1
Statistical learning for accurate and interpretable battery lifetime predictionCode1
Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global InferenceCode1
Enabling Collaborative Data Science Development with the Ballet FrameworkCode1
Binary Black-box Evasion Attacks Against Deep Learning-based Static Malware Detectors with Adversarial Byte-Level Language ModelCode1
Yelp Review Rating Prediction: Machine Learning and Deep Learning ModelsCode1
CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code MatchingCode1
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?Code1
Short-Term Load Forecasting using Bi-directional Sequential Models and Feature Engineering for Small DatasetsCode1
Classification of Periodic Variable Stars with Novel Cyclic-Permutation Invariant Neural NetworksCode1
Show:102550
← PrevPage 2 of 35Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN14 gestures accuracy0.98Unverified