Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 1706 papers

Title	Date	Tasks	Status	Hype	Score
Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time series	Oct 19, 2023	DiversityFeature Engineering	CodeCode Available	1	5
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists	Oct 30, 2024	Feature Engineering	CodeCode Available	1	5
Fatigue Assessment using ECG and Actigraphy Sensors	Aug 6, 2020	Decision MakingDeep Learning	CodeCode Available	1	5
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems	Jul 14, 2020	Automated Essay ScoringCommon Sense Reasoning	CodeCode Available	1	5
CASPR: Customer Activity Sequence-based Prediction and Representation	Nov 16, 2022	Feature EngineeringPrediction	CodeCode Available	1	5
Cardea: An Open Automated Machine Learning Framework for Electronic Health Records	Oct 1, 2020	Automated Feature EngineeringAutoML	CodeCode Available	1	5
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning	Jun 12, 2024	Automated Feature EngineeringFeature Engineering	CodeCode Available	1	5
PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding	Jun 5, 2022	Feature EngineeringMulti-Task Learning	CodeCode Available	1	5
Generative Pre-Training from Molecules	Sep 16, 2021	Feature EngineeringGeneral Knowledge	CodeCode Available	1	5
Relational Deep Learning: Graph Representation Learning on Relational Databases	Dec 7, 2023	Deep LearningFeature Engineering	CodeCode Available	1	5
CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT	Apr 20, 2020	Feature Engineering	CodeCode Available	1	5
Replay and Synthetic Speech Detection with Res2net Architecture	Oct 28, 2020	Feature EngineeringSynthetic Speech Detection	CodeCode Available	1	5
Enabling Collaborative Data Science Development with the Ballet Framework	Dec 14, 2020	Feature Engineering	CodeCode Available	1	5
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect Segmentation	Dec 21, 2023	Edge DetectionFeature Engineering	CodeCode Available	1	5
End-to-end Deep Learning from Raw Sensor Data: Atrial Fibrillation Detection using Wearables	Jul 27, 2018	Atrial Fibrillation DetectionFeature Engineering	CodeCode Available	1	5
DiviK: Divisive intelligent K-Means for hands-free unsupervised clustering in big biological data	Sep 22, 2020	ClusteringFeature Engineering	CodeCode Available	1	5
DeltaPy: A Framework for Tabular Data Augmentation in Python	May 22, 2020	BIG-bench Machine LearningData Augmentation	CodeCode Available	1	5
Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor Factorization	Nov 28, 2021	Dimensionality ReductionFeature Engineering	CodeCode Available	1	5
DoE2Vec: Deep-learning Based Features for Exploratory Landscape Analysis	Mar 31, 2023	Deep LearningFeature Engineering	CodeCode Available	1	5
End-to-End Optimized Arrhythmia Detection Pipeline using Machine Learning for Ultra-Edge Devices	Nov 23, 2021	Arrhythmia DetectionAtrial Fibrillation Detection	CodeCode Available	1	5
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction	Mar 13, 2017	Click-Through Rate PredictionFeature Engineering	CodeCode Available	1	5
DIFER: Differentiable Automated Feature Engineering	Oct 17, 2020	Automated Feature EngineeringBIG-bench Machine Learning	CodeCode Available	1	5
Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees	May 18, 2019	Feature EngineeringFeature Importance	CodeCode Available	1	5
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection	Apr 1, 2023	Deep LearningFeature Engineering	CodeCode Available	1	5
Deep Dive into Hunting for LotLs Using Machine Learning and Feature Engineering.	Apr 21, 2023	Feature Engineering	CodeCode Available	1	5
DriveML: An R Package for Driverless Machine Learning	May 1, 2020	AutoMLBIG-bench Machine Learning	CodeCode Available	1	5
A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data	Jul 2, 2024	Feature EngineeringHyperparameter Optimization	CodeCode Available	1	5
Efficient End-to-End AutoML via Scalable Search Space Decomposition	Jun 19, 2022	AutoMLFeature Engineering	CodeCode Available	1	5
An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint Programming	Jun 9, 2023	Combinatorial OptimizationFeature Engineering	CodeCode Available	1	5
Context-Aware Deep Learning for Multi Modal Depression Detection	Dec 26, 2024	Data AugmentationDeep Learning	CodeCode Available	1	5
Deep & Cross Network for Ad Click Predictions	Aug 17, 2017	Click-Through Rate PredictionFeature Engineering	CodeCode Available	1	5
A Survey of Information Cascade Analysis: Models, Predictions, and Recent Advances	May 22, 2020	Feature EngineeringMarketing	CodeCode Available	1	5
CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching	Dec 1, 2020	Computer SecurityCross-Modal Retrieval	CodeCode Available	1	5
Attention-Based Deep Learning Framework for Human Activity Recognition with User Adaptation	Jun 6, 2020	Activity RecognitionDeep Learning	CodeCode Available	1	5
Fine-Tuning Self-Supervised Learning Models for End-to-End Pronunciation Scoring	Sep 19, 2023	Feature EngineeringPhone-level pronunciation scoring	CodeCode Available	1	5
fseval: A Benchmarking Framework for Feature Selection and Feature Ranking Algorithms	Nov 23, 2022	Automated Feature EngineeringBenchmarking	CodeCode Available	1	5
Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference	Dec 16, 2020	Feature EngineeringMedical Question Answering	CodeCode Available	1	5
AutoGL: A Library for Automated Graph Learning	Apr 11, 2021	AutoMLBIG-bench Machine Learning	CodeCode Available	1	5
Cognitive Evolutionary Search to Select Feature Interactions for Click-Through Rate Prediction	Aug 1, 2023	Click-Through Rate PredictionEvolutionary Algorithms	CodeCode Available	1	5
Understanding the Dynamics of DNNs Using Graph Modularity	Nov 24, 2021	Feature Engineering	CodeCode Available	1	5
AutoML: A Survey of the State-of-the-Art	Aug 2, 2019	AutoMLFeature Engineering	CodeCode Available	1	5
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?	Sep 26, 2019	Feature EngineeringQ-Learning	CodeCode Available	1	5
Itsy Bitsy SpiderNet: Fully Connected Residual Network for Fraud Detection	May 17, 2021	Feature EngineeringFraud Detection	CodeCode Available	1	5
A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch Literature	Nov 1, 2021	coreference-resolutionCoreference Resolution	CodeCode Available	1	5
Anomaly Detection for Solder Joints Using β-VAE	Apr 24, 2021	Anomaly DetectionFeature Engineering	CodeCode Available	1	5
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?	Dec 1, 2020	Feature EngineeringQ-Learning	CodeCode Available	1	5
Discovering Neural Wirings	Jun 3, 2019	Feature EngineeringNetwork Pruning	CodeCode Available	1	5
Bayesian Optimization of Catalysis With In-Context Learning	Apr 11, 2023	Bayesian OptimizationFeature Engineering	CodeCode Available	1	5
AutoSmart: An Efficient and Automatic Machine Learning framework for Temporal Relational Data	Sep 9, 2021	AutoMLBIG-bench Machine Learning	CodeCode Available	1	5
Compatible deep neural network framework with financial time series data, including data preprocessor, neural network model and trading strategy	May 11, 2022	Binary ClassificationFeature Engineering	CodeCode Available	1	5

Show:10 25 50

← PrevPage 2 of 35Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	CNN	14 gestures accuracy	0.98	—	Unverified