Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 1706 papers

Title	Date	Tasks	Status	Hype
DriveML: An R Package for Driverless Machine Learning	May 1, 2020	AutoMLBIG-bench Machine Learning	CodeCode Available	1
LML-DAP: Language Model Learning a Dataset for Data-Augmented Prediction	Sep 27, 2024	ClassificationFeature Engineering	CodeCode Available	1
fseval: A Benchmarking Framework for Feature Selection and Feature Ranking Algorithms	Nov 23, 2022	Automated Feature EngineeringBenchmarking	CodeCode Available	1
Mill.jl and JsonGrinder.jl: automated differentiable feature extraction for learning from raw JSON data	May 19, 2021	BIG-bench Machine LearningFeature Engineering	CodeCode Available	1
Network Analytics for Anti-Money Laundering -- A Systematic Literature Review and Experimental Evaluation	May 29, 2024	Feature EngineeringFraud Detection	CodeCode Available	1
Cognitive Evolutionary Search to Select Feature Interactions for Click-Through Rate Prediction	Aug 1, 2023	Click-Through Rate PredictionEvolutionary Algorithms	CodeCode Available	1
Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference	Dec 16, 2020	Feature EngineeringMedical Question Answering	CodeCode Available	1
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning	Jun 12, 2024	Automated Feature EngineeringFeature Engineering	CodeCode Available	1
PTRAIL -- A python package for parallel trajectory data preprocessing	Aug 26, 2021	Feature EngineeringPosition	CodeCode Available	1
Pushing the boundaries of molecular property prediction for drug discovery with multitask learning BERT enhanced by SMILES enumeration	Dec 15, 2022	Drug DiscoveryFeature Engineering	CodeCode Available	1
Compatible deep neural network framework with financial time series data, including data preprocessor, neural network model and trading strategy	May 11, 2022	Binary ClassificationFeature Engineering	CodeCode Available	1
Replay and Synthetic Speech Detection with Res2net Architecture	Oct 28, 2020	Feature EngineeringSynthetic Speech Detection	CodeCode Available	1
Cardea: An Open Automated Machine Learning Framework for Electronic Health Records	Oct 1, 2020	Automated Feature EngineeringAutoML	CodeCode Available	1
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems	Jul 14, 2020	Automated Essay ScoringCommon Sense Reasoning	CodeCode Available	1
CASPR: Customer Activity Sequence-based Prediction and Representation	Nov 16, 2022	Feature EngineeringPrediction	CodeCode Available	1
Context-Aware Deep Learning for Multi Modal Depression Detection	Dec 26, 2024	Data AugmentationDeep Learning	CodeCode Available	1
Bayesian Optimization of Catalysis With In-Context Learning	Apr 11, 2023	Bayesian OptimizationFeature Engineering	CodeCode Available	1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences	May 28, 2024	BenchmarkingFeature Engineering	CodeCode Available	1
AutoML: A Survey of the State-of-the-Art	Aug 2, 2019	AutoMLFeature Engineering	CodeCode Available	1
Automated Website Fingerprinting through Deep Learning	Aug 21, 2017	Deep LearningFeature Engineering	CodeCode Available	1
Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time series	Oct 19, 2023	DiversityFeature Engineering	CodeCode Available	1
BP-Net: Efficient Deep Learning for Continuous Arterial Blood Pressure Estimation using Photoplethysmogram	Nov 29, 2021	Blood pressure estimationFeature Engineering	CodeCode Available	1
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists	Oct 30, 2024	Feature Engineering	CodeCode Available	1
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?	Dec 1, 2020	Feature EngineeringQ-Learning	CodeCode Available	1
CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT	Apr 20, 2020	Feature Engineering	CodeCode Available	1
Classification of Periodic Variable Stars with Novel Cyclic-Permutation Invariant Neural Networks	Nov 2, 2020	AstronomyFeature Engineering	CodeCode Available	1
A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data	Jul 2, 2024	Feature EngineeringHyperparameter Optimization	CodeCode Available	1
CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching	Dec 1, 2020	Computer SecurityCross-Modal Retrieval	CodeCode Available	1
Classification of Raw MEG/EEG Data with Detach-Rocket Ensemble: An Improved ROCKET Algorithm for Multivariate Time Series Analysis	Aug 5, 2024	ClassificationEEG	CodeCode Available	1
Deep Dive into Hunting for LotLs Using Machine Learning and Feature Engineering.	Apr 21, 2023	Feature Engineering	CodeCode Available	1
AutoSmart: An Efficient and Automatic Machine Learning framework for Temporal Relational Data	Sep 9, 2021	AutoMLBIG-bench Machine Learning	CodeCode Available	1
Benchmarks and Custom Package for Energy Forecasting	Jul 14, 2023	Feature EngineeringLoad Forecasting	CodeCode Available	1
Attention-Based Deep Learning Framework for Human Activity Recognition with User Adaptation	Jun 6, 2020	Activity RecognitionDeep Learning	CodeCode Available	1
Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor Factorization	Nov 28, 2021	Dimensionality ReductionFeature Engineering	CodeCode Available	1
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection	Apr 1, 2023	Deep LearningFeature Engineering	CodeCode Available	1
DiviK: Divisive intelligent K-Means for hands-free unsupervised clustering in big biological data	Sep 22, 2020	ClusteringFeature Engineering	CodeCode Available	1
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect Segmentation	Dec 21, 2023	Edge DetectionFeature Engineering	CodeCode Available	1
Efficient End-to-End AutoML via Scalable Search Space Decomposition	Jun 19, 2022	AutoMLFeature Engineering	CodeCode Available	1
End-to-end Deep Learning from Raw Sensor Data: Atrial Fibrillation Detection using Wearables	Jul 27, 2018	Atrial Fibrillation DetectionFeature Engineering	CodeCode Available	1
End-to-End Optimized Arrhythmia Detection Pipeline using Machine Learning for Ultra-Edge Devices	Nov 23, 2021	Arrhythmia DetectionAtrial Fibrillation Detection	CodeCode Available	1
A Survey of Information Cascade Analysis: Models, Predictions, and Recent Advances	May 22, 2020	Feature EngineeringMarketing	CodeCode Available	1
AutoGL: A Library for Automated Graph Learning	Apr 11, 2021	AutoMLBIG-bench Machine Learning	CodeCode Available	1
An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint Programming	Jun 9, 2023	Combinatorial OptimizationFeature Engineering	CodeCode Available	1
A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch Literature	Nov 1, 2021	coreference-resolutionCoreference Resolution	CodeCode Available	1
General-Purpose User Embeddings based on Mobile App Usage	May 27, 2020	Feature Engineering	CodeCode Available	1
Generative Pre-Training from Molecules	Sep 16, 2021	Feature EngineeringGeneral Knowledge	CodeCode Available	1
Understanding the Dynamics of DNNs Using Graph Modularity	Nov 24, 2021	Feature Engineering	CodeCode Available	1
Discovering Neural Wirings	Jun 3, 2019	Feature EngineeringNetwork Pruning	CodeCode Available	1
Interpreting Machine Learning Models for Room Temperature Prediction in Non-domestic Buildings	Nov 23, 2021	BIG-bench Machine LearningDecision Making	CodeCode Available	1
Anomaly Detection for Solder Joints Using β-VAE	Apr 24, 2021	Anomaly DetectionFeature Engineering	CodeCode Available	1

Show:10 25 50

← PrevPage 2 of 35Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	CNN	14 gestures accuracy	0.98	—	Unverified