SOTAVerified

Feature Engineering

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Papers

Showing 2650 of 1706 papers

TitleStatusHype
The Remarkable Robustness of LLMs: Stages of Inference?Code1
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree ReasoningCode1
Network Analytics for Anti-Money Laundering -- A Systematic Literature Review and Experimental EvaluationCode1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking SequencesCode1
VCR-Graphormer: A Mini-batch Graph Transformer via Virtual ConnectionsCode1
Retrieve, Merge, Predict: Augmenting Tables with Data LakesCode1
SMUTF: Schema Matching Using Generative Tags and Hybrid FeaturesCode1
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect SegmentationCode1
Relational Deep Learning: Graph Representation Learning on Relational DatabasesCode1
netFound: Foundation Model for Network SecurityCode1
Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time seriesCode1
FASER: Binary Code Similarity Search through the use of Intermediate RepresentationsCode1
Fine-Tuning Self-Supervised Learning Models for End-to-End Pronunciation ScoringCode1
SimTeG: A Frustratingly Simple Approach Improves Textual Graph LearningCode1
Cognitive Evolutionary Search to Select Feature Interactions for Click-Through Rate PredictionCode1
TimeTuner: Diagnosing Time Representations for Time-Series Forecasting with Counterfactual ExplanationsCode1
Benchmarks and Custom Package for Energy ForecastingCode1
Feature Programming for Multivariate Time Series PredictionCode1
An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint ProgrammingCode1
Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature EngineeringCode1
Deep Dive into Hunting for LotLs Using Machine Learning and Feature Engineering.Code1
SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language ModelCode1
Bayesian Optimization of Catalysis With In-Context LearningCode1
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability DetectionCode1
DoE2Vec: Deep-learning Based Features for Exploratory Landscape AnalysisCode1
Show:102550
← PrevPage 2 of 69Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CNN14 gestures accuracy0.98Unverified