SOTAVerified

Data Integration

Data integration (also called information integration) is the process of consolidating data from a set of heterogeneous data sources into a single uniform data set (materialized integration) or view on the data (virtual integration). Data integration pipelines involve subtasks such as schema matching, table annotation, entity resolution, value normalization, data cleansing, and data fusion. Application domains of data integration include data warehousing, data lakes, and knowledge base consolidation. Surveys on Data integration:

Papers

Showing 150 of 431 papers

TitleStatusHype
EasySpider: A No-Code Visual System for Crawling the WebCode7
TableGPT2: A Large Multimodal Model with Tabular Data IntegrationCode4
Exploring the Feasibility of Automated Data Standardization using Large Language Models for Seamless PositioningCode3
Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language ModelsCode3
Intervention-Aware Forecasting: Breaking Historical Limits from a System PerspectiveCode3
An RML-FNML module for Python user-defined functions in Morph-KGCCode3
Declarative generation of RDF-star graphs from heterogeneous dataCode3
Adaptive Multi-Scale Decomposition Framework for Time Series ForecastingCode2
CARTE: Pretraining and Transfer for Tabular LearningCode2
Integrate Any Omics: Towards genome-wide data integration for patient stratificationCode2
Boosting Knowledge Graph Generation from Tabular Data with RML ViewsCode2
Morph-KGC: Scalable knowledge graph materialization with mapping partitionsCode2
Graph Neural Networks for Multimodal Single-Cell Data IntegrationCode2
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell DataCode1
KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data LakesCode1
FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data ClassificationCode1
RecKG: Knowledge Graph for Recommender SystemsCode1
Column Property Annotation using Large Language ModelsCode1
Towards Unified Molecule-Enhanced Pathology Image Representation Learning via Integrating Spatial TranscriptomicsCode1
Fine-tuning Large Language Models for Entity MatchingCode1
AnyMatch -- Efficient Zero-Shot Entity Matching with a Small Language ModelCode1
Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative DrivingCode1
MSMA: Multi-agent Trajectory Prediction in Connected and Autonomous Vehicle Environment with Multi-source Data IntegrationCode1
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable ObjectivesCode1
WeatherQA: Can Multimodal Language Models Reason about Severe Weather?Code1
iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric CancerCode1
eipy: An Open-Source Python Package for Multi-modal Data Integration using Heterogeneous EnsemblesCode1
Cost-Effective In-Context Learning for Entity Resolution: A Design Space ExplorationCode1
Transformer-based Entity Legal Form ClassificationCode1
Entity Matching using Large Language ModelsCode1
Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge GraphsCode1
MapperGPT: Large Language Models for Linking and Mapping EntitiesCode1
Towards Lightweight Data Integration using Multi-workflow Provenance and Data ObservabilityCode1
Is your data alignable? Principled and interpretable alignability testing and integration of single-cell dataCode1
Column Type Annotation using ChatGPTCode1
Using ChatGPT for Entity MatchingCode1
Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data IntegrationCode1
Web-Scale Academic Name Disambiguation: the WhoIsWho Benchmark, Leaderboard, and ToolkitCode1
Unsupervised Entity Alignment for Temporal Knowledge GraphsCode1
WDC Products: A Multi-Dimensional Entity Matching BenchmarkCode1
Integrating Multimodal Data for Joint Generative Modeling of Complex DynamicsCode1
Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and PreparationCode1
Domain Adaptation for Deep Entity Resolution: A Design Space ExplorationCode1
Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical TextCode1
Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in One Unified FormatCode1
Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and FairnessCode1
Dual-Objective Fine-Tuning of BERT for Entity MatchingCode1
A Variational Information Bottleneck Approach to Multi-Omics Data IntegrationCode1
COMO: A Pipeline for Multi-Omics Data Integration in Metabolic Modeling and Drug DiscoveryCode1
GripNet: Graph Information Propagation on Supergraph for Heterogeneous GraphsCode1
Show:102550
← PrevPage 1 of 9Next →

No leaderboard results yet.