SOTAVerified

Benchmarking

Papers

Showing 551575 of 5548 papers

TitleStatusHype
Benchmarking Image Retrieval for Visual LocalizationCode1
A Benchmarking Study of Kolmogorov-Arnold Networks on Tabular DataCode1
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
Curious Hierarchical Actor-Critic Reinforcement LearningCode1
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge GraphsCode1
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures TranslationCode1
Benchmarking Graph Neural Networks for FMRI analysisCode1
A Dataset for Answering Time-Sensitive QuestionsCode1
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and SolutionsCode1
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language ModelsCode1
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasksCode1
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089Code1
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAMCode1
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health CounselingCode1
COVID-19 event extraction from Twitter via extractive question answering with continuous promptsCode1
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?Code1
Contemporary Symbolic Regression Methods and their Relative PerformanceCode1
Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban IntersectionCode1
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable SummarizationCode1
ConsumerBench: Benchmarking Generative AI Applications on End-User DevicesCode1
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning AlgorithmsCode1
Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERTCode1
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional DependenciesCode1
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity QuantificationCode1
Show:102550
← PrevPage 23 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified