SOTAVerified

Benchmarking

Papers

Showing 27012750 of 5548 papers

TitleStatusHype
Benchmarking Multi-Domain Active Learning on Image Classification0
Benchmarking and Enhancing Disentanglement in Concept-Residual Models0
LucidDreaming: Controllable Object-Centric 3D Generation0
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval0
Event-based Continuous Color Video Decompression from Single Frames0
Enhancing Ligand Pose Sampling for Molecular DockingCode1
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy EvaluationCode1
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning AlgorithmsCode1
Z_2 Z_2 Equivariant Quantum Neural Networks: Benchmarking against Classical Neural NetworksCode0
Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction0
AlignBench: Benchmarking Chinese Alignment of Large Language ModelsCode2
TaskBench: Benchmarking Large Language Models for Task AutomationCode6
TransOpt: Transformer-based Representation Learning for Optimization Problem Classification0
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices0
ROBBIE: Robust Bias Evaluation of Large Generative Language Models0
Biomedical knowledge graph-optimized prompt generation for large language modelsCode2
SAIBench: A Structural Interpretation of AI for Science Through Benchmarks0
Enhancing Post-Hoc Explanation Benchmark Reliability for Image Classification0
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMsCode1
SEED-Bench-2: Benchmarking Multimodal Large Language ModelsCode2
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers0
PAWS-VMK: A Unified Approach To Semi-Supervised Learning And Out-of-Distribution Detection0
Riemannian Self-Attention Mechanism for SPD Networks0
FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections0
Comprehensive Benchmarking of Entropy and Margin Based Scoring Metrics for Data Selection0
Experimental Analysis of Large-scale Learnable Vector Storage CompressionCode0
Lightly Weighted Automatic Audio Parameter Extraction for the Quality Assessment of Consensus Auditory-Perceptual Evaluation of Voice0
Syn3DWound: A Synthetic Dataset for 3D Wound Bed Analysis0
Benchmarking Large Language Model Volatility0
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained GenerationCode1
ASI: Accuracy-Stability Index for Evaluating Deep Learning Models0
An Empirical Investigation into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification0
Benchmarking Robustness of Text-Image Composed RetrievalCode1
Large Language Models as Automated Aligners for benchmarking Vision-Language Models0
Dialogue Quality and Emotion Annotations for Customer Support ConversationsCode0
Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSICode0
Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN)0
Learning Dynamic Selection and Pricing of Out-of-Home DeliveriesCode0
Benchmarking Toxic Molecule Classification using Graph Neural Networks and Few Shot Learning0
PG-Video-LLaVA: Pixel Grounding Large Video-Language ModelsCode2
A projected nonlinear state-space model for forecasting time series signalsCode0
Deep State-Space Model for Predicting Cryptocurrency Price0
IMGTB: A Framework for Machine-Generated Text Detection BenchmarkingCode1
Benchmarking bias: Expanding clinical AI model card to incorporate bias reporting of social and non-social factors0
BEND: Benchmarking DNA Language Models on biologically meaningful tasksCode1
Towards a more inductive world for drug repurposing approachesCode1
Demonstrating Almost Linear Time Complexity of Bus Admittance Matrix-Based Distribution Network Power Flow: An Empirical Approach0
LogLead -- Fast and Integrated Log Loader, Enhancer, and Anomaly DetectorCode1
Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning0
Segment Together: A Versatile Paradigm for Semi-Supervised Medical Image Segmentation0
Show:102550
← PrevPage 55 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified