SOTAVerified

Benchmarking

Papers

Showing 561570 of 5548 papers

TitleStatusHype
Multi-Behavior Recommendation with Personalized Directed Acyclic Behavior GraphsCode1
Does your model understand genes? A benchmark of gene properties for biological and text modelsCode1
Grounding Descriptions in Images informs Zero-Shot Visual RecognitionCode1
Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs"Code1
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OasisCode1
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learningCode1
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language ModelsCode1
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMMCode1
VidHal: Benchmarking Temporal Hallucinations in Vision LLMsCode1
Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and TasksCode1
Show:102550
← PrevPage 57 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified