SOTAVerified

Benchmarking

Papers

Showing 32013250 of 5548 papers

TitleStatusHype
Environment-aware UAV Communications: CKM Construction and Predictive Beamforming0
Neural Network Approach for Non-Markovian Dissipative Dynamics of Many-Body Open Quantum Systems0
Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions0
Benchmarking changepoint detection algorithms on cardiac time series0
Iterated Invariant Extended Kalman Filter (IterIEKF)0
White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs0
Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset0
Neuromorphic Vision-based Motion Segmentation with Graph Transformer Neural Network0
MMInA: Benchmarking Multihop Multimodal Internet Agents0
A Universal Protocol to Benchmark Camera Calibration for Sports0
AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptidesCode0
LLM Evaluators Recognize and Favor Their Own Generations0
Feature selection in linear SVMs via a hard cardinality constraint: a scalable SDP decomposition approach0
A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic CountingCode0
A Large-Scale Evaluation of Speech Foundation Models0
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language RepresentationCode0
Practical Guidelines for Cell Segmentation Models Under Optical Aberrations in Microscopy0
Exploring the Decentraland Economy: Multifaceted Parcel Attributes, Key Insights, and Benchmarking0
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models0
Certifying almost all quantum states with few single-qubit measurements0
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMsCode0
WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs0
From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution0
Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NASCode0
MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering0
Towards Objectively Benchmarking Social Intelligence for Language Agents at Action LevelCode0
HOEG: A New Approach for Object-Centric Predictive Process MonitoringCode0
EFSA: Towards Event-Level Financial Sentiment AnalysisCode0
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language ModelsCode0
A Comparison of Cryptocurrency Volatility-benchmarking New and Mature Asset Classes0
Multicalibration for Confidence Scoring in LLMs0
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition DynamicsCode0
SDFR: Synthetic Data for Face Recognition Competition0
Enhancing Video Summarization with Context AwarenessCode0
GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System0
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)Code0
Dynamic Risk Assessment Methodology with an LDM-based System for Parking Scenarios0
Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text GenerationCode0
Benchmarking ChatGPT on Algorithmic ReasoningCode0
Schroedinger's Threshold: When the AUC doesn't predict AccuracyCode0
Benchmarking Parameter Control Methods in Differential Evolution for Mixed-Integer Black-Box OptimizationCode0
DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior0
A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-offCode0
NL2KQL: From Natural Language to Kusto Query0
PATCH! Psychometrics-AssisTed BenCHmarking of Large Language Models against Human Populations: A Case Study of Proficiency in 8th Grade MathematicsCode0
On the reduction of Linear Parameter-Varying State-Space models0
Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach0
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations0
Diffusion-Driven Domain Adaptation for Generating 3D Molecules0
SpiralMLP: A Lightweight Vision MLP Architecture0
Show:102550
← PrevPage 65 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified