SOTAVerified

Benchmarking

Papers

Showing 721730 of 5548 papers

TitleStatusHype
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn InteractionsCode1
Quantitative Certification of Bias in Large Language ModelsCode1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking SequencesCode1
DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment RegimeCode1
GCondenser: Benchmarking Graph CondensationCode1
Analog or Digital In-memory Computing? Benchmarking through Quantitative ModelingCode1
Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture BreedingCode1
DocuMint: Docstring Generation for Python using Small Language ModelsCode1
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure InterpretationCode1
Benchmarking Classical and Learning-Based Multibeam Point Cloud RegistrationCode1
Show:102550
← PrevPage 73 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified