SOTAVerified

Benchmarking

Papers

Showing 601625 of 5548 papers

TitleStatusHype
Trade-offs in Privacy-Preserving Eye Tracking through Iris Obfuscation: A Benchmarking StudyCode0
LEMUR Neural Network Dataset: Towards Seamless AutoMLCode1
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding0
SortBench: Benchmarking LLMs based on their ability to sort lists0
TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning0
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMsCode1
TorchFX: A modern approach to Audio DSP with PyTorch and GPU accelerationCode2
Adaptive Shrinkage Estimation For Personalized Deep Kernel Regression In Modeling Brain TrajectoriesCode0
Benchmarking Suite for Synthetic Aperture Radar Imagery Anomaly Detection (SARIAD) AlgorithmsCode0
Geological Inference from Textual Data using Word EmbeddingsCode0
NorEval: A Norwegian Language Understanding and Generation Evaluation BenchmarkCode0
Benchmarking Multi-Organ Segmentation Tools for Multi-Parametric T1-weighted Abdominal MRI0
SydneyScapes: Image Segmentation for Australian Environments0
Benchmarking Image Embeddings for E-Commerce: Evaluating Off-the Shelf Foundation Models, Fine-Tuning Strategies and Practical Trade-offs0
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-JudgeCode0
Benchmarking Multimodal CoT Reward Model Stepwise by Visual ProgramCode0
TabKAN: Advancing Tabular Data Analysis using Kolmogorov-Arnold Network0
A Roadmap for Improving Data Reliability and Sharing in Crosslinking Mass Spectrometry0
Evolutionary Generation of Random Surreal Numbers for BenchmarkingCode1
RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration0
Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis0
Benchmarking Convolutional Neural Network and Graph Neural Network based Surrogate Models on a Real-World Car External Aerodynamics Dataset0
V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language ModelsCode1
An Empirical Study of GPT-4o Image Generation CapabilitiesCode1
Towards Visual Text Grounding of Multimodal Large Language Model0
Show:102550
← PrevPage 25 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified