SOTAVerified

Benchmarking

Papers

Showing 26012650 of 5548 papers

TitleStatusHype
SoK: Systematization and Benchmarking of Deepfake Detectors in a Unified Framework0
Benchmark Analysis of Various Pre-trained Deep Learning Models on ASSIRA Cats and Dogs Dataset0
MST: Adaptive Multi-Scale Tokens Guided Interactive SegmentationCode0
TransportationGames: Benchmarking Transportation Knowledge of (Multimodal) Large Language Models0
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning0
Attention versus Contrastive Learning of Tabular Data -- A Data-centric Benchmarking0
Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural NetworksCode0
Segment Anything Model for Medical Image Segmentation: Current Applications and Future DirectionsCode5
NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open Worlds0
CAVIAR: Co-simulation of 6G Communications, 3D Scenarios and AI for Digital TwinsCode1
Using Multi-Temporal Sentinel-1 and Sentinel-2 data for water bodies mapping0
German Text Embedding Clustering BenchmarkCode1
Benchmarking PathCLIP for Pathology Image Analysis0
Enhancing 3D-Air Signature by Pen Tip Tail Trajectory Awareness: Dataset and Featuring by Novel Spatio-temporal CNNCode0
Nodule detection and generation on chest X-rays: NODE21 Challenge0
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets0
Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos0
Hyperbolic Anomaly Detection0
AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One0
FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning0
A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified BenchmarkCode2
Sheared Backpropagation for Fine-tuning Foundation Models0
FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures0
SEED-Bench: Benchmarking Multimodal Large Language ModelsCode3
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language ModelsCode1
Temporal Validity Change Prediction0
Benchmarking Large Language Models on Controllable Generation under Diversified InstructionsCode1
Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models0
Benchmarking Hebbian learning rules for associative memory0
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRACode1
TSPP: A Unified Benchmarking Tool for Time-series ForecastingCode0
FALCON: Feature-Label Constrained Graph Net Collapse for Memory Efficient GNNsCode0
Knowledge Enhanced Conditional Imputation for Healthcare Time-seriesCode0
Combining SNNs with Filtering for Efficient Neural Decoding in Implantable Brain-Machine Interfaces0
RDF-star2Vec: RDF-star Graph Embeddings for Data MiningCode0
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and BeyondCode1
Data needs and challenges for quantum dot devices automation0
Benchmarking Evolutionary Community Detection Algorithms in Dynamic Networks0
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language ModelsCode1
Incorporating Human Flexibility through Reward Preferences in Human-AI Teaming0
ARBiBench: Benchmarking Adversarial Robustness of Binarized Neural Networks0
RetailSynth: Synthetic Data Generation for Retail AI Systems EvaluationCode1
AN ELIXIR FOR BLOCKCHAIN SCALABILITY WITH CHANNEL BASED CLUSTERED SHARDING0
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation0
Review and experimental benchmarking of machine learning algorithms for efficient optimization of cold atom experiments0
Comparing Machine Learning Algorithms by Union-Free Generic DepthCode0
Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest0
Perception Test 2023: A Summary of the First Challenge And Outcome0
FiFAR: A Fraud Detection Dataset for Learning to DeferCode1
Scaling Compute Is Not All You Need for Adversarial RobustnessCode0
Show:102550
← PrevPage 53 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified