SOTAVerified

Benchmarking

Papers

Showing 26262650 of 5548 papers

TitleStatusHype
Temporal Validity Change Prediction0
Benchmarking Large Language Models on Controllable Generation under Diversified InstructionsCode1
Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models0
Benchmarking Hebbian learning rules for associative memory0
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRACode1
TSPP: A Unified Benchmarking Tool for Time-series ForecastingCode0
FALCON: Feature-Label Constrained Graph Net Collapse for Memory Efficient GNNsCode0
Knowledge Enhanced Conditional Imputation for Healthcare Time-seriesCode0
Combining SNNs with Filtering for Efficient Neural Decoding in Implantable Brain-Machine Interfaces0
RDF-star2Vec: RDF-star Graph Embeddings for Data MiningCode0
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and BeyondCode1
Data needs and challenges for quantum dot devices automation0
Benchmarking Evolutionary Community Detection Algorithms in Dynamic Networks0
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language ModelsCode1
Incorporating Human Flexibility through Reward Preferences in Human-AI Teaming0
ARBiBench: Benchmarking Adversarial Robustness of Binarized Neural Networks0
RetailSynth: Synthetic Data Generation for Retail AI Systems EvaluationCode1
AN ELIXIR FOR BLOCKCHAIN SCALABILITY WITH CHANNEL BASED CLUSTERED SHARDING0
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation0
Review and experimental benchmarking of machine learning algorithms for efficient optimization of cold atom experiments0
Comparing Machine Learning Algorithms by Union-Free Generic DepthCode0
Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest0
Perception Test 2023: A Summary of the First Challenge And Outcome0
FiFAR: A Fraud Detection Dataset for Learning to DeferCode1
Scaling Compute Is Not All You Need for Adversarial RobustnessCode0
Show:102550
← PrevPage 106 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified