SOTAVerified

Benchmarking

Papers

Showing 13011350 of 5548 papers

TitleStatusHype
Benchmarking Differential Privacy and Federated Learning for BERT ModelsCode1
You are AllSet: A Multiset Function Framework for Hypergraph Neural NetworksCode1
Mutual-Information Based Few-Shot ClassificationCode1
Synthetic Benchmarks for Scientific Research in Explainable Machine LearningCode1
Underwater Image Restoration via Contrastive Learning and a Real-world DatasetCode1
Intrinsic Image HarmonizationCode1
Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic TestingCode1
Understanding and Evaluating Racial Biases in Image CaptioningCode1
Selection of Source Images Heavily Influences the Effectiveness of Adversarial AttacksCode1
Online Learning with Optimism and DelayCode1
Shades of BLEU, Flavours of Success: The Case of MultiWOZCode1
Signals to Spikes for Neuromorphic Regulated Reservoir Computing and EMG Hand Gesture RecognitionCode1
RobustNav: Towards Benchmarking Robustness in Embodied NavigationCode1
Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness MetricsCode1
EXPObench: Benchmarking Surrogate-based Optimisation Algorithms on Expensive Black-box FunctionsCode1
The Medkit-Learn(ing) Environment: Medical Decision Modelling through SimulationCode1
DFGC 2021: A DeepFake Game CompetitionCode1
FedScale: Benchmarking Model and System Performance of Federated Learning at ScaleCode1
Benchmarking the Performance of Bayesian Optimization across Multiple Experimental Materials Science DomainsCode1
Anabranch Network for Camouflaged Object SegmentationCode1
DACBench: A Benchmark Library for Dynamic Algorithm ConfigurationCode1
Multimodal Fusion via Teacher-Student Network for Indoor Action RecognitionCode1
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarksCode1
A Reinforcement Learning Environment for Multi-Service UAV-enabled Wireless SystemsCode1
AnomalyHop: An SSL-based Image Anomaly Localization MethodCode1
D2S: Document-to-Slide Generation Via Query-Based Text SummarizationCode1
Open Radar Initiative: Large Scale Dataset for Benchmarking of micro-Doppler Recognition AlgorithmsCode1
dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal ProcessingCode1
2.5D Visual Relationship DetectionCode1
Knodle: Modular Weakly Supervised Learning with PyTorchCode1
Data Generating Process to Evaluate Causal Discovery Techniques for Time Series DataCode1
Towards Standardising Reinforcement Learning Approaches for Production Scheduling ProblemsCode1
Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning InterpretabilityCode1
Safety-enhanced UAV Path Planning with Spherical Vector-based Particle Swarm OptimizationCode1
StylePTB: A Compositional Benchmark for Fine-grained Controllable Text Style TransferCode1
Robust Semantic Interpretability: Revisiting Concept Activation VectorsCode1
CBench: Towards Better Evaluation of Question Answering Over Knowledge GraphsCode1
Remote Sensing Image Classification with the SEN12MS DatasetCode1
Simultaneous Navigation and Construction Benchmarking EnvironmentsCode1
Benchmarks for Deep Off-Policy EvaluationCode1
3D AffordanceNet: A Benchmark for Visual Object Affordance UnderstandingCode1
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic EventsCode1
Marine Snow Removal Benchmarking DatasetCode1
Learning to Optimize: A Primer and A BenchmarkCode1
Neural Multi-Hop Reasoning With Logical Rules on Biomedical Knowledge GraphsCode1
SHARP: Environment and Person Independent Activity Recognition with Commodity IEEE 802.11 Access PointsCode1
A Large-Scale Dataset for Benchmarking Elevator Button Segmentation and Character RecognitionCode1
The Effect of Domain and Diacritics in Yorùbá-English Neural Machine TranslationCode1
Recent Advances on Neural Network Pruning at InitializationCode1
A Computed Tomography Vertebral Segmentation Dataset with Anatomical Variations and Multi-Vendor Scanner DataCode1
Show:102550
← PrevPage 27 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified