SOTAVerified

Benchmarking

Papers

Showing 32513300 of 5548 papers

TitleStatusHype
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects0
HySpecNet-11k: A Large-Scale Hyperspectral Dataset for Benchmarking Learning-Based Hyperspectral Image Compression Methods0
Dynamic Neighborhood Construction for Structured Large Discrete Action SpacesCode0
Accurate and Efficient Structural Ensemble Generation of Macrocyclic Peptides using Internal Coordinate DiffusionCode1
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language ModelsCode1
ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context LearningCode0
Design and implementation of intelligent packet filtering in IoT microcontroller-based devicesCode0
ShuffleMix: Improving Representations via Channel-Wise Shuffle of Interpolated Hidden StatesCode0
Large-scale Ridesharing DARP Instances Based on Real Travel DemandCode0
IDToolkit: A Toolkit for Benchmarking and Developing Inverse Design Algorithms in NanophotonicsCode1
Human Body Shape Classification Based on a Single Image0
Decoding the Underlying Meaning of Multimodal Hateful MemesCode1
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual IllusionCode0
Exploring the Practicality of Generative Retrieval on Dynamic Corpora0
BASED: Benchmarking, Analysis, and Structural Estimation of DeblurringCode0
Learning from Integral Losses in Physics Informed Neural NetworksCode0
Benchmarking Diverse-Modal Entity Linking with Generative Models0
The Brain Tumor Segmentation (BraTS) Challenge 2023: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs)Code2
Zero is Not Hero Yet: Benchmarking Zero-Shot Performance of LLMs for Financial TasksCode1
Benchmarking state-of-the-art gradient boosting algorithms for classification0
Investigation of UAV Detection in Images with Complex Backgrounds and Rainy ArtifactsCode0
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical DatasetCode0
KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range MultilaterationCode1
Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite0
Barkour: Benchmarking Animal-level Agility with Quadruped Robots0
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and BenchmarkingCode0
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer0
LAraBench: Benchmarking Arabic AI with Large Language Models0
Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet ExtractionCode0
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability AssessmentCode1
Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution TracesCode1
R2H: Building Multimodal Navigation Helpers that Respond to Help Requests0
When the Music Stops: Tip-of-the-Tongue Retrieval for MusicCode0
Benchmarking Machine Translation with Cultural AwarenessCode0
Robust Model-Based Optimization for Challenging Fitness LandscapesCode0
Exploring Large Language Models for Classical PhilologyCode1
Multilingual Large Language Models Are Not (Yet) Code-Switchers0
How Fragile is Relation Extraction under Entity Replacements?Code0
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought MethodCode1
A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting ApproachesCode0
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate0
Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial AttacksCode1
Value-at-Risk-Based Portfolio Insurance: Performance Evaluation and Benchmarking Against CPPI in a Markov-Modulated Regime-Switching Market0
Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite0
Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language ModelsCode2
Separating form and meaning: Using self-consistency to quantify task understanding across multiple sensesCode0
TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks0
Ahead-of-Time P-Tuning0
Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation0
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization0
Show:102550
← PrevPage 66 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified