SOTAVerified

Benchmarking

Papers

Showing 42514300 of 5548 papers

TitleStatusHype
Extensible Logging and Empirical Attainment Function for IOHexperimenter0
Context-guided Triple Matching for Multiple Choice Question Answering0
PASS: An ImageNet replacement for self-supervised pretraining without humansCode1
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language UnderstandingCode1
Disentangled Feature Representation for Few-shot Image ClassificationCode1
MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement LearningCode2
Curb Your Carbon Emissions: Benchmarking Carbon Emissions in Machine Translation0
Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue SystemCode1
Benchmarking Lane-changing Decision-making for Deep Reinforcement Learning0
Benchmarking Augmentation Methods for Learning Robust Navigation Agents: the Winning Entry of the 2021 iGibson Challenge0
SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and BenchmarkingCode1
Efficiently solving the thief orienteering problem with a max-min ant colony optimization approachCode0
A Novel Cluster Detection of COVID-19 Patients and Medical Disease Conditions Using Improved Evolutionary Clustering Algorithm Star0
Hybrid Transceiver Design for Tera-Hertz MIMO Systems Relying on Bayesian Learning Aided Sparse Channel Estimation0
AI Accelerator Survey and TrendsCode1
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge GraphsCode1
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction0
WiSoSuper: Benchmarking Super-Resolution Methods on Wind and Solar Data0
Messing Up 3D Virtual Environments: Transferable Adversarial 3D ObjectsCode0
Benchmarking Feature-based Algorithm Selection Systems for Black-box Numerical OptimizationCode0
A Survey on Temporal Sentence Grounding in Videos0
OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle CommunicationCode1
Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation DatasetCode1
Benchmarking the Spectrum of Agent CapabilitiesCode1
A Continuous Optimisation Benchmark Suite from Neural Network RegressionCode0
RobustART: Benchmarking Robustness on Architecture Design and Training TechniquesCode1
Benchmarking Processor Performance by Multi-Threaded Machine Learning Algorithms0
Application of DEA in International Market Selection for the export of products from Spain0
A framework for benchmarking uncertainty in deep regression0
Characterization of Constrained Continuous Multiobjective Optimization Problems: A Feature Space Perspective0
CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization0
Towards Efficient Synchronous Federated Training: A Survey on System Optimization StrategiesCode0
Resistive Neural Hardware Accelerators0
Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and TrackingCode2
Fine-grained Hand Gesture Recognition in Multi-viewpoint Hand HygieneCode0
Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through LexicaCode1
Scikit-dimension: a Python package for intrinsic dimension estimationCode1
Biomedical Data-to-Text Generation via Fine-Tuning TransformersCode1
Benchmarking the Robustness of Instance Segmentation Models0
Towards Sentiment Analysis of Tobacco Products’ Usage in Social Media0
Benchmarking down-scaled (not so large) pre-trained language modelsCode0
ReMeDi: Resources for Multi-domain, Multi-service, Medical DialoguesCode1
Cross-Lingual Text Classification of Transliterated Hindi and MalayalamCode0
Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization0
Benchmarking the Accuracy and Robustness of Feedback Alignment Algorithms0
Semi-Supervised Exaggeration Detection of Health Science Press ReleasesCode1
Tune It or Don't Use It: Benchmarking Data-Efficient Image ClassificationCode1
BioFors: A Large Biomedical Image Forensics DatasetCode0
KO codes: Inventing Nonlinear Encoding and Decoding for Reliable Wireless Communication via Deep-learningCode1
Show:102550
← PrevPage 86 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified