SOTAVerified

Benchmarking

Papers

Showing 20012050 of 5548 papers

TitleStatusHype
Benchmarking and Improving Generator-Validator Consistency of Language Models0
Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors0
Applicability and Challenges of Deep Reinforcement Learning for Satellite Frequency Plan Design0
Decoding Complexity: Intelligent Pattern Exploration with CHPDA (Context Aware Hybrid Pattern Detection Algorithm)0
Decoding the Diversity: A Review of the Indic AI Research Landscape0
Certifying almost all quantum states with few single-qubit measurements0
Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines0
An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks0
CellCycleGAN: Spatiotemporal Microscopy Image Synthesis of Cell Populations using Statistical Shape Models and Conditional GANs0
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech0
Establishing Reliability Metrics for Reward Models in Large Language Models0
Deep Convolutional Generative Adversarial Network Based Food Recognition Using Partially Labeled Data0
CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark0
Deep Crowd Anomaly Detection: State-of-the-Art, Challenges, and Future Research Directions0
An efficiency analysis of Spanish airports0
Deep Diffusion Models and Unsupervised Hyperspectral Unmixing for Realistic Abundance Map Synthesis0
Estimating Task Completion Times for Network Rollouts using Statistical Models within Partitioning-based Regression Methods0
DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices0
Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices0
Deeper Insights into the Robustness of ViTs towards Common Corruptions0
DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection0
EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding0
Evaluating Cultural and Social Awareness of LLM Web Agents0
Evaluating the Performance of Large Language Models via Debates0
Deep Generative Models for Physiological Signals: A Systematic Literature Review0
Deep Hedging of Long-Term Financial Derivatives0
Evolutionary Multimodal Optimization: A Short Survey0
Deep Imputation of Missing Values in Time Series Health Data: A Review with Benchmarking0
CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs0
Deep Learning and Knowledge-Based Methods for Computer Aided Molecular Design -- Toward a Unified Approach: State-of-the-Art and Future Directions0
Benchmarking and Evaluation of AI Models in Biology: Outcomes and Recommendations from the CZI Virtual Cells Workshop0
An EEG-based Stereoscopic Research to Reveal the Brain's Response to What Happens Before and After Watching 2D and 3D Movies0
Deep learning for action spotting in association football videos0
CausalRivers -- Scaling up benchmarking of causal discovery for real-world time-series0
Deep learning for extracting protein-protein interactions from biomedical literature0
Deep learning for molecular design - a review of the state of the art0
Optimal Design of Volt/VAR Control Rules of Inverters using Deep Learning0
Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost Functions0
Benchmarking and Error Diagnosis in Multi-Instance Pose Estimation0
Causal Reasoning Meets Visual Representation Learning: A Prospective Study0
Benchmarking and Enhancing Surgical Phase Recognition Models for Robotic-Assisted Esophagectomy0
Deep Learning of Intrinsically Motivated Options in the Arcade Learning Environment0
Deep Learning vs. Gradient Boosting: Benchmarking state-of-the-art machine learning algorithms for credit scoring0
Deeply Supervised Depth Map Super-Resolution as Novel View Synthesis0
Benchmarking Graph Learning for Drug-Drug Interaction Prediction0
Deep Nets: What have they ever done for Vision?0
An Early Warning Sign of Critical Transition in The Antarctic Ice Sheet -- A Data Driven Tool for Spatiotemporal Tipping Point0
A Dataset for Movie Description0
Benchmarking and Enhancing Disentanglement in Concept-Residual Models0
EnzChemRED, a rich enzyme chemistry relation extraction dataset0
Show:102550
← PrevPage 41 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified