SOTAVerified

Benchmarking

Papers

Showing 611620 of 5548 papers

TitleStatusHype
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
MatTools: Benchmarking Large Language Models for Materials Science ToolsCode1
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model EvaluationCode1
CRoW: Benchmarking Commonsense Reasoning in Real-World TasksCode1
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language ModelsCode1
Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet DatasetsCode1
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089Code1
Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource LanguagesCode1
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable SummarizationCode1
Show:102550
← PrevPage 62 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified