SOTAVerified

16k

Papers

Showing 76100 of 146 papers

TitleStatusHype
Bimanual Dexterity for Complex Tasks0
Piecing It All Together: Verifying Multi-Hop Multimodal Claims0
Model Editing for LLMs4Code: How Far are We?Code0
Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context EvaluationCode0
Correlation-Aware Select and Merge Attention for Efficient Fine-Tuning and Context Length Extension0
Extending Context Window of Large Language Models from a Distributional PerspectiveCode0
Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies0
LongIns: A Challenging Long-context Instruction-based Exam for LLMs0
Inferring Pluggable Types with Machine Learning0
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models0
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual UnderstandingCode0
An Empirical Study of Mamba-based Language ModelsCode0
Long Context Alignment with Short Instructions and Synthesized Positions0
FPT: Feature Prompt Tuning for Few-shot Readability AssessmentCode0
RU22Fact: Optimizing Evidence for Multilingual Explainable Fact-Checking on Russia-Ukraine ConflictCode0
An AI-Assisted Skincare Routine Recommendation System in XR0
Transformers for Low-Resource Languages:Is Féidir Linn!0
Human Evaluation of English--Irish Transformer-Based NMT0
Divide-Conquer-and-Merge: Memory- and Time-Efficient Holographic Displays0
Calpric: Inclusive and Fine-grain Labeling of Privacy Policies with Crowdsourcing and Active LearningCode0
Detours for Navigating Instructional Videos0
Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction0
Beyond Accuracy: Statistical Measures and Benchmark for Evaluation of Representation from Self-Supervised Learning0
Improved prompting and process for writing user personas with LLMs, using qualitative interviews: Capturing behaviour and personality traits of users0
Retrieval meets Long Context Large Language Models0
Show:102550
← PrevPage 4 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Suprime21'"1Unverified