SOTAVerified

Large Language Model

Papers

Showing 251300 of 6097 papers

TitleStatusHype
Detecting hallucinations in large language models using semantic entropyCode3
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at ScaleCode3
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model PromptsCode3
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve ClassificationCode3
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API CallsCode3
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive MemoryCode3
Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language ModelCode3
Chat-Edit-3D: Interactive 3D Scene Editing via Text PromptsCode3
Llemma: An Open Language Model For MathematicsCode3
DARWIN 1.5: Large Language Models as Materials Science Adapted LearnersCode3
An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use CasesCode3
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language ModelsCode3
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-ScalingCode3
Cross-Tokenizer Distillation via Approximate Likelihood MatchingCode2
Critique-out-Loud Reward ModelsCode2
Libra: Building Decoupled Vision System on Large Language ModelsCode2
Customization Assistant for Text-to-image GenerationCode2
CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model GenerationCode2
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language InterpretationCode2
LifeGPT: Topology-Agnostic Generative Pretrained Transformer Model for Cellular AutomataCode2
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest QuestionsCode2
CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language ModelsCode2
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application VulnerabilitiesCode2
LifelongAgentBench: Evaluating LLM Agents as Lifelong LearnersCode2
L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial AttacksCode2
Large Scale Transfer Learning for Tabular Data via Language ModelingCode2
LaVy: Vietnamese Multimodal Large Language ModelCode2
Large Language Model with Region-guided Referring and Grounding for CT Report GenerationCode2
AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web PlatformsCode2
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially FastCode2
CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive ProgrammingCode2
AgentSims: An Open-Source Sandbox for Large Language Model EvaluationCode2
Alphazero-like Tree-Search can Guide Large Language Model Decoding and TrainingCode2
Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization ApproachCode2
Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language ModelsCode2
Large Language Model Enhanced Recommender Systems: A SurveyCode2
Large Language Model Guided Tree-of-ThoughtCode2
Control Industrial Automation System with Large Language Model AgentsCode2
AgentReview: Exploring Peer Review Dynamics with LLM AgentsCode2
LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential RecommendationCode2
Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and EnhancementCode2
Language Models can Solve Computer TasksCode2
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile InstructionsCode2
Large Language Model Safety: A Holistic SurveyCode2
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model ApplicationCode2
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics LearningCode2
KnowCoder: Coding Structured Knowledge into LLMs for Universal Information ExtractionCode2
CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at ScaleCode2
KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAGCode2
Compiler Optimization via LLM Reasoning for Efficient Model ServingCode2
Show:102550
← PrevPage 6 of 122Next →

No leaderboard results yet.