SOTAVerified

Large Language Model

Papers

Showing 251300 of 6097 papers

TitleStatusHype
Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language ModelCode3
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare CopilotCode3
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video GenerationCode3
MeshXL: Neural Coordinate Field for Generative 3D Foundation ModelsCode3
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive MemoryCode3
An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use CasesCode3
Detecting hallucinations in large language models using semantic entropyCode3
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at ScaleCode3
Llemma: An Open Language Model For MathematicsCode3
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve ClassificationCode3
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API CallsCode3
Enhancing Decision Analysis with a Large Language Model: pyDecision a Comprehensive Library of MCDA Methods in PythonCode3
DARWIN 1.5: Large Language Models as Materials Science Adapted LearnersCode3
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language InterpretationCode2
Cross-Tokenizer Distillation via Approximate Likelihood MatchingCode2
Libra: Building Decoupled Vision System on Large Language ModelsCode2
Customization Assistant for Text-to-image GenerationCode2
Can Large Language Model Agents Simulate Human Trust Behavior?Code2
LifeGPT: Topology-Agnostic Generative Pretrained Transformer Model for Cellular AutomataCode2
CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model GenerationCode2
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application VulnerabilitiesCode2
Critique-out-Loud Reward ModelsCode2
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest QuestionsCode2
LifelongAgentBench: Evaluating LLM Agents as Lifelong LearnersCode2
CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language ModelsCode2
Large Scale Transfer Learning for Tabular Data via Language ModelingCode2
L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial AttacksCode2
LaVy: Vietnamese Multimodal Large Language ModelCode2
AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web PlatformsCode2
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially FastCode2
AgentSims: An Open-Source Sandbox for Large Language Model EvaluationCode2
cadrille: Multi-modal CAD Reconstruction with Online Reinforcement LearningCode2
Large Language Model with Region-guided Referring and Grounding for CT Report GenerationCode2
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for EnsemblingCode2
Large Language Model Guided Tree-of-ThoughtCode2
Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and EnhancementCode2
Large Language Model Enhanced Recommender Systems: A SurveyCode2
AgentReview: Exploring Peer Review Dynamics with LLM AgentsCode2
Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language ModelsCode2
Large Language Model Safety: A Holistic SurveyCode2
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile InstructionsCode2
LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential RecommendationCode2
Large language models can be zero-shot anomaly detectors for time series?Code2
Language Models Can Improve Event Prediction by Few-Shot Abductive ReasoningCode2
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model ApplicationCode2
Language Models can Solve Computer TasksCode2
CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive ProgrammingCode2
CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at ScaleCode2
biorecap: an R package for summarizing bioRxiv preprints with a local LLMCode2
KnowCoder: Coding Structured Knowledge into LLMs for Universal Information ExtractionCode2
Show:102550
← PrevPage 6 of 122Next →

No leaderboard results yet.