SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1026–1050 of 1596 papers

Title	Date	Tasks	Status	Hype
ConvNLP: Image-based AI Text Detection	Jul 9, 2024	Domain GeneralizationMath	—Unverified	0
Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models	Jul 9, 2024	Math	CodeCode Available	0
Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?	Jul 6, 2024	Math	CodeCode Available	0
Smart Vision-Language Reasoners	Jul 5, 2024	MathMathematical Reasoning	CodeCode Available	0
Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior	Jul 2, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning	Jun 29, 2024	Binary ClassificationGSM8K	—Unverified	0
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models	Jun 28, 2024	DiversityMath	—Unverified	0
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting	Jun 28, 2024	Bilevel OptimizationInstruction Following	—Unverified	0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions	Jun 27, 2024	Distractor GenerationMath	CodeCode Available	0
Task Oriented In-Domain Data Augmentation	Jun 24, 2024	Data AugmentationMath	—Unverified	0
Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions	Jun 20, 2024	Active LearningMath	—Unverified	0
Towards Infinite-Long Prefix in Transformer	Jun 20, 2024	Mathparameter-efficient fine-tuning	CodeCode Available	0
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning	Jun 20, 2024	GSM8KHeuristic Search	—Unverified	0
Can LLMs Reason in the Wild with Programs?	Jun 19, 2024	GSM8KMath	CodeCode Available	0
Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever	Jun 19, 2024	MathSemantic Similarity	—Unverified	0
Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems	Jun 18, 2024	In-Context LearningMath	—Unverified	0
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation	Jun 17, 2024	Image GenerationMath	CodeCode Available	0
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts	Jun 17, 2024	Math	—Unverified	0
Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment	Jun 17, 2024	Logical ReasoningMath	—Unverified	0
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning	Jun 16, 2024	BenchmarkingMath	—Unverified	0
ReMI: A Dataset for Reasoning with Multiple Images	Jun 13, 2024	Chart UnderstandingMath	—Unverified	0
CLST: Cold-Start Mitigation in Knowledge Tracing by Aligning a Generative Language Model as a Students' Knowledge Tracer	Jun 13, 2024	Domain GeneralizationKnowledge Tracing	—Unverified	0
Can I understand what I create? Self-Knowledge Evaluation of Large Language Models	Jun 10, 2024	Math	—Unverified	0
Human Learning about AI	Jun 8, 2024	Math	—Unverified	0
A multi-core periphery perspective: Ranking via relative centrality	Jun 6, 2024	Math	—Unverified	0

Show:10 25 50

← PrevPage 42 of 64Next →

No leaderboard results yet.