| xVerify: Efficient Answer Verifier for Reasoning Model Evaluations | Apr 14, 2025 | | CodeCode Available | 2 |
| MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning | Apr 14, 2025 | Machine TranslationReinforcement Learning (RL) | CodeCode Available | 2 |
| FLOWR: Flow Matching for Structure-Aware De Novo, Interaction- and Fragment-Based Ligand Generation | Apr 14, 2025 | | CodeCode Available | 2 |
| Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models | Apr 14, 2025 | Action GenerationDenoising | CodeCode Available | 2 |
| SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users | Apr 14, 2025 | DiversityFace Alignment | CodeCode Available | 2 |
| NaviDiffusor: Cost-Guided Diffusion Model for Visual Navigation | Apr 14, 2025 | Visual Navigation | CodeCode Available | 2 |
| LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification | Apr 14, 2025 | Person Re-Identification | CodeCode Available | 2 |
| LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models | Apr 14, 2025 | Equation DiscoveryMemorization | CodeCode Available | 2 |
| Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning | Apr 14, 2025 | Mathematical Reasoningmbpp | CodeCode Available | 2 |
| FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding | Apr 14, 2025 | | CodeCode Available | 2 |
| OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation | Apr 14, 2025 | 3D Shape Generation | CodeCode Available | 2 |
| Software package for simulations using the coarse-grained CALVADOS model | Apr 14, 2025 | | CodeCode Available | 2 |
| The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer | Apr 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language model | Apr 13, 2025 | DiagnosticLanguage Modeling | CodeCode Available | 2 |
| Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability | Apr 13, 2025 | model | CodeCode Available | 2 |
| TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning | Apr 13, 2025 | Question Answeringreinforcement-learning | CodeCode Available | 2 |
| HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation | Apr 13, 2025 | Multimodal ReasoningRAG | CodeCode Available | 2 |
| Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation | Apr 13, 2025 | Domain AdaptationLanguage Modeling | CodeCode Available | 2 |
| Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images | Apr 13, 2025 | GPU | CodeCode Available | 2 |
| SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model | Apr 13, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking | Apr 12, 2025 | Knowledge Distillation | CodeCode Available | 2 |
| BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting | Apr 12, 2025 | 3DGSNovel View Synthesis | CodeCode Available | 2 |
| Flux Already Knows -- Activating Subject-Driven Image Generation without Training | Apr 12, 2025 | Image GenerationVirtual Try-on | CodeCode Available | 2 |
| A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future | Apr 12, 2025 | | CodeCode Available | 2 |
| TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration | Apr 11, 2025 | Audio Signal ProcessingBenchmarking | CodeCode Available | 2 |
| RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements | Apr 11, 2025 | Video Generation | CodeCode Available | 2 |
| Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning | Apr 11, 2025 | | CodeCode Available | 2 |
| DataMap: A Portable Application for Visualizing High-Dimensional Data | Apr 11, 2025 | | CodeCode Available | 2 |
| self-prompting analogical reasoning for uav object detection | Apr 11, 2025 | graph constructionobject-detection | CodeCode Available | 2 |
| PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models | Apr 11, 2025 | ClusteringLanguage Modeling | CodeCode Available | 2 |
| SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning | Apr 10, 2025 | | CodeCode Available | 2 |
| Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models | Apr 10, 2025 | Emotion InterpretationEmotion Recognition | CodeCode Available | 2 |
| VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning | Apr 10, 2025 | MathMultimodal Reasoning | CodeCode Available | 2 |
| P2Object: Single Point Supervised Object Detection and Instance Segmentation | Apr 10, 2025 | Instance SegmentationMultiple Instance Learning | CodeCode Available | 2 |
| GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Apr 10, 2025 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 |
| SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models | Apr 10, 2025 | Reinforcement Learning (RL)Visual Reasoning | CodeCode Available | 2 |
| MM-IFEngine: Towards Multimodal Instruction Following | Apr 10, 2025 | Instruction Following | CodeCode Available | 2 |
| Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora | Apr 10, 2025 | | CodeCode Available | 2 |
| LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation | Apr 10, 2025 | Code GenerationContinual Learning | CodeCode Available | 2 |
| SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement | Apr 10, 2025 | Knowledge DistillationVisual Reasoning | CodeCode Available | 2 |
| Compositional Flows for 3D Molecule and Synthesis Pathway Co-design | Apr 10, 2025 | Drug Design | CodeCode Available | 2 |
| LLM4Ranking: An Easy-to-use Framework of Utilizing Large Language Models for Document Reranking | Apr 10, 2025 | RerankingRetrieval-augmented Generation | CodeCode Available | 2 |
| OmniCaptioner: One Captioner to Rule Them All | Apr 9, 2025 | AllImage Captioning | CodeCode Available | 2 |
| AssistanceZero: Scalably Solving Assistance Games | Apr 9, 2025 | Imitation LearningMinecraft | CodeCode Available | 2 |
| ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities | Apr 9, 2025 | ColorizationSketch Colorization | CodeCode Available | 2 |
| Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection | Apr 9, 2025 | Contrastive Learningcounterfactual | CodeCode Available | 2 |
| Objaverse++: Curated 3D Object Dataset with Quality Annotations | Apr 9, 2025 | 3D GenerationAttribute | CodeCode Available | 2 |
| InteractRank: Personalized Web-Scale Search Pre-Ranking with Cross Interaction Features | Apr 9, 2025 | Computational Efficiency | CodeCode Available | 2 |
| TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling | Apr 9, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Rethinking LayerNorm in Image Restoration Transformers | Apr 9, 2025 | Image Restoration | CodeCode Available | 2 |