| Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance | Feb 12, 2025 | BenchmarkingLong-Context Understanding | CodeCode Available | 2 |
| ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification | Feb 12, 2025 | DecoderDescriptive | CodeCode Available | 2 |
| SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation | Feb 12, 2025 | Earth Observationobject-detection | CodeCode Available | 2 |
| Human-Centric Foundation Models: Perception, Generation and Agentic Modeling | Feb 12, 2025 | Survey | CodeCode Available | 2 |
| Cluster and Predict Latents Patches for Improved Masked Image Modeling | Feb 12, 2025 | Representation Learning | CodeCode Available | 2 |
| Brain Latent Progression: Individual-based Spatiotemporal Disease Progression on 3D Brain MRIs via Latent Diffusion | Feb 12, 2025 | | CodeCode Available | 2 |
| The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks | Feb 12, 2025 | | CodeCode Available | 2 |
| LIR-LIVO: A Lightweight,Robust LiDAR/Vision/Inertial Odometry with Illumination-Resilient Deep Features | Feb 12, 2025 | Pose EstimationVisual Odometry | CodeCode Available | 2 |
| TLOB: A Novel Transformer Model with Dual Attention for Price Trend Prediction with Limit Order Book Data | Feb 12, 2025 | | CodeCode Available | 2 |
| WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point | Feb 12, 2025 | | CodeCode Available | 2 |
| A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks | Feb 12, 2025 | | CodeCode Available | 2 |
| mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data | Feb 12, 2025 | cross-modal alignmentLarge Language Model | CodeCode Available | 2 |
| TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation | Feb 11, 2025 | Image Generation | CodeCode Available | 2 |
| LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid | Feb 11, 2025 | | CodeCode Available | 2 |
| Training Deep Learning Models with Norm-Constrained LMOs | Feb 11, 2025 | Deep Learning | CodeCode Available | 2 |
| MeshSplats: Mesh-Based Rendering with Gaussian Splatting Initialization | Feb 11, 2025 | | CodeCode Available | 2 |
| Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving | Feb 11, 2025 | AttributeAutonomous Driving | CodeCode Available | 2 |
| DPO-Shift: Shifting the Distribution of Direct Preference Optimization | Feb 11, 2025 | | CodeCode Available | 2 |
| Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models | Feb 11, 2025 | Style Transfer | CodeCode Available | 2 |
| Automated Capability Discovery via Model Self-Exploration | Feb 11, 2025 | model | CodeCode Available | 2 |
| RoboBERT: An End-to-end Multimodal Robotic Manipulation Model | Feb 11, 2025 | Data Augmentation | CodeCode Available | 2 |
| SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement | Feb 10, 2025 | Semantic Segmentation | CodeCode Available | 2 |
| KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment | Feb 10, 2025 | ArticlesKnowledge Graphs | CodeCode Available | 2 |
| Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning | Feb 10, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| On the Emergence of Thinking in LLMs I: Searching for the Right Intuition | Feb 10, 2025 | Math | CodeCode Available | 2 |
| TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting | Feb 10, 2025 | Representation LearningTime Series | CodeCode Available | 2 |
| MaterialFusion: High-Quality, Zero-Shot, and Controllable Material Transfer with Diffusion Models | Feb 10, 2025 | | CodeCode Available | 2 |
| Saving 77% of the Parameters in Large Language Models Technical Report | Feb 9, 2025 | GPUText Generation | CodeCode Available | 2 |
| Skill Expansion and Composition in Parameter Space | Feb 9, 2025 | D4RL | CodeCode Available | 2 |
| 3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly | Feb 9, 2025 | Anomaly DetectionUnsupervised Anomaly Detection | CodeCode Available | 2 |
| Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Model | Feb 8, 2025 | Image Generation | CodeCode Available | 2 |
| Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark | Feb 8, 2025 | Knowledge DistillationObject Tracking | CodeCode Available | 2 |
| CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | Feb 8, 2025 | Code GenerationHumanEval | CodeCode Available | 2 |
| Knowledge Graph-Guided Retrieval Augmented Generation | Feb 8, 2025 | DiversityHallucination | CodeCode Available | 2 |
| Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey | Feb 8, 2025 | FairnessRAG | CodeCode Available | 2 |
| Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures | Feb 7, 2025 | Mathematical Problem-Solvingreinforcement-learning | CodeCode Available | 2 |
| NoLiMa: Long-Context Evaluation Beyond Literal Matching | Feb 7, 2025 | | CodeCode Available | 2 |
| GaussRender: Learning 3D Occupancy with Gaussian Rendering | Feb 7, 2025 | 3D geometryAutonomous Vehicles | CodeCode Available | 2 |
| QuEST: Stable Training of LLMs with 1-Bit Weights and Activations | Feb 7, 2025 | GPUQuantization | CodeCode Available | 2 |
| MHAF-YOLO: Multi-Branch Heterogeneous Auxiliary Fusion YOLO for accurate object detection | Feb 7, 2025 | object-detectionObject Detection | CodeCode Available | 2 |
| GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity? | Feb 7, 2025 | 8kInformation Retrieval | CodeCode Available | 2 |
| SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning | Feb 7, 2025 | | CodeCode Available | 2 |
| Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion | Feb 6, 2025 | image-classificationImage Classification | CodeCode Available | 2 |
| Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models | Feb 6, 2025 | | CodeCode Available | 2 |
| Training Language Models to Reason Efficiently | Feb 6, 2025 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| SoK: Benchmarking Poisoning Attacks and Defenses in Federated Learning | Feb 6, 2025 | BenchmarkingData Poisoning | CodeCode Available | 2 |
| WaferLLM: Large Language Model Inference at Wafer Scale | Feb 6, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | Feb 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Sparse Autoencoders for Hypothesis Generation | Feb 5, 2025 | | CodeCode Available | 2 |
| On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices | Feb 5, 2025 | DenoisingModel Optimization | CodeCode Available | 2 |