| RuleKit 2: Faster and simpler rule learning | Apr 29, 2025 | Descriptive | CodeCode Available | 2 | 5 |
| Segment Anything for Histopathology | Feb 1, 2025 | Image SegmentationInstance Segmentation | CodeCode Available | 2 | 5 |
| High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning | Jul 8, 2025 | MMEReinforcement Learning (RL) | CodeCode Available | 2 | 5 |
| Seeing World Dynamics in a Nutshell | Feb 5, 2025 | Video Reconstruction | CodeCode Available | 2 | 5 |
| Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models | Feb 6, 2025 | | CodeCode Available | 2 | 5 |
| KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG | Feb 13, 2025 | Knowledge GraphsLarge Language Model | CodeCode Available | 2 | 5 |
| Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors | Feb 18, 2025 | Code GenerationKnowledge Tracing | CodeCode Available | 2 | 5 |
| Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization | Feb 18, 2025 | Image RetrievalQuestion Answering | CodeCode Available | 2 | 5 |
| E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation | Mar 8, 2022 | GPUInstance Segmentation | CodeCode Available | 2 | 5 |
| A Survey on Data Contamination for Large Language Models | Feb 20, 2025 | SurveyText Generation | CodeCode Available | 2 | 5 |
| MaterialFusion: High-Quality, Zero-Shot, and Controllable Material Transfer with Diffusion Models | Feb 10, 2025 | | CodeCode Available | 2 | 5 |
| PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning | Feb 21, 2025 | Hallucination | CodeCode Available | 2 | 5 |
| voc2vec: A Foundation Model for Non-Verbal Vocalization | Feb 22, 2025 | model | CodeCode Available | 2 | 5 |
| WebGames: Challenging General-Purpose Web-Browsing AI Agents | Feb 25, 2025 | | CodeCode Available | 2 | 5 |
| FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction | Feb 27, 2025 | Image GenerationPrediction | CodeCode Available | 2 | 5 |
| AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms | Feb 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Automatic database description generation for Text-to-SQL | Feb 28, 2025 | Text to SQLText-To-SQL | CodeCode Available | 2 | 5 |
| UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture Search | Mar 1, 2025 | Neural Architecture SearchSpeech Enhancement | CodeCode Available | 2 | 5 |
| LongProLIP: A Probabilistic Vision-Language Model with Long Context Text | Mar 11, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| An Approach for Air Drawing Using Background Subtraction and Contour Extraction | Mar 3, 2025 | Hand DetectionOptical Character Recognition (OCR) | CodeCode Available | 2 | 5 |
| Interactive Debugging and Steering of Multi-Agent AI Systems | Mar 3, 2025 | AI Agent | CodeCode Available | 2 | 5 |
| MPO: Boosting LLM Agents with Meta Plan Optimization | Mar 4, 2025 | | CodeCode Available | 2 | 5 |
| Text2LIVE: Text-Driven Layered Image and Video Editing | Apr 5, 2022 | Video Editing | CodeCode Available | 2 | 5 |
| Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking | Mar 9, 2025 | Visual Tracking | CodeCode Available | 2 | 5 |
| GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian Splats | Mar 11, 2025 | 3DGSNeRF | CodeCode Available | 2 | 5 |
| Is CLIP ideal? No. Can we fix it? Yes! | Mar 10, 2025 | AttributeNegation | CodeCode Available | 2 | 5 |
| Word2World: Generating Stories and Worlds through Large Language Models | May 6, 2024 | Game Design | CodeCode Available | 2 | 5 |
| LLM-FP4: 4-Bit Floating-Point Quantized Transformers | Oct 25, 2023 | Common Sense ReasoningQuantization | CodeCode Available | 2 | 5 |
| OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer | Mar 13, 2025 | Decodermultimodal interaction | CodeCode Available | 2 | 5 |
| RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs | Mar 8, 2025 | Instruction FollowingMathematical Reasoning | CodeCode Available | 2 | 5 |
| A Comprehensive Survey on Knowledge Distillation | Mar 15, 2025 | Knowledge DistillationSurvey | CodeCode Available | 2 | 5 |
| TimberTrek: Exploring and Curating Sparse Decision Trees with Interactive Visualization | Sep 19, 2022 | | CodeCode Available | 2 | 5 |
| LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models | Mar 18, 2025 | compressed sensingVideo Generation | CodeCode Available | 2 | 5 |
| MambaIC: State Space Models for High-Performance Learned Image Compression | Mar 16, 2025 | Image CompressionState Space Models | CodeCode Available | 2 | 5 |
| Single Image Iterative Subject-driven Generation and Editing | Mar 20, 2025 | Image Generation | CodeCode Available | 2 | 5 |
| NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes | Mar 20, 2025 | Scene Generation | CodeCode Available | 2 | 5 |
| SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer | Mar 20, 2025 | DecoderMamba | CodeCode Available | 2 | 5 |
| Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping | Mar 21, 2025 | GPUMotion Estimation | CodeCode Available | 2 | 5 |
| Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection | Mar 25, 2025 | Anomaly DetectionUnsupervised Anomaly Detection | CodeCode Available | 2 | 5 |
| Datasets for Depression Modeling in Social Media: An Overview | Mar 27, 2025 | | CodeCode Available | 2 | 5 |
| AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World | Mar 31, 2025 | Robot ManipulationScheduling | CodeCode Available | 2 | 5 |
| On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices | Mar 31, 2025 | DenoisingModel Optimization | CodeCode Available | 2 | 5 |
| Efficient Federated Learning Tiny Language Models for Mobile Network Feature Prediction | Apr 2, 2025 | Federated Learning | CodeCode Available | 2 | 5 |
| An Illusion of Progress? Assessing the Current State of Web Agents | Apr 2, 2025 | | CodeCode Available | 2 | 5 |
| Re-thinking Temporal Search for Long-Form Video Understanding | Apr 3, 2025 | Computational EfficiencyForm | CodeCode Available | 2 | 5 |
| A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and Opportunities | Apr 1, 2025 | | CodeCode Available | 2 | 5 |
| Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting | Apr 7, 2025 | Boundary DetectionObject | CodeCode Available | 2 | 5 |
| VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation | Apr 5, 2025 | | CodeCode Available | 2 | 5 |
| Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models | Mar 21, 2025 | GSM8KQuestion Answering | CodeCode Available | 2 | 5 |
| LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models | Apr 14, 2025 | Equation DiscoveryMemorization | CodeCode Available | 2 | 5 |