| LiveBench: A Challenging, Contamination-Limited LLM Benchmark | Jun 27, 2024 | ArticlesInstruction Following | CodeCode Available | 5 |
| OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding | Jun 27, 2024 | DecoderSegmentation | CodeCode Available | 5 |
| Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model | Jun 27, 2024 | MambaSegmentation | CodeCode Available | 5 |
| ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation | Jun 26, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 |
| MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation | Jun 25, 2024 | DiversityNatural Language Understanding | CodeCode Available | 5 |
| MixTex: Unambiguous Recognition Should Not Rely Solely on Real Data | Jun 24, 2024 | Data AugmentationOptical Character Recognition (OCR) | CodeCode Available | 5 |
| Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs | Jun 24, 2024 | Representation LearningVisual Grounding | CodeCode Available | 5 |
| LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training | Jun 24, 2024 | Mixture-of-Experts | CodeCode Available | 5 |
| ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models | Jun 21, 2024 | | CodeCode Available | 5 |
| Uni-Mol2: Exploring Molecular Pretraining Model at Scale | Jun 21, 2024 | model | CodeCode Available | 5 |
| aeon: a Python toolkit for learning from time series | Jun 20, 2024 | Anomaly DetectionModel Selection | CodeCode Available | 5 |
| EvTexture: Event-driven Texture Enhancement for Video Super-Resolution | Jun 19, 2024 | Event-based visionSuper-Resolution | CodeCode Available | 5 |
| Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | Jun 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Improving Text-To-Audio Models with Synthetic Captions | Jun 18, 2024 | AudioCapsAudio captioning | CodeCode Available | 5 |
| Autoregressive Image Generation without Vector Quantization | Jun 17, 2024 | Image GenerationQuantization | CodeCode Available | 5 |
| τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains | Jun 17, 2024 | | CodeCode Available | 5 |
| From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline | Jun 17, 2024 | Chatbot | CodeCode Available | 5 |
| PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery | Jun 16, 2024 | DecoderEarth Observation | CodeCode Available | 5 |
| 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities | Jun 13, 2024 | Instance Segmentationmultimodal generation | CodeCode Available | 5 |
| EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts | Jun 13, 2024 | Conditional Image GenerationImage Generation | CodeCode Available | 5 |
| VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks | Jun 12, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 5 |
| VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | Jun 11, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 5 |
| FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion | Jun 11, 2024 | GPU | CodeCode Available | 5 |
| Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B | Jun 11, 2024 | Decision MakingGSM8K | CodeCode Available | 5 |
| Zero-shot Image Editing with Reference Imitation | Jun 11, 2024 | Semantic correspondence | CodeCode Available | 5 |
| Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation | Jun 10, 2024 | Conditional Image GenerationImage Generation | CodeCode Available | 5 |
| PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation | Jun 10, 2024 | 3D ReconstructionAutonomous Driving | CodeCode Available | 5 |
| The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models | Jun 9, 2024 | Instruction Following | CodeCode Available | 5 |
| Matching Anything by Segmenting Anything | Jun 6, 2024 | Domain GeneralizationMultiple Object Tracking | CodeCode Available | 5 |
| ShareGPT4Video: Improving Video Understanding and Generation with Better Captions | Jun 6, 2024 | Video CaptioningVideo Generation | CodeCode Available | 5 |
| Text-to-Image Rectified Flow as Plug-and-Play Priors | Jun 5, 2024 | 3D GenerationText to 3D | CodeCode Available | 5 |
| Wings: Learning Multimodal LLMs without Text-only Forgetting | Jun 5, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 5 |
| StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning | Jun 5, 2024 | Automatic Speech Recognition (ASR)de-en | CodeCode Available | 5 |
| Parrot: Multilingual Visual Instruction Tuning | Jun 4, 2024 | Mixture-of-Experts | CodeCode Available | 5 |
| PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling | Jun 4, 2024 | | CodeCode Available | 5 |
| AudioLCM: Text-to-Audio Generation with Latent Consistency Models | Jun 1, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 5 |
| Ovis: Structural Embedding Alignment for Multimodal Large Language Model | May 31, 2024 | Language ModelingMultimodal Large Language Model | CodeCode Available | 5 |
| Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation | May 31, 2024 | MuJoCoreinforcement-learning | CodeCode Available | 5 |
| Very Low Complexity Speech Synthesis Using Framewise Autoregressive GAN (FARGAN) with Pitch Prediction | May 31, 2024 | Speech Synthesis | CodeCode Available | 5 |
| Xwin-LM: Strong and Scalable Alignment Practice for LLMs | May 30, 2024 | | CodeCode Available | 5 |
| SpinQuant: LLM quantization with learned rotations | May 26, 2024 | Quantization | CodeCode Available | 5 |
| CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling | May 26, 2024 | | CodeCode Available | 5 |
| DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ | May 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Focus Anywhere for Fine-grained Multi-page Document Understanding | May 23, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 5 |
| TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting | May 23, 2024 | Future predictionTime Series | CodeCode Available | 5 |
| Improved Distribution Matching Distillation for Fast Image Synthesis | May 23, 2024 | Image Generation | CodeCode Available | 5 |
| PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression | May 23, 2024 | Quantization | CodeCode Available | 5 |
| Awesome Multi-modal Object Tracking | May 23, 2024 | Autonomous DrivingKnowledge Distillation | CodeCode Available | 5 |
| Diffusion for World Modeling: Visual Details Matter in Atari | May 20, 2024 | Image Generationreinforcement-learning | CodeCode Available | 5 |
| Uni-Mol Docking V2: Towards Realistic and Accurate Binding Pose Prediction | May 20, 2024 | Drug DesignMolecular Docking | CodeCode Available | 5 |