| Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI | Jan 25, 2024 | | CodeCode Available | 3 |
| pix2gestalt: Amodal Segmentation by Synthesizing Wholes | Jan 25, 2024 | 3D ReconstructionObject Recognition | CodeCode Available | 3 |
| Marabou 2.0: A Versatile Formal Analyzer of Neural Networks | Jan 25, 2024 | | CodeCode Available | 3 |
| MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache | Jan 25, 2024 | GPUmodel | CodeCode Available | 3 |
| An Extensible Framework for Open Heterogeneous Collaborative Perception | Jan 25, 2024 | | CodeCode Available | 3 |
| AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents | Jan 24, 2024 | Benchmarking | CodeCode Available | 3 |
| VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks | Jan 24, 2024 | | CodeCode Available | 3 |
| Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment | Jan 23, 2024 | AllInstruction Following | CodeCode Available | 3 |
| Benchmarking LLMs via Uncertainty Quantification | Jan 23, 2024 | BenchmarkingUncertainty Quantification | CodeCode Available | 3 |
| Lumiere: A Space-Time Diffusion Model for Video Generation | Jan 23, 2024 | Super-ResolutionText-to-Video Generation | CodeCode Available | 3 |
| Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text | Jan 22, 2024 | | CodeCode Available | 3 |
| In-Context Learning for Extreme Multi-Label Classification | Jan 22, 2024 | ClassificationExtreme Multi-Label Classification | CodeCode Available | 3 |
| A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation | Jan 22, 2024 | BenchmarkingDiagnostic | CodeCode Available | 3 |
| MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo | Jan 22, 2024 | 3D ReconstructionDepth Estimation | CodeCode Available | 3 |
| Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms | Jan 22, 2024 | Evolutionary Algorithmsreinforcement-learning | CodeCode Available | 3 |
| MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer | Jan 18, 2024 | | CodeCode Available | 3 |
| The Manga Whisperer: Automatically Generating Transcriptions for Comics | Jan 18, 2024 | | CodeCode Available | 3 |
| RAP-SAM: Towards Real-Time All-Purpose Segment Anything | Jan 18, 2024 | AllDecoder | CodeCode Available | 3 |
| Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models | Jan 17, 2024 | Task Planning | CodeCode Available | 3 |
| SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents | Jan 17, 2024 | Natural Language Visual Grounding | CodeCode Available | 3 |
| GARField: Group Anything with Radiance Fields | Jan 17, 2024 | Scene Understanding | CodeCode Available | 3 |
| Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities | Jan 16, 2024 | Autonomous DrivingNeRF | CodeCode Available | 3 |
| RoHM: Robust Human Motion Reconstruction via Diffusion | Jan 16, 2024 | Denoising | CodeCode Available | 3 |
| Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models | Jan 16, 2024 | GPUQuantization | CodeCode Available | 3 |
| ModernTCN: A Modern Pure Convolution Structure for General Time Series Analysis | Jan 16, 2024 | Time SeriesTime Series Analysis | CodeCode Available | 3 |
| AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception | Jan 16, 2024 | MLLM Evaluation: Aesthetics | CodeCode Available | 3 |
| A Survey of Resource-efficient LLM and Multimodal Foundation Models | Jan 16, 2024 | Survey | CodeCode Available | 3 |
| MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline | Jan 16, 2024 | GSM8KMath | CodeCode Available | 3 |
| Small LLMs Are Weak Tool Learners: A Multi-LLM Agent | Jan 14, 2024 | Language ModellingLarge Language Model | CodeCode Available | 3 |
| How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs | Jan 12, 2024 | | CodeCode Available | 3 |
| INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning | Jan 12, 2024 | Diversitydocument understanding | CodeCode Available | 3 |
| GroundingGPT:Language Enhanced Multi-modal Grounding Model | Jan 11, 2024 | Language ModellingLarge Language Model | CodeCode Available | 3 |
| Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs | Jan 11, 2024 | Representation LearningSelf-Supervised Learning | CodeCode Available | 3 |
| AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning | Jan 10, 2024 | Question Answering | CodeCode Available | 3 |
| Deep learning in motion deblurring: current status, benchmarks and future prospects | Jan 10, 2024 | DeblurringDeep Learning | CodeCode Available | 3 |
| Evaluating Language Model Agency through Negotiations | Jan 9, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 3 |
| Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models | Jan 9, 2024 | GPU | CodeCode Available | 3 |
| RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation | Jan 9, 2024 | GPUMath | CodeCode Available | 3 |
| Universal Time-Series Representation Learning: A Survey | Jan 8, 2024 | Feature EngineeringRepresentation Learning | CodeCode Available | 3 |
| GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation | Jan 8, 2024 | 3D GenerationText to 3D | CodeCode Available | 3 |
| MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts | Jan 8, 2024 | MambaMixture-of-Experts | CodeCode Available | 3 |
| Improved motif-scaffolding with SE(3) flow matching | Jan 8, 2024 | Data AugmentationDiversity | CodeCode Available | 3 |
| DiarizationLM: Speaker Diarization Post-Processing with Large Language Models | Jan 7, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 |
| EAT: Self-Supervised Pre-Training with Efficient Audio Transformer | Jan 7, 2024 | Audio ClassificationSelf-Supervised Learning | CodeCode Available | 3 |
| Pheme: Efficient and Conversational Speech Generation | Jan 5, 2024 | | CodeCode Available | 3 |
| The Rise of Diffusion Models in Time-Series Forecasting | Jan 5, 2024 | Time SeriesTime Series Analysis | CodeCode Available | 3 |
| Denoising Vision Transformers | Jan 5, 2024 | DenoisingDepth Estimation | CodeCode Available | 3 |
| Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model | Jan 4, 2024 | Combinatorial OptimizationLanguage Modeling | CodeCode Available | 3 |
| Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket | Jan 4, 2024 | image-classificationImage Classification | CodeCode Available | 3 |
| LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model | Jan 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |