| GenAI Arena: An Open Evaluation Platform for Generative Models | Jun 6, 2024 | Image GenerationInstruction Following | CodeCode Available | 2 |
| DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data | Jun 6, 2024 | 3D GenerationText to 3D | CodeCode Available | 2 |
| Parameter-Inverted Image Pyramid Networks | Jun 6, 2024 | Computational Efficiencyimage-classification | CodeCode Available | 2 |
| Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction | Jun 6, 2024 | NeRF | CodeCode Available | 2 |
| GLACE: Global Local Accelerated Coordinate Encoding | Jun 6, 2024 | Camera Pose EstimationPose Estimation | CodeCode Available | 2 |
| VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling | Jun 6, 2024 | DiversityMusic Generation | CodeCode Available | 2 |
| Latent Neural Operator for Solving Forward and Inverse PDE Problems | Jun 6, 2024 | Computational EfficiencyGPU | CodeCode Available | 2 |
| PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM | Jun 5, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion | Jun 5, 2024 | 3D Generation3D Reconstruction | CodeCode Available | 2 |
| DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut | Jun 5, 2024 | Image SegmentationSegmentation | CodeCode Available | 2 |
| Combinatorial Optimization with Automated Graph Neural Networks | Jun 5, 2024 | Combinatorial OptimizationGraph Embedding | CodeCode Available | 2 |
| DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences | Jun 5, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 |
| L-PR: Exploiting LiDAR Fiducial Marker for Unordered Low Overlap Multiview Point Cloud Registration | Jun 5, 2024 | 3D geometryPoint Cloud Registration | CodeCode Available | 2 |
| A-Bench: Are LMMs Masters at Evaluating AI-generated Images? | Jun 5, 2024 | | CodeCode Available | 2 |
| GSGAN: Adversarial Learning for Hierarchical Generation of 3D Gaussian Splats | Jun 5, 2024 | 3D-Aware Image Synthesis3D Generation | CodeCode Available | 2 |
| FedPylot: Navigating Federated Learning for Real-Time Object Detection in Internet of Vehicles | Jun 5, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 |
| Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors | Jun 5, 2024 | 3D Lane DetectionAutonomous Driving | CodeCode Available | 2 |
| Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms | Jun 5, 2024 | Low-Rank Matrix CompletionMachine Translation | CodeCode Available | 2 |
| Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models | Jun 5, 2024 | DiversityLanguage Modeling | CodeCode Available | 2 |
| When Spiking neural networks meet temporal attention image decoding and adaptive spiking neuron | Jun 5, 2024 | | CodeCode Available | 2 |
| Audio Mamba: Bidirectional State Space Model for Audio Representation Learning | Jun 5, 2024 | Audio ClassificationClassification | CodeCode Available | 2 |
| Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models | Jun 5, 2024 | Few-Shot LearningLanguage Modeling | CodeCode Available | 2 |
| U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation | Jun 5, 2024 | Image SegmentationKolmogorov-Arnold Networks | CodeCode Available | 2 |
| ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization | Jun 4, 2024 | geo-localizationVisual Place Recognition | CodeCode Available | 2 |
| Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation | Jun 4, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| A Temporal Kolmogorov-Arnold Transformer for Time Series Forecasting | Jun 4, 2024 | DecoderKolmogorov-Arnold Networks | CodeCode Available | 2 |
| Block Transformer: Global-to-Local Language Modeling for Fast Inference | Jun 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Multi-target stain normalization for histology slides | Jun 4, 2024 | | CodeCode Available | 2 |
| XRec: Large Language Models for Explainable Recommendation | Jun 4, 2024 | Collaborative FilteringDecision Making | CodeCode Available | 2 |
| Generative Active Learning for Long-tailed Instance Segmentation | Jun 4, 2024 | Active LearningInstance Segmentation | CodeCode Available | 2 |
| Multi-Stage Speech Bandwidth Extension with Flexible Sampling Rate Control | Jun 4, 2024 | Bandwidth ExtensionCPU | CodeCode Available | 2 |
| Extended Mind Transformers | Jun 4, 2024 | Common Sense Reasoningcounterfactual | CodeCode Available | 2 |
| From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks | Jun 4, 2024 | Image CaptioningLanguage Modelling | CodeCode Available | 2 |
| Demystifying the Compression of Mixture-of-Experts Through a Unified Framework | Jun 4, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| GrootVL: Tree Topology is All You Need in State Space Model | Jun 4, 2024 | Allimage-classification | CodeCode Available | 2 |
| ReLU-KAN: New Kolmogorov-Arnold Networks that Only Need Matrix Addition, Dot Multiplication, and ReLU | Jun 4, 2024 | Kolmogorov-Arnold Networks | CodeCode Available | 2 |
| SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices | Jun 4, 2024 | Text Generation | CodeCode Available | 2 |
| MidiCaps: A large-scale MIDI dataset with text captions | Jun 4, 2024 | Information RetrievalMusic Information Retrieval | CodeCode Available | 2 |
| GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer | Jun 3, 2024 | 3D Object DetectionImage-to-Image Translation | CodeCode Available | 2 |
| Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation | Jun 3, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM models | Jun 3, 2024 | ChunkingMamba | CodeCode Available | 2 |
| SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM | Jun 3, 2024 | DecoderGPU | CodeCode Available | 2 |
| A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization | Jun 3, 2024 | Combinatorial Optimization | CodeCode Available | 2 |
| FreeTumor: Advance Tumor Segmentation via Large-Scale Tumor Synthesis | Jun 3, 2024 | SegmentationTumor Segmentation | CodeCode Available | 2 |
| TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine | Jun 3, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 2 |
| EduNLP: Towards a Unified and Modularized Library for Educational Resources | Jun 3, 2024 | | CodeCode Available | 2 |
| Boosting Vision-Language Models with Transduction | Jun 3, 2024 | Few-Shot LearningTransductive Learning | CodeCode Available | 2 |
| Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization | Jun 3, 2024 | Survey | CodeCode Available | 2 |
| Tetrahedron Splatting for 3D Generation | Jun 3, 2024 | 3D Generation3DGS | CodeCode Available | 2 |
| The Geometry of Categorical and Hierarchical Concepts in Large Language Models | Jun 3, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |