| Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition | May 23, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| Advancing Spiking Neural Networks for Sequential Modeling with Central Pattern Generators | May 23, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| Not All Language Model Features Are Linear | May 23, 2024 | AllLanguage Modeling | CodeCode Available | 2 |
| DreamText: High Fidelity Scene Text Synthesis | May 23, 2024 | | CodeCode Available | 2 |
| Flatten Anything: Unsupervised Neural Surface Parameterization | May 23, 2024 | | CodeCode Available | 2 |
| Agent Planning with World Knowledge Model | May 23, 2024 | modelWorld Knowledge | CodeCode Available | 2 |
| RoPINN: Region Optimized Physics-Informed Neural Networks | May 23, 2024 | | CodeCode Available | 2 |
| Mamba-R: Vision Mamba ALSO Needs Registers | May 23, 2024 | MambaSemantic Segmentation | CodeCode Available | 2 |
| AnalogCoder: Analog Circuit Design via Training-Free Code Generation | May 23, 2024 | Code Generation | CodeCode Available | 2 |
| Metric Flow Matching for Smooth Interpolations on the Data Manifold | May 23, 2024 | Trajectory Prediction | CodeCode Available | 2 |
| Calibrated Self-Rewarding Vision Language Models | May 23, 2024 | HallucinationLanguage Modelling | CodeCode Available | 2 |
| Drones Help Drones: A Collaborative Framework for Multi-Drone Object Trajectory Prediction and Beyond | May 23, 2024 | 3D Object Detectionobject-detection | CodeCode Available | 2 |
| Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation | May 23, 2024 | DenoisingImage Denoising | CodeCode Available | 2 |
| TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes | May 23, 2024 | Autonomous DrivingLane Detection | CodeCode Available | 2 |
| Improved Canonicalization for Model Agnostic Equivariance | May 23, 2024 | Contrastive Learningmodel | CodeCode Available | 2 |
| RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar | May 22, 2024 | Autonomous DrivingPrediction | CodeCode Available | 2 |
| Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity | May 22, 2024 | Language ModellingModel Editing | CodeCode Available | 2 |
| Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling | May 22, 2024 | Weather Forecasting | CodeCode Available | 2 |
| Dense Connector for MLLMs | May 22, 2024 | Video Understanding | CodeCode Available | 2 |
| FedCache 2.0: Federated Edge Learning with Knowledge Caching and Dataset Distillation | May 22, 2024 | Dataset DistillationFederated Learning | CodeCode Available | 2 |
| Vikhr: Constructing a State-of-the-art Bilingual Open-Source Instruction-Following Large Language Model for Russian | May 22, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token | May 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles | May 22, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 |
| What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions | May 22, 2024 | Data ValuationGPU | CodeCode Available | 2 |
| Fine-tuned In-Context Learning Transformers are Excellent Tabular Data Classifiers | May 22, 2024 | In-Context Learning | CodeCode Available | 2 |
| VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding | May 22, 2024 | Dense Video CaptioningHighlight Detection | CodeCode Available | 2 |
| BrainMorph: A Foundational Keypoint Model for Robust and Flexible Brain MRI Registration | May 22, 2024 | | CodeCode Available | 2 |
| Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation | May 22, 2024 | InformativenessLanguage Modeling | CodeCode Available | 2 |
| Context and Geometry Aware Voxel Transformer for Semantic Scene Completion | May 22, 2024 | 3D Semantic Scene Completion from a single RGB image | CodeCode Available | 2 |
| Learning Diffusion Priors from Observations by Expectation Maximization | May 22, 2024 | | CodeCode Available | 2 |
| A General Framework for Jersey Number Recognition in Sports Video | May 22, 2024 | Jersey Number RecognitionScene Text Recognition | CodeCode Available | 2 |
| I2I-Mamba: Multi-modal medical image synthesis via selective state space modeling | May 22, 2024 | Image GenerationMamba | CodeCode Available | 2 |
| FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition | May 22, 2024 | Image Generation | CodeCode Available | 2 |
| LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos | May 22, 2024 | | CodeCode Available | 2 |
| CViT: Continuous Vision Transformer for Operator Learning | May 22, 2024 | Operator learning | CodeCode Available | 2 |
| Large Language Models Meet NLP: A Survey | May 21, 2024 | Survey | CodeCode Available | 2 |
| Mamba in Speech: Towards an Alternative to Self-Attention | May 21, 2024 | MambaSpeech Enhancement | CodeCode Available | 2 |
| KPConvX: Modernizing Kernel Point Convolution with Kernel Attention | May 21, 2024 | 3D Point Cloud ClassificationSemantic Segmentation | CodeCode Available | 2 |
| RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search | May 21, 2024 | Quantization | CodeCode Available | 2 |
| ProtT3: Protein-to-Text Generation for Text-based Protein Understanding | May 21, 2024 | Property PredictionQuestion Answering | CodeCode Available | 2 |
| SirLLM: Streaming Infinite Retentive LLM | May 21, 2024 | Text Generation | CodeCode Available | 2 |
| Reducing Transformer Key-Value Cache Size with Cross-Layer Attention | May 21, 2024 | | CodeCode Available | 2 |
| The future of cosmological likelihood-based inference: accelerated high-dimensional parameter estimation and model comparison | May 21, 2024 | Bayesian InferenceCPU | CodeCode Available | 2 |
| FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information | May 21, 2024 | Speech Recognition | CodeCode Available | 2 |
| Wav-KAN: Wavelet Kolmogorov-Arnold Networks | May 21, 2024 | Computational EfficiencyKolmogorov-Arnold Networks | CodeCode Available | 2 |
| LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language | May 21, 2024 | regression | CodeCode Available | 2 |
| GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details | May 20, 2024 | 3D Generation3D Geometry Prediction | CodeCode Available | 2 |
| Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning | May 20, 2024 | BenchmarkingMRI segmentation | CodeCode Available | 2 |
| Imp: Highly Capable Large Multimodal Models for Mobile Devices | May 20, 2024 | QuantizationVisual Question Answering | CodeCode Available | 2 |
| DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment | May 20, 2024 | Contrastive LearningDomain Adaptation | CodeCode Available | 2 |