| Enhancing Large Vision Language Models with Self-Training on Image Comprehension | May 30, 2024 | Image ComprehensionVisual Question Answering | CodeCode Available | 2 |
| Easy Problems That LLMs Get Wrong | May 30, 2024 | Common Sense ReasoningLogical Reasoning | CodeCode Available | 2 |
| Group Robust Preference Optimization in Reward-free RLHF | May 30, 2024 | | CodeCode Available | 2 |
| Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion | May 30, 2024 | Semantic CommunicationVideo Compression | CodeCode Available | 2 |
| ANAH: Analytical Annotation of Hallucinations in Large Language Models | May 30, 2024 | Generative Question AnsweringHallucination | CodeCode Available | 2 |
| Recurrent neural network wave functions for Rydberg atom arrays on kagome lattice | May 30, 2024 | | CodeCode Available | 2 |
| N-Dimensional Gaussians for Fitting of High Dimensional Functions | May 30, 2024 | | CodeCode Available | 2 |
| STHN: Deep Homography Estimation for UAV Thermal Geo-localization with Satellite Imagery | May 30, 2024 | Autonomous Navigationgeo-localization | CodeCode Available | 2 |
| Fully-inductive Node Classification on Arbitrary Graphs | May 30, 2024 | ClassificationNode Classification | CodeCode Available | 2 |
| Improving the Training of Rectified Flows | May 30, 2024 | Image GenerationKnowledge Distillation | CodeCode Available | 2 |
| LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically Generating Metaheuristics | May 30, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| All-In-One Medical Image Restoration via Task-Adaptive Routing | May 30, 2024 | AllDenoising | CodeCode Available | 2 |
| DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark | May 30, 2024 | DeepFake DetectionMamba | CodeCode Available | 2 |
| Open-Set Domain Adaptation for Semantic Segmentation | May 30, 2024 | Domain AdaptationSemantic Segmentation | CodeCode Available | 2 |
| OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving | May 30, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 2 |
| Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models | May 30, 2024 | | CodeCode Available | 2 |
| Self-Exploring Language Models: Active Preference Elicitation for Online Alignment | May 29, 2024 | Instruction Following | CodeCode Available | 2 |
| NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild | May 29, 2024 | NeRF | CodeCode Available | 2 |
| CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image Formats | May 29, 2024 | De-identificationFairness | CodeCode Available | 2 |
| SketchDeco: Decorating B&W Sketches with Colour | May 29, 2024 | Image ColorizationImage Generation | CodeCode Available | 2 |
| Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching | May 29, 2024 | compressed sensingDeblurring | CodeCode Available | 2 |
| VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos | May 29, 2024 | EgoSchemaMME | CodeCode Available | 2 |
| RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching | May 29, 2024 | DenoisingProtein Design | CodeCode Available | 2 |
| Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models | May 29, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control | May 29, 2024 | RAGResponse Generation | CodeCode Available | 2 |
| Benchmarking and Improving Detail Image Caption | May 29, 2024 | BenchmarkingImage Captioning | CodeCode Available | 2 |
| Compressing Large Language Models using Low Rank and Low Precision Decomposition | May 29, 2024 | Quantization | CodeCode Available | 2 |
| Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer | May 29, 2024 | Facial Expression RecognitionFacial Expression Recognition (FER) | CodeCode Available | 2 |
| Can Graph Learning Improve Planning in LLM-based Agents? | May 29, 2024 | Decision MakingGraph Learning | CodeCode Available | 2 |
| Matryoshka Query Transformer for Large Vision-Language Models | May 29, 2024 | Language ModellingRepresentation Learning | CodeCode Available | 2 |
| ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention | May 28, 2024 | GPURepresentation Learning | CodeCode Available | 2 |
| Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference | May 28, 2024 | GPUText Generation | CodeCode Available | 2 |
| SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals | May 28, 2024 | Contrastive LearningRepresentation Learning | CodeCode Available | 2 |
| FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes | May 28, 2024 | Novel View SynthesisTriplet | CodeCode Available | 2 |
| Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment | May 28, 2024 | cross-modal alignment | CodeCode Available | 2 |
| XTrack: Multimodal Training Boosts RGB-X Video Object Trackers | May 28, 2024 | Inductive BiasMixture-of-Experts | CodeCode Available | 2 |
| Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving | May 28, 2024 | Autonomous DrivingBilevel Optimization | CodeCode Available | 2 |
| Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models | May 28, 2024 | AllComputational Efficiency | CodeCode Available | 2 |
| Frustratingly Easy Test-Time Adaptation of Vision-Language Models | May 28, 2024 | Test-time Adaptation | CodeCode Available | 2 |
| Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification | May 28, 2024 | Person Re-IdentificationTriplet | CodeCode Available | 2 |
| Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations | May 28, 2024 | GPU | CodeCode Available | 2 |
| Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment | May 28, 2024 | | CodeCode Available | 2 |
| Color Shift Estimation-and-Correction for Image Enhancement | May 28, 2024 | Exposure CorrectionImage Enhancement | CodeCode Available | 2 |
| MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance | May 28, 2024 | | CodeCode Available | 2 |
| Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian Splatting | May 28, 2024 | | CodeCode Available | 2 |
| FlashST: A Simple and Universal Prompt-Tuning Framework for Traffic Prediction | May 28, 2024 | In-Context LearningPrediction | CodeCode Available | 2 |
| SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation | May 28, 2024 | AudioCapsAudio Generation | CodeCode Available | 2 |
| FASTopic: Pretrained Transformer is a Fast, Adaptive, Stable, and Transferable Topic Model | May 28, 2024 | RelationTopic Models | CodeCode Available | 2 |
| Dataset Regeneration for Sequential Recommendation | May 28, 2024 | Recommendation SystemsSequential Recommendation | CodeCode Available | 2 |
| Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation | May 28, 2024 | Instance SegmentationObject Proposal Generation | CodeCode Available | 2 |