| An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM | Mar 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint | Mar 27, 2024 | Image DehazingPseudo Label | CodeCode Available | 2 |
| IDGenRec: LLM-RecSys Alignment with Textual ID Learning | Mar 27, 2024 | Sequential RecommendationText Generation | CodeCode Available | 2 |
| Attention Calibration for Disentangled Text-to-Image Personalization | Mar 27, 2024 | Image GenerationNovel Concepts | CodeCode Available | 2 |
| Generative Medical Segmentation | Mar 27, 2024 | DecoderDomain Generalization | CodeCode Available | 2 |
| Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving | Mar 26, 2024 | Adversarial AttackAutonomous Driving | CodeCode Available | 2 |
| Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model | Mar 26, 2024 | DenoisingReference-based Super-Resolution | CodeCode Available | 2 |
| Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance | Mar 26, 2024 | Motion GenerationMotion Synthesis | CodeCode Available | 2 |
| Multi-Task Dense Prediction via Mixture of Low-Rank Experts | Mar 26, 2024 | DecoderMixture-of-Experts | CodeCode Available | 2 |
| EgoLifter: Open-world 3D Segmentation for Egocentric Perception | Mar 26, 2024 | 3D ReconstructionObject | CodeCode Available | 2 |
| A Survey on 3D Egocentric Human Pose Estimation | Mar 26, 2024 | 3D Human Pose EstimationEgocentric Pose Estimation | CodeCode Available | 2 |
| Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs | Mar 26, 2024 | GPUImage Compression | CodeCode Available | 2 |
| MIND Your Language: A Multilingual Dataset for Cross-lingual News Recommendation | Mar 26, 2024 | Cross-Lingual TransferLanguage Modelling | CodeCode Available | 2 |
| OmniVid: A Generative Framework for Universal Video Understanding | Mar 26, 2024 | Action RecognitionDecoder | CodeCode Available | 2 |
| BVR Gym: A Reinforcement Learning Environment for Beyond-Visual-Range Air Combat | Mar 26, 2024 | | CodeCode Available | 2 |
| Mechanistic Design and Scaling of Hybrid Architectures | Mar 26, 2024 | Mamba | CodeCode Available | 2 |
| Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models | Mar 26, 2024 | | CodeCode Available | 2 |
| Unsupervised Learning for Joint Beamforming Design in RIS-aided ISAC Systems | Mar 26, 2024 | Integrated sensing and communicationISAC | CodeCode Available | 2 |
| Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms | Mar 26, 2024 | Language Modelling | CodeCode Available | 2 |
| Efficient Video Object Segmentation via Modulated Cross-Attention Memory | Mar 26, 2024 | GPUObject | CodeCode Available | 2 |
| AID: Attention Interpolation of Text-to-Image Diffusion | Mar 26, 2024 | Spatial Interpolation | CodeCode Available | 2 |
| Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders | Mar 26, 2024 | ObjectSelf-Supervised Learning | CodeCode Available | 2 |
| LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection | Mar 26, 2024 | Image Generation | CodeCode Available | 2 |
| Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions | Mar 25, 2024 | Attribute | CodeCode Available | 2 |
| VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting | Mar 25, 2024 | Mamba | CodeCode Available | 2 |
| An End-to-End Structure with Novel Position Mechanism and Improved EMD for Stock Forecasting | Mar 25, 2024 | PositionTime Series | CodeCode Available | 2 |
| RepairAgent: An Autonomous, LLM-Based Agent for Program Repair | Mar 25, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos | Mar 25, 2024 | 3D ReconstructionAnimal Pose Estimation | CodeCode Available | 2 |
| AI-Generated Video Detection via Spatio-Temporal Anomaly Learning | Mar 25, 2024 | Optical Flow Estimation | CodeCode Available | 2 |
| Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance | Mar 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Invertible Diffusion Models for Compressed Sensing | Mar 25, 2024 | compressed sensingGPU | CodeCode Available | 2 |
| Is Your LiDAR Placement Optimized for 3D Scene Understanding? | Mar 25, 2024 | 3D Object DetectionLIDAR Semantic Segmentation | CodeCode Available | 2 |
| QKFormer: Hierarchical Spiking Transformer using Q-K Attention | Mar 25, 2024 | | CodeCode Available | 2 |
| Visually Guided Generative Text-Layout Pre-training for Document Intelligence | Mar 25, 2024 | Document Classificationdocument understanding | CodeCode Available | 2 |
| DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action Recognition | Mar 25, 2024 | Action RecognitionSkeleton Based Action Recognition | CodeCode Available | 2 |
| LSTTN: A Long-Short Term Transformer-based Spatio-temporal Neural Network for Traffic Flow Forecasting | Mar 25, 2024 | | CodeCode Available | 2 |
| Understanding Long Videos with Multimodal Language Models | Mar 25, 2024 | Action RecognitionFine-grained Action Recognition | CodeCode Available | 2 |
| TwinLiteNetPlus: A Stronger Model for Real-time Drivable Area and Lane Segmentation | Mar 25, 2024 | Autonomous DrivingDrivable Area Detection | CodeCode Available | 2 |
| Grappa -- A Machine Learned Molecular Mechanics Force Field | Mar 25, 2024 | Computational Efficiency | CodeCode Available | 2 |
| Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion | Mar 25, 2024 | Decoder | CodeCode Available | 2 |
| DreamLIP: Language-Image Pre-training with Long Captions | Mar 25, 2024 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 |
| Composed Video Retrieval via Enriched Context and Discriminative Embeddings | Mar 25, 2024 | Composed Video Retrieval (CoVR)Retrieval | CodeCode Available | 2 |
| Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding | Mar 25, 2024 | Data AugmentationScene Understanding | CodeCode Available | 2 |
| Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation | Mar 25, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| Elysium: Exploring Object-level Perception in Videos via MLLM | Mar 25, 2024 | ObjectObject Tracking | CodeCode Available | 2 |
| Few-Shot Bearing Fault Diagnosis Via Ensembling Transformer-Based Model With Mahalanobis Distance Metric Learning From Multiscale Features | Mar 25, 2024 | ClassificationFault Diagnosis | CodeCode Available | 2 |
| CFAT: Unleashing TriangularWindows for Image Super-resolution | Mar 24, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| A Transformer approach for Electricity Price Forecasting | Mar 24, 2024 | | CodeCode Available | 2 |
| CoverUp: Effective High Coverage Test Generation for Python | Mar 24, 2024 | software testing | CodeCode Available | 2 |
| CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field | Mar 24, 2024 | NeRFNovel View Synthesis | CodeCode Available | 2 |