| A Comprehensive Survey on Self-Supervised Learning for Recommendation | Apr 4, 2024 | Contrastive LearningRecommendation Systems | CodeCode Available | 2 |
| Learning Transferable Negative Prompts for Out-of-Distribution Detection | Apr 4, 2024 | Out-of-Distribution DetectionOut of Distribution (OOD) Detection | CodeCode Available | 2 |
| No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance | Apr 4, 2024 | BenchmarkingImage Generation | CodeCode Available | 2 |
| AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution | Apr 4, 2024 | Image Super-ResolutionQuantization | CodeCode Available | 2 |
| LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity | Apr 4, 2024 | Sensitivity | CodeCode Available | 2 |
| LongVLM: Efficient Long Video Understanding via Large Language Models | Apr 4, 2024 | Question AnsweringVideo Question Answering | CodeCode Available | 2 |
| SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer | Apr 4, 2024 | motion predictionNeRF | CodeCode Available | 2 |
| The More You See in 2D, the More You Perceive in 3D | Apr 4, 2024 | 3D ReconstructionImage to 3D | CodeCode Available | 2 |
| Gen3DSR: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View | Apr 4, 2024 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 2 |
| OmniGS: Fast Radiance Field Reconstruction using Omnidirectional Gaussian Splatting | Apr 4, 2024 | GPU | CodeCode Available | 2 |
| Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation | Apr 4, 2024 | Contrastive LearningReferring Expression | CodeCode Available | 2 |
| Is CLIP the main roadblock for fine-grained open-world perception? | Apr 4, 2024 | Autonomous DrivingNovel Concepts | CodeCode Available | 2 |
| CodeEditorBench: Evaluating Code Editing Capability of Large Language Models | Apr 4, 2024 | Code Generation | CodeCode Available | 2 |
| DiffDet4SAR: Diffusion-based Aircraft Target Detection Network for SAR Images | Apr 4, 2024 | Denoising | CodeCode Available | 2 |
| CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching | Apr 4, 2024 | AttributeImage Captioning | CodeCode Available | 2 |
| Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning | Apr 4, 2024 | 3D Scene ReconstructionDepth Estimation | CodeCode Available | 2 |
| MonoCD: Monocular 3D Object Detection with Complementary Depths | Apr 4, 2024 | 3D Object DetectionDepth Estimation | CodeCode Available | 2 |
| DQ-DETR: DETR with Dynamic Query for Tiny Object Detection | Apr 4, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models | Apr 3, 2024 | | CodeCode Available | 2 |
| Tightly-Coupled LiDAR-IMU-Wheel Odometry with Online Calibration of a Kinematic Model for Skid-Steering Robots | Apr 3, 2024 | | CodeCode Available | 2 |
| KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking | Apr 3, 2024 | Fact CheckingForm | CodeCode Available | 2 |
| Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures | Apr 3, 2024 | CPUGPU | CodeCode Available | 2 |
| Effector: A Python package for regional explanations | Apr 3, 2024 | | CodeCode Available | 2 |
| HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras | Apr 3, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection | Apr 3, 2024 | Autonomous Vehiclesobject-detection | CodeCode Available | 2 |
| LidarDM: Generative LiDAR Simulation in a Generated World | Apr 3, 2024 | Autonomous DrivingPoint Cloud Generation | CodeCode Available | 2 |
| JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks | Apr 3, 2024 | LLM Jailbreak | CodeCode Available | 2 |
| Efficient Multi-Vector Dense Retrieval Using Bit Vectors | Apr 3, 2024 | QuantizationRetrieval | CodeCode Available | 2 |
| Scaling Laws for Galaxy Images | Apr 3, 2024 | Domain Adaptation | CodeCode Available | 2 |
| ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline | Apr 3, 2024 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| Prompting for Numerical Sequences: A Case Study on Market Comment Generation | Apr 3, 2024 | Comment GenerationData-to-Text Generation | CodeCode Available | 2 |
| Linear Attention Sequence Parallelism | Apr 3, 2024 | 2k | CodeCode Available | 2 |
| GenN2N: Generative NeRF2NeRF Translation | Apr 3, 2024 | ColorizationContrastive Learning | CodeCode Available | 2 |
| Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models | Apr 3, 2024 | Instruction Following | CodeCode Available | 2 |
| Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge | Apr 2, 2024 | Robotic Grasping | CodeCode Available | 2 |
| Scene Adaptive Sparse Transformer for Event-based Object Detection | Apr 2, 2024 | Objectobject-detection | CodeCode Available | 2 |
| EGTR: Extracting Graph from Transformer for Scene Graph Generation | Apr 2, 2024 | Graph GenerationMulti-Task Learning | CodeCode Available | 2 |
| Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss | Apr 2, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation | Apr 2, 2024 | 3D Pose EstimationPose Estimation | CodeCode Available | 2 |
| Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners | Apr 2, 2024 | class-incremental learningClass Incremental Learning | CodeCode Available | 2 |
| EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking | Apr 2, 2024 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 2 |
| Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration | Apr 2, 2024 | AllDecoder | CodeCode Available | 2 |
| Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation | Apr 2, 2024 | NavigateVision and Language Navigation | CodeCode Available | 2 |
| MotionChain: Conversational Motion Controllers via Multimodal Prompts | Apr 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Accelerating Transformer Pre-training with 2:4 Sparsity | Apr 2, 2024 | GPU | CodeCode Available | 2 |
| Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack | Apr 2, 2024 | Adversarial AttackText Detection | CodeCode Available | 2 |
| Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model | Apr 2, 2024 | DecoderMamba | CodeCode Available | 2 |
| BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition | Apr 2, 2024 | speech-recognitionSpeech Recognition | CodeCode Available | 2 |
| Diffusion^2: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models | Apr 2, 2024 | 3D Generation4D reconstruction | CodeCode Available | 2 |
| Weakly-supervised Audio Separation via Bi-modal Semantic Similarity | Apr 2, 2024 | Semantic SimilaritySemantic Textual Similarity | CodeCode Available | 2 |