| MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection | Apr 12, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation | Apr 12, 2024 | | CodeCode Available | 2 |
| Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues | Apr 12, 2024 | Data AugmentationFace Anti-Spoofing | CodeCode Available | 2 |
| LaSagnA: Language-based Segmentation Assistant for Complex Queries | Apr 12, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| Learning representations of learning representations | Apr 12, 2024 | Sentence | CodeCode Available | 2 |
| LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning | Apr 12, 2024 | Image SegmentationLanguage Modeling | CodeCode Available | 2 |
| Inheritune: Training Smaller Yet More Attentive Language Models | Apr 12, 2024 | DecoderLanguage Modelling | CodeCode Available | 2 |
| PnLCalib: Sports Field Registration via Points and Lines Optimization | Apr 12, 2024 | Camera CalibrationHomography Estimation | CodeCode Available | 2 |
| Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese | Apr 11, 2024 | | CodeCode Available | 2 |
| AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs | Apr 11, 2024 | Safety Alignment | CodeCode Available | 2 |
| HGRN2: Gated Linear RNNs with State Expansion | Apr 11, 2024 | Image ClassificationLanguage Modeling | CodeCode Available | 2 |
| Can Vehicle Motion Planning Generalize to Realistic Long-tail Scenarios? | Apr 11, 2024 | Autonomous DrivingMotion Planning | CodeCode Available | 2 |
| Multi-view Aggregation Network for Dichotomous Image Segmentation | Apr 11, 2024 | DecoderDichotomous Image Segmentation | CodeCode Available | 2 |
| Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval | Apr 11, 2024 | DecoderDense Video Captioning | CodeCode Available | 2 |
| LLoCO: Learning Long Contexts Offline | Apr 11, 2024 | 4kIn-Context Learning | CodeCode Available | 2 |
| SFSORT: Scene Features-based Simple Online Real-Time Tracker | Apr 11, 2024 | CPUMulti-Object Tracking | CodeCode Available | 2 |
| GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo | Apr 11, 2024 | 3D Reconstruction | CodeCode Available | 2 |
| From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples | Apr 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation | Apr 11, 2024 | | CodeCode Available | 2 |
| Manipulating Large Language Models to Increase Product Visibility | Apr 11, 2024 | STS | CodeCode Available | 2 |
| Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding | Apr 11, 2024 | 3D geometryparameter-efficient fine-tuning | CodeCode Available | 2 |
| PRAM: Place Recognition Anywhere Model for Efficient Visual Localization | Apr 11, 2024 | Autonomous DrivingLandmark Recognition | CodeCode Available | 2 |
| QuasiSim: Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer | Apr 11, 2024 | | CodeCode Available | 2 |
| Rethinking Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising | Apr 11, 2024 | Computational EfficiencyDenoising | CodeCode Available | 2 |
| GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh | Apr 11, 2024 | Computational Efficiency | CodeCode Available | 2 |
| Behavior Trees Enable Structured Programming of Language Model Agents | Apr 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| LaVy: Vietnamese Multimodal Large Language Model | Apr 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| ViM-UNet: Vision Mamba for Biomedical Segmentation | Apr 11, 2024 | Instance SegmentationMamba | CodeCode Available | 2 |
| Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation | Apr 11, 2024 | | CodeCode Available | 2 |
| Differentiable All-pole Filters for Time-varying Audio Systems | Apr 11, 2024 | AllAudio Effects Modeling | CodeCode Available | 2 |
| Self-supervised Dataset Distillation: A Good Compression Is All You Need | Apr 11, 2024 | AllDataset Distillation | CodeCode Available | 2 |
| FusionMamba: Efficient Remote Sensing Image Fusion with State Space Model | Apr 11, 2024 | Mamba | CodeCode Available | 2 |
| Classifier-guided neural blind deconvolution: a physics-informed denoising module for bearing fault diagnosis under heavy noise | Apr 11, 2024 | Deep LearningDenoising | CodeCode Available | 2 |
| Latent Guard: a Safety Framework for Text-to-image Generation | Apr 11, 2024 | Contrastive LearningImage Generation | CodeCode Available | 2 |
| Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening | Apr 11, 2024 | Pansharpening | CodeCode Available | 2 |
| MindBridge: A Cross-Subject Brain Decoding Framework | Apr 11, 2024 | Brain DecodingData Augmentation | CodeCode Available | 2 |
| Deep learning-driven pulmonary artery and vein segmentation reveals demography-associated vasculature anatomical differences | Apr 11, 2024 | AnatomySegmentation | CodeCode Available | 2 |
| Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness | Apr 10, 2024 | Speech Synthesistext-to-speech | CodeCode Available | 2 |
| MoCha-Stereo: Motif Channel Attention Network for Stereo Matching | Apr 10, 2024 | Disparity EstimationStereo Depth Estimation | CodeCode Available | 2 |
| Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic | Apr 10, 2024 | GPU | CodeCode Available | 2 |
| CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations | Apr 10, 2024 | Dialogue Generationtext-to-speech | CodeCode Available | 2 |
| Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior | Apr 10, 2024 | 3D GenerationModel Optimization | CodeCode Available | 2 |
| Scaling Multi-Camera 3D Object Detection through Weak-to-Strong Eliciting | Apr 10, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| The CAST package for training and assessment of spatial prediction models in R | Apr 10, 2024 | feature selectionModel Selection | CodeCode Available | 2 |
| NeuroNet: A Novel Hybrid Self-Supervised Learning Framework for Sleep Stage Classification Using Single-Channel EEG | Apr 10, 2024 | Contrastive LearningEEG | CodeCode Available | 2 |
| Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study | Apr 10, 2024 | Representation LearningTime Series | CodeCode Available | 2 |
| Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? | Apr 10, 2024 | | CodeCode Available | 2 |
| Sparse Global Matching for Video Frame Interpolation with Large Motion | Apr 10, 2024 | Video Frame Interpolation | CodeCode Available | 2 |
| UMBRAE: Unified Multimodal Brain Decoding | Apr 10, 2024 | Brain DecodingLanguage Modeling | CodeCode Available | 2 |
| Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation | Apr 10, 2024 | Question AnsweringRAG | CodeCode Available | 2 |