| FairDiff: Fair Segmentation with Point-Image Diffusion | Jul 8, 2024 | FairnessImage Generation | CodeCode Available | 2 |
| Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision | Jul 8, 2024 | Action Quality AssessmentDescriptive | CodeCode Available | 2 |
| LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages | Jul 8, 2024 | Data AugmentationTranslation | CodeCode Available | 2 |
| SOLO: A Single Transformer for Scalable Vision-Language Modeling | Jul 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| LGRNet: Local-Global Reciprocal Network for Uterine Fibroid Segmentation in Ultrasound Videos | Jul 8, 2024 | SegmentationVideo Polyp Segmentation | CodeCode Available | 2 |
| InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation | Jul 8, 2024 | | CodeCode Available | 2 |
| MEEG and AT-DGNN: Improving EEG Emotion Recognition with Music Introducing and Graph-based Learning | Jul 8, 2024 | Arousal EstimationEEG | CodeCode Available | 2 |
| WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering | Jul 8, 2024 | DiagnosticGenerative Visual Question Answering | CodeCode Available | 2 |
| PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models | Jul 8, 2024 | Autonomous DrivingImage Generation | CodeCode Available | 2 |
| Controllable and Reliable Knowledge-Intensive Task-Oriented Conversational Agents with Declarative Genie Worksheets | Jul 8, 2024 | HallucinationNavigate | CodeCode Available | 2 |
| PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation | Jul 8, 2024 | EthicsLanguage Modeling | CodeCode Available | 2 |
| Training-free CryoET Tomogram Segmentation | Jul 8, 2024 | Contrastive LearningCryogenic Electron Tomography | CodeCode Available | 2 |
| BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space | Jul 8, 2024 | Autonomous DrivingDecoder | CodeCode Available | 2 |
| 4D Contrastive Superflows are Dense 3D Representation Learners | Jul 8, 2024 | Autonomous DrivingContrastive Learning | CodeCode Available | 2 |
| iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement | Jul 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition | Jul 7, 2024 | Emotion RecognitionMultimodal Sentiment Analysis | CodeCode Available | 2 |
| Language Representations Can be What Recommenders Need: Findings and Potentials | Jul 7, 2024 | Collaborative FilteringContrastive Learning | CodeCode Available | 2 |
| See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition | Jul 7, 2024 | parameter-efficient fine-tuning | CodeCode Available | 2 |
| Just read twice: closing the recall gap for recurrent language models | Jul 7, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 2 |
| P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds | Jul 7, 2024 | 3D Single Object TrackingGPU | CodeCode Available | 2 |
| HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning | Jul 7, 2024 | Continual LearningRepresentation Learning | CodeCode Available | 2 |
| Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models | Jul 7, 2024 | class-incremental learningClass Incremental Learning | CodeCode Available | 2 |
| MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding | Jul 6, 2024 | ArticlesInstruction Following | CodeCode Available | 2 |
| How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions | Jul 6, 2024 | Question AnsweringRAG | CodeCode Available | 2 |
| SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention | Jul 6, 2024 | Classificationobject-detection | CodeCode Available | 2 |
| Slice-Consistent 3D Volumetric Brain CT-to-MRI Translation with 2D Brownian Bridge Diffusion Model | Jul 6, 2024 | Image-to-Image TranslationTranslation | CodeCode Available | 2 |
| RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models | Jul 6, 2024 | Medical DiagnosisRAG | CodeCode Available | 2 |
| Associative Recurrent Memory Transformer | Jul 5, 2024 | Retrieval | CodeCode Available | 2 |
| AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation | Jul 5, 2024 | Action RecognitionFew-Shot Image Classification | CodeCode Available | 2 |
| PartCraft: Crafting Creative Objects by Parts | Jul 5, 2024 | | CodeCode Available | 2 |
| Isomorphic Pruning for Vision Models | Jul 5, 2024 | | CodeCode Available | 2 |
| Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units | Jul 5, 2024 | Acoustic Unit DiscoveryAutomatic Speech Recognition | CodeCode Available | 2 |
| RPN: Reconciled Polynomial Network Towards Unifying PGMs, Kernel SVMs, MLP and KAN | Jul 5, 2024 | | CodeCode Available | 2 |
| Discovering symbolic expressions with parallelized tree search | Jul 5, 2024 | Equation Discoveryregression | CodeCode Available | 2 |
| Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection | Jul 5, 2024 | Novel Object Detectionobject-detection | CodeCode Available | 2 |
| ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models | Jul 5, 2024 | HallucinationLong Form Question Answering | CodeCode Available | 2 |
| SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing Industry | Jul 5, 2024 | Benchmarkingobject-detection | CodeCode Available | 2 |
| RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation | Jul 5, 2024 | Human-Object Interaction DetectionRetrieval | CodeCode Available | 2 |
| AnySR: Realizing Image Super-Resolution as Any-Scale, Any-Resource | Jul 5, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents | Jul 5, 2024 | Decision MakingMulti-hop Question Answering | CodeCode Available | 2 |
| VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation | Jul 4, 2024 | | CodeCode Available | 2 |
| ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild | Jul 4, 2024 | Chart UnderstandingDecision Making | CodeCode Available | 2 |
| Occupancy as Set of Points | Jul 4, 2024 | | CodeCode Available | 2 |
| MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis | Jul 4, 2024 | DiagnosticLanguage Modeling | CodeCode Available | 2 |
| TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models | Jul 4, 2024 | RAGRetrieval-augmented Generation | CodeCode Available | 2 |
| Craftium: An Extensible Framework for Creating Reinforcement Learning Environments | Jul 4, 2024 | BenchmarkingMinecraft | CodeCode Available | 2 |
| Benchmarking Complex Instruction-Following with Multiple Constraints Composition | Jul 4, 2024 | BenchmarkingInstruction Following | CodeCode Available | 2 |
| Mixture of A Million Experts | Jul 4, 2024 | Computational EfficiencyLanguage Modeling | CodeCode Available | 2 |
| Unraveling Molecular Structure: A Multimodal Spectroscopic Dataset for Chemistry | Jul 4, 2024 | | CodeCode Available | 2 |
| DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification | Jul 4, 2024 | DescriptiveDiversity | CodeCode Available | 2 |