| FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models | Apr 20, 2024 | Binary ClassificationFake Image Detection | CodeCode Available | 2 | 5 |
| A General Framework for Jersey Number Recognition in Sports Video | May 22, 2024 | Jersey Number RecognitionScene Text Recognition | CodeCode Available | 2 | 5 |
| MobileQuant: Mobile-friendly Quantization for On-device Language Models | Aug 25, 2024 | Quantization | CodeCode Available | 2 | 5 |
| STAEformer: Spatio-Temporal Adaptive Embedding Makes Vanilla Transformer SOTA for Traffic Forecasting | Aug 21, 2023 | Time SeriesTraffic Prediction | CodeCode Available | 2 | 5 |
| Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration | May 24, 2022 | Image GenerationInfrared And Visible Image Fusion | CodeCode Available | 2 | 5 |
| Wavelet Diffusion Neural Operator | Dec 6, 2024 | | CodeCode Available | 2 | 5 |
| DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation | Sep 5, 2024 | | CodeCode Available | 2 | 5 |
| FedModule: A Modular Federated Learning Framework | Sep 7, 2024 | Federated LearningPersonalized Federated Learning | CodeCode Available | 2 | 5 |
| Dawn of the transformer era in speech emotion recognition: closing the valence gap | Mar 14, 2022 | Cross-corpusEmotion Recognition | CodeCode Available | 2 | 5 |
| NetTrack: Tracking Highly Dynamic Objects with a Net | Mar 17, 2024 | Multi-Object TrackingObject | CodeCode Available | 2 | 5 |
| QuadTree Attention for Vision Transformers | Jan 8, 2022 | object-detectionObject Detection | CodeCode Available | 2 | 5 |
| OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer | Jun 24, 2024 | AI AgentLarge Language Model | CodeCode Available | 2 | 5 |
| XSimGCL: Towards Extremely Simple Graph Contrastive Learning for Recommendation | Sep 6, 2022 | Contrastive Learning | CodeCode Available | 2 | 5 |
| Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs | Jun 8, 2025 | | CodeCode Available | 2 | 5 |
| fluke: Federated Learning Utility frameworK for Experimentation and research | Dec 20, 2024 | Federated Learning | CodeCode Available | 2 | 5 |
| EgoLifter: Open-world 3D Segmentation for Egocentric Perception | Mar 26, 2024 | 3D ReconstructionObject | CodeCode Available | 2 | 5 |
| Are Large Kernels Better Teachers than Transformers for ConvNets? | May 30, 2023 | Knowledge Distillation | CodeCode Available | 2 | 5 |
| Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions | Jul 23, 2024 | Depth EstimationDepth Prediction | CodeCode Available | 2 | 5 |
| Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | Mar 12, 2024 | Autonomous DrivingDepth Estimation | CodeCode Available | 2 | 5 |
| Visual Autoregressive Modeling for Image Super-Resolution | Jan 31, 2025 | Image Super-ResolutionQuantization | CodeCode Available | 2 | 5 |
| MBQ: Modality-Balanced Quantization for Large Vision-Language Models | Dec 27, 2024 | GPUQuantization | CodeCode Available | 2 | 5 |
| Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation | May 3, 2022 | 2D Human Pose EstimationMulti-Person Pose Estimation | CodeCode Available | 2 | 5 |
| Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere | Jun 6, 2023 | Operator learning | CodeCode Available | 2 | 5 |
| ChatterBox: Multi-round Multimodal Referring and Grounding | Jan 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| PHUDGE: Phi-3 as Scalable Judge | May 12, 2024 | Data Augmentation | CodeCode Available | 2 | 5 |
| Towards Realistic Generative 3D Face Models | Apr 24, 2023 | 3D Face ReconstructionFace Model | CodeCode Available | 2 | 5 |
| SustainDC: Benchmarking for Sustainable Data Center Control | Aug 14, 2024 | BenchmarkingManagement | CodeCode Available | 2 | 5 |
| Platypus: Quick, Cheap, and Powerful Refinement of LLMs | Aug 14, 2023 | GPU | CodeCode Available | 2 | 5 |
| MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly | May 15, 2025 | 8kBenchmarking | CodeCode Available | 2 | 5 |
| TextSLAM: Visual SLAM with Semantic Planar Text Features | May 17, 2023 | Mixed RealityObject SLAM | CodeCode Available | 2 | 5 |
| Trust, but Verify: Cross-Modality Fusion for HD Map Change Detection | Dec 14, 2022 | Change Detection | CodeCode Available | 2 | 5 |
| LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation | Oct 16, 2023 | GPUImage Animation | CodeCode Available | 2 | 5 |
| SemiCD-VL: Visual-Language Model Guidance Makes Better Semi-supervised Change Detector | May 8, 2024 | Change DetectionLanguage Modeling | CodeCode Available | 2 | 5 |
| Gotta Hear Them All: Sound Source Aware Vision to Audio Generation | Nov 23, 2024 | AllAudio Generation | CodeCode Available | 2 | 5 |
| Audio-Visual Segmentation with Semantics | Jan 30, 2023 | SegmentationSemantic Segmentation | CodeCode Available | 2 | 5 |
| A Simple and Effective Pruning Approach for Large Language Models | Jun 20, 2023 | Network Pruning | CodeCode Available | 2 | 5 |
| S-Graphs+: Real-time Localization and Mapping leveraging Hierarchical Representations | Dec 22, 2022 | | CodeCode Available | 2 | 5 |
| 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models | Mar 13, 2025 | Large Language ModelObject | CodeCode Available | 2 | 5 |
| MATCHA: Towards Matching Anything | Jan 1, 2025 | Point Tracking | CodeCode Available | 2 | 5 |
| FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models | Nov 5, 2024 | | CodeCode Available | 2 | 5 |
| SEGA: Instructing Text-to-Image Models using Semantic Guidance | Jan 28, 2023 | | CodeCode Available | 2 | 5 |
| On the detection of synthetic images generated by diffusion models | Nov 1, 2022 | Image Compression | CodeCode Available | 2 | 5 |
| Universal Neural Functionals | Feb 7, 2024 | | CodeCode Available | 2 | 5 |
| Structure PLP-SLAM: Efficient Sparse Mapping and Localization using Point, Line and Plane for Monocular, RGB-D and Stereo Cameras | Jul 13, 2022 | Camera LocalizationPose Tracking | CodeCode Available | 2 | 5 |
| Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation | May 1, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 | 5 |
| LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | Apr 27, 2023 | Common Sense ReasoningCoreference Resolution | CodeCode Available | 2 | 5 |
| PLA: Language-Driven Open-Vocabulary 3D Scene Understanding | Nov 29, 2022 | 3D Open-Vocabulary Instance SegmentationContrastive Learning | CodeCode Available | 2 | 5 |
| TIM: A Time Interval Machine for Audio-Visual Action Recognition | Apr 8, 2024 | Action DetectionAction Recognition | CodeCode Available | 2 | 5 |
| Beyond MOT: Semantic Multi-Object Tracking | Mar 8, 2024 | Multi-Object TrackingObject | CodeCode Available | 2 | 5 |
| From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning | Aug 23, 2023 | Instruction Following | CodeCode Available | 2 | 5 |