| PortaSpeech: Portable and High-Quality Generative Text-to-Speech | Sep 30, 2021 | text-to-speechText to Speech | CodeCode Available | 2 |
| Perspective Fields for Single Image Camera Calibration | Dec 6, 2022 | Camera Calibration | CodeCode Available | 2 |
| SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations | Jun 19, 2023 | Node Property PredictionPhilosophy | CodeCode Available | 2 |
| GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks | Nov 28, 2024 | BenchmarkingObject Counting | CodeCode Available | 2 |
| DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer | Jul 10, 2022 | FormInductive Bias | CodeCode Available | 2 |
| What Makes a Scene ? Scene Graph-based Evaluation and Feedback for Controllable Generation | Nov 23, 2024 | Image GenerationScene Generation | CodeCode Available | 2 |
| A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment | Mar 8, 2025 | speech-recognitionSpeech Recognition | CodeCode Available | 2 |
| CoMoSVC: Consistency Model-based Singing Voice Conversion | Jan 3, 2024 | GPUmodel | CodeCode Available | 2 |
| Speech Model Pre-training for End-to-End Spoken Language Understanding | Apr 7, 2019 | Speech-to-TextSpoken Language Understanding | CodeCode Available | 2 |
| AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion | Jun 1, 2024 | Gesture GenerationRhythm | CodeCode Available | 2 |
| Efficient Memory Management for Deep Neural Net Inference | Jan 10, 2020 | Management | CodeCode Available | 2 |
| 2nd Place Solution for Waymo Open Dataset Challenge -- Real-time 2D Object Detection | Jun 16, 2021 | 2D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| Fiber: A Platform for Efficient Development and Distributed Training for Reinforcement Learning and Population-Based Methods | Mar 25, 2020 | Distributed ComputingReinforcement Learning | CodeCode Available | 2 |
| VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI | Oct 15, 2024 | Question AnsweringVideo Question Answering | CodeCode Available | 2 |
| UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection | Feb 28, 2025 | Anomaly DetectionImage Classification | CodeCode Available | 2 |
| UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer | Sep 22, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Out-of-sample scoring and automatic selection of causal estimators | Dec 20, 2022 | Causal InferenceHyperparameter Optimization | CodeCode Available | 2 |
| Linearly-evolved Transformer for Pan-sharpening | Apr 19, 2024 | | CodeCode Available | 2 |
| Geometric Clifford Algebra Networks | Feb 13, 2023 | | CodeCode Available | 2 |
| Diffusion Model Alignment Using Direct Preference Optimization | Nov 21, 2023 | modelText-to-Image Generation | CodeCode Available | 2 |
| ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge | Apr 11, 2017 | General KnowledgeMultilingual Word Embeddings | CodeCode Available | 2 |
| PandaGPT: One Model To Instruction-Follow Them All | May 25, 2023 | AllImage Description | CodeCode Available | 2 |
| Side-channel analysis against ANSSI’s protected AES implementation on ARM: end-to-end attacks with multi-task learning | Mar 10, 2023 | Multi-Task LearningSide Channel Analysis | CodeCode Available | 2 |
| Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler | Aug 23, 2024 | | CodeCode Available | 2 |
| GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting | Jan 23, 2025 | 3DGSNeRF | CodeCode Available | 2 |
| PixelLM: Pixel Reasoning with Large Multimodal Model | Dec 4, 2023 | Decodermodel | CodeCode Available | 2 |
| Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring | Jun 11, 2024 | AttributeDomain Generalization | CodeCode Available | 2 |
| Multi-Agent Large Language Models for Conversational Task-Solving | Oct 30, 2024 | FairnessQuestion Answering | CodeCode Available | 2 |
| KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction | Mar 12, 2024 | Code GenerationLanguage Modelling | CodeCode Available | 2 |
| LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT | Oct 7, 2023 | Audio captioningAutomatic Speech Recognition | CodeCode Available | 2 |
| Towards Robust and Generalizable Lensless Imaging with Modular Learned Reconstruction | Feb 3, 2025 | Transfer Learning | CodeCode Available | 2 |
| CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information | Jun 20, 2024 | Vision and Language Navigation | CodeCode Available | 2 |
| Accelerating Image Super-Resolution Networks with Pixel-Level Classification | Jul 31, 2024 | | CodeCode Available | 2 |
| QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking | Oct 12, 2022 | Contrastive LearningMultiple Object Tracking | CodeCode Available | 2 |
| F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian Splatting | Jan 12, 2025 | | CodeCode Available | 2 |
| Make It Count: Text-to-Image Generation with an Accurate Number of Objects | Jun 14, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats | Dec 29, 2022 | 3D Human Pose EstimationDimensionality Reduction | CodeCode Available | 2 |
| 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment | Aug 8, 2023 | 3D Question Answering (3D-QA)Dense Captioning | CodeCode Available | 2 |
| FatesGS: Fast and Accurate Sparse-View Surface Reconstruction using Gaussian Splatting with Depth-Feature Consistency | Jan 8, 2025 | Novel View SynthesisSurface Reconstruction | CodeCode Available | 2 |
| Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework | Feb 19, 2025 | | CodeCode Available | 2 |
| The dark side of the forces: assessing non-conservative force models for atomistic machine learning | Dec 16, 2024 | Computational chemistryComputational Efficiency | CodeCode Available | 2 |
| Evaluating Frontier Models for Dangerous Capabilities | Mar 20, 2024 | | CodeCode Available | 2 |
| K-LITE: Learning Transferable Visual Models with External Knowledge | Apr 20, 2022 | BenchmarkingDescriptive | CodeCode Available | 2 |
| TIPO: Text to Image with Text Presampling for Prompt Optimization | Nov 12, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 2 |
| Sequential Model-Based Optimization for General Algorithm Configuration | Jan 18, 2011 | Hyperparameter Optimizationmodel | CodeCode Available | 2 |
| Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation | Dec 11, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes | Jun 13, 2025 | Linear evaluationSelf-Supervised Learning | CodeCode Available | 2 |
| M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction | Feb 24, 2022 | Open-Ended Question AnsweringPrediction | CodeCode Available | 2 |
| A fully automatic AI system for tooth and alveolar bone segmentation from cone-beam CT images | Apr 19, 2022 | Segmentation | CodeCode Available | 2 |
| GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning | Jul 4, 2025 | BenchmarkingGraph Generation | CodeCode Available | 2 |