| ParlamentParla: A Speech Corpus of Catalan Parliamentary Sessions | Jun 1, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations | Mar 1, 2023 | Self-Supervised LearningSpeech Synthesis | —Unverified | 0 | 0 |
| PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling | Jun 13, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Penambahan emosi menggunakan metode manipulasi prosodi untuk sistem text to speech bahasa Indonesia | Jun 29, 2016 | Sentencetext-to-speech | —Unverified | 0 | 0 |
| Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech | Nov 2, 2020 | Knowledge DistillationSpeech Synthesis | —Unverified | 0 | 0 |
| Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis | Oct 28, 2022 | DecoderDiversity | —Unverified | 0 | 0 |
| Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice | Jun 14, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes | Dec 17, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 | 0 |
| Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis | Jun 4, 2024 | In-Context LearningLanguage Modeling | —Unverified | 0 | 0 |
| Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech | Jun 14, 2025 | Grapheme-to-Phoneme Conversiontext-to-speech | —Unverified | 0 | 0 |
| Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end | Jan 24, 2022 | Morphological AnalysisPolyphone disambiguation | —Unverified | 0 | 0 |
| Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features | Jul 3, 2019 | Polyphone disambiguationSentence | —Unverified | 0 | 0 |
| Positional Description for Numerical Normalization | Aug 22, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 | 0 |
| Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar | Oct 13, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction | Jun 18, 2025 | Sentencetext-to-speech | —Unverified | 0 | 0 |
| Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis | Aug 4, 2018 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Preference Alignment Improves Language Model-Based TTS | Sep 19, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling | Apr 14, 2024 | Polyphone disambiguationText Normalization | —Unverified | 0 | 0 |
| Probing Deep Speaker Embeddings for Speaker-related Tasks | Dec 14, 2022 | Speaker RecognitionSpeaker Verification | —Unverified | 0 | 0 |
| Probing Speaker-specific Features in Speaker Representations | Jan 9, 2025 | Self-Supervised LearningSpeaker Verification | —Unverified | 0 | 0 |
| PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control | Jan 10, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| PSCodec: A Series of High-Fidelity Low-bitrate Neural Speech Codecs Leveraging Prompt Encoders | Apr 3, 2024 | Representation LearningSpeaker Verification | —Unverified | 0 | 0 |
| PromptTTS 2: Describing and Generating Voices with Text Prompt | Sep 5, 2023 | Language ModellingLarge Language Model | —Unverified | 0 | 0 |
| PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions | Sep 15, 2023 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions | Jun 3, 2025 | Expressive Speech SynthesisPrompt Learning | —Unverified | 0 | 0 |
| Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis | Nov 19, 2021 | ClusteringDecoder | —Unverified | 0 | 0 |
| Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech | Nov 4, 2020 | Graph AttentionRepresentation Learning | —Unverified | 0 | 0 |
| Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech | Jun 24, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis | Dec 16, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features | Nov 21, 2019 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Prosody-TTS: An end-to-end speech synthesis system with prosody control | Oct 6, 2021 | RhythmSpeech Synthesis | —Unverified | 0 | 0 |
| ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech | Feb 16, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech | Aug 28, 2023 | Domain Generalizationtext-to-speech | —Unverified | 0 | 0 |
| Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis | Apr 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Punjabi Text-To-Speech Synthesis System | Dec 1, 2012 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| 運用Python結合語音辨識及合成技術於自動化音文同步之實作(A Python Implementation of Automatic Speech-text Synchronization Using Speech Recognition and Text-to-Speech Technology)[In Chinese] | Oct 1, 2015 | speech-recognitionSpeech Recognition | —Unverified | 0 | 0 |
| QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis | Mar 14, 2023 | Emotional Speech SynthesisSentence | —Unverified | 0 | 0 |
| RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis | Apr 4, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning | Dec 2, 2023 | Decodertext-to-speech | —Unverified | 0 | 0 |
| RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations | May 24, 2025 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 | 0 |
| RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis | Oct 29, 2024 | DenoisingSinging Voice Synthesis | —Unverified | 0 | 0 |
| Reading Assistance through LARA, the Learning And Reading Assistant | Jun 1, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Real-Time Pill Identification for the Visually Impaired Using Deep Learning | May 8, 2024 | Deep LearningManagement | —Unverified | 0 | 0 |
| ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence | May 9, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis | Sep 8, 2021 | Expressive Speech SynthesisSentence | —Unverified | 0 | 0 |
| Refer-iTTS: A System for Referring in Spoken Installments to Objects in Real-World Images | Sep 1, 2017 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |
| Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss | Apr 28, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech | Jun 5, 2021 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability | Apr 3, 2021 | Emotion Recognitionreinforcement-learning | —Unverified | 0 | 0 |
| DLPO: Diffusion Model Loss-Guided Reinforcement Learning for Fine-Tuning Text-to-Speech Diffusion Models | May 23, 2024 | Image Generationreinforcement-learning | —Unverified | 0 | 0 |