| ParlamentParla: A Speech Corpus of Catalan Parliamentary Sessions | Jun 1, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations | Mar 1, 2023 | Self-Supervised LearningSpeech Synthesis | —Unverified | 0 |
| PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling | Jun 13, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Penambahan emosi menggunakan metode manipulasi prosodi untuk sistem text to speech bahasa Indonesia | Jun 29, 2016 | Sentencetext-to-speech | —Unverified | 0 |
| Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech | Nov 2, 2020 | Knowledge DistillationSpeech Synthesis | —Unverified | 0 |
| Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis | Oct 28, 2022 | DecoderDiversity | —Unverified | 0 |
| Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice | Jun 14, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes | Dec 17, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis | Jun 4, 2024 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech | Jun 14, 2025 | Grapheme-to-Phoneme Conversiontext-to-speech | —Unverified | 0 |
| Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end | Jan 24, 2022 | Morphological AnalysisPolyphone disambiguation | —Unverified | 0 |
| Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features | Jul 3, 2019 | Polyphone disambiguationSentence | —Unverified | 0 |
| Positional Description for Numerical Normalization | Aug 22, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar | Oct 13, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction | Jun 18, 2025 | Sentencetext-to-speech | —Unverified | 0 |
| Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis | Aug 4, 2018 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Preference Alignment Improves Language Model-Based TTS | Sep 19, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling | Apr 14, 2024 | Polyphone disambiguationText Normalization | —Unverified | 0 |
| Probing Deep Speaker Embeddings for Speaker-related Tasks | Dec 14, 2022 | Speaker RecognitionSpeaker Verification | —Unverified | 0 |
| Probing Speaker-specific Features in Speaker Representations | Jan 9, 2025 | Self-Supervised LearningSpeaker Verification | —Unverified | 0 |
| PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control | Jan 10, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| PSCodec: A Series of High-Fidelity Low-bitrate Neural Speech Codecs Leveraging Prompt Encoders | Apr 3, 2024 | Representation LearningSpeaker Verification | —Unverified | 0 |
| PromptTTS 2: Describing and Generating Voices with Text Prompt | Sep 5, 2023 | Language ModellingLarge Language Model | —Unverified | 0 |
| PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions | Sep 15, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions | Jun 3, 2025 | Expressive Speech SynthesisPrompt Learning | —Unverified | 0 |
| Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis | Nov 19, 2021 | ClusteringDecoder | —Unverified | 0 |
| Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech | Nov 4, 2020 | Graph AttentionRepresentation Learning | —Unverified | 0 |
| Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech | Jun 24, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis | Dec 16, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features | Nov 21, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| Prosody-TTS: An end-to-end speech synthesis system with prosody control | Oct 6, 2021 | RhythmSpeech Synthesis | —Unverified | 0 |
| ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech | Feb 16, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| The Zero Resource Speech Challenge 2019: TTS without T | Apr 25, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories | Aug 20, 2019 | RetrievalTAG | —Unverified | 0 |
| On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition | Jul 31, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Handling Numeric Expressions in Automatic Speech Recognition | Jul 18, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation | Aug 1, 2024 | Representation LearningSpeech Synthesis | —Unverified | 0 |
| Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach | Sep 10, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech | May 15, 2025 | Emotional Speech SynthesisLanguage Modeling | —Unverified | 0 |
| Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese | May 16, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Voice Impression Control in Zero-Shot TTS | Jun 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs | Jun 12, 2025 | Speech-to-Speech Translationtext-to-speech | —Unverified | 0 |
| AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge | Aug 30, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| A Bengali HMM Based Speech Synthesis System | Jun 16, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling | May 26, 2025 | GPUtext-to-speech | —Unverified | 0 |
| AccentBox: Towards High-Fidelity Zero-Shot Accent Generation | Sep 13, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training | Jun 3, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Accent conversion using discrete units with parallel data synthesized from controllable accented TTS | Sep 30, 2024 | Data AugmentationSpeech Synthesis | —Unverified | 0 |
| Accented Text-to-Speech Synthesis with Limited Data | May 8, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A Challenge Set and Methods for Noun-Verb Ambiguity | Oct 1, 2018 | Speech Synthesistext-to-speech | —Unverified | 0 |