Pattern Recognition & Machine Learning Lab
Korea University
TranSentence: Speech-to-Speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data
MIDI-Voice: Expressive Zero-shot Singing Voice Synthesis via MIDI-driven Priors
DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling
HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representation for Speech Synthesis
Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck
StyleVC: Non-parallel Voice Conversion with Adversarial Style Generalization
EmoQ-TTS: Emotion intensity Quantization for Fine-grained Controllable Emotional Text-to-Speech
Fre-GAN 2: Fast and Efficient Frequency-consistent Audio Synthesis
PVAE-TTS: High-Quality Adaptive Text-to-Speech via Progressive Variational Autoencoder
VoiceMixer: Adversarial Voice Style Mixup
Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Recombination for Speech Synthesis
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis