PRML Lab. Speech Team

Pattern Recognition & Machine Learning Lab

Korea University

2024

ICASSP, [Paper] [Demo]

TranSentence: Speech-to-Speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data


ICASSP, [Paper] [Demo]

MIDI-Voice: Expressive Zero-shot Singing Voice Synthesis via MIDI-driven Priors


AAAI, [Paper] [Demo]

DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion


2023

Interspeech, [Paper] [Demo]

HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer


Interspeech, [Paper] [Demo]

Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation


ACPR, [Paper] [Demo]

PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling


2022

NeurIPS, [Paper] [Demo]

HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representation for Speech Synthesis


TASLP, [Paper] [Demo]

Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck


ICPR, [Paper] [Demo]

StyleVC: Non-parallel Voice Conversion with Adversarial Style Generalization


ICASSP, [Paper] [Demo]

EmoQ-TTS: Emotion intensity Quantization for Fine-grained Controllable Emotional Text-to-Speech


ICASSP, [Paper] [Demo]

Fre-GAN 2: Fast and Efficient Frequency-consistent Audio Synthesis


ICASSP, [Paper] [Demo]

PVAE-TTS: High-Quality Adaptive Text-to-Speech via Progressive Variational Autoencoder


2021

NeurIPS, [Paper] [Demo]

VoiceMixer: Adversarial Voice Style Mixup


AAAI2021, [Paper] [Demo]

Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Recombination for Speech Synthesis


SMC, [Paper] [Demo]

GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints


Interspeech, [Paper] [Demo]

Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech


Interspeech, [Paper] [Demo]

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis


2020

Interspeech, [Paper] [Demo]

Audio dequantization for high fidelity audio generation in flow-based neural vocoder