Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Ji-Hoon Kim, Sang-Hoon Lee, Ji-Hyun Lee, Seong-Whan Lee

ABSTRACT

Although recent works on neural vocoder have improved the quality of synthesized audio, there still exists a gap between generated and ground-truth audio in frequency space. This difference leads to spectral artifacts such as hissing noise or reverberation, and thus degrades the sample quality. In this paper, we propose Fre-GAN which achieves frequency-consistent audio synthesis with highly improved generation quality. Specifically, we first present resolution-connected generator and resolution-wise discriminators, which help learn various scales of spectral distributions over multiple frequency bands. Additionally, to reproduce high-frequency components accurately, we leverage discrete wavelet transform in the discriminators. From our experiments, Fre-GAN achieves high-fidelity waveform generation with a gap of only 0.03 MOS compared to ground-truth audio while outperforming standard models in quality.



Audio samples

Script : The forms of printed letters should be beautiful and that their arrangement on the page should be reasonable and a help to the shapeliness of the letters themselves.

Ground Truth

WaveNet

WaveGlow

HiFi-GAN V1

Fre-GAN V1 (Proposed)

HiFi-GAN V2

Fre-GAN V2 (Proposed)

Script : Printings in the only sense with which we are at present concerned differs from most if not from all the arts and crafts represented in the Exhibition.

Ground Truth

WaveNet

WaveGlow

HiFi-GAN V1

Fre-GAN V1 (Proposed)

HiFi-GAN V2

Fre-GAN V2 (Proposed)

Script : The squalor and uncleanness of the debtors' side was intensified by constant overcrowding.

Ground Truth

WaveNet

WaveGlow

HiFi-GAN V1

Fre-GAN V1 (Proposed)

HiFi-GAN V2

Fre-GAN V2 (Proposed)

Script : The association was organized under the most promising auspices.

Ground Truth

WaveNet

WaveGlow

HiFi-GAN V1

Fre-GAN V1 (Proposed)

HiFi-GAN V2

Fre-GAN V2 (Proposed)

Script : The Roman type of all these printers is similar in character.

Ground Truth

WaveNet

WaveGlow

HiFi-GAN V1

Fre-GAN V1 (Proposed)

HiFi-GAN V2

Fre-GAN V2 (Proposed)

Ablation studies

Script : The Roman letter was used side by side with the Gothic.

Ground Truth

Fre-GAN V2 (500k steps)

w/o RCG

w/o NN upsampler

w/o mel condition

w/o RPD & RSD

w/o DWT

HiFi-GAN V2 (500k steps)

Script : One very important matter in "setting up" for fine printing is the spacing. That is, the lateral distance of words from one another.

Ground Truth

Fre-GAN V2 (500k steps)

w/o RCG

w/o NN upsampler

w/o mel condition

w/o RPD & RSD

w/o DWT

HiFi-GAN V2 (500k steps)

Script : He seems to have taken the letter of the Elzevirs of the seventeenth century for his model.

Ground Truth

Fre-GAN V2 (500k steps)

w/o RCG

w/o NN upsampler

w/o mel condition

w/o RPD & RSD

w/o DWT

HiFi-GAN V2 (500k steps)

Script : The general solidity of a page is much to be sought for.

Ground Truth

Fre-GAN V2 (500k steps)

w/o RCG

w/o NN upsampler

w/o mel condition

w/o RPD & RSD

w/o DWT

HiFi-GAN V2 (500k steps)

Script : A further development of the Roman letter took place at Venice.

Ground Truth

Fre-GAN V2 (500k steps)

w/o RCG

w/o NN upsampler

w/o mel condition

w/o RPD & RSD

w/o DWT

HiFi-GAN V2 (500k steps)