Ji-Hoon Kim, Sang-Hoon Lee, Ji-Hyun Lee, Seong-Whan Lee
Although recent works on neural vocoder have improved the quality of synthesized audio, there still exists a gap between generated and ground-truth audio in frequency space. This difference leads to spectral artifacts such as hissing noise or reverberation, and thus degrades the sample quality. In this paper, we propose Fre-GAN which achieves frequency-consistent audio synthesis with highly improved generation quality. Specifically, we first present resolution-connected generator and resolution-wise discriminators, which help learn various scales of spectral distributions over multiple frequency bands. Additionally, to reproduce high-frequency components accurately, we leverage discrete wavelet transform in the discriminators. From our experiments, Fre-GAN achieves high-fidelity waveform generation with a gap of only 0.03 MOS compared to ground-truth audio while outperforming standard models in quality.
Script : The forms of printed letters should be beautiful and that their arrangement on the page should be reasonable and a help to the shapeliness of the letters themselves.
|
||||
---|---|---|---|---|
Ground Truth |
WaveNet |
WaveGlow |
||
HiFi-GAN V1 |
Fre-GAN V1 (Proposed) |
HiFi-GAN V2 |
Fre-GAN V2 (Proposed) |
|
Script : Printings in the only sense with which we are at present concerned differs from most if not from all the arts and crafts represented in the Exhibition.
|
||||
---|---|---|---|---|
Ground Truth |
WaveNet |
WaveGlow |
||
HiFi-GAN V1 |
Fre-GAN V1 (Proposed) |
HiFi-GAN V2 |
Fre-GAN V2 (Proposed) |
|
Script : The squalor and uncleanness of the debtors' side was intensified by constant overcrowding.
|
||||
---|---|---|---|---|
Ground Truth |
WaveNet |
WaveGlow |
||
HiFi-GAN V1 |
Fre-GAN V1 (Proposed) |
HiFi-GAN V2 |
Fre-GAN V2 (Proposed) |
|
Script : The association was organized under the most promising auspices.
|
||||
---|---|---|---|---|
Ground Truth |
WaveNet |
WaveGlow |
||
HiFi-GAN V1 |
Fre-GAN V1 (Proposed) |
HiFi-GAN V2 |
Fre-GAN V2 (Proposed) |
|
Script : The Roman type of all these printers is similar in character.
|
||||
---|---|---|---|---|
Ground Truth |
WaveNet |
WaveGlow |
||
HiFi-GAN V1 |
Fre-GAN V1 (Proposed) |
HiFi-GAN V2 |
Fre-GAN V2 (Proposed) |
|
Script : The Roman letter was used side by side with the Gothic.
|
||||
---|---|---|---|---|
Ground Truth |
Fre-GAN V2 (500k steps) |
w/o RCG |
w/o NN upsampler |
|
w/o mel condition |
w/o RPD & RSD |
w/o DWT |
HiFi-GAN V2 (500k steps) |
Script : One very important matter in "setting up" for fine printing is the spacing. That is, the lateral distance of words from one another.
|
||||
---|---|---|---|---|
Ground Truth |
Fre-GAN V2 (500k steps) |
w/o RCG |
w/o NN upsampler |
|
w/o mel condition |
w/o RPD & RSD |
w/o DWT |
HiFi-GAN V2 (500k steps) |
Script : He seems to have taken the letter of the Elzevirs of the seventeenth century for his model.
|
||||
---|---|---|---|---|
Ground Truth |
Fre-GAN V2 (500k steps) |
w/o RCG |
w/o NN upsampler |
|
w/o mel condition |
w/o RPD & RSD |
w/o DWT |
HiFi-GAN V2 (500k steps) |
Script : The general solidity of a page is much to be sought for.
|
||||
---|---|---|---|---|
Ground Truth |
Fre-GAN V2 (500k steps) |
w/o RCG |
w/o NN upsampler |
|
w/o mel condition |
w/o RPD & RSD |
w/o DWT |
HiFi-GAN V2 (500k steps) |
Script : A further development of the Roman letter took place at Venice.
|
||||
---|---|---|---|---|
Ground Truth |
Fre-GAN V2 (500k steps) |
w/o RCG |
w/o NN upsampler |
|
w/o mel condition |
w/o RPD & RSD |
w/o DWT |
HiFi-GAN V2 (500k steps) |