Fre-GAN 2: Fast and Efficient Frequency-consistent Audio Synthesis

Sang-Hoon Lee, Ji-Hoon Kim, Kang-Eun Lee, Seong-Whan Lee

ABSTRACT

Although recent advances in neural vocoder have shown significant improvement, most of these models have a trade-off between audio quality and computational complexity. Since the large model has a limitation on the low-resource devices, a more efficient neural vocoder should synthesize high-quality audio for practical applicability. In this paper, we present Fre-GAN 2, a fast and efficient high-quality audio synthesis model. For fast synthesis, Fre-GAN 2 only synthesizes low and high-frequency parts of the audio, and we leverage the inverse discrete wavelet transform to reproduce the target-resolution audio in the generator. Additionally, we also introduce adversarial periodic feature distillation, which makes the model synthesize high-quality audio with only a small parameter. The experimental results show the superiority of Fre-GAN 2 in audio quality. Furthermore, Fre-GAN 2 has a 10.91×generation acceleration, and the parameters are compressed by 21.23×than Fre-GAN.

Audio Samples

 

Script : Printing in the only sense with which we are at present concerned differs from most if not from all the arts and crafts represented in the Exhibition .

Ground Truth

WaveNet

HiFi-GAN V1

HiFi-GAN V2

Fre-GAN V1

 

Fre-GAN 2 V1

(Single-level iDWT)

Fre-GAN 2 V1

(Multi-level iDWT)

Fre-GAN V2

 

Fre-GAN 2 V2

(Single-level iDWT)

Fre-GAN 2 V2

(Multi-level iDWT)

Fre-GAN 2* V2

(Multi-level iDWT, APFD)

Script : This is best furthered by the avoidance of irrational swellings and spiky projections and by the using of careful purity of line..

Ground Truth

WaveNet

HiFi-GAN V1

HiFi-GAN V2

Fre-GAN V1

 

Fre-GAN 2 V1

(Single-level iDWT)

Fre-GAN 2 V1

(Multi-level iDWT)

Fre-GAN V2

 

Fre-GAN 2 V2

(Single-level iDWT)

Fre-GAN 2 V2

(Multi-level iDWT)

Fre-GAN 2* V2

(Multi-level iDWT, APFD)

Script : The supply of which was, however, limited, and there were not always enough to give bedding to all. The stock was diminished by theft.

Ground Truth

WaveNet

HiFi-GAN V1

HiFi-GAN V2

Fre-GAN V1

 

Fre-GAN 2 V1

(Single-level iDWT)

Fre-GAN 2 V1

(Multi-level iDWT)

Fre-GAN V2

 

Fre-GAN 2 V2

(Single-level iDWT)

Fre-GAN 2 V2

(Multi-level iDWT)

Fre-GAN 2* V2

(Multi-level iDWT, APFD)

Script : He slept in the same bed with a highwayman on one side and a man charged with murder on the other.

Ground Truth

WaveNet

HiFi-GAN V1

HiFi-GAN V2

Fre-GAN V1

 

Fre-GAN 2 V1

(Single-level iDWT)

Fre-GAN 2 V1

(Multi-level iDWT)

Fre-GAN V2

 

Fre-GAN 2 V2

(Single-level iDWT)

Fre-GAN 2 V2

(Multi-level iDWT)

Fre-GAN 2* V2

(Multi-level iDWT, APFD)

Script : The committee seems to have fully realized even at this early date eighteen fifteen.

Ground Truth

WaveNet

HiFi-GAN V1

HiFi-GAN V2

Fre-GAN V1

 

Fre-GAN 2 V1

(Single-level iDWT)

Fre-GAN 2 V1

(Multi-level iDWT)

Fre-GAN V2

 

Fre-GAN 2 V2

(Single-level iDWT)

Fre-GAN 2 V2

(Multi-level iDWT)

Fre-GAN 2* V2

(Multi-level iDWT, APFD)

Ablation

Knowledge distillation method

Script : Their type is on the lines of the German and French rather than of the Roman printers.

Fre-GAN 2 V2

(Multi-level iDWT, 500k)

APFD (500k)

L1 distance (500k)

AFD (500K)

Sub-audio modelling

Script :The committee seems to have fully realized even at this early date eighteen fifteen.

Fre-GAN 2 V2 (Multi-level iDWT, 500k)

PQMF (500k)

Source Code and other samples

 

The source code will be pulished after the paper is accepted. Before that, please contact me (sh_lee@korea.ac.kr). We will give the download link of source code.

(2022-02-14) The source code is released [link]

We uploaded all of the samples which are used for subjective evaluation [link]