|
9 | 9 |
|
10 | 10 | This repository is the official PyTorch implementation of our AAAI-2022 [paper](https://arxiv.org/abs/2105.02446), in which we propose DiffSinger (for Singing-Voice-Synthesis) and DiffSpeech (for Text-to-Speech). |
11 | 11 |
|
12 | | -<table style="width:100%"> |
13 | | - <tr> |
14 | | - <th>DiffSinger/DiffSpeech at training</th> |
15 | | - <th>DiffSinger/DiffSpeech at inference</th> |
16 | | - </tr> |
17 | | - <tr> |
18 | | - <td><img src="resources/model_a.png" alt="Training" height="300"></td> |
19 | | - <td><img src="resources/model_b.png" alt="Inference" height="300"></td> |
20 | | - </tr> |
21 | | -</table> |
22 | 12 |
|
23 | 13 | :tada: :tada: :tada: **Updates**: |
24 | 14 | - Sep.11, 2022: :electric_plug: [DiffSinger-PN](docs/README-SVS-opencpop-pndm.md). Add plug-in [PNDM](https://arxiv.org/abs/2202.09778), ICLR 2022 in our laboratory, to accelerate DiffSinger freely. |
@@ -48,6 +38,17 @@ or pip install -r requirements_3090.txt (GPU 3090, CUDA 11.4) |
48 | 38 | - [Run DiffSpeech (TTS version)](docs/README-TTS.md). |
49 | 39 | - [Run DiffSinger (SVS version)](docs/README-SVS.md). |
50 | 40 |
|
| 41 | +## Overview |
| 42 | +| Mel Pipeline | Dataset | Pitch Input | F0 Prediction | Acceleration Method | Vocoder | |
| 43 | +| ------------------------------------------------------------------------------------------- | ---------------------------------------------------------| ----------------- | ------------- | --------------------------- | ----------------------------- | |
| 44 | +| [DiffSpeech (Text->F0, Text+F0->Mel, Mel->Wav)](docs/README-TTS.md) | [Ljspeech](https://keithito.com/LJ-Speech-Dataset/) | None | Explicit | Shallow Diffusion | NSF-HiFiGAN | |
| 45 | +| [DiffSinger (Lyric+F0->Mel, Mel->Wav)](docs/README-SVS-popcs.md) | [PopCS](https://github.com/MoonInTheRiver/DiffSinger) | Ground-Truth F0 | None | Shallow Diffusion | NSF-HiFiGAN | |
| 46 | +| [DiffSinger (Lyric+MIDI->F0, Lyric+F0->Mel, Mel->Wav)](docs/README-SVS-opencpop-cascade.md) | [OpenCpop](https://wenet.org.cn/opencpop/) | MIDI | Explicit | Shallow Diffusion | NSF-HiFiGAN | |
| 47 | +| [FFT-Singer (Lyric+MIDI->F0, Lyric+F0->Mel, Mel->Wav)](docs/README-SVS-opencpop-cascade.md) | [OpenCpop](https://wenet.org.cn/opencpop/) | MIDI | Explicit | Invalid | NSF-HiFiGAN | |
| 48 | +| [DiffSinger (Lyric+MIDI->Mel, Mel->Wav)](docs/README-SVS-opencpop-e2e.md) | [OpenCpop](https://wenet.org.cn/opencpop/) | MIDI | Implicit | None | Pitch-Extractor + NSF-HiFiGAN | |
| 49 | +| [DiffSinger+PNDM (Lyric+MIDI->Mel, Mel->Wav)](docs/README-SVS-opencpop-pndm.md) | [OpenCpop](https://wenet.org.cn/opencpop/) | MIDI | Implicit | PLMS | Pitch-Extractor + NSF-HiFiGAN | |
| 50 | + |
| 51 | + |
51 | 52 | ## Tensorboard |
52 | 53 | ```sh |
53 | 54 | tensorboard --logdir_spec exp_name |
|
0 commit comments