You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can freely control the inference steps, by adding these arguments in your experiment scripts :
15
+
--hparams="pndm_speedup=5" or --hparams="pndm_speedup=10".
16
+
17
+
Contributed by @luping-liu .
18
+
19
+
## DiffSinger (MIDI SVS | B version | +PNDM)
20
+
### 0. Data Acquirement
21
+
For Opencpop dataset: Please strictly follow the instructions of [Opencpop](https://wenet.org.cn/opencpop/). We have no right to give you the access to Opencpop.
22
+
23
+
The pipeline below is designed for Opencpop dataset:
24
+
25
+
### 1. Preparation
26
+
27
+
#### Data Preparation
28
+
a) Download and extract Opencpop, then create a link to the dataset folder: `ln -s /xxx/opencpop data/raw/`
29
+
30
+
b) Run the following scripts to pack the dataset for training/inference.
# `data/binary/opencpop-midi-dp` will be generated.
37
+
```
38
+
39
+
#### Vocoder Preparation
40
+
We provide the pre-trained model of [HifiGAN-Singing](https://github.com/MoonInTheRiver/DiffSinger/releases/download/pretrain-model/0109_hifigan_bigpopcs_hop128.zip) which is specially designed for SVS with NSF mechanism.
41
+
42
+
Also, please unzip pre-trained vocoder and [this pendant for vocoder](https://github.com/MoonInTheRiver/DiffSinger/releases/download/pretrain-model/0102_xiaoma_pe.zip) into `checkpoints` before training your acoustic model.
43
+
44
+
(Update: You can also move [a ckpt with more training steps](https://github.com/MoonInTheRiver/DiffSinger/releases/download/pretrain-model/model_ckpt_steps_1512000.ckpt) into this vocoder directory)
45
+
46
+
This singing vocoder is trained on ~70 hours singing data, which can be viewed as a universal vocoder.
a) the HifiGAN-Singing is trained on our [vocoder dataset](https://dl.acm.org/doi/abs/10.1145/3474085.3475437) and the training set of [PopCS](https://arxiv.org/abs/2105.02446). Opencpop is the out-of-domain dataset (unseen speaker). This may cause the deterioration of audio quality, and we are considering fine-tuning this vocoder on the training set of Opencpop.
110
+
111
+
b) in this version of codes, we used the melody frontend ([lyric + MIDI]->[ph_dur]) to predict phoneme duration. F0 curve is implicitly predicted together with mel-spectrogram.
112
+
113
+
c) example [generated audio](https://github.com/MoonInTheRiver/DiffSinger/blob/master/resources/demos_0221/DS/).
114
+
More generated audio demos can be found in [DiffSinger](https://github.com/MoonInTheRiver/DiffSinger/releases/download/pretrain-model/0228_opencpop_ds100_rel.zip).
0 commit comments