![]() ![]() One way of addressing the long input problem is to use an autoencoder that compresses raw audio to a lower-dimensional space by discarding some of the perceptually irrelevant bits of information. Thus, to learn the high level semantics of music, a model would have to deal with extremely long-range dependencies. ![]() For comparison, GPT-2 had 1,000 timesteps and OpenAI Five took tens of thousands of timesteps per game. ![]() 24 A typical 4-minute song at CD quality (44 kHz, 16-bit) has over 10 million timesteps. 20 21 22 23 Generating music at the audio level is challenging since the sequences are very long. 19 For a deeper dive into raw audio modelling, we recommend this excellent overview. One can also use a hybrid approach-first generate the symbolic music, then render it to raw audio using a wavenet conditioned on piano rolls, 13 14 an autoencoder, 15 or a GAN 16-or do music style transfer, to transfer styles between classical and jazz music, 17 generate chiptune music, 18 or disentangle musical style and content. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |