How I Made One of the World’s First 100% AI Songs

This amazing piece of AI music was created by a company named Aiva, which uses cutting-edge machine learning technology to create tunes like that.

There are also AI bands releasing great music online using similar machine learning programs.

But, what you may not realize, is that the original version of the song does not sound anything like what you hear. They usually use music editing software to vastly improve on what the AI comes up with, as shown in the behind-the-scenes video.

The main output of the AI is a melody, usually in the form of a MIDI piano track. The problem comes when you start trying to have the computer automatically add other instruments to it. It does not have a good sense of the beat and how to mix various instruments together to follow the piano melody. And, making things worse, MIDI files most of the time have a very computerish/fake sound, like old video game music. You would never confuse it with a real song you might hear on the radio.

interesting reading:  The Higher Education Commission of Pakistan deploys Microsoft Teams to over 100 universities

Because of all of this, AI music companies and AI bands generally use a mixture of human input and AI input (see some good examples in this Google blog posting). And what I have talked about so far does not even get into all the issues involved with having the AI write lyrics and sing the song.

Yes, it totally sucks. I didn’t expect a top 40 hit, though. What matters is that it is one of the few songs I know of that was 100% made using AI from start to finish.

The music was automatically generated with Google’s open-source Magenta. Specifically, I used the pre-trained “trio” model, as described in their MusicVAE blog posting.

The lyrics were written using the open-source GPT-2 Simple natural language program, which I trained on poems and song lyrics as described in my article how I created a lyrics generator about this lyrics site. And the song title was chosen based on various non-AI rules, mostly relating to what line is used the most in the lyrics.

The vocals were one of the hardest parts. It has to detect the pitch of the melody in the music, convert text to voice, and adjust the voice to match the notes of the melody. Then on top of that, it has to try to figure out exactly where the virtual singer should say each word, to stay on beat.

interesting reading:  JD Digits Named to Annual List of the World’s Most Innovative Companies for 2020

There was also an issue that I couldn’t just simply feed the MIDI music file to the vocal synthesis program directly. First, I had to convert the original MIDI file into a new MIDI with only one channel containing the piano part, because that is usually the melody the singer needs to follow. I used an open-source program called banana-split to get this done.

Next, I used an open-source virtual singer program named midi2voice to create the vocals (a WAV file), using the lyrics and music as input. The final step was to combine that voice file with the original MIDI music file to produce the song. I did this by converting the MIDI file to WAV, and then using ffmpeg’s amerge command like this:

In the end, I accomplished my goal, which was to set up an automated framework for creating an AI song with no human intervention needed. Not just for music like everyone else is focusing on, but with lyrics and vocals to make it into a real song. I knew ahead of time it was not going to sound very good, but this is just a start. Now that I have a demo version, I can work on making improvements to it. Maybe even someday launch an AI rock star, with CDs, merch, and virtual concerts. But I have a long way to go.

interesting reading:  What we’ve learned about how remote work is changing us

Dadabots — They make AI music using raw audio instead of MIDI, so the results sound much more real. But, much of the output does not sound good, so they need to manually curate many short snippets of music into a song.

Neural Story Teller (see the bottom of the page) — Part of the Songs From Pi project. They do the same kind of thing I did, but using very different methods, and explain in an academic paper how they did it.

Adversarially Trained End-to-end Korean Singing Voice Synthesis System — It is crazy how real this sounds, but they did not release the code for it, so it is way too hard for me to replicate.

Courtesy: Towards Data Science

Leave a Reply

Your email address will not be published. Required fields are marked *

Captcha loading...