Cobra Forum

Plesk Panel => Web Application => Topic started by: mahesh on Dec 27, 2023, 07:28 AM

Title: AI Music Generation on Vultr Cloud GPU
Post by: mahesh on Dec 27, 2023, 07:28 AM
(https://pix.cobrasoft.org/images/2023/12/27/yIXwpHn.png)
Introduction
AI music generation is an innovative synthesis of art and science. By analyzing vast databases of existing musical compositions, AI models, often based on deep learning techniques have become adept at generating music across different genres and styles. With the advance of such models and algorithms, you can generate attractive music and sounds in a few minutes.

AudioCraft and Bark are two open-source text-to-audio tools used together to generate a soundtrack and lyrics to match an audio file. In addition, tools such as FFmpeg mix the generated melody and lyrics to create a single final output file.

This article explains how you can perform AI Music Generation tasks on a Vultr Cloud GPU server. You are to mix the generated soundtrack and lyrics to create a single output file you can download and share with personal licenses.

Prerequisites
Before you begin, make sure you:

Set Up the Server
1.Install FFmpeg

$ sudo apt install ffmpeg
2.Install the Python virtual environment package

$ sudo apt install python3.11-venv
3.Create a new Python Virtual environment

$ python3 -m venv myenv
4.Activate the environment

$ source myenv/bin/activate
5.Upgrade the Pip package manager

$ pip install --upgrade pip
6.Using pip, the necessary dependency packages

$ pip install torch==2.0.1 audiocraft==0.0.2 bark==0.1.5 protobuf==4.24.2
The above command installs the following packages:

pyTorch: A deep-learning Python library
audiocraft: A PyTorch library used for deep learning on audio generation
bark: A transformer-based text-to-audio model
protobuf: A required package necessary for loading the AudioCraft model
Generate the Lyrics Audio
To create a full audio file, generate the lyrics audio before mounting a melody as described in the steps below.

1.Access the Python Shell

$ python3
2.Import the bark library and necessary packages to your session

 from bark import SAMPLE_RATE, generate_audio, preload_models
 from scipy.io.wavfile import write as write_wav
3.Download and load all bark text-to-audio models

>>> preload_models()
The download process may take between 1 to 2 minutes to complete, and the total model size is above 10GB

4.Define your lyrics using the lyrics_text variable

>>> lyrics_text = '''
        In the realm of the digital, where clouds converge, Vultr's brilliance shines, a power to emerge.
        Bytes and data swirling in cosmic dance, Unveiling solutions, fate is given a chance.
     '''
Replace the above lyrics with your desired text

5.Call the Bark library generate_audio function to generate lyrics using the audio array

>>> audio_array = generate_audio(lyrics_text)
6.Save the generated audio to a local file. Replace lyrics.wav with your desired filename

>>> write_wav('lyrics.wav', SAMPLE_RATE, audio_array)
7.Exit the Python Shell

>>> exit ()
8.List files in your working directory

$ ls
Verify that your generated lyrics audio file is available

Generate the Sound Track
To generate a soundtrack you can combine with your lyrics audio, choose your desired audiocraft pre-trained model to apply. As of September 2023, below are the available models:

This section uses the melody model to generate a soundtrack based on your text prompt as described below.

1.Access the Python Shell

$ python 3
2.Import the audiocraft libraries

 >>> from audiocraft.models import MusicGen
 >>> from audiocraft.data.audio import audio_write
3.Load your target model. For this article, melody

>>> model = MusicGen.get_pretrained("melody")
4.Set the soundtrack length

>>> model.set_generation_params(duration=14)
It's recommended to generate a soundtrack with the same length as your lyrics audio. The above code generates a track with 14 seconds that matches the lyrics audio length

5.Define the sound track prompt with your desired text

>>> melody_prompt = 'modern and forward-looking, with a blend of electronic and acoustic elements'
6.Generate the sound track using the generate function from the AudioCraft library

>>> audio_array = model.generate([melody_prompt], progress=True)
7.Export the generated soundtrack to a file. Replace melody-track with your desired filename

>>> audio_write('melody-track', audio_array[0].cpu(), model.sample_rate)
8.Close the Python console

>>> exit ()
9.List files in your working directory

$ ls
Verify that a new melody-track.wav file is available

Mix the Generated Lyrics and the Sound Track
When you generate and export the necessary audio files to your directory, use ffmpeg to combine the lyrics with your sound to create a single output file as described below.

1.Using ffmpeg, normalize the lyrics audio file to a standard volume to match your soundtrack

$ ffmpeg -i lyrics.wav -filter:a loudnorm lyrics_norm.wav
2.Normalize the soundtrack file volume

$ ffmpeg -i melody.wav -filter:a loudnorm melody_norm.wav
3.Mix the normalized audio inputs to create a single stereo output file

$ ffmpeg -i melody_norm.wav -i lyrics_norm.wav  -filter_complex "[0:a][1:a]amerge=inputs=2,pan=stereo|c0<c0+c1+c2+c3|c1<c0+c1+c2+c3[a]" -map "[a]" output.mp3
When successful, verify that a new output.mp3 file is available in your working directory

4.Deactivate the Python virtual environment

$ deactivate
Download the Generated Music File
To download a copy of your generated music file to your computer, use a secure file transfer protocol such as SFTP, FTP, Rsync, or SCP. In this section, use Secure Copy (SCP) to download the mixed music file to your computer as described below.

In a new terminal window, use scp to download the output.mp3 file from your user home directory to your computer's working directory

$ scp example-user@SERVER-IP:~/output.mp3 .
When the download is complete, find the file in your computer files, and open it using a media application such as VLC to listen to the generated music

Conclusion
In this article, you generated AI music on a Vultr Cloud GPU server. Depending on your use case, you can change the lyrics and soundtrack prompts to match your needs. The music generation process takes a few minutes to complete, for more information about the generation tools, visit the following documentation pages.