Text only | Text with Images

Cobra Forum

Plesk Panel => Web Application => Topic started by: mahesh on Dec 27, 2023, 07:28 AM

Title: AI Music Generation on Vultr Cloud GPU
Post by: mahesh on Dec 27, 2023, 07:28 AM

(https://pix.cobrasoft.org/images/2023/12/27/yIXwpHn.png)
Introduction
AI music generation is an innovative synthesis of art and science. By analyzing vast databases of existing musical compositions, AI models, often based on deep learning techniques have become adept at generating music across different genres and styles. With the advance of such models and algorithms, you can generate attractive music and sounds in a few minutes.

AudioCraft and Bark are two open-source text-to-audio tools used together to generate a soundtrack and lyrics to match an audio file. In addition, tools such as FFmpeg mix the generated melody and lyrics to create a single final output file.

This article explains how you can perform AI Music Generation tasks on a Vultr Cloud GPU server. You are to mix the generated soundtrack and lyrics to create a single output file you can download and share with personal licenses.

Prerequisites
Before you begin, make sure you:

Deploy a Debian NVIDIA A100 Cloud GPU server on Vultr with at least:
1/7 GPU
10GB GPU RAM
15GB memory
Using SSH, access the server
Create a non-root user and switch to the user account
Update the server

Set Up the Server
1.Install FFmpeg

Code Select

 $ sudo apt install ffmpeg

2.Install the Python virtual environment package

Code Select

 $ sudo apt install python3.11-venv

3.Create a new Python Virtual environment

Code Select

 $ python3 -m venv myenv

4.Activate the environment

Code Select

 $ source myenv/bin/activate

5.Upgrade the Pip package manager

Code Select

 $ pip install --upgrade pip

6.Using pip, the necessary dependency packages

Code Select

 $ pip install torch==2.0.1 audiocraft==0.0.2 bark==0.1.5 protobuf==4.24.2

The above command installs the following packages:

Code Select

pyTorch: A deep-learning Python library
audiocraft: A PyTorch library used for deep learning on audio generation
bark: A transformer-based text-to-audio model
protobuf: A required package necessary for loading the AudioCraft model

Generate the Lyrics Audio
To create a full audio file, generate the lyrics audio before mounting a melody as described in the steps below.

1.Access the Python Shell

Code Select

 $ python3

2.Import the bark library and necessary packages to your session

Code Select

from bark import SAMPLE_RATE, generate_audio, preload_models
 from scipy.io.wavfile import write as write_wav

3.Download and load all bark text-to-audio models

Code Select

 >>> preload_models()

The download process may take between 1 to 2 minutes to complete, and the total model size is above 10GB

4.Define your lyrics using the lyrics_text variable

Code Select

 >>> lyrics_text = '''
        In the realm of the digital, where clouds converge, Vultr's brilliance shines, a power to emerge. 
        Bytes and data swirling in cosmic dance, Unveiling solutions, fate is given a chance.
     '''

Replace the above lyrics with your desired text

5.Call the Bark library generate_audio function to generate lyrics using the audio array

Code Select

 >>> audio_array = generate_audio(lyrics_text)

6.Save the generated audio to a local file. Replace lyrics.wav with your desired filename

Code Select

 >>> write_wav('lyrics.wav', SAMPLE_RATE, audio_array)

7.Exit the Python Shell

Code Select

 >>> exit ()

8.List files in your working directory

Code Select

 $ ls

Verify that your generated lyrics audio file is available

Generate the Sound Track
To generate a soundtrack you can combine with your lyrics audio, choose your desired audiocraft pre-trained model to apply. As of September 2023, below are the available models:

Text-to-music only models:
A small model with 300M parameters
A medium model with 1.5B parameters
A large model with 3.3B parameters
A melody model with 1.5B parameters that supports melody-guided music generation

This section uses the melody model to generate a soundtrack based on your text prompt as described below.

1.Access the Python Shell

Code Select

 $ python 3

2.Import the audiocraft libraries

Code Select

>>> from audiocraft.models import MusicGen
 >>> from audiocraft.data.audio import audio_write

3.Load your target model. For this article, melody

Code Select

 >>> model = MusicGen.get_pretrained("melody")

4.Set the soundtrack length

Code Select

 >>> model.set_generation_params(duration=14)

It's recommended to generate a soundtrack with the same length as your lyrics audio. The above code generates a track with 14 seconds that matches the lyrics audio length

5.Define the sound track prompt with your desired text

Code Select

 >>> melody_prompt = 'modern and forward-looking, with a blend of electronic and acoustic elements'

6.Generate the sound track using the generate function from the AudioCraft library

Code Select

 >>> audio_array = model.generate([melody_prompt], progress=True)

7.Export the generated soundtrack to a file. Replace melody-track with your desired filename

Code Select

 >>> audio_write('melody-track', audio_array[0].cpu(), model.sample_rate)

8.Close the Python console

Code Select

 >>> exit ()

9.List files in your working directory

Code Select

 $ ls

Verify that a new melody-track.wav file is available

Mix the Generated Lyrics and the Sound Track
When you generate and export the necessary audio files to your directory, use ffmpeg to combine the lyrics with your sound to create a single output file as described below.

1.Using ffmpeg, normalize the lyrics audio file to a standard volume to match your soundtrack

Code Select

 $ ffmpeg -i lyrics.wav -filter:a loudnorm lyrics_norm.wav

2.Normalize the soundtrack file volume

Code Select

 $ ffmpeg -i melody.wav -filter:a loudnorm melody_norm.wav

3.Mix the normalized audio inputs to create a single stereo output file

Code Select

 $ ffmpeg -i melody_norm.wav -i lyrics_norm.wav  -filter_complex "[0:a][1:a]amerge=inputs=2,pan=stereo|c0<c0+c1+c2+c3|c1<c0+c1+c2+c3[a]" -map "[a]" output.mp3

When successful, verify that a new output.mp3 file is available in your working directory

4.Deactivate the Python virtual environment

Code Select

 $ deactivate

Download the Generated Music File
To download a copy of your generated music file to your computer, use a secure file transfer protocol such as SFTP, FTP, Rsync, or SCP. In this section, use Secure Copy (SCP) to download the mixed music file to your computer as described below.

In a new terminal window, use scp to download the output.mp3 file from your user home directory to your computer's working directory

Code Select

$ scp example-user@SERVER-IP:~/output.mp3 .

When the download is complete, find the file in your computer files, and open it using a media application such as VLC to listen to the generated music

Conclusion
In this article, you generated AI music on a Vultr Cloud GPU server. Depending on your use case, you can change the lyrics and soundtrack prompts to match your needs. The music generation process takes a few minutes to complete, for more information about the generation tools, visit the following documentation pages.

AudioCraft Repository
Bark Model Page
FFmpeg documentation

Text only | Text with Images

SMF 2.1.3 © 2022, Simple Machines