Cobra Forum

Plesk Panel => Web Application => Topic started by: mahesh on Jan 04, 2024, 01:22 AM

Title: How to Use Hugging Face Diffusion Models on Vultr Cloud GPU
Post by: mahesh on Jan 04, 2024, 01:22 AM
Introduction
Diffusers is a Hugging Face library that provides access to pre-trained diffusion models in form of prepackaged pipelines. It offers tools for building and training diffusion models, and includes many different core neural network models used as building blocks to create new pipelines.

This article explains how you can use Hugging Face Diffusion models on a Vultr Cloud GPU server. You are to use a variation of models to generate human-readable results on the server.

Prerequisites
Before you begin:

Install Jupyter Notebook
Jupyter Notebook is an open-source application that offers a web-based development environment to create with live code, visualizations, and equations. To run models interactively on your Vultr Cloud GPU server, install Jupyter Notebook as described in the steps below.

1.Install the pip package manager

$ sudo apt install python3-pip
2.Using pip, install the Notebook package

$ sudo pip install notebook
3.Open the Jupyter Notebook port 8888 through the firewall to allow access to the web interface

$ sudo ufw allow 8888
4.Start Jupyter Notebook

$ jupyter notebook --ip=0.0.0.0
The above command starts Jupyter Notebook and allows connections from all server interfaces as declared by 0.0.0.0. When successful, copy the generated access token displayed in your output:

 [I 2023-08-10 12:57:52.455 ServerApp] Jupyter Server 2.7.0 is running at:
 [I 2023-08-10 12:57:52.455 ServerApp] http://HOSTNAME:8888/tree?token=73631c92ba278d265aedeb3b199bd4d48e5ef5b2eed0ae06
 [I 2023-08-10 12:57:52.455 ServerApp]     http://127.0.0.1:8888/tree?token=73631c92ba278d265aedeb3b199bd4d48e5ef5b2eed0ae06
 [I 2023-08-10 12:57:52.455 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
In case the command fails to run, close your SSH session, and start it again to activate Jupyter Notebook

$ exit
5.In a web browser such as Chrome, access Jupyter Notebook using your access token. Replace the example IP Address 192.0.2.100 with your actual Server IP

http://192.0.2.100:8888/tree?token=YOUR_TOKEN
Using the Models
A pipeline is a high-level interface that packages components required to perform different predefined tasks such as image-generation, image-to-image-generation, and audio-generation. You can run a pipeline by specifying a task and letting it use the default settings for any additional tasks. It's also possible to custom-build a pipeline by specifying the model, tokenizer, and other parameters.

Examples in this article base on image/audio generation models and cover both pipeline approaches. Before loading new models in a Notebook session, it's recommended to close and restart the iPython notebook Kernel. This clears the old models from memory and frees up space for new models.

To run code in a Notebook session, add code in the code cell fields, and press CTRL + ENTER, or press the run button on the main taskbar.

Stable Diffusion V2.1 Model
The stable Diffusion v2.1 model is a fork of the stable-diffusion-2 checkpoint and it's trained with 55 thousand steps on the same dataset. Additionally, it's fine-tuned with 155 thousand extra steps on 768x768 images, in this section, use the model as described in the steps below.

1.Open a new Jupyter Notebook file. Rename it to stablediffusion

2.Install the required global packages

!pip install scipy safetensors matplotlib
3.To use the model, import the following packages

 from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
 import torch
The StableDiffusionPipeline class provides an interface to the Stable Diffusion v2.1 model for generating images. DPMSolverMultistepScheduler provides a fast scheduler that generates good outputs with around 20 steps, and torch enables support for GPU tensor computations.

4.Declare the model

model_id = "stabilityai/stable-diffusion-2-1"
 pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
 pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
 pipe = pipe.to("cuda")
The parameters passed to the from_pretrained() method are:

In diffusion models, a Scheduler de-noises samples by iteratively adding noise during training, and updates samples based on the model outputs during inference. It updates the rule to solve the underlying differential equation.

5.Generate an image by providing a prompt as below. Replace An astronaut landing on planet with your desired prompt

 prompt = "An astronaut landing on planet"
 image = pipe(prompt).images
 image[0]
The above code declares and feeds the prompt to the previously declared pipeline. Then, it stores the image attribute and a different image generates each time you run the module. You can enhance the prompt by using details like the camera lens, environment, and include any other relevant information to refine your desired outcome.

Below are the accepted image generation parameters:

The An astronaut landing on planet generates an image like the one below:

(https://pix.cobrasoft.org/images/2024/01/04/KjWgwka.png)
AudioLDM Model
AudioLDM is a text-to-audio latent diffusion model (LDM) with 1.5 million training steps. The model incorporates over 700 CLAP audio dimensions and 400 million parameters. By taking a text prompt as input, it predicts the corresponding audio output, and generates realistic text-conditional sound effects, human speech, and music samples. Run the model to generate audio results as described in the steps below.

1.Open a new Jupyter Notebook file. Rename it to audioldm

2.In a new code cell, install the required packages

!pip install scipy
3.To use the model, import the necessary packages

 from diffusers import AudioLDMPipeline
 import torch
4.Declare the pipeline

model_id = "cvssp/audioldm-m-full"
 pipe = AudioLDMPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
 pipe = pipe.to("cuda")
In the above command, the AudioLDMPipeline instance uses the pre-trained model specified by model_id. torch_dtype=torch.float16 sets the data type to 16-bit floating-point which helps with memory efficiency and faster computations. The pipeline is then moved to the GPU using cuda for faster processing.

5.Generate audio by providing a prompt. Replace Piano and violin plays with your desired text prompt

 prompt = "Piano and violin plays"
 audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]
In the above command, the num_inference_steps parameter specifies the number of diffusion steps (iterations) used in the generation process, and audio_length_in_s sets the desired duration of the generated audio in seconds. The resulting audio outputs to the audio variable.

6.Display the generated audio

from IPython.display import Audio
 Audio(audio, rate=16000)
The above code block allows you to play and listen to the generated audio using the Audio function from the iPython library. The rate=16000 argument specifies the sampling rate of the audio set to 16000 samples per second.

7.Save the audio to a file

 import scipy
 scipy.io.wavfile.write("file_name.wav", rate=16000, data=audio)
The above code saves the generated audio as a WAV file named file_name.wav using scipy.io.wavfile.write(). The specified sampling rate rate=16000 verifies that the audio saves with the correct sampling rate.

When using the model, the following are the accepted parameters.

Below are other AudioLDM variants with the respective training steps:

Stable Diffusion ControlNet
ControlNet is a neural network structure that controls a pre-trained image Diffusion model by adding extra conditions. This checkpoint corresponds to the ControlNet conditioned on human pose estimation. Its function is to allow input of a conditioning image to use and manipulate the image generation process.

It accepts scribbles, edge maps, pose key points, depth maps, segmentation maps, normal maps as the condition input to guide the content of the generated image. In this section, apply the ControlNet model as described in the steps below.

1.Open a new Jupyter Notebook file. Rename it to sd-controlnet

2.Install the necessary packages

!pip install controlnet_aux matplotlib mediapipe
3.To use the model, import the required packages

 from PIL import Image
 from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
 import torch
 from controlnet_aux import OpenposeDetector
 from diffusers.utils import load_image
4.Load an image. Replace https://example.com/image.png with your actual image source

 openpose = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
 image = load_image("https://example.com/image.png")
 image = openpose(image)
The above code block loads the pre-trained OpenposeDetector 2and processes the input image from the specified URL. The openpose object estimates the human pose in the image and returns the processed image with pose information. To read your image, make sure it has a target file extension such as .png or .jpg.

5.Specify the model parameters with 16-bit weights

 controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16)
 pipe = StableDiffusionControlNetPipeline.from_pretrained(
     "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16)
 pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
The above code loads the ControlNetModel and the StableDiffusionControlNetPipeline. torch_dtype=torch.float16 sets the data type to 16-bit floating-point for improved memory efficiency and faster computations.

Input a text prompt to generate a new image using the model. Replace Chef in kitchen with your desired prompt

 pipe.enable_model_cpu_offload()
 image = pipe("Chef in kitchen", image, num_inference_steps=20).images
 image[0]
The above code block uses pipe to generate a new image based on the prompt Chef in kitchen and processes the image with pose information. The num_inference_steps parameter sets the number of diffusion steps used in the generation process. The generated image is then added to the image variable.

The following are the accepted model parameters:

The "Chef in kitchen" prompt generates an image like the one below:

(https://pix.cobrasoft.org/images/2024/01/04/LiYfctT.png)
Below are other variants available for the same model:

Conclusion
In this article, you implemented Hugging Face diffusion models and used the models to generate results. To use other diffusion models, visit the respective model card pages to learn how to use them. Additionally, studying the model's documentation provides valuable insights into its specific details and configuration options.