Text only | Text with Images

Cobra Forum

Plesk Panel => Web Application => Topic started by: mahesh on Jan 05, 2024, 05:28 AM

Title: AI Image Manipulation with Instruct Pix2Pix on Vultr Cloud GPU
Post by: mahesh on Jan 05, 2024, 05:28 AM

Introduction
Pix2Pix is a type of Generative Adversarial Network (GAN) that maps input images to corresponding output images. It's extended by InstructPix2Pix which is a Stable-Diffusion that combines the Pix2Pix framework with the ability to generate visually stunning outputs based on textual instructions or prompts using Natural Language Processing (NLP).

(https://pix.cobrasoft.org/images/2024/01/05/1-2.png)
This article explains how to carry out AI Image Manipulation with InstructPix2Pix on a Vultr Cloud GPU server. Using the InstructPix2Pix text-guided image manipulation model, you can generate images from prompts on your server as described in the article steps.

Prerequisite
Before you begin:

Deploy a Ubuntu A100 Cloud GPU server with at least:
1/3 GPU
20 GB VRAM.
3 vCPUs
30 GB Memory
Use SSH to access the server as a non-root user with sudo privileges.
Update the server.

Set Up the Server
In this section, set up the server to run the InstructPix2Pix model with the necessary dependency packages as described in the steps below.

1.In your user home, create a new directory to store generated images.

Code Select

 $ mkdir ~/images

2.Install PyTorch.

Code Select

 $ pip3 install torch --index-url https://download.pytorch.org/whl/cu118

The above command installs PyTorch with pre-built CUDA 11.8 libraries.

3.Install Jupyter Notebook.

Code Select

 $ pip3 install notebook

4.By default, UFW is active on Vultr servers, configure the firewall to allow connections to the Jupyter Notebook port 8888.

Code Select

 $ sudo ufw allow 8888

5.Restart the firewall to save changes.

Code Select

 $ sudo ufw reload

6.Start Jupyter Notebook.

Code Select

 $ jupyter notebook --ip=0.0.0.0

The above command starts Jupyter Notebook listening to incoming connections on all Server IP Addresses.

When successful, your output should look like the one below:

Code Select

[I 2023-07-30 13:06:10.312 ServerApp]     http://127.0.0.1:8888/tree?token=285dc780f9a3d7e483ca9df32015eff6c1b0cf2549271bc0
 [I 2023-07-30 13:06:10.312 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
 [W 2023-07-30 13:06:10.316 ServerApp] No web browser found: Error('could not locate runnable browser').
 [C 2023-07-30 13:06:10.316 ServerApp] 

     To access the server, open this file in a browser:
         file:///home/example-user/.local/share/jupyter/runtime/jpserver-11174-open.html
     Or copy and paste one of these URLs:
         http://HOSTNAME:8888/tree?token=285dc780f9a3d7e483ca9df32015eff6c1b0cf2549271bc0
         http://127.0.0.1:8888/tree?token=285dc780f9a3d7e483ca9df32015eff6c1b0cf2549271bc0

7.In a web browser such as Firefox, access Jupyter Notebook using the token generated in your command output.

Code Select

 http://SERVER-IP:8888/tree?token=TOKEN

(https://pix.cobrasoft.org/images/2024/01/05/UuPgRfj.png)

Import Libraries
In this section, use Jupyter Notebook to install the required libraries, create the model, and run it on the server as described in the steps below.

1.On the Jupyter Notebook interface, click New in the top right corner, and select Notebook from the list.

2.In the new Notebook window, enter commands line by line, and click the play button to execute a command.

3.Install the required model libraries.

Code Select

 !pip3 install diffusers accelerate safetensors transformers pillow

The above commands install the following packages:

diffusers: Provides an implementation of various diffusion-based algorithms used for image generation, image manipulation, and other tasks related to generative models.
accelerate: Optimizes the performance of computations on various hardware architectures, including GPUs and multi-GPUs. It accelerates training and inference processes.
safetensors: Provides additional safety measures and runtime checks for tensor operations. It helps detect and prevent issues such as memory leaks, out-of-bounds errors, and NaN (Not-a-Number) values.
transformers: An NLP toolkit that provides a wide range of pre-trained models, architectures, and utilities for tasks related to text such as text classification, text recognition, and text generation.
Pillow: Provides image capabilities to perform tasks such as manipulatation, and image processing.

4.Click run, or press Control + Enter to install the libraries.

5.Import the required libraries.

Code Select

import PIL
 import requests
 import torch
 from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler

Below are the imported libraries:

PIL: Python Imaging Library (PIL) provides image processing capabilities. In the InstructPix2Pix model, it's used for operations, such as opening and image processing.
requests: Download an image from a specified URL.
torch: Used as the deep learning framework to work with the StableDiffusionInstructPix2PixPipeline model.
diffusers: Provides implementations of diffusion-based algorithms. It includes classes and functions related to image generation and manipulation tasks, particularly for generative models.
EulerAncestralDiscreteScheduler: This is a class from the diffusers library. It refers to a specific scheduler implementation for the Euler Ancestral Sampling method used in diffusion models.

6.Upgrade Jupyter Notebook and the ipywidgets package.

Code Select

 !pip3 install --upgrade jupyter ipywidgets

7.Click Kernel on the main Notebook bar, and select Restart Kernel from the dropdown list to use the updated packages.

(https://pix.cobrasoft.org/images/2024/01/05/6ppwHl6.png)
Process the Image
In this section, process an image before further analysis with the model as described in the steps below.

1.Define the image URL.

Code Select

 url="https://example.com/image.png"

Verify that your URL points to an image with a visible file extension in the URL.

2.Define the download_image() function.

Code Select

 def download_image(url):
     image = PIL.Image.open(requests.get(url, stream=True).raw)
     image = PIL.ImageOps.exif_transpose(image)
     image = image.convert("RGB")
     return image

The above code creates a new function named download_image() with the image URL as an argument. The function commands inside the download_image() include:

PIL.Image.open(requests.get(url, stream=True).raw): Sends an HTTP GET request to retrieve the image specified in the url. requests.get() returns the response object in turn, .raw grants access to the retrieved raw content, and PIL.Image.open() opens the raw content using the PIL library.
PIL.ImageOps.exif_transpose(): Adjusts the image orientation to ensure that the image is correctly oriented for proper display. Many images contain EXIF information used to set the proper orientation.
image = image.convert("RGB"): Converts the URL image to the RGB (red, green, blue) format which is a standardized color representation and has more compatibility with various models and algorithms.

Set Up the Model
Define the Model Pipeline.

Code Select

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None)
pipe.to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

The above code configures an instance for the StableDiffusionInstructPix2PixPipeline class, including loading a pre-trained model, specifying the data type, enabling GPU acceleration, and configuring the scheduler for diffusion steps during the inference process.

The following are the pipeline parameters:

model_id: Assigns the model identifier timbrooks/instruct-pix2pix to the variable model_id.
pipe: Creates an instance of StableDiffusionInstructPix2PixPipeline.
from_pretrained uses a class to initialize the pipeline with a pre-trained model.
safety_checker Applies no safety checks as set to None during the inference process.
torch.float16 Enforces the data type for computations in half-precision (16) floating-point format. When blank, it's set to 32-bit full precision.
pipe.to("cuda"): Moves the pipeline object pipe to the CUDA device, which enforces computations to run on the GPU.
pipe.scheduler: Assigns a scheduler to the scheduler attribute of the pipeline object pipe. The EulerAncestralDiscreteScheduler class manages diffusion steps scheduling during the inference process.

Run the Model
Manipulate the image with a text prompt. Replace Hello World with an actual text prompt of your choice.

Code Select

image = download_image(url)
prompt = "Hello World"
images = pipe(prompt, image=image, num_inference_steps=10, image_guidance_scale=1).images

The above code downloads the image, defines a prompt, and instructs the model to generate output based on the following parameters:

prompt: Includes changes to apply to the image.
num_inference_steps: Specifies the number of diffusion steps performed during the inference process.
image_guidance_scale: This parameter determines the strength of the guidance or influence of the input image on the generated output.

Both num_inference_steps and image_guidance_scale values are in a balanced state which means changing the command values is not recommended.

Display the Generated Images
1.Display the generated image.

Code Select

 images[0]

The above command unpacks the first element from the list of images stored in the images variable.

2.Save the generated image with your target name and destination. For this article, the images directory you created earlier. Replace image.png with your desired filename.

Code Select

 images[0].save("/home/example-user/image.png")

The above command uses the .save() method with the desired output path to save the generated image to disk.

Download the Generated Images
In this section, use sftp to download generated images to your local computer as described below.

1.Open a new terminal session on your computer, and connect to your server using SFTP.

Code Select

 $ sftp example-user@SERVER-IP

Replace example-user, and SERVER-IP with your server details.

2.Switch to the images directory.

Code Select

 sftp> cd images

3.Download generated images in the directory. For this article image.png as saved earlier.

Code Select

 sftp> get image.png

The above command downloads image.png to your working directory.

4.Open a new file explorer window, find and open your downloaded image to use it.

Monitor GPU Resources
Check the GPU usage statistics.

Code Select

!nvidia-smi

The above command displays information about connected GPU devices and their statistics. It includes a summary of the available GPU devices and the respective usage statistics.

Output:

Code Select

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     10879      C   /usr/bin/python3                 6427MiB |
+-----------------------------------------------------------------------------+

As displayed in the above output, the InstructPix2Pix model uses up to 6 GB of GPU memory.

When using torch_dtype as torch.float16:

Code Select

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0    0    0      45337      C   /usr/bin/python3                16191MiB |
+-----------------------------------------------------------------------------+

As displayed in the output, the model with 16-bit precision takes up to 6GB of VRAM which makes the model less precise but faster.

Conclusion
In this article, you carried out AI Image Manipulation with Instruct Pix2Pix on a Vultr Cloud GPU server. You set up the server, installed necessary libraries, and performed processed images using prompts. By running the InstructPix2Pix model on a Vultr Cloud GPU server, it accelerates the model's performance and speed.

Text only | Text with Images

SMF 2.1.3 © 2022, Simple Machines