PyTorch Applications (1) - Video Face Mosaic Processing with YOLO

iioSnail

17 Dec, 2024

Table of Content

Introduction
Adding Mosaic to Faces in Images
Apply Mosaic to Faces in Video

Introduction

YoloV11 (Github) offers a user-friendly API that makes it easy to perform tasks such as object detection, semantic segmentation, and pose estimation.

In this article, we’ll demonstrate how to use YoloV11 for object detection to apply a face mosaic effect to a video. This is designed as a beginner-friendly tutorial to help you get started with YoloV11 for object detection.

Note: This article does not cover model training.

Adding Mosaic to Faces in Images

Before diving into video processing, let’s begin with processing static images. This will help you familiarize yourself with the Yolo API.

First, install the official Yolo dependency: ultralytics. This library provides a simple interface for model training, inference, and more, making it easy to work with Yolo.

```python
!pip install ultralytics
```

Next, import the libraries you’ll need:

```python
import cv2  # For image processing
from ultralytics import YOLO  # For accessing the YOLO model
```

Next, let's load the Yolo Model. Here, we use a Yolo model trained specifically for face detection from the yolov8-face project. You’ll need to download the yolov8n-face.pt model first and place it in your working directory.

```python
model = YOLO("yolov8n-face.pt")
```

Let’s download a sample test image for face detection:

```python
!wget https://ultralytics.com/images/bus.jpg
```

After downloading it, read the image using OpenCV and display it:

```python
image = cv2.imread("bus.jpg")
cv2.imshow(image)
```

After obtaining the image, we can perform inference using the loaded YOLO model as follows:

```python
# YOLO can process multiple images at once, so the results are returned as a list.
results = model(image)
```

For an object detection task, the detection results are stored in the boxes attribute of results:

```python
results[0].boxes   # Since we only process one image, we use results[0].
```

The output would look like this:

```
ultralytics.engine.results.Boxes object with attributes:

# Since two faces are detected, all the data below contains two entries.
cls: tensor([0., 0.])  # Object class. Since this is a face detection model, there is only one class: face (0).
conf: tensor([0.7887, 0.7506])  # Confidence scores (0~1)
...
xyxy: tensor([[114., 417., 154., 467.],
        [270., 421., 307., 470.]])  # Bounding box coordinates of the detected faces.
...
```

With the bounding box coordinates for the detected faces, we can apply a mosaic effect to the face regions. The following code demonstrates how to achieve this:

```python
mosaic_scale = 10  # Define the level of mosaic blur. Larger values result in a stronger blur.

boxes = results[0].boxes.xyxy
for i in range(len(boxes)):  # Iterate through all detected faces
    x1, y1, x2, y2 = boxes[i].int()
    # Extract the region of interest (ROI), i.e., the face area
    # For example, if the face size is 49x37, the ROI shape will be (49, 37, 3)
    roi = image[y1:y2, x1:x2]

    # Get the height and width of the face region
    h, w = roi.shape[:2]
    # Downscale the face region by a factor of {mosaic_scale}, e.g., 10x smaller
    small_roi = cv2.resize(roi, (w // mosaic_scale, h // mosaic_scale), interpolation=cv2.INTER_LINEAR)
    # Upscale the downscaled face back to its original size, creating the mosaic effect
    mosaic_roi = cv2.resize(small_roi, (w, h), interpolation=cv2.INTER_NEAREST)
    # Replace the original face in the image with the mosaic face
    image[y1:y2, x1:x2] = mosaic_roi

```

After processing the image and applying the mosaic effect to the detected faces, let's display the final result using OpenCV:

```python
cv2.imshow(image)
```

Apply Mosaic to Faces in Video

Apply mosaic to faces in a video involves processing each frame individually, applying a mosaic effect, and then reassembling the video. The process can be broken down into the following steps:

Extract the audio from the video using ffmpeg.
Read video frames one by one and apply a mosaic effect to faces.
Combine the processed frames back into a video (At the time, the video is silent).
Merge the extracted audio with the silent video to produce the final output.

Let’s proceed with a code demonstration.

We start by importing the necessary libraries:

```python
import ffmpeg # Requires ffmpeg-python to be installed
import cv2
from numpy import ndarray
from ultralytics import YOLO
from tqdm import tqdm
```

Next, we encapsulate the logic for applying a mosaic effect to an image into a reusable function. This will allow us to blur faces in video frames efficiently.

```python
def mosaic_image(model, image:ndarray, mosaic_scale = 10) -> ndarray:
    results = model(image, verbose=False)
    results[0].boxes

    boxes = results[0].boxes.xyxy    
    for i in range(len(boxes)):
        x1, y1, x2, y2 = boxes[i].int()
        roi = image[y1:y2, x1:x2]

        h, w = roi.shape[:2]
        small_roi = cv2.resize(roi, (w // mosaic_scale, h // mosaic_scale), interpolation=cv2.INTER_LINEAR)
        mosaic_roi = cv2.resize(small_roi, (w, h), interpolation=cv2.INTER_NEAREST)
        image[y1:y2, x1:x2] = mosaic_roi

    return image
```

Now that we have the mosaic function ready, we can move on to processing the video.

First, we should define paths for the input video, output video, and intermediate files. Also, load the YOLO model for face detection.

```python
# You can download the video from the link: 
# https://github.com/iioSnail/pytorch_deep_learning_examples/tree/main/asserts/mp4
input_video = "kunkun.mp4"
tmp_audio = "tmp.wav"
tmp_video = "tmp_kunkun.mp4"
output_video = "mosaic_kunkun.mp4"

model = YOLO("yolov8n-face.pt")
```

Next, use ffmpeg to extract the audio from the original video.

```python
ffmpeg.input(input_video).output(tmp_audio, format='wav').run(overwrite_output=True)
```

The following Python code demonstrates how to process a video frame-by-frame, apply an effect (like a mosaic), and save it as a new video.

```python
# Open the input video file
cap = cv2.VideoCapture(input_video)
if not cap.isOpened():
    print("Error: Could not open video file.")
    exit(0)

# Retrieve video properties: width, height, frame rate, and total frame count
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)
# Total number of frames
n_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

# Configure the video writer for saving the processed frames
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(tmp_video, fourcc, fps, (width, height))

# Initialize a progress bar
pro_bar = tqdm(total=n_frames)

# Process the video frame-by-frame
while True:
    # Read the current frame
    # ret: True if a frame is read successfully, False otherwise.
    ret, frame = cap.read()

    if not ret:  # If no more frames, exit the loop
        break

    # Apply mosaic processing to the current frame
    frame = mosaic_image(model, frame)
    # Write the processed frame to the output video
    out.write(frame)

    # Update the progress bar
    pro_bar.update(1)

# Release resources
cap.release()
out.release()
pro_bar.close()
```

After processing the frames, the resulting video is silent. To add back the original audio, the following code uses the ffmpeg library to merge the audio file with the newly created video:

```python
video_stream = ffmpeg.input(tmp_video)
audio_stream = ffmpeg.input(tmp_audio)
ffmpeg.output(video_stream, audio_stream, output_video, vcodec="copy", acodec='aac').run(overwrite_output=True)
```

Final result:

Pytorch Tutorial

PyTorch Applications (1) - Video Face Mosaic Processing with YOLO

Table of Content

Introduction

Adding Mosaic to Faces in Images

Apply Mosaic to Faces in Video

Popular Posts

Categories

Hashtag

Blog Archive