PyTorch Beginner's Tutorial (2) - Using a BP Neural Network to Recognize MNIST Handwritten Digits

iioSnail

4 Nov, 2024

Blog Content

In this article, we’ll implement a handwritten digit recognition model for the MNIST dataset using a basic BP (backpropagation) neural network. Let's dive right in.

Import Required Packages

```python
import os
import numpy as np
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time
from torchvision import datasets, transforms
from torch import nn, optim
```

Set Up Transformations

We define a transform object to standardize the images in the dataset:

```python
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,)),])
```

Load Training Data

We’ll load and, if necessary, download the training dataset using PyTorch’s API:

```python
train_set = datasets.MNIST('train_set',  # save here
                          download=not os.path.exists('train_set'), # download if not exists
                          train=True, # use training set
                          transform=transform # apply transform
                         )
train_set
```

```
Dataset MNIST
    Number of datapoints: 60000
    Root location: train_set
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.5,), std=(0.5,))
           )
```

After downloading, we’ll see the training set contains 60,000 images. Next, we download the test dataset:

```python
test_set = datasets.MNIST('test_set', 
                        download=not os.path.exists('test_set'),
                        train=False, 
                        transform=transform
                       )
test_set
```

```
Dataset MNIST
    Number of datapoints: 10000
    Root location: test_set
    Split: Test
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.5,), std=(0.5,))
           )
```

The test dataset has 10,000 images.

Create Data Loaders

Next, we’ll use DataLoader to manage batching for both training and testing datasets:

```python
train_loader = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=64, shuffle=True)

dataiter = iter(train_loader)
images, labels = dataiter.next()

print(images.shape)
print(labels.shape)
```

```
torch.Size([64, 1, 28, 28])
torch.Size([64])
```

The output shows that each batch contains 64 grayscale images, each sized 28x28 pixels. Let’s display one image:

```python
plt.imshow(images[0].numpy().squeeze(), cmap='gray_r');
```

With this, our initial setup is done.

Define the Neural Network

```python
class NeuralNetwork(nn.Module):

    def __init__(self):
        super().__init__()

        """
        Define the first linear layer:
        Input: image (28x28 pixels)
        Output: input to the first hidden layer with 128 units
        """
        self.linear1 = nn.Linear(28 * 28, 128)
        # Apply ReLU activation in the first hidden layer
        self.relu1 = nn.ReLU()

        """
        Define the second linear layer:
        Input: output from the first hidden layer
        Output: input to the second hidden layer with 64 units
        """
        self.linear2 = nn.Linear(128, 64)
        # Apply ReLU activation in the second hidden layer
        self.relu2 = nn.ReLU()

        """
        Define the third linear layer:
        Input: output from the second hidden layer
        Output: output layer with 10 units
        """
        self.linear3 = nn.Linear(64, 10)
        # Apply softmax for normalization at the output layer
        self.softmax = nn.LogSoftmax(dim=1)

        # Alternatively, define the model using nn.Sequential:
        self.model = nn.Sequential(
            nn.Linear(28 * 28, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10),
            nn.LogSoftmax(dim=1)
        )

    def forward(self, x):
        """
        Define the forward pass of the neural network
        x: image data with shape (64, 1, 28, 28)
        """
        # Reshape x to (64, 784)
        x = x.view(x.shape[0], -1)

        # Forward propagation
        x = self.linear1(x)
        x = self.relu1(x)
        x = self.linear2(x)
        x = self.relu2(x)
        x = self.linear3(x)
        x = self.softmax(x)

        # Alternatively, this could be done using x = self.model(x)

        return x
```

```python
model = NerualNetwork()
```

After defining the neural network, we set up the loss function, using Negative Log Likelihood Loss (NLLLoss), which is common for classification tasks.

```python
criterion = nn.NLLLoss()
```

Then, we define the optimizer, using Stochastic Gradient Descent with a learning rate of 0.003 and the default momentum of 0.9 (to reduce overfitting).

```python
optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)
```

With the setup complete, we start training the dataset:

```python
time0 = time() # Record the start time
epochs = 15  # Train for 15 epochs
for e in range(epochs):
    running_loss = 0  # Initialize the loss for the epoch
    for images, labels in train_loader:
        # Forward pass to get predictions
        output = model(images) 

        # Compute the loss
        loss = criterion(output, labels) 

        # Backward pass
        loss.backward()

        # Update weights
        optimizer.step()

        # Clear gradients
        optimizer.zero_grad() 

        # Accumulate the loss
        running_loss += loss.item()
    else:
        # Print the loss after each epoch
        print("Epoch {} - Training loss: {}".format(e, running_loss/len(train_loader)))

# Print total training time
print("\nTraining Time (in minutes) =",(time()-time0)/60)
```

```
Epoch 0 - Training loss: 0.6462286284117937
Epoch 1 - Training loss: 0.27847810615418056
...
Epoch 13 - Training loss: 0.056689855163551565
Epoch 14 - Training loss: 0.05361823974547586

Training Time (in minutes) = 2.9436919848124186
```

On my machine, the training took just over 2 minutes to complete, with the loss decreasing steadily.

Next, we’ll evaluate the model:

```python
correct_count, all_count = 0, 0
model.eval() # Set the model to evaluation mode

# Load images batch by batch from the test_loader
for images,labels in test_loader:
    # Loop through the batch to evaluate each image
    for i in range(len(labels)):
        logps = model(images[i])  # Forward pass to get predictions
        probab = list(logps.detach().numpy()[0]) # Convert prediction to a list of probabilities
        pred_label = probab.index(max(probab)) # Get the index of the highest probability as the predicted label
        true_label = labels.numpy()[i]
        if(true_label == pred_label): # Check if the prediction is correct
            correct_count += 1
        all_count += 1

print("Number Of Images Tested =", all_count)
print("Model Accuracy =", (correct_count/all_count))
```

```
Number Of Images Tested = 10000
Model Accuracy = 0.9741
```

The model achieved an accuracy of 97.41% on the test dataset.

References

Handwritten Digit Recognition Using PyTorch — Intro To Neural Networks： https://towardsdatascience.com/handwritten-digit-mnist-pytorch-977b5338e627

PyTorch Beginner's Tutorial (2) - Using a BP Neural Network to Recognize MNIST Handwritten Digits

Blog Content

Import Required Packages

Set Up Transformations

Load Training Data

Create Data Loaders

Define the Neural Network

References

Popular Posts

Categories

Hashtag

Blog Archive

Blog Content

Import Required Packages

Set Up Transformations

Load Training Data

Create Data Loaders

Define the Neural Network

References

Popular Posts

Pytorch中 nn.Transformer的使用详解与Transformer的黑盒讲解

小样本学习（Few-shot Learning）入门

模型泛化技巧“随机权重平均(Stochastic Weight Averaging, SWA)”介绍与Pytorch Lightning的SWA实现讲解

MultiHead-Attention和Masked-Attention的机制和原理

【强化学习】 OpenAI Gym入门：基础组件（Getting Started With OpenAI Gym: The Basic Building Blocks）

Categories

Hashtag

Blog Archive