PyTorch Beginner's Tutorial (2) - Using a BP Neural Network to Recognize MNIST Handwritten Digits
Blog Content
In this article, we’ll implement a handwritten digit recognition model for the MNIST dataset using a basic BP (backpropagation) neural network. Let's dive right in.
Import Required Packages
```python import os import numpy as np import torch import torchvision import matplotlib.pyplot as plt from time import time from torchvision import datasets, transforms from torch import nn, optim ```
Set Up Transformations
We define a transform
object to standardize the images in the dataset:
```python transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)),]) ```
Load Training Data
We’ll load and, if necessary, download the training dataset using PyTorch’s API:
```python train_set = datasets.MNIST('train_set', # save here download=not os.path.exists('train_set'), # download if not exists train=True, # use training set transform=transform # apply transform ) train_set ```
``` Dataset MNIST Number of datapoints: 60000 Root location: train_set Split: Train StandardTransform Transform: Compose( ToTensor() Normalize(mean=(0.5,), std=(0.5,)) ) ```
After downloading, we’ll see the training set contains 60,000 images. Next, we download the test dataset:
```python test_set = datasets.MNIST('test_set', download=not os.path.exists('test_set'), train=False, transform=transform ) test_set ```
``` Dataset MNIST Number of datapoints: 10000 Root location: test_set Split: Test StandardTransform Transform: Compose( ToTensor() Normalize(mean=(0.5,), std=(0.5,)) ) ```
The test dataset has 10,000 images.
Create Data Loaders
Next, we’ll use DataLoader
to manage batching for both training and testing datasets:
```python train_loader =, batch_size=64, shuffle=True) test_loader =, batch_size=64, shuffle=True) dataiter = iter(train_loader) images, labels = print(images.shape) print(labels.shape) ```
``` torch.Size([64, 1, 28, 28]) torch.Size([64]) ```
The output shows that each batch contains 64 grayscale images, each sized 28x28 pixels. Let’s display one image:
```python plt.imshow(images[0].numpy().squeeze(), cmap='gray_r'); ```
With this, our initial setup is done.
Define the Neural Network
```python class NeuralNetwork(nn.Module): def __init__(self): super().__init__() """ Define the first linear layer: Input: image (28x28 pixels) Output: input to the first hidden layer with 128 units """ self.linear1 = nn.Linear(28 * 28, 128) # Apply ReLU activation in the first hidden layer self.relu1 = nn.ReLU() """ Define the second linear layer: Input: output from the first hidden layer Output: input to the second hidden layer with 64 units """ self.linear2 = nn.Linear(128, 64) # Apply ReLU activation in the second hidden layer self.relu2 = nn.ReLU() """ Define the third linear layer: Input: output from the second hidden layer Output: output layer with 10 units """ self.linear3 = nn.Linear(64, 10) # Apply softmax for normalization at the output layer self.softmax = nn.LogSoftmax(dim=1) # Alternatively, define the model using nn.Sequential: self.model = nn.Sequential( nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 64), nn.ReLU(), nn.Linear(64, 10), nn.LogSoftmax(dim=1) ) def forward(self, x): """ Define the forward pass of the neural network x: image data with shape (64, 1, 28, 28) """ # Reshape x to (64, 784) x = x.view(x.shape[0], -1) # Forward propagation x = self.linear1(x) x = self.relu1(x) x = self.linear2(x) x = self.relu2(x) x = self.linear3(x) x = self.softmax(x) # Alternatively, this could be done using x = self.model(x) return x ```
```python model = NerualNetwork() ```
After defining the neural network, we set up the loss function, using Negative Log Likelihood Loss (NLLLoss), which is common for classification tasks.
```python criterion = nn.NLLLoss() ```
Then, we define the optimizer, using Stochastic Gradient Descent with a learning rate of 0.003 and the default momentum of 0.9 (to reduce overfitting).
```python optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9) ```
With the setup complete, we start training the dataset:
```python time0 = time() # Record the start time epochs = 15 # Train for 15 epochs for e in range(epochs): running_loss = 0 # Initialize the loss for the epoch for images, labels in train_loader: # Forward pass to get predictions output = model(images) # Compute the loss loss = criterion(output, labels) # Backward pass loss.backward() # Update weights optimizer.step() # Clear gradients optimizer.zero_grad() # Accumulate the loss running_loss += loss.item() else: # Print the loss after each epoch print("Epoch {} - Training loss: {}".format(e, running_loss/len(train_loader))) # Print total training time print("\nTraining Time (in minutes) =",(time()-time0)/60) ```
``` Epoch 0 - Training loss: 0.6462286284117937 Epoch 1 - Training loss: 0.27847810615418056 ... Epoch 13 - Training loss: 0.056689855163551565 Epoch 14 - Training loss: 0.05361823974547586 Training Time (in minutes) = 2.9436919848124186 ```
On my machine, the training took just over 2 minutes to complete, with the loss decreasing steadily.
Next, we’ll evaluate the model:
```python correct_count, all_count = 0, 0 model.eval() # Set the model to evaluation mode # Load images batch by batch from the test_loader for images,labels in test_loader: # Loop through the batch to evaluate each image for i in range(len(labels)): logps = model(images[i]) # Forward pass to get predictions probab = list(logps.detach().numpy()[0]) # Convert prediction to a list of probabilities pred_label = probab.index(max(probab)) # Get the index of the highest probability as the predicted label true_label = labels.numpy()[i] if(true_label == pred_label): # Check if the prediction is correct correct_count += 1 all_count += 1 print("Number Of Images Tested =", all_count) print("Model Accuracy =", (correct_count/all_count)) ```
``` Number Of Images Tested = 10000 Model Accuracy = 0.9741 ```
The model achieved an accuracy of 97.41% on the test dataset.
Handwritten Digit Recognition Using PyTorch — Intro To Neural Networks: