PyTorch Beginner's Tutorial (2) - Using a BP Neural Network to Recognize MNIST Handwritten Digits

In this article, we’ll implement a handwritten digit recognition model for the MNIST dataset using a basic BP (backpropagation) neural network. Let's dive right in.

Import Required Packages

import os
import numpy as np
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time
from torchvision import datasets, transforms
from torch import nn, optim

Set Up Transformations

We define a transform object to standardize the images in the dataset:

transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,)),])

Load Training Data

We’ll load and, if necessary, download the training dataset using PyTorch’s API:

train_set = datasets.MNIST('train_set',  # save here
                          download=not os.path.exists('train_set'), # download if not exists
                          train=True, # use training set
                          transform=transform # apply transform
Dataset MNIST
    Number of datapoints: 60000
    Root location: train_set
    Split: Train
Transform: Compose(
               Normalize(mean=(0.5,), std=(0.5,))

After downloading, we’ll see the training set contains 60,000 images. Next, we download the test dataset:

test_set = datasets.MNIST('test_set', 
                        download=not os.path.exists('test_set'),
Dataset MNIST
    Number of datapoints: 10000
    Root location: test_set
    Split: Test
Transform: Compose(
               Normalize(mean=(0.5,), std=(0.5,))

The test dataset has 10,000 images.

Create Data Loaders

Next, we’ll use DataLoader to manage batching for both training and testing datasets:

train_loader =, batch_size=64, shuffle=True)
test_loader =, batch_size=64, shuffle=True)

dataiter = iter(train_loader)
images, labels =

torch.Size([64, 1, 28, 28])

The output shows that each batch contains 64 grayscale images, each sized 28x28 pixels. Let’s display one image:

plt.imshow(images[0].numpy().squeeze(), cmap='gray_r');

With this, our initial setup is done.

Define the Neural Network

class NeuralNetwork(nn.Module):

    def __init__(self):

        Define the first linear layer:
        Input: image (28x28 pixels)
        Output: input to the first hidden layer with 128 units
        self.linear1 = nn.Linear(28 * 28, 128)
        # Apply ReLU activation in the first hidden layer
        self.relu1 = nn.ReLU()

        Define the second linear layer:
        Input: output from the first hidden layer
        Output: input to the second hidden layer with 64 units
        self.linear2 = nn.Linear(128, 64)
        # Apply ReLU activation in the second hidden layer
        self.relu2 = nn.ReLU()

        Define the third linear layer:
        Input: output from the second hidden layer
        Output: output layer with 10 units
        self.linear3 = nn.Linear(64, 10)
        # Apply softmax for normalization at the output layer
        self.softmax = nn.LogSoftmax(dim=1)

        # Alternatively, define the model using nn.Sequential:
        self.model = nn.Sequential(
            nn.Linear(28 * 28, 128),
            nn.Linear(128, 64),
            nn.Linear(64, 10),

    def forward(self, x):
        Define the forward pass of the neural network
        x: image data with shape (64, 1, 28, 28)
        # Reshape x to (64, 784)
        x = x.view(x.shape[0], -1)

        # Forward propagation
        x = self.linear1(x)
        x = self.relu1(x)
        x = self.linear2(x)
        x = self.relu2(x)
        x = self.linear3(x)
        x = self.softmax(x)

        # Alternatively, this could be done using x = self.model(x)

        return x
model = NerualNetwork()

After defining the neural network, we set up the loss function, using Negative Log Likelihood Loss (NLLLoss), which is common for classification tasks.

criterion = nn.NLLLoss()

Then, we define the optimizer, using Stochastic Gradient Descent with a learning rate of 0.003 and the default momentum of 0.9 (to reduce overfitting).

optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)

With the setup complete, we start training the dataset:

time0 = time() # Record the start time
epochs = 15  # Train for 15 epochs
for e in range(epochs):
    running_loss = 0  # Initialize the loss for the epoch
    for images, labels in train_loader:
        # Forward pass to get predictions
        output = model(images) 

        # Compute the loss
        loss = criterion(output, labels) 

        # Backward pass

        # Update weights

        # Clear gradients

        # Accumulate the loss
        running_loss += loss.item()
        # Print the loss after each epoch
        print("Epoch {} - Training loss: {}".format(e, running_loss/len(train_loader)))

# Print total training time
print("\nTraining Time (in minutes) =",(time()-time0)/60)
Epoch 0 - Training loss: 0.6462286284117937
Epoch 13 - Training loss: 0.056689855163551565
Epoch 14 - Training loss: 0.05361823974547586

Training Time (in minutes) = 2.9436919848124186

On my machine, the training took just over 2 minutes to complete, with the loss decreasing steadily.

Next, we’ll evaluate the model:

correct_count, all_count = 0, 0
model.eval() # Set the model to evaluation mode

# Load images batch by batch from the test_loader
for images,labels in test_loader:
    # Loop through the batch to evaluate each image
    for i in range(len(labels)):
        logps = model(images[i])  # Forward pass to get predictions
        probab = list(logps.detach().numpy()[0]) # Convert prediction to a list of probabilities
        pred_label = probab.index(max(probab)) # Get the index of the highest probability as the predicted label
        true_label = labels.numpy()[i]
        if(true_label == pred_label): # Check if the prediction is correct
            correct_count += 1
        all_count += 1

print("Number Of Images Tested =", all_count)
print("Model Accuracy =", (correct_count/all_count))
Number Of Images Tested = 10000
Model Accuracy = 0.9741

The model achieved an accuracy of 97.41% on the test dataset.


Handwritten Digit Recognition Using PyTorch — Intro To Neural Networks:

