WIP: A Quick PyTorch 2.0 Tutorial¶
In short:
If you have a new GPU (NVIDIA 40XX or A100, A10G etc), you can "compile" your models and often see speed ups.
Before PyTorch 2.0:
import torch
model = create_model()
### Train model ###
### Test model ###
After PyTorch 2.0:
import torch
model = create_model()
compiled_model = torch.compile(model) # <- new!
### Train model ### <- faster!
### Test model ### <- faster!
Things to note:
- TK - add where it doesn't work
TK - Resources to learn more¶
- PyTorch 2.0 launch blog post - https://pytorch.org/get-started/pytorch-2.0/
- PyTorch 2.0 release notes - https://pytorch.org/blog/pytorch-2.0-release/
- GitHub release notes - https://github.com/pytorch/pytorch/releases/tag/v2.0.0 (lots of info here!)
- PyTorch default device context manager - https://github.com/pytorch/tutorials/pull/2220/files
In [2]:
Copied!
import torch
# Check PyTorch version
pt_version = torch.__version__
print(f"[INFO] Current PyTorch version: {pt_version} (should be 2.x+)")
# Install PyTorch 2.0 if necessary
if pt_version.split(".")[0] == "1": # Check if PyTorch version begins with 1
!pip3 install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
print("[INFO] PyTorch 2.x installed, if you're on Google Colab, you may need to restart your runtime.")
import torch
pt_version = torch.__version__
print(f"[INFO] Current PyTorch version: {pt_version} (should be 2.x+)")
else:
print("[INFO] PyTorch 2.x installed, you'll be able to use the new features.")
import torch
# Check PyTorch version
pt_version = torch.__version__
print(f"[INFO] Current PyTorch version: {pt_version} (should be 2.x+)")
# Install PyTorch 2.0 if necessary
if pt_version.split(".")[0] == "1": # Check if PyTorch version begins with 1
!pip3 install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
print("[INFO] PyTorch 2.x installed, if you're on Google Colab, you may need to restart your runtime.")
import torch
pt_version = torch.__version__
print(f"[INFO] Current PyTorch version: {pt_version} (should be 2.x+)")
else:
print("[INFO] PyTorch 2.x installed, you'll be able to use the new features.")
[INFO] Current PyTorch version: 2.0.0+cu118 (should be 2.x+) [INFO] PyTorch 2.x installed, you'll be able to use the new features.
TK - New feature: globally set devices¶
In [3]:
Copied!
# See here: https://github.com/pytorch/tutorials/pull/2220/files
import torch
with torch.device('cuda'):
mod = torch.nn.Linear(20, 30)
print(mod.weight.device)
print(mod(torch.randn(128, 20)).device)
# See here: https://github.com/pytorch/tutorials/pull/2220/files
import torch
with torch.device('cuda'):
mod = torch.nn.Linear(20, 30)
print(mod.weight.device)
print(mod(torch.randn(128, 20)).device)
cuda:0 cuda:0
In [4]:
Copied!
import torch
print(torch.version.cuda)
print(torch.cuda.is_available())
print(torch.backends.cudnn.version())
import torch
print(torch.version.cuda)
print(torch.cuda.is_available())
print(torch.backends.cudnn.version())
11.8 True 8700
TODO:
- add in info about PyTorch 2.0
- a quick upgrade for speed ups
- a quick note on which GPU will be needed (works best on NVIDIA GPUs, not macOS)
TK - Check GPU¶
- Note: If you're running on Google Colab, you'll need to setup a GPU: runtime -> change runtime type -> hardware accelerator
- Best speedups are on newer NVIDIA/AMD GPUs (this is because PyTorch 2.0 leverages new GPU hardware)
In [5]:
Copied!
# Make sure we're using a NVIDIA GPU
if torch.cuda.is_available():
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
print('Not connected to a GPU')
else:
print(f"GPU information:\n{gpu_info}")
# Get GPU name
gpu_name = !nvidia-smi --query-gpu=gpu_name --format=csv
gpu_name = gpu_name[1]
# Replace spaces with "_" (for naming files later on)
gpu_name = gpu_name.replace(" ", "_")
print(f'GPU name: {gpu_name}')
# Get GPU capability score
gpu_score = torch.cuda.get_device_capability()
print(f"GPU capability score: {gpu_score}")
# Make sure we're using a NVIDIA GPU
if torch.cuda.is_available():
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
print('Not connected to a GPU')
else:
print(f"GPU information:\n{gpu_info}")
# Get GPU name
gpu_name = !nvidia-smi --query-gpu=gpu_name --format=csv
gpu_name = gpu_name[1]
# Replace spaces with "_" (for naming files later on)
gpu_name = gpu_name.replace(" ", "_")
print(f'GPU name: {gpu_name}')
# Get GPU capability score
gpu_score = torch.cuda.get_device_capability()
print(f"GPU capability score: {gpu_score}")
GPU information: Thu Mar 16 16:19:41 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | 0% 45C P2 35W / 320W | 371MiB / 16376MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1001 G /usr/lib/xorg/Xorg 86MiB | | 0 N/A N/A 1223 G /usr/bin/gnome-shell 10MiB | | 0 N/A N/A 73068 C ...ch/env-nightly/bin/python 270MiB | +-----------------------------------------------------------------------------+ GPU name: NVIDIA_GeForce_RTX_4080 GPU capability score: (8, 9)
- TK - add a table for NVIDIA GPUs and architectures etc and which lead to speedups
TK - Simple training example¶
- CIFAR10
- ResNet50
In [8]:
Copied!
import torch
print(f"PyTorch version: {torch.__version__}")
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
import torch
print(f"PyTorch version: {torch.__version__}")
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
PyTorch version: 2.0.0+cu118 Using device: cuda
In [9]:
Copied!
import torchvision
print(f"TorchVision version: {torchvision.__version__}")
import torchvision
print(f"TorchVision version: {torchvision.__version__}")
TorchVision version: 0.15.1+cu118
Create model and transforms¶
In [10]:
Copied!
model_weights = torchvision.models.ResNet50_Weights.IMAGENET1K_V2
transforms = model_weights.transforms()
model = torchvision.models.resnet50(weights=model_weights)
total_params = sum(
param.numel() for param in model.parameters()
)
print(total_params)
model_weights = torchvision.models.ResNet50_Weights.IMAGENET1K_V2
transforms = model_weights.transforms()
model = torchvision.models.resnet50(weights=model_weights)
total_params = sum(
param.numel() for param in model.parameters()
)
print(total_params)
25557032
In [11]:
Copied!
print(transforms)
print(transforms)
ImageClassification( crop_size=[224] resize_size=[232] mean=[0.485, 0.456, 0.406] std=[0.229, 0.224, 0.225] interpolation=InterpolationMode.BILINEAR )
In [12]:
Copied!
# TODO: speedups on larger GPUs will likely be seen with larger amounts of data
# TK - also see here for using `torch.backends.cuda.matmul.allow_tf32` to enable TF32 on A100s/newer GPUS - https://github.com/pytorch/pytorch/blob/master/torch/_inductor/compile_fx.py#L86
if gpu_score >= (8, 0):
print(f"[INFO] Using GPU with score: {gpu_score}, enabling TensorFloat32 (TF32) computing (faster on new GPUs)")
# Set TF32 = True
IMAGE_SIZE = 224
torch.backends.cuda.matmul.allow_tf32 = True
transforms.crop_size = IMAGE_SIZE
transforms.resize_size = IMAGE_SIZE
else:
transforms.crop_size = 224
transforms.resize_size = 224 # Resize to 32x32, CIFAR10 is 32x32
# TODO: speedups on larger GPUs will likely be seen with larger amounts of data
# TK - also see here for using `torch.backends.cuda.matmul.allow_tf32` to enable TF32 on A100s/newer GPUS - https://github.com/pytorch/pytorch/blob/master/torch/_inductor/compile_fx.py#L86
if gpu_score >= (8, 0):
print(f"[INFO] Using GPU with score: {gpu_score}, enabling TensorFloat32 (TF32) computing (faster on new GPUs)")
# Set TF32 = True
IMAGE_SIZE = 224
torch.backends.cuda.matmul.allow_tf32 = True
transforms.crop_size = IMAGE_SIZE
transforms.resize_size = IMAGE_SIZE
else:
transforms.crop_size = 224
transforms.resize_size = 224 # Resize to 32x32, CIFAR10 is 32x32
[INFO] Using GPU with score: (8, 9), enabling TensorFloat32 (TF32) computing (faster on new GPUs)
In [13]:
Copied!
transforms
transforms
Out[13]:
ImageClassification( crop_size=224 resize_size=224 mean=[0.485, 0.456, 0.406] std=[0.229, 0.224, 0.225] interpolation=InterpolationMode.BILINEAR )
Make datasets¶
In [14]:
Copied!
train_dataset = torchvision.datasets.CIFAR10(root='.', train=True, download=True, transform=transforms)
test_dataset = torchvision.datasets.CIFAR10(root='.', train=False, download=True, transform=transforms)
# Get the lengths of the datasets
train_len = len(train_dataset)
test_len = len(test_dataset)
print(f"[INFO] Train dataset length: {train_len}")
print(f"[INFO] Test dataset length: {test_len}")
train_dataset = torchvision.datasets.CIFAR10(root='.', train=True, download=True, transform=transforms)
test_dataset = torchvision.datasets.CIFAR10(root='.', train=False, download=True, transform=transforms)
# Get the lengths of the datasets
train_len = len(train_dataset)
test_len = len(test_dataset)
print(f"[INFO] Train dataset length: {train_len}")
print(f"[INFO] Test dataset length: {test_len}")
Files already downloaded and verified Files already downloaded and verified [INFO] Train dataset length: 50000 [INFO] Test dataset length: 10000
Create DataLoaders¶
- Generally GPUs aren't the bottleneck of ML code
- Data loading is the main bottleneck
- E.g. you want to get your data to the GPU as fast as possible = more workers (though in my experience this generally caps at about ~4 workers per GPU, though don't trust me, better to do your own experiments)
- You want your GPUs to go brrrrr - https://horace.io/brrr_intro.html
- More here on crazy matmul improvements - https://twitter.com/cHHillee/status/1630274804795445248?s=20
In [15]:
Copied!
from torch.utils.data import DataLoader
# Create DataLoaders
import os
BATCH_SIZE = 128
NUM_WORKERS = os.cpu_count()
train_dataloader = DataLoader(dataset=train_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS)
test_dataloader = DataLoader(dataset=test_dataset,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS)
# Print details
print(f"Train dataloader length: {len(train_dataloader)} batches of size {BATCH_SIZE}")
print(f"Test dataloader length: {len(test_dataloader)} batches of size {BATCH_SIZE}")
print(f"Using number of workers: {NUM_WORKERS} (generally more workers means faster dataloading from CPU to GPU)")
from torch.utils.data import DataLoader
# Create DataLoaders
import os
BATCH_SIZE = 128
NUM_WORKERS = os.cpu_count()
train_dataloader = DataLoader(dataset=train_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=NUM_WORKERS)
test_dataloader = DataLoader(dataset=test_dataset,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=NUM_WORKERS)
# Print details
print(f"Train dataloader length: {len(train_dataloader)} batches of size {BATCH_SIZE}")
print(f"Test dataloader length: {len(test_dataloader)} batches of size {BATCH_SIZE}")
print(f"Using number of workers: {NUM_WORKERS} (generally more workers means faster dataloading from CPU to GPU)")
Train dataloader length: 391 batches of size 128 Test dataloader length: 79 batches of size 128 Using number of workers: 16 (generally more workers means faster dataloading from CPU to GPU)
In [16]:
Copied!
# Create filename to save the results
dataset_name = "CIFAR10"
model_name = "ResNet50"
# Create filename to save the results
dataset_name = "CIFAR10"
model_name = "ResNet50"
Create training loops¶
In [17]:
Copied!
import time
from tqdm.auto import tqdm
from typing import Dict, List, Tuple
def train_step(epoch: int,
model: torch.nn.Module,
dataloader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module,
optimizer: torch.optim.Optimizer,
device: torch.device,
disable_progress_bar: bool = False) -> Tuple[float, float]:
"""Trains a PyTorch model for a single epoch.
Turns a target PyTorch model to training mode and then
runs through all of the required training steps (forward
pass, loss calculation, optimizer step).
Args:
model: A PyTorch model to be trained.
dataloader: A DataLoader instance for the model to be trained on.
loss_fn: A PyTorch loss function to minimize.
optimizer: A PyTorch optimizer to help minimize the loss function.
device: A target device to compute on (e.g. "cuda" or "cpu").
Returns:
A tuple of training loss and training accuracy metrics.
In the form (train_loss, train_accuracy). For example:
(0.1112, 0.8743)
"""
# Put model in train mode
model.train()
# Setup train loss and train accuracy values
train_loss, train_acc = 0, 0
# Loop through data loader data batches
progress_bar = tqdm(
enumerate(dataloader),
desc=f"Training Epoch {epoch}",
total=len(dataloader),
disable=disable_progress_bar
)
for batch, (X, y) in progress_bar:
# Send data to target device
X, y = X.to(device), y.to(device)
# 1. Forward pass
y_pred = model(X)
# 2. Calculate and accumulate loss
loss = loss_fn(y_pred, y)
train_loss += loss.item()
# 3. Optimizer zero grad
optimizer.zero_grad()
# 4. Loss backward
loss.backward()
# 5. Optimizer step
optimizer.step()
# Calculate and accumulate accuracy metric across all batches
y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
train_acc += (y_pred_class == y).sum().item()/len(y_pred)
# Update progress bar
progress_bar.set_postfix(
{
"train_loss": train_loss / (batch + 1),
"train_acc": train_acc / (batch + 1),
}
)
# Adjust metrics to get average loss and accuracy per batch
train_loss = train_loss / len(dataloader)
train_acc = train_acc / len(dataloader)
return train_loss, train_acc
def test_step(epoch: int,
model: torch.nn.Module,
dataloader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module,
device: torch.device,
disable_progress_bar: bool = False) -> Tuple[float, float]:
"""Tests a PyTorch model for a single epoch.
Turns a target PyTorch model to "eval" mode and then performs
a forward pass on a testing dataset.
Args:
model: A PyTorch model to be tested.
dataloader: A DataLoader instance for the model to be tested on.
loss_fn: A PyTorch loss function to calculate loss on the test data.
device: A target device to compute on (e.g. "cuda" or "cpu").
Returns:
A tuple of testing loss and testing accuracy metrics.
In the form (test_loss, test_accuracy). For example:
(0.0223, 0.8985)
"""
# Put model in eval mode
model.eval()
# Setup test loss and test accuracy values
test_loss, test_acc = 0, 0
# Loop through data loader data batches
progress_bar = tqdm(
enumerate(dataloader),
desc=f"Testing Epoch {epoch}",
total=len(dataloader),
disable=disable_progress_bar
)
# Turn on inference context manager
with torch.no_grad(): # no_grad() required for PyTorch 2.0
# Loop through DataLoader batches
for batch, (X, y) in progress_bar:
# Send data to target device
X, y = X.to(device), y.to(device)
# 1. Forward pass
test_pred_logits = model(X)
# 2. Calculate and accumulate loss
loss = loss_fn(test_pred_logits, y)
test_loss += loss.item()
# Calculate and accumulate accuracy
test_pred_labels = test_pred_logits.argmax(dim=1)
test_acc += ((test_pred_labels == y).sum().item()/len(test_pred_labels))
# Update progress bar
progress_bar.set_postfix(
{
"test_loss": test_loss / (batch + 1),
"test_acc": test_acc / (batch + 1),
}
)
# Adjust metrics to get average loss and accuracy per batch
test_loss = test_loss / len(dataloader)
test_acc = test_acc / len(dataloader)
return test_loss, test_acc
def train(model: torch.nn.Module,
train_dataloader: torch.utils.data.DataLoader,
test_dataloader: torch.utils.data.DataLoader,
optimizer: torch.optim.Optimizer,
loss_fn: torch.nn.Module,
epochs: int,
device: torch.device,
disable_progress_bar: bool = False) -> Dict[str, List]:
"""Trains and tests a PyTorch model.
Passes a target PyTorch models through train_step() and test_step()
functions for a number of epochs, training and testing the model
in the same epoch loop.
Calculates, prints and stores evaluation metrics throughout.
Args:
model: A PyTorch model to be trained and tested.
train_dataloader: A DataLoader instance for the model to be trained on.
test_dataloader: A DataLoader instance for the model to be tested on.
optimizer: A PyTorch optimizer to help minimize the loss function.
loss_fn: A PyTorch loss function to calculate loss on both datasets.
epochs: An integer indicating how many epochs to train for.
device: A target device to compute on (e.g. "cuda" or "cpu").
Returns:
A dictionary of training and testing loss as well as training and
testing accuracy metrics. Each metric has a value in a list for
each epoch.
In the form: {train_loss: [...],
train_acc: [...],
test_loss: [...],
test_acc: [...]}
For example if training for epochs=2:
{train_loss: [2.0616, 1.0537],
train_acc: [0.3945, 0.3945],
test_loss: [1.2641, 1.5706],
test_acc: [0.3400, 0.2973]}
"""
# Create empty results dictionary
results = {"train_loss": [],
"train_acc": [],
"test_loss": [],
"test_acc": [],
"train_epoch_time": [],
"test_epoch_time": []
}
# Loop through training and testing steps for a number of epochs
for epoch in tqdm(range(epochs), disable=disable_progress_bar):
# Perform training step and time it
train_epoch_start_time = time.time()
train_loss, train_acc = train_step(epoch=epoch,
model=model,
dataloader=train_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
device=device,
disable_progress_bar=disable_progress_bar)
train_epoch_end_time = time.time()
train_epoch_time = train_epoch_end_time - train_epoch_start_time
# Perform testing step and time it
test_epoch_start_time = time.time()
test_loss, test_acc = test_step(epoch=epoch,
model=model,
dataloader=test_dataloader,
loss_fn=loss_fn,
device=device,
disable_progress_bar=disable_progress_bar)
test_epoch_end_time = time.time()
test_epoch_time = test_epoch_end_time - test_epoch_start_time
# Print out what's happening
print(
f"Epoch: {epoch+1} | "
f"train_loss: {train_loss:.4f} | "
f"train_acc: {train_acc:.4f} | "
f"test_loss: {test_loss:.4f} | "
f"test_acc: {test_acc:.4f} | "
f"train_epoch_time: {train_epoch_time:.4f} | "
f"test_epoch_time: {test_epoch_time:.4f}"
)
# Update results dictionary
results["train_loss"].append(train_loss)
results["train_acc"].append(train_acc)
results["test_loss"].append(test_loss)
results["test_acc"].append(test_acc)
results["train_epoch_time"].append(train_epoch_time)
results["test_epoch_time"].append(test_epoch_time)
# Return the filled results at the end of the epochs
return results
import time
from tqdm.auto import tqdm
from typing import Dict, List, Tuple
def train_step(epoch: int,
model: torch.nn.Module,
dataloader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module,
optimizer: torch.optim.Optimizer,
device: torch.device,
disable_progress_bar: bool = False) -> Tuple[float, float]:
"""Trains a PyTorch model for a single epoch.
Turns a target PyTorch model to training mode and then
runs through all of the required training steps (forward
pass, loss calculation, optimizer step).
Args:
model: A PyTorch model to be trained.
dataloader: A DataLoader instance for the model to be trained on.
loss_fn: A PyTorch loss function to minimize.
optimizer: A PyTorch optimizer to help minimize the loss function.
device: A target device to compute on (e.g. "cuda" or "cpu").
Returns:
A tuple of training loss and training accuracy metrics.
In the form (train_loss, train_accuracy). For example:
(0.1112, 0.8743)
"""
# Put model in train mode
model.train()
# Setup train loss and train accuracy values
train_loss, train_acc = 0, 0
# Loop through data loader data batches
progress_bar = tqdm(
enumerate(dataloader),
desc=f"Training Epoch {epoch}",
total=len(dataloader),
disable=disable_progress_bar
)
for batch, (X, y) in progress_bar:
# Send data to target device
X, y = X.to(device), y.to(device)
# 1. Forward pass
y_pred = model(X)
# 2. Calculate and accumulate loss
loss = loss_fn(y_pred, y)
train_loss += loss.item()
# 3. Optimizer zero grad
optimizer.zero_grad()
# 4. Loss backward
loss.backward()
# 5. Optimizer step
optimizer.step()
# Calculate and accumulate accuracy metric across all batches
y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
train_acc += (y_pred_class == y).sum().item()/len(y_pred)
# Update progress bar
progress_bar.set_postfix(
{
"train_loss": train_loss / (batch + 1),
"train_acc": train_acc / (batch + 1),
}
)
# Adjust metrics to get average loss and accuracy per batch
train_loss = train_loss / len(dataloader)
train_acc = train_acc / len(dataloader)
return train_loss, train_acc
def test_step(epoch: int,
model: torch.nn.Module,
dataloader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module,
device: torch.device,
disable_progress_bar: bool = False) -> Tuple[float, float]:
"""Tests a PyTorch model for a single epoch.
Turns a target PyTorch model to "eval" mode and then performs
a forward pass on a testing dataset.
Args:
model: A PyTorch model to be tested.
dataloader: A DataLoader instance for the model to be tested on.
loss_fn: A PyTorch loss function to calculate loss on the test data.
device: A target device to compute on (e.g. "cuda" or "cpu").
Returns:
A tuple of testing loss and testing accuracy metrics.
In the form (test_loss, test_accuracy). For example:
(0.0223, 0.8985)
"""
# Put model in eval mode
model.eval()
# Setup test loss and test accuracy values
test_loss, test_acc = 0, 0
# Loop through data loader data batches
progress_bar = tqdm(
enumerate(dataloader),
desc=f"Testing Epoch {epoch}",
total=len(dataloader),
disable=disable_progress_bar
)
# Turn on inference context manager
with torch.no_grad(): # no_grad() required for PyTorch 2.0
# Loop through DataLoader batches
for batch, (X, y) in progress_bar:
# Send data to target device
X, y = X.to(device), y.to(device)
# 1. Forward pass
test_pred_logits = model(X)
# 2. Calculate and accumulate loss
loss = loss_fn(test_pred_logits, y)
test_loss += loss.item()
# Calculate and accumulate accuracy
test_pred_labels = test_pred_logits.argmax(dim=1)
test_acc += ((test_pred_labels == y).sum().item()/len(test_pred_labels))
# Update progress bar
progress_bar.set_postfix(
{
"test_loss": test_loss / (batch + 1),
"test_acc": test_acc / (batch + 1),
}
)
# Adjust metrics to get average loss and accuracy per batch
test_loss = test_loss / len(dataloader)
test_acc = test_acc / len(dataloader)
return test_loss, test_acc
def train(model: torch.nn.Module,
train_dataloader: torch.utils.data.DataLoader,
test_dataloader: torch.utils.data.DataLoader,
optimizer: torch.optim.Optimizer,
loss_fn: torch.nn.Module,
epochs: int,
device: torch.device,
disable_progress_bar: bool = False) -> Dict[str, List]:
"""Trains and tests a PyTorch model.
Passes a target PyTorch models through train_step() and test_step()
functions for a number of epochs, training and testing the model
in the same epoch loop.
Calculates, prints and stores evaluation metrics throughout.
Args:
model: A PyTorch model to be trained and tested.
train_dataloader: A DataLoader instance for the model to be trained on.
test_dataloader: A DataLoader instance for the model to be tested on.
optimizer: A PyTorch optimizer to help minimize the loss function.
loss_fn: A PyTorch loss function to calculate loss on both datasets.
epochs: An integer indicating how many epochs to train for.
device: A target device to compute on (e.g. "cuda" or "cpu").
Returns:
A dictionary of training and testing loss as well as training and
testing accuracy metrics. Each metric has a value in a list for
each epoch.
In the form: {train_loss: [...],
train_acc: [...],
test_loss: [...],
test_acc: [...]}
For example if training for epochs=2:
{train_loss: [2.0616, 1.0537],
train_acc: [0.3945, 0.3945],
test_loss: [1.2641, 1.5706],
test_acc: [0.3400, 0.2973]}
"""
# Create empty results dictionary
results = {"train_loss": [],
"train_acc": [],
"test_loss": [],
"test_acc": [],
"train_epoch_time": [],
"test_epoch_time": []
}
# Loop through training and testing steps for a number of epochs
for epoch in tqdm(range(epochs), disable=disable_progress_bar):
# Perform training step and time it
train_epoch_start_time = time.time()
train_loss, train_acc = train_step(epoch=epoch,
model=model,
dataloader=train_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
device=device,
disable_progress_bar=disable_progress_bar)
train_epoch_end_time = time.time()
train_epoch_time = train_epoch_end_time - train_epoch_start_time
# Perform testing step and time it
test_epoch_start_time = time.time()
test_loss, test_acc = test_step(epoch=epoch,
model=model,
dataloader=test_dataloader,
loss_fn=loss_fn,
device=device,
disable_progress_bar=disable_progress_bar)
test_epoch_end_time = time.time()
test_epoch_time = test_epoch_end_time - test_epoch_start_time
# Print out what's happening
print(
f"Epoch: {epoch+1} | "
f"train_loss: {train_loss:.4f} | "
f"train_acc: {train_acc:.4f} | "
f"test_loss: {test_loss:.4f} | "
f"test_acc: {test_acc:.4f} | "
f"train_epoch_time: {train_epoch_time:.4f} | "
f"test_epoch_time: {test_epoch_time:.4f}"
)
# Update results dictionary
results["train_loss"].append(train_loss)
results["train_acc"].append(train_acc)
results["test_loss"].append(test_loss)
results["test_acc"].append(test_acc)
results["train_epoch_time"].append(train_epoch_time)
results["test_epoch_time"].append(test_epoch_time)
# Return the filled results at the end of the epochs
return results
In [18]:
Copied!
import torch
import torchvision
def create_model():
model_weights = torchvision.models.ResNet50_Weights.IMAGENET1K_V2
transforms = model_weights.transforms()
model = torchvision.models.resnet50(weights=model_weights)
# TK - adjust the output layer shape for CIFAR10
model.fc = torch.nn.Linear(2048, 10)
return model, transforms
model, transforms = create_model()
import torch
import torchvision
def create_model():
model_weights = torchvision.models.ResNet50_Weights.IMAGENET1K_V2
transforms = model_weights.transforms()
model = torchvision.models.resnet50(weights=model_weights)
# TK - adjust the output layer shape for CIFAR10
model.fc = torch.nn.Linear(2048, 10)
return model, transforms
model, transforms = create_model()
In [19]:
Copied!
model
model
Out[19]:
ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (avgpool): AdaptiveAvgPool2d(output_size=(1, 1)) (fc): Linear(in_features=2048, out_features=10, bias=True) )
In [21]:
Copied!
NUM_EPOCHS = 5
NUM_EPOCHS = 5
In [22]:
Copied!
model, transforms = create_model()
model.to(device)
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),
lr=0.003)
results = train(model=model,
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
epochs=NUM_EPOCHS,
device=device)
model, transforms = create_model()
model.to(device)
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),
lr=0.003)
results = train(model=model,
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
epochs=NUM_EPOCHS,
device=device)
0%| | 0/5 [00:00<?, ?it/s]
Training Epoch 0: 0%| | 0/391 [00:00<?, ?it/s]
Testing Epoch 0: 0%| | 0/79 [00:00<?, ?it/s]
Epoch: 1 | train_loss: 0.7721 | train_acc: 0.7322 | test_loss: 0.6501 | test_acc: 0.7804 | train_epoch_time: 110.3998 | test_epoch_time: 9.1594
Training Epoch 1: 0%| | 0/391 [00:00<?, ?it/s]
Testing Epoch 1: 0%| | 0/79 [00:00<?, ?it/s]
Epoch: 2 | train_loss: 0.4383 | train_acc: 0.8483 | test_loss: 0.4706 | test_acc: 0.8444 | train_epoch_time: 109.8912 | test_epoch_time: 9.1496
Training Epoch 2: 0%| | 0/391 [00:00<?, ?it/s]
Testing Epoch 2: 0%| | 0/79 [00:00<?, ?it/s]
Epoch: 3 | train_loss: 0.3049 | train_acc: 0.8947 | test_loss: 0.4456 | test_acc: 0.8423 | train_epoch_time: 109.8841 | test_epoch_time: 9.2271
Training Epoch 3: 0%| | 0/391 [00:00<?, ?it/s]
Testing Epoch 3: 0%| | 0/79 [00:00<?, ?it/s]
Epoch: 4 | train_loss: 0.2309 | train_acc: 0.9204 | test_loss: 0.4313 | test_acc: 0.8591 | train_epoch_time: 109.9004 | test_epoch_time: 9.1894
Training Epoch 4: 0%| | 0/391 [00:00<?, ?it/s]
Testing Epoch 4: 0%| | 0/79 [00:00<?, ?it/s]
Epoch: 5 | train_loss: 0.1701 | train_acc: 0.9411 | test_loss: 0.3599 | test_acc: 0.8832 | train_epoch_time: 109.8170 | test_epoch_time: 9.2094
In [23]:
Copied!
model, transforms = create_model()
model.to(device)
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),
lr=0.003)
compile_start_time = time.time()
### New in PyTorch 2.x ###
compiled_model = torch.compile(model)
##########################
compile_end_time = time.time()
compile_time = compile_end_time - compile_start_time
print(f"Time to compile: {compile_time} | Note: The first time you compile your model, the first few epochs will be slower than subsequent runs.")
compile_results = train(model=compiled_model,
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
epochs=NUM_EPOCHS,
device=device)
model, transforms = create_model()
model.to(device)
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),
lr=0.003)
compile_start_time = time.time()
### New in PyTorch 2.x ###
compiled_model = torch.compile(model)
##########################
compile_end_time = time.time()
compile_time = compile_end_time - compile_start_time
print(f"Time to compile: {compile_time} | Note: The first time you compile your model, the first few epochs will be slower than subsequent runs.")
compile_results = train(model=compiled_model,
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
epochs=NUM_EPOCHS,
device=device)
Time to compile: 0.08393430709838867 | Note: The first time you compile your model, the first few epochs will be slower than subsequent runs.
0%| | 0/5 [00:00<?, ?it/s]
Training Epoch 0: 0%| | 0/391 [00:00<?, ?it/s]
Testing Epoch 0: 0%| | 0/79 [00:00<?, ?it/s]
Epoch: 1 | train_loss: 0.8047 | train_acc: 0.7209 | test_loss: 0.6493 | test_acc: 0.7758 | train_epoch_time: 123.4870 | test_epoch_time: 17.0246
Training Epoch 1: 0%| | 0/391 [00:00<?, ?it/s]
Testing Epoch 1: 0%| | 0/79 [00:00<?, ?it/s]
Epoch: 2 | train_loss: 0.4395 | train_acc: 0.8502 | test_loss: 0.5214 | test_acc: 0.8228 | train_epoch_time: 98.0134 | test_epoch_time: 7.5135
Training Epoch 2: 0%| | 0/391 [00:00<?, ?it/s]
Testing Epoch 2: 0%| | 0/79 [00:00<?, ?it/s]
Epoch: 3 | train_loss: 0.3184 | train_acc: 0.8894 | test_loss: 0.4231 | test_acc: 0.8561 | train_epoch_time: 98.0356 | test_epoch_time: 7.6120
Training Epoch 3: 0%| | 0/391 [00:00<?, ?it/s]
Testing Epoch 3: 0%| | 0/79 [00:00<?, ?it/s]
Epoch: 4 | train_loss: 0.2440 | train_acc: 0.9148 | test_loss: 0.3896 | test_acc: 0.8698 | train_epoch_time: 98.0249 | test_epoch_time: 7.4867
Training Epoch 4: 0%| | 0/391 [00:00<?, ?it/s]
Testing Epoch 4: 0%| | 0/79 [00:00<?, ?it/s]
Epoch: 5 | train_loss: 0.1794 | train_acc: 0.9378 | test_loss: 0.4060 | test_acc: 0.8698 | train_epoch_time: 98.0487 | test_epoch_time: 7.5143
In [ ]:
Copied!
# before tf32 (with compile)
# Epoch: 2 | train_loss: 0.4303 | train_acc: 0.8524 | test_loss: 0.5928 | test_acc: 0.7969 | train_epoch_time: 98.1773 | test_epoch_time: 7.5189
# before tf32 (with compile)
# Epoch: 2 | train_loss: 0.4303 | train_acc: 0.8524 | test_loss: 0.5928 | test_acc: 0.7969 | train_epoch_time: 98.1773 | test_epoch_time: 7.5189
In [30]:
Copied!
# Create the graphs of results and compiled_results
import pandas as pd
results_df = pd.DataFrame(results)
compile_results_df = pd.DataFrame(compile_results)
# Create the graphs of results and compiled_results
import pandas as pd
results_df = pd.DataFrame(results)
compile_results_df = pd.DataFrame(compile_results)
In [31]:
Copied!
results_df.head()
results_df.head()
Out[31]:
train_loss | train_acc | test_loss | test_acc | train_epoch_time | test_epoch_time | |
---|---|---|---|---|---|---|
0 | 0.772122 | 0.732217 | 0.650112 | 0.780360 | 110.399775 | 9.159359 |
1 | 0.438305 | 0.848274 | 0.470592 | 0.844442 | 109.891169 | 9.149639 |
2 | 0.304912 | 0.894689 | 0.445594 | 0.842267 | 109.884116 | 9.227098 |
3 | 0.230868 | 0.920440 | 0.431279 | 0.859078 | 109.900380 | 9.189433 |
4 | 0.170133 | 0.941148 | 0.359897 | 0.883208 | 109.816973 | 9.209387 |
TK - Make this more obvious that it's for a single run¶
In [50]:
Copied!
def plot_mean_epoch_times(non_compiled_results, compiled_results, multi_runs=False, num_runs=0, save=False, save_path=""):
mean_train_epoch_time = non_compiled_results.train_epoch_time.mean()
mean_test_epoch_time = non_compiled_results.test_epoch_time.mean()
mean_results = [mean_train_epoch_time, mean_test_epoch_time]
mean_compile_train_epoch_time = compiled_results.train_epoch_time.mean()
mean_compile_test_epoch_time = compiled_results.test_epoch_time.mean()
mean_compile_results = [mean_compile_train_epoch_time, mean_compile_test_epoch_time]
# Calculate the percentage difference between the mean compile and non-compile train epoch times
train_epoch_time_diff = mean_compile_train_epoch_time - mean_train_epoch_time
train_epoch_time_diff_percent = (train_epoch_time_diff / mean_train_epoch_time) * 100
# Calculate the percentage difference between the mean compile and non-compile test epoch times
test_epoch_time_diff = mean_compile_test_epoch_time - mean_test_epoch_time
test_epoch_time_diff_percent = (test_epoch_time_diff / mean_test_epoch_time) * 100
# Print the mean difference percentages
print(f"Mean train epoch time difference: {round(train_epoch_time_diff_percent, 3)}% (negative means faster)")
print(f"Mean test epoch time difference: {round(test_epoch_time_diff_percent, 3)}% (negative means faster)")
# Create a bar plot of the mean train and test epoch time for both results and compiled_results
# Make both bars appear on the same plot
import matplotlib.pyplot as plt
import numpy as np
# Create plot
plt.figure(figsize=(10, 7))
width = 0.3
x_indicies = np.arange(len(mean_results))
plt.bar(x=x_indicies, height=mean_results, width=width, label="non_compiled_results")
plt.bar(x=x_indicies + width, height=mean_compile_results, width=width, label="compiled_results")
plt.xticks(x_indicies + width / 2, ("Train Epoch", "Test Epoch"))
plt.ylabel("Mean epoch time (seconds, lower is better)")
# TK - make this title include dataset/model information for a better idea of what's happening
if multi_runs:
plt.title(f"GPU: {gpu_name} | Epochs: {NUM_EPOCHS} ({num_runs} runs) | Data: {dataset_name} | Model: {model_name} | Image size: {IMAGE_SIZE} | Batch size: {BATCH_SIZE}")
else:
plt.title(f"GPU: {gpu_name} | Epochs: {NUM_EPOCHS} | Data: {dataset_name} | Model: {model_name} | Image size: {IMAGE_SIZE} | Batch size: {BATCH_SIZE}")
plt.legend();
if save:
plt.savefig(save_path)
print(f"[INFO] Plot saved to {save_path}")
def plot_mean_epoch_times(non_compiled_results, compiled_results, multi_runs=False, num_runs=0, save=False, save_path=""):
mean_train_epoch_time = non_compiled_results.train_epoch_time.mean()
mean_test_epoch_time = non_compiled_results.test_epoch_time.mean()
mean_results = [mean_train_epoch_time, mean_test_epoch_time]
mean_compile_train_epoch_time = compiled_results.train_epoch_time.mean()
mean_compile_test_epoch_time = compiled_results.test_epoch_time.mean()
mean_compile_results = [mean_compile_train_epoch_time, mean_compile_test_epoch_time]
# Calculate the percentage difference between the mean compile and non-compile train epoch times
train_epoch_time_diff = mean_compile_train_epoch_time - mean_train_epoch_time
train_epoch_time_diff_percent = (train_epoch_time_diff / mean_train_epoch_time) * 100
# Calculate the percentage difference between the mean compile and non-compile test epoch times
test_epoch_time_diff = mean_compile_test_epoch_time - mean_test_epoch_time
test_epoch_time_diff_percent = (test_epoch_time_diff / mean_test_epoch_time) * 100
# Print the mean difference percentages
print(f"Mean train epoch time difference: {round(train_epoch_time_diff_percent, 3)}% (negative means faster)")
print(f"Mean test epoch time difference: {round(test_epoch_time_diff_percent, 3)}% (negative means faster)")
# Create a bar plot of the mean train and test epoch time for both results and compiled_results
# Make both bars appear on the same plot
import matplotlib.pyplot as plt
import numpy as np
# Create plot
plt.figure(figsize=(10, 7))
width = 0.3
x_indicies = np.arange(len(mean_results))
plt.bar(x=x_indicies, height=mean_results, width=width, label="non_compiled_results")
plt.bar(x=x_indicies + width, height=mean_compile_results, width=width, label="compiled_results")
plt.xticks(x_indicies + width / 2, ("Train Epoch", "Test Epoch"))
plt.ylabel("Mean epoch time (seconds, lower is better)")
# TK - make this title include dataset/model information for a better idea of what's happening
if multi_runs:
plt.title(f"GPU: {gpu_name} | Epochs: {NUM_EPOCHS} ({num_runs} runs) | Data: {dataset_name} | Model: {model_name} | Image size: {IMAGE_SIZE} | Batch size: {BATCH_SIZE}")
else:
plt.title(f"GPU: {gpu_name} | Epochs: {NUM_EPOCHS} | Data: {dataset_name} | Model: {model_name} | Image size: {IMAGE_SIZE} | Batch size: {BATCH_SIZE}")
plt.legend();
if save:
plt.savefig(save_path)
print(f"[INFO] Plot saved to {save_path}")
In [49]:
Copied!
os.makedirs("pytorch_2_results/figures", exist_ok=True)
save_path_multi_run = f"pytorch_2_results/figures/single_run_{gpu_name}_{model_name}_{dataset_name}_{IMAGE_SIZE}_train_epoch_time.png"
plot_mean_epoch_times(results_df, compile_results_df, multi_runs=False, save_path=save_path_multi_run, save=True)
os.makedirs("pytorch_2_results/figures", exist_ok=True)
save_path_multi_run = f"pytorch_2_results/figures/single_run_{gpu_name}_{model_name}_{dataset_name}_{IMAGE_SIZE}_train_epoch_time.png"
plot_mean_epoch_times(results_df, compile_results_df, multi_runs=False, save_path=save_path_multi_run, save=True)
Mean train epoch time difference: -6.234% (negative means faster) Mean test epoch time difference: 2.648% (negative means faster) [INFO] Plot saved to pytorch_2_results/figures/single_run_NVIDIA GeForce RTX 4080_ResNet50_CIFAR10_224_train_epoch_time.png
In [27]:
Copied!
mean_test_epoch_time, mean_compile_test_epoch_time
mean_test_epoch_time, mean_compile_test_epoch_time
Out[27]:
(9.186983013153077, 9.430223417282104)
TK - Save results to file with GPU details¶
TODO:
- Save the results to file with GPU name and other details (run on multiple machines)
- Run for multiple passes (e.g. 5x runs to average the time over each run)
In [28]:
Copied!
save_name_for_non_compiled_results = f"single_run_non_compiled_results_{dataset_name}_{model_name}_{gpu_name.replace(' ', '_')}.csv"
save_name_for_compiled_results = f"single_run_compiled_results_{dataset_name}_{model_name}_{gpu_name.replace(' ', '_')}.csv"
save_name_for_non_compiled_results, save_name_for_compiled_results
save_name_for_non_compiled_results = f"single_run_non_compiled_results_{dataset_name}_{model_name}_{gpu_name.replace(' ', '_')}.csv"
save_name_for_compiled_results = f"single_run_compiled_results_{dataset_name}_{model_name}_{gpu_name.replace(' ', '_')}.csv"
save_name_for_non_compiled_results, save_name_for_compiled_results
Out[28]:
('single_run_non_compiled_results_CIFAR10_ResNet50_NVIDIA_GeForce_RTX_4080.csv', 'single_run_compiled_results_CIFAR10_ResNet50_NVIDIA_GeForce_RTX_4080.csv')
In [29]:
Copied!
# Make a directory for single_run results
import os
pytorch_2_results_dir = "pytorch_2_results"
pytorch_2_single_run_results_dir = f"{pytorch_2_results_dir}/single_run_results"
os.makedirs(pytorch_2_single_run_results_dir, exist_ok=True)
# Save the results
results_df.to_csv(f"{pytorch_2_single_run_results_dir}/{save_name_for_non_compiled_results}")
compile_results_df.to_csv(f"{pytorch_2_single_run_results_dir}/{save_name_for_compiled_results}")
# Make a directory for single_run results
import os
pytorch_2_results_dir = "pytorch_2_results"
pytorch_2_single_run_results_dir = f"{pytorch_2_results_dir}/single_run_results"
os.makedirs(pytorch_2_single_run_results_dir, exist_ok=True)
# Save the results
results_df.to_csv(f"{pytorch_2_single_run_results_dir}/{save_name_for_non_compiled_results}")
compile_results_df.to_csv(f"{pytorch_2_single_run_results_dir}/{save_name_for_compiled_results}")
TK - Try for multiple runs¶
In [34]:
Copied!
def create_and_train_non_compiled_model(epochs=NUM_EPOCHS, disable_progress_bar=False):
model, transforms = create_model()
model.to(device)
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),
lr=0.003)
results = train(model=model,
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
epochs=epochs,
device=device,
disable_progress_bar=disable_progress_bar)
return results
# TK - change this to only compile a model once and then run the training loop multiple times
# TK - the first time you compile a model, the first few epochs will be slower than subsequent runs
# TK - consider the first few epochs of training to be a "warmup" period
# def create_and_train_compiled_model(epochs=NUM_EPOCHS, disable_progress_bar=False):
# model, transforms = create_model()
# model.to(device)
# loss_fn = torch.nn.CrossEntropyLoss()
# optimizer = torch.optim.Adam(model.parameters(),
# lr=0.003)
# compile_start_time = time.time()
# ### New in PyTorch 2.x ###
# compiled_model = torch.compile(model)
# ##########################
# compile_end_time = time.time()
# compile_time = compile_end_time - compile_start_time
# print(f"Time to compile: {compile_time} | Note: The first time you compile your model, the first few epochs will be slower than subsequent runs.")
# compile_results = train(model=compiled_model,
# train_dataloader=train_dataloader,
# test_dataloader=test_dataloader,
# loss_fn=loss_fn,
# optimizer=optimizer,
# epochs=NUM_EPOCHS,
# device=device,
# disable_progress_bar=disable_progress_bar)
# return compile_results
def create_compiled_model():
model, _ = create_model()
model.to(device)
compile_start_time = time.time()
### New in PyTorch 2.x ###
compiled_model = torch.compile(model)
##########################
compile_end_time = time.time()
compile_time = compile_end_time - compile_start_time
print(f"Time to compile: {compile_time} | Note: The first time you compile your model, the first few epochs will be slower than subsequent runs.")
return compiled_model
def train_compiled_model(model=compiled_model, epochs=NUM_EPOCHS, disable_progress_bar=False):
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(compiled_model.parameters(),
lr=0.003)
compile_results = train(model=model,
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
epochs=epochs,
device=device,
disable_progress_bar=disable_progress_bar)
return compile_results
def create_and_train_non_compiled_model(epochs=NUM_EPOCHS, disable_progress_bar=False):
model, transforms = create_model()
model.to(device)
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),
lr=0.003)
results = train(model=model,
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
epochs=epochs,
device=device,
disable_progress_bar=disable_progress_bar)
return results
# TK - change this to only compile a model once and then run the training loop multiple times
# TK - the first time you compile a model, the first few epochs will be slower than subsequent runs
# TK - consider the first few epochs of training to be a "warmup" period
# def create_and_train_compiled_model(epochs=NUM_EPOCHS, disable_progress_bar=False):
# model, transforms = create_model()
# model.to(device)
# loss_fn = torch.nn.CrossEntropyLoss()
# optimizer = torch.optim.Adam(model.parameters(),
# lr=0.003)
# compile_start_time = time.time()
# ### New in PyTorch 2.x ###
# compiled_model = torch.compile(model)
# ##########################
# compile_end_time = time.time()
# compile_time = compile_end_time - compile_start_time
# print(f"Time to compile: {compile_time} | Note: The first time you compile your model, the first few epochs will be slower than subsequent runs.")
# compile_results = train(model=compiled_model,
# train_dataloader=train_dataloader,
# test_dataloader=test_dataloader,
# loss_fn=loss_fn,
# optimizer=optimizer,
# epochs=NUM_EPOCHS,
# device=device,
# disable_progress_bar=disable_progress_bar)
# return compile_results
def create_compiled_model():
model, _ = create_model()
model.to(device)
compile_start_time = time.time()
### New in PyTorch 2.x ###
compiled_model = torch.compile(model)
##########################
compile_end_time = time.time()
compile_time = compile_end_time - compile_start_time
print(f"Time to compile: {compile_time} | Note: The first time you compile your model, the first few epochs will be slower than subsequent runs.")
return compiled_model
def train_compiled_model(model=compiled_model, epochs=NUM_EPOCHS, disable_progress_bar=False):
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(compiled_model.parameters(),
lr=0.003)
compile_results = train(model=model,
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
epochs=epochs,
device=device,
disable_progress_bar=disable_progress_bar)
return compile_results
In [36]:
Copied!
# Run non-compiled model for multiple runs
NUM_RUNS = 3
NUM_EPOCHS = 5
non_compile_results_multiple_runs = []
for i in tqdm(range(NUM_RUNS)):
print(f"[INFO] Run {i+1} of {NUM_RUNS} for non-compiled model")
results = create_and_train_non_compiled_model(epochs=NUM_EPOCHS, disable_progress_bar=True)
non_compile_results_multiple_runs.append(results)
# Run non-compiled model for multiple runs
NUM_RUNS = 3
NUM_EPOCHS = 5
non_compile_results_multiple_runs = []
for i in tqdm(range(NUM_RUNS)):
print(f"[INFO] Run {i+1} of {NUM_RUNS} for non-compiled model")
results = create_and_train_non_compiled_model(epochs=NUM_EPOCHS, disable_progress_bar=True)
non_compile_results_multiple_runs.append(results)
0%| | 0/3 [00:00<?, ?it/s]
[INFO] Run 1 of 3 for non-compiled model Epoch: 1 | train_loss: 0.8378 | train_acc: 0.7089 | test_loss: 1.0247 | test_acc: 0.6870 | train_epoch_time: 109.5564 | test_epoch_time: 9.2328 Epoch: 2 | train_loss: 0.4580 | train_acc: 0.8411 | test_loss: 0.5060 | test_acc: 0.8253 | train_epoch_time: 109.5725 | test_epoch_time: 9.2088 Epoch: 3 | train_loss: 0.3323 | train_acc: 0.8846 | test_loss: 0.4765 | test_acc: 0.8399 | train_epoch_time: 109.5022 | test_epoch_time: 9.1758 Epoch: 4 | train_loss: 0.2509 | train_acc: 0.9146 | test_loss: 0.4185 | test_acc: 0.8649 | train_epoch_time: 109.5430 | test_epoch_time: 9.1935 Epoch: 5 | train_loss: 0.1788 | train_acc: 0.9376 | test_loss: 0.4003 | test_acc: 0.8716 | train_epoch_time: 109.5906 | test_epoch_time: 9.1582 [INFO] Run 2 of 3 for non-compiled model Epoch: 1 | train_loss: 0.8725 | train_acc: 0.6948 | test_loss: 0.6809 | test_acc: 0.7659 | train_epoch_time: 109.5877 | test_epoch_time: 9.2452 Epoch: 2 | train_loss: 0.4765 | train_acc: 0.8357 | test_loss: 0.5766 | test_acc: 0.8007 | train_epoch_time: 109.4739 | test_epoch_time: 9.2453 Epoch: 3 | train_loss: 0.3369 | train_acc: 0.8838 | test_loss: 0.3794 | test_acc: 0.8704 | train_epoch_time: 109.5909 | test_epoch_time: 9.2295 Epoch: 4 | train_loss: 0.2466 | train_acc: 0.9138 | test_loss: 0.5049 | test_acc: 0.8333 | train_epoch_time: 109.5572 | test_epoch_time: 9.2202 Epoch: 5 | train_loss: 0.1788 | train_acc: 0.9372 | test_loss: 0.4328 | test_acc: 0.8614 | train_epoch_time: 109.5400 | test_epoch_time: 9.2429 [INFO] Run 3 of 3 for non-compiled model Epoch: 1 | train_loss: 0.7792 | train_acc: 0.7307 | test_loss: 0.6573 | test_acc: 0.7762 | train_epoch_time: 109.6908 | test_epoch_time: 9.2283 Epoch: 2 | train_loss: 0.4296 | train_acc: 0.8538 | test_loss: 0.6004 | test_acc: 0.8050 | train_epoch_time: 109.5275 | test_epoch_time: 9.2120 Epoch: 3 | train_loss: 0.3101 | train_acc: 0.8926 | test_loss: 0.4222 | test_acc: 0.8589 | train_epoch_time: 109.6433 | test_epoch_time: 9.1418 Epoch: 4 | train_loss: 0.2354 | train_acc: 0.9186 | test_loss: 0.3736 | test_acc: 0.8787 | train_epoch_time: 109.5805 | test_epoch_time: 9.2719 Epoch: 5 | train_loss: 0.1736 | train_acc: 0.9386 | test_loss: 0.3786 | test_acc: 0.8789 | train_epoch_time: 109.6146 | test_epoch_time: 9.2244
In [37]:
Copied!
# Go through non_compile_results_multiple_runs and create a dataframe for each run then concatenate them together
non_compile_results_dfs = []
for result in non_compile_results_multiple_runs:
result_df = pd.DataFrame(result)
non_compile_results_dfs.append(result_df)
non_compile_results_multiple_runs_df = pd.concat(non_compile_results_dfs)
# Get the averages across the multiple runs
non_compile_results_multiple_runs_df = non_compile_results_multiple_runs_df.groupby(non_compile_results_multiple_runs_df.index).mean()
non_compile_results_multiple_runs_df
# Go through non_compile_results_multiple_runs and create a dataframe for each run then concatenate them together
non_compile_results_dfs = []
for result in non_compile_results_multiple_runs:
result_df = pd.DataFrame(result)
non_compile_results_dfs.append(result_df)
non_compile_results_multiple_runs_df = pd.concat(non_compile_results_dfs)
# Get the averages across the multiple runs
non_compile_results_multiple_runs_df = non_compile_results_multiple_runs_df.groupby(non_compile_results_multiple_runs_df.index).mean()
non_compile_results_multiple_runs_df
Out[37]:
train_loss | train_acc | test_loss | test_acc | train_epoch_time | test_epoch_time | |
---|---|---|---|---|---|---|
0 | 0.829813 | 0.711462 | 0.787613 | 0.743045 | 109.611609 | 9.235445 |
1 | 0.454694 | 0.843546 | 0.560969 | 0.810324 | 109.524597 | 9.222032 |
2 | 0.326421 | 0.887027 | 0.426016 | 0.856375 | 109.578773 | 9.182378 |
3 | 0.244295 | 0.915669 | 0.432327 | 0.858946 | 109.560207 | 9.228536 |
4 | 0.177081 | 0.937800 | 0.403896 | 0.870616 | 109.581744 | 9.208528 |
In [38]:
Copied!
# TK - change this to only compile a model once and then run the training loop multiple times
# Create compiled model
compiled_model = create_compiled_model()
compiled_results_multiple_runs = []
for i in tqdm(range(NUM_RUNS)):
print(f"[INFO] Run {i+1} of {NUM_RUNS} for compiled model")
results = train_compiled_model(model=compiled_model, epochs=NUM_EPOCHS, disable_progress_bar=True)
compiled_results_multiple_runs.append(results)
# TK - change this to only compile a model once and then run the training loop multiple times
# Create compiled model
compiled_model = create_compiled_model()
compiled_results_multiple_runs = []
for i in tqdm(range(NUM_RUNS)):
print(f"[INFO] Run {i+1} of {NUM_RUNS} for compiled model")
results = train_compiled_model(model=compiled_model, epochs=NUM_EPOCHS, disable_progress_bar=True)
compiled_results_multiple_runs.append(results)
Time to compile: 0.001680135726928711 | Note: The first time you compile your model, the first few epochs will be slower than subsequent runs.
0%| | 0/3 [00:00<?, ?it/s]
[INFO] Run 1 of 3 for compiled model Epoch: 1 | train_loss: 0.7646 | train_acc: 0.7342 | test_loss: 0.7037 | test_acc: 0.7672 | train_epoch_time: 122.2336 | test_epoch_time: 16.7382 Epoch: 2 | train_loss: 0.4172 | train_acc: 0.8569 | test_loss: 0.4448 | test_acc: 0.8516 | train_epoch_time: 97.6691 | test_epoch_time: 7.5259 Epoch: 3 | train_loss: 0.3056 | train_acc: 0.8939 | test_loss: 0.4070 | test_acc: 0.8654 | train_epoch_time: 97.6748 | test_epoch_time: 7.4222 Epoch: 4 | train_loss: 0.2275 | train_acc: 0.9221 | test_loss: 0.4287 | test_acc: 0.8588 | train_epoch_time: 97.6403 | test_epoch_time: 7.4536 Epoch: 5 | train_loss: 0.1673 | train_acc: 0.9409 | test_loss: 0.3591 | test_acc: 0.8873 | train_epoch_time: 97.6284 | test_epoch_time: 7.4502 [INFO] Run 2 of 3 for compiled model Epoch: 1 | train_loss: 0.1867 | train_acc: 0.9353 | test_loss: 0.3929 | test_acc: 0.8778 | train_epoch_time: 97.7071 | test_epoch_time: 7.4878 Epoch: 2 | train_loss: 0.1211 | train_acc: 0.9578 | test_loss: 0.3603 | test_acc: 0.8928 | train_epoch_time: 97.5929 | test_epoch_time: 7.4256 Epoch: 3 | train_loss: 0.0917 | train_acc: 0.9682 | test_loss: 0.4029 | test_acc: 0.8845 | train_epoch_time: 97.7499 | test_epoch_time: 7.4274 Epoch: 4 | train_loss: 0.0708 | train_acc: 0.9749 | test_loss: 0.4205 | test_acc: 0.8841 | train_epoch_time: 97.6956 | test_epoch_time: 7.4811 Epoch: 5 | train_loss: 0.0560 | train_acc: 0.9807 | test_loss: 0.4884 | test_acc: 0.8682 | train_epoch_time: 97.8197 | test_epoch_time: 7.4636 [INFO] Run 3 of 3 for compiled model Epoch: 1 | train_loss: 0.0706 | train_acc: 0.9757 | test_loss: 0.4214 | test_acc: 0.8836 | train_epoch_time: 97.8444 | test_epoch_time: 7.5171 Epoch: 2 | train_loss: 0.0509 | train_acc: 0.9825 | test_loss: 0.4852 | test_acc: 0.8805 | train_epoch_time: 97.6679 | test_epoch_time: 7.5124 Epoch: 3 | train_loss: 0.0425 | train_acc: 0.9851 | test_loss: 0.4100 | test_acc: 0.8985 | train_epoch_time: 97.6812 | test_epoch_time: 7.4985 Epoch: 4 | train_loss: 0.0410 | train_acc: 0.9859 | test_loss: 0.4030 | test_acc: 0.9010 | train_epoch_time: 97.7047 | test_epoch_time: 7.4856 Epoch: 5 | train_loss: 0.0394 | train_acc: 0.9864 | test_loss: 0.4396 | test_acc: 0.8923 | train_epoch_time: 97.6329 | test_epoch_time: 7.5252
In [39]:
Copied!
# Go through compile_results_multiple_runs and create a dataframe for each run then concatenate them together
compile_results_dfs = []
for result in compiled_results_multiple_runs:
result_df = pd.DataFrame(result)
compile_results_dfs.append(result_df)
compile_results_multiple_runs_df = pd.concat(compile_results_dfs)
# Get the averages across the multiple runs
compile_results_multiple_runs_df = compile_results_multiple_runs_df.groupby(compile_results_multiple_runs_df.index).mean()
compile_results_multiple_runs_df
# Go through compile_results_multiple_runs and create a dataframe for each run then concatenate them together
compile_results_dfs = []
for result in compiled_results_multiple_runs:
result_df = pd.DataFrame(result)
compile_results_dfs.append(result_df)
compile_results_multiple_runs_df = pd.concat(compile_results_dfs)
# Get the averages across the multiple runs
compile_results_multiple_runs_df = compile_results_multiple_runs_df.groupby(compile_results_multiple_runs_df.index).mean()
compile_results_multiple_runs_df
Out[39]:
train_loss | train_acc | test_loss | test_acc | train_epoch_time | test_epoch_time | |
---|---|---|---|---|---|---|
0 | 0.340599 | 0.881720 | 0.505985 | 0.842860 | 105.928343 | 10.581059 |
1 | 0.196404 | 0.932397 | 0.430130 | 0.874967 | 97.643334 | 7.487950 |
2 | 0.146588 | 0.949077 | 0.406647 | 0.882812 | 97.701943 | 7.449384 |
3 | 0.113085 | 0.960947 | 0.417372 | 0.881296 | 97.680187 | 7.473454 |
4 | 0.087548 | 0.969324 | 0.429059 | 0.882582 | 97.693671 | 7.479675 |
In [47]:
Copied!
def plot_mean_epoch_times(non_compiled_results, compiled_results, multi_runs=False, num_runs=0, save=False, save_path=""):
mean_train_epoch_time = non_compiled_results.train_epoch_time.mean()
mean_test_epoch_time = non_compiled_results.test_epoch_time.mean()
mean_results = [mean_train_epoch_time, mean_test_epoch_time]
mean_compile_train_epoch_time = compiled_results.train_epoch_time.mean()
mean_compile_test_epoch_time = compiled_results.test_epoch_time.mean()
mean_compile_results = [mean_compile_train_epoch_time, mean_compile_test_epoch_time]
# Calculate the percentage difference between the mean compile and non-compile train epoch times
train_epoch_time_diff = mean_compile_train_epoch_time - mean_train_epoch_time
train_epoch_time_diff_percent = (train_epoch_time_diff / mean_train_epoch_time) * 100
# Calculate the percentage difference between the mean compile and non-compile test epoch times
test_epoch_time_diff = mean_compile_test_epoch_time - mean_test_epoch_time
test_epoch_time_diff_percent = (test_epoch_time_diff / mean_test_epoch_time) * 100
# Print the mean difference percentages
print(f"Mean train epoch time difference: {round(train_epoch_time_diff_percent, 3)}% (negative means faster)")
print(f"Mean test epoch time difference: {round(test_epoch_time_diff_percent, 3)}% (negative means faster)")
# Create a bar plot of the mean train and test epoch time for both results and compiled_results
# Make both bars appear on the same plot
import matplotlib.pyplot as plt
import numpy as np
# Create plot
plt.figure(figsize=(10, 7))
width = 0.3
x_indicies = np.arange(len(mean_results))
plt.bar(x=x_indicies, height=mean_results, width=width, label="non_compiled_results")
plt.bar(x=x_indicies + width, height=mean_compile_results, width=width, label="compiled_results")
plt.xticks(x_indicies + width / 2, ("Train Epoch", "Test Epoch"))
plt.ylabel("Mean epoch time (seconds, lower is better)")
# TK - make this title include dataset/model information for a better idea of what's happening
if multi_runs:
plt.title(f"GPU: {gpu_name} | Epochs: {NUM_EPOCHS} ({NUM_RUNS} runs) | Data: {dataset_name} | Model: {model_name} | Image size: {IMAGE_SIZE} | Batch size: {BATCH_SIZE}")
else:
plt.title(f"GPU: {gpu_name} | Epochs: {NUM_EPOCHS} | Data: {dataset_name} | Model: {model_name} | Image size: {IMAGE_SIZE} | Batch size: {BATCH_SIZE}")
plt.legend();
if save:
plt.savefig(save_path)
print(f"[INFO] Plot saved to {save_path}")
def plot_mean_epoch_times(non_compiled_results, compiled_results, multi_runs=False, num_runs=0, save=False, save_path=""):
mean_train_epoch_time = non_compiled_results.train_epoch_time.mean()
mean_test_epoch_time = non_compiled_results.test_epoch_time.mean()
mean_results = [mean_train_epoch_time, mean_test_epoch_time]
mean_compile_train_epoch_time = compiled_results.train_epoch_time.mean()
mean_compile_test_epoch_time = compiled_results.test_epoch_time.mean()
mean_compile_results = [mean_compile_train_epoch_time, mean_compile_test_epoch_time]
# Calculate the percentage difference between the mean compile and non-compile train epoch times
train_epoch_time_diff = mean_compile_train_epoch_time - mean_train_epoch_time
train_epoch_time_diff_percent = (train_epoch_time_diff / mean_train_epoch_time) * 100
# Calculate the percentage difference between the mean compile and non-compile test epoch times
test_epoch_time_diff = mean_compile_test_epoch_time - mean_test_epoch_time
test_epoch_time_diff_percent = (test_epoch_time_diff / mean_test_epoch_time) * 100
# Print the mean difference percentages
print(f"Mean train epoch time difference: {round(train_epoch_time_diff_percent, 3)}% (negative means faster)")
print(f"Mean test epoch time difference: {round(test_epoch_time_diff_percent, 3)}% (negative means faster)")
# Create a bar plot of the mean train and test epoch time for both results and compiled_results
# Make both bars appear on the same plot
import matplotlib.pyplot as plt
import numpy as np
# Create plot
plt.figure(figsize=(10, 7))
width = 0.3
x_indicies = np.arange(len(mean_results))
plt.bar(x=x_indicies, height=mean_results, width=width, label="non_compiled_results")
plt.bar(x=x_indicies + width, height=mean_compile_results, width=width, label="compiled_results")
plt.xticks(x_indicies + width / 2, ("Train Epoch", "Test Epoch"))
plt.ylabel("Mean epoch time (seconds, lower is better)")
# TK - make this title include dataset/model information for a better idea of what's happening
if multi_runs:
plt.title(f"GPU: {gpu_name} | Epochs: {NUM_EPOCHS} ({NUM_RUNS} runs) | Data: {dataset_name} | Model: {model_name} | Image size: {IMAGE_SIZE} | Batch size: {BATCH_SIZE}")
else:
plt.title(f"GPU: {gpu_name} | Epochs: {NUM_EPOCHS} | Data: {dataset_name} | Model: {model_name} | Image size: {IMAGE_SIZE} | Batch size: {BATCH_SIZE}")
plt.legend();
if save:
plt.savefig(save_path)
print(f"[INFO] Plot saved to {save_path}")
In [48]:
Copied!
os.makedirs("pytorch_2_results/figures", exist_ok=True)
save_path_multi_run = f"pytorch_2_results/figures/multi_run_{gpu_name}_{model_name}_{dataset_name}_{IMAGE_SIZE}_train_epoch_time.png"
plot_mean_epoch_times(non_compile_results_multiple_runs_df, compile_results_multiple_runs_df, multi_runs=True, num_runs=NUM_RUNS, save_path=save_path_multi_run, save=True)
os.makedirs("pytorch_2_results/figures", exist_ok=True)
save_path_multi_run = f"pytorch_2_results/figures/multi_run_{gpu_name}_{model_name}_{dataset_name}_{IMAGE_SIZE}_train_epoch_time.png"
plot_mean_epoch_times(non_compile_results_multiple_runs_df, compile_results_multiple_runs_df, multi_runs=True, num_runs=NUM_RUNS, save_path=save_path_multi_run, save=True)
Mean train epoch time difference: -9.347% (negative means faster) Mean test epoch time difference: -12.165% (negative means faster) [INFO] Plot saved to pytorch_2_results/figures/multi_run_NVIDIA GeForce RTX 4080_ResNet50_CIFAR10_224_train_epoch_time.png
In [42]:
Copied!
save_name_for_multi_run_non_compiled_results = f"multi_run_non_compiled_results_{NUM_RUNS}_runs_{dataset_name}_{model_name}_{gpu_name.replace(' ', '_')}.csv"
save_name_for_multi_run_compiled_results = f"multi_run_compiled_results_{NUM_RUNS}_runs_{dataset_name}_{model_name}_{gpu_name.replace(' ', '_')}.csv"
print(save_name_for_multi_run_non_compiled_results)
print(save_name_for_multi_run_compiled_results)
# Make a directory for multi_run results
import os
pytorch_2_results_dir = "pytorch_2_results"
pytorch_2_multi_run_results_dir = f"{pytorch_2_results_dir}/multi_run_results"
os.makedirs(pytorch_2_multi_run_results_dir, exist_ok=True)
# Save the results
results_df.to_csv(f"{pytorch_2_multi_run_results_dir}/{save_name_for_multi_run_non_compiled_results}")
compile_results_df.to_csv(f"{pytorch_2_multi_run_results_dir}/{save_name_for_multi_run_compiled_results}")
save_name_for_multi_run_non_compiled_results = f"multi_run_non_compiled_results_{NUM_RUNS}_runs_{dataset_name}_{model_name}_{gpu_name.replace(' ', '_')}.csv"
save_name_for_multi_run_compiled_results = f"multi_run_compiled_results_{NUM_RUNS}_runs_{dataset_name}_{model_name}_{gpu_name.replace(' ', '_')}.csv"
print(save_name_for_multi_run_non_compiled_results)
print(save_name_for_multi_run_compiled_results)
# Make a directory for multi_run results
import os
pytorch_2_results_dir = "pytorch_2_results"
pytorch_2_multi_run_results_dir = f"{pytorch_2_results_dir}/multi_run_results"
os.makedirs(pytorch_2_multi_run_results_dir, exist_ok=True)
# Save the results
results_df.to_csv(f"{pytorch_2_multi_run_results_dir}/{save_name_for_multi_run_non_compiled_results}")
compile_results_df.to_csv(f"{pytorch_2_multi_run_results_dir}/{save_name_for_multi_run_compiled_results}")
multi_run_non_compiled_results_3_runs_CIFAR10_ResNet50_NVIDIA_GeForce_RTX_4080.csv multi_run_compiled_results_3_runs_CIFAR10_ResNet50_NVIDIA_GeForce_RTX_4080.csv
TK - Possible improvements/extensions¶
- TK - use mixed precision training - https://pytorch.org/docs/stable/notes/amp_examples.html#amp-examples (more speedups)
- Transformer based models may see better speedups than conv models (due to PyTorch 2.0) - https://pytorch.org/blog/pytorch-2.0-release/#stable-accelerated-pytorch-2-transformers