diff --git a/recognition/47049358/.gitignore b/recognition/47049358/.gitignore
new file mode 100644
index 000000000..410d378be
--- /dev/null
+++ b/recognition/47049358/.gitignore
@@ -0,0 +1,20 @@
+semantic_labels_anon
+semantic_MRs_anon
+
+**__pycache__**
+
+*.ipynb
+
+*.pkl
+
+Images
+
+rangpur_outputs
+
+*.sh
+
+*.out
+
+*.pdf
+
+BriefDataDescription.txt
\ No newline at end of file
diff --git a/recognition/47049358/README.md b/recognition/47049358/README.md
new file mode 100644
index 000000000..83ac23de6
--- /dev/null
+++ b/recognition/47049358/README.md
@@ -0,0 +1,201 @@
+---
+title: COMP3710 Report
+author: "Ryuto Hisamoto"
+date: "2024-10-25"
+---
+
+# Table of Contents
+
+- [Table of Contents](#table-of-contents)
+- [Improved 3D UNet](#improved-3d-unet)
+  - [Problem](#problem)
+  - [Model](#model)
+- [Loading Data](#loading-data)
+- [Training](#training)
+  - [Loss Function](#loss-function)
+  - [Optimiser](#optimiser)
+- [Testing](#testing)
+- [Result](#result)
+- [Discussion](#discussion)
+- [Conclusion](#conclusion)
+- [References](#references)
+- [Dependencies](#dependencies)
+
+# Improved 3D UNet
+
+Improved 3D UNet is capable of producing segmentations for medical images. The report covers the architecture of model,its parameters and relevant components, and its performance on 3D prostate data.
+
+## Problem
+
+Segmentation is a task that requires a machine learning models to divide image components into meaningful parts.
+In other words, the model is required to classify components of an image correctly into corresponding labels.
+
+## Model
+
+[`modules.py`](modules.py)
+
+<p align="center">
+  <img src = documentation/model_architecture.png alt = "Improved 3D UNet Architecture" width = 100% >  
+  <br>
+  <em>Figure 1: Improved 3D UNet Architecture</em>
+<p>
+  
+UNet is an architecture for convolutional neural networks specifically for segmentation tasks (Gupta, 2021).
+The model takes advantage of skip connections and tensor concatenations to preserve input details and its structure while learning appropriate segmentations.
+The basic structure of UNet involves the downsampling and upsampling of original images with skip connections in between corresponding pair of downsampling and upsampling layers.
+Skip connection is a technique used to (1) preserve features of the image and (2) prevent diminishing gradients over deep layers of network preventing the learning of parameters (PATHAK, 2024).
+The authors of "Brain Tumor Segmentation and Radiomics Survival Prediction: Contribution to the BRATS 2017 Challenge" proposes the improvement on the architecture by the integration of segmentation layers at different levels. The resulting architecture is improved 3D UNet which is capable of performing complex segmentation tasks with appropriate parameters and components.
+
+In the context pathway (encoding part), 3 x 3 x 3 convolution with a stride and padding of 1 is applied in each convolutional layer. Then, instance normalisation is applied and its output groes through leaky ReLU with a negative slope of $10 ^ {-2}$ as an activation function. We refer to this module as a 'standard module', and 3 x 3 x 3 stride 2 convolution is the same as a standard module except its stride being 2 to reduce the resolution of input.Context modules are composed of two standard modules with a drop out layer in-between with a dropout probability of 30%. This helps in reducing computational cost and memory requirements. Lastly, output from context modules are combined with its input passed from a standard module with element-wise sum. From the 2nd level, the depth of layers is doubled, and the process is repeated throughout each level of the context pathway.
+
+The localisation pathway (decoding part) utilises a 4 x 4 x 4 transposed convolution with a stride of 2 and padding of 1 to increase the resolution while reducing the feature maps. As the input goes up layers, they are concatenated with the output from a context module on the same layer to preserve features which are potentially lost as they go through the network. Then, localisation modules combines the features together while reducing the number of feature maps to reduce memory consumption. Its output is handed over to the following upsampling module, and the process is repeated until it reaches back to the original level of the architecture. When the input reaches to the original level, it goes through another standard module before handed over to a segmentation layer and is summed with the previsous outputs of segmentation layers.
+
+From the third localisation layer, segmentation layers which apply 1 x 1 x 1 convolution with a stride of 1 take outputs from localisation modules and map them to the corresponding segmentation labels and are summed element-wise after upscaled to match the size. Finally the output is applied a softmax to turn its predictions of labels into probabilities for later calculation of loss. It is to be noted that, argmax has to be applied to produce proper masks from the output of the model. Otherwise, the model produces its predictions from the architecture and processed discussed.
+
+# Loading Data
+
+[`dataset.py`](dataset.py)
+
+The authors of "Brain Tumor Segmentation and Radiomics Survival Prediction: Contribution to the BRATS 2017 Challenge" seem to have used the following augmentation methods:
+
+- Random rotation
+- Random scaling
+- Elastic transformation
+- Gamma correction
+- Mirroring (assumably horizontal flip given their problem space)
+
+However, some augmentations methods are altered to limit the complexity of solution. For instance, use of elastic transformation was avoided as it could alter the image significantly, causing it to deviate from the actual images the model may find. Moreover, the tuning of such complex method could decrease the maintainability of solutin. Therefore, the project preserved basic augmentation techniques to process the training data. More precisely, techniques used are limited to:
+
+- Random Rotation ($[-0.5, 0.5]$ for all x, y, and z coordinates)
+- Random Vertical Flip
+- Gaussian Noise ($\mu = 0, \sigma = 0.5$)
+- Resizing (down to (128 x 128 x 64))
+
+Resizing is an optional transformation as it is meant to be done to save the memory consumption and increase the speed of training. **However, resizing must be applied with limited memory resources to run the model**. The past attempts have shown that the implementation cannot process image size of (256 x 256 x 128) regardless of batch size. In addition, all images are normalised as they are loaded to eliminate difference in intensity scales if there are any. Finally, all voxel values are loaded as `torch.float32` but `torch.uint8` is used for labels to save memory consumption. The labels in the dataset are indexed according to the table below, and are assigned to corresponding layers as binary masks as they are on-hot encoded. For example, segments corresponding to the first type of label appears as 1s in the 0th layer while other parts appear as 0s, and so on.
+
+ |Labels|Segment|
+| - | - |
+|0| Background |
+|1| Body |
+|2| Bones |
+|3| Bladder |
+|4| Rectum |
+|5| Prostate |
+
+
+<p align="center">
+  <img src = documentation/example_labels_and_images.png alt = "Example of labels layered on top of images" width = 100% >
+  <br>
+  <em>Figure 2: Example of labels layered on top of images.</em>
+<p>
+
+# Training
+
+[`train.py`](train.py)
+
+- Batch Size: 2
+- Number of Epochs: 300
+- Learning Rate: $5e ^ {-4}$
+- Initial Learning Rate (for lr_scheduler): 0.985
+- Weight Decay: $1e ^ {-5}$
+
+The model takes in an raw image as its input, and its goal is to learn the best feature map which ends up being a multi-channel segmentation of the original image. 
+
+## Loss Function
+
+The model utilises dice loss as its loss function. Moreover, it is capable of using deviations of dice loss such as a sum of dice loss and cross-entropy loss, or focal loss. A vanilla dice score has formula: $$D(y_{true}, y_{pred}) = 2 \times \frac{\Sigma(y_{true} \cdot y_{pred})}{\Sigma y_{true} + \Sigma y_{pred}}$$
+
+in which $y_{true}$ is the ground truth probability and $y_{pred}$ is the predicted probability. Hence dice loss is provided by:
+
+$$L_{Dice} = 1 - D(y_{true}, y_{pred})$$
+
+The loss function mitigates the problem with other loss functions such as a cross-entropy loss which tend to be biased toward a dominant class. The design of dice loss provides more accurate representation of the model's performance in segmentation. In addition `monai` provides an option to exclude background from the calculation of loss, and the model makes use of this option when calculating the loss (background is included when testing).
+
+It is recommended to use the sum of dice loss and a weighted cross-entropy loss (Yang et al., 2022) for the problem as it seems to optimise the performance the most. Cross-entropy loss is calculating by:
+
+$$L_{CE} = \frac{1}{N} \Sigma_i - [y_i \times \ln (p_i) + (1 - y_i) \times \ln (1 - p_i)]$$
+
+where $y_i$ is the lebel of sample $i$ and $p_i$ represents the probability of sample $i$ predicted to be positive, and $N$ represents the number of samples. Hence the its wegithed sum with a dice loss can be shown as
+
+$$L_{loss} = L_{Dice} + \alpha L_{CE}$$
+
+"Multi-task thyroid tumor segmentation based on the joint loss function" recommends to set $\alpha = 0.2$, so the report strictly follows it to calculate the weighted loss.
+
+## Optimiser
+
+**Adam (Adaptive Moment Estimation)** is an optimisation algorithm that boosts the speed of convergence of gradient descent. The optimiser utilises an exponential average of gradients, which allows its efficient and fast pace of convergence. Moreover, the optimiser applies a **$L_2$ regularisation** (aka Tikhonov regularisation) to penalise for the complexity of model. Complexity can be defined as the number of parameters learned from the data, and high complexity is likely to be an indication of overfitting to the training samples. Hence, regularisation is necessary to prevent the model from learning high values of parameters by penalising the model for its complexity, and $L_2$ regularisation is one of the explicit regularisation methods which adds an extra penalty term to the cost function. The parameters learned with such technique can be denoted as
+
+$$\hat{\theta} = \arg \min_\theta \frac{1}{n} ||X\theta - y||^ 2_ 2  + \lambda ||\theta|| ^ 2 _ 2$$
+
+In addition, the model utilises a learning rate scheduler based on the number of epochs, which dynamically changes the learning rate over epochs. This allows the model to start from a large learning rate which evntually settles to a small learning rate for easier convergence. In the implementation, the learnign rate is reduced by $1e ^ {-5}$ over each epoch. 
+
+It is to be noted that mixed precision and gradient accumulation are used to reduce the memory consumption during the training. **Mixed precision** reduces the memory consumption by replaceing value types with `torch.float16` where it can to reduce the space required to perform necessary operations including loss and gradient calculations necessary to train the model. **Gradient accumulation** accumulates the gradients and updates the weights after some training loop.
+
+# Testing
+
+[`predict.py`](predict.py)
+
+The model is tested by measuring its dice scores on the segmentations it produces for unseen images. Although the model outputs softmax values for its predicted segmentations, they are one-hot encoded during the test to maximise the contribution of correct predictions. Dice scores for each label is calculated independently to obtain the accurate performance to analyse the model's weakness and strengths in predicting particular segments for all labels. Then, their averages are taken and are summarised in the bar chart. Moreover, the visualisation of first 9 labels are produced with the actual segmentations for comparison.
+
+# Result
+
+<p align="center">
+  <img src = documentation/unet_dice_coefs_over_epochs_dice_ce_loss.png alt = "The Training Progress with Dice Loss + CE" width = 100% >
+  <br>
+  <em>Figure 3: The Training Progress with Dice Loss + 0.2CE</em>
+<p>
+
+<p align="center">
+  <img src = documentation/ground_truths_dice_ce_loss.png alt = "Example of Predicted Labels produced by the Model" width = 100% >
+  <br>
+  <em>Figure 4: Example of Ground Truth Labels used for Testing</em>
+<p>
+
+<p align="center">
+  <img src = documentation/predictions_dice_ce_loss.png alt = "Example of Predicted Labels produced by the Model" width = 100% >
+  <br>
+  <em>Figure 5: Example of Predicted Labels produced by the model</em>
+<p>
+
+<p align="center">
+  <img src = documentation/dice_coefs_test_dice_ce_loss.png alt = "The Final Dice Scores achieved by the Model for Each Label" width = 100% >
+  <br>
+  <em>Figure 6: The Final Dice Scores achieved by the Model for Each Label</em>
+<p>
+
+The outcome shows the significant impact of the choice of loss function in the performance of model. It was found that with other loss functions, the model performs poorly on assigning correct labels to small segments. Specifically, segment label 4 (rectum) often suffered from poor performance as it was often ignored by the model in optimising the segmentaion of corresponding label. However, the addition of weighted cross-entropy loss seem to enforce the model to classify segments correctly, which might seem to cause a tremendous improvement in performance. The final model produces segment predictions with dice scores greater than 0.8 each, which is an astonishing performance from where it started off.
+
+# Discussion
+
+Firstly, there had to be a compromise in maintaining the original resolution of the image given the limiation in resources. The model seem to perform well on downsized images, but without testing it on images with original resolution, its performance on original images can only be estimated. Moreover, the optimality of architecture remain as a question as the model could be potentially simplified to perform the same task without facing issues in its large consumption of computer memory.
+
+Secondly, the project did not incorporate the idea of pateient-level predictions. Despite the model's strong performance, its true robustness to scans taken from new patients must be explored to test its true ability to produce segmentations. In future, the model has to be tested for its capability by training it based on patient-level images.
+
+Finally, although the report strictly followed the implementation of the architectures and loss functions from the published papers with different problem space, there could be more optimal or efficient adjustments that could improve the model's performance in terms of accuracy and time and/or memory savings. Therefore, future research
+could focus on the improvement of current model with differentiations from the architectures and components already mentioned by researchers for new discoveries.
+
+# Conclusion
+
+Improved 3D UNet is a powerful architecture which makes complex image-processing tasks possible. However, its performance is truly maximised through the observation of its behaviour and performance under different settings, tunings, and/or parameter selections. In the given problem of segmenting 3D prostate images, adjusting the loss function from a vanilla dice loss to the sum of dice loss and weighted cross-entropy loss improved the performance dramatically. The model could be explored in depth in regards to its relationship with its components for improved performance, which could potentially lead to a discovery of new and more generalised architectures that could function in wider 
+
+# References
+
+1. Gupta, P. (2021, December 17). Understanding Skip Connections in Convolutional Neural Networks using U-Net Architecture. Medium. https://medium.com/@preeti.gupta02.pg/understanding-skip-connections-in-convolutional-neural-networks-using-u-net-architecture-b31d90f9670a
+
+2. Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., & Maier-Hein, K. (2018). Brain Tumor Segmentation and Radiomics Survival Prediction: Contribution to the BRATS 2017 Challenge. In arXiv. https://arxiv.org/pdf/1802.10508v1
+
+3. PATHAK, H. (2024, July 21). How do skip connections impact the training process of neural networks? Medium. https://medium.com/@harshnpathak/how-do-skip-connections-impact-the-training-process-of-neural-networks-bccca6efb2eb
+
+4. Yang, D., Li, Y., & Yu, J. (2022). Multi-task thyroid tumor segmentation based on the joint loss function. Biomedical Signal Processing and Control, 79(2). https://doi.org/10.1016/j.bspc.2022.104249
+
+# Dependencies
+
+- matplotlib=3.9.2
+- monai=1.4.0
+- nibabel=5.3.2=pypi_0
+- pytorch=2.5.0
+- scikit-learn=1.5.2=pypi_0
+- torchaudio=2.5.0
+- torchvision=0.20.0
+
+_\*For more details, please refer to the [`requirements.txt`](requirements.txt)._
\ No newline at end of file
diff --git a/recognition/47049358/dataset.py b/recognition/47049358/dataset.py
new file mode 100644
index 000000000..7a0ae82d9
--- /dev/null
+++ b/recognition/47049358/dataset.py
@@ -0,0 +1,89 @@
+#!/usr/bin/env python
+""" Initialises monai transformations and loads paths to image and label nifti files.
+
+dataset.py loads the files to perform image segmentation. It loads the paths to images and corresponding
+labels, but none of them are processed in the file. Moreover, transformation on training set and test set
+are defined in the file, but they are to be exported with corresponding dictionary files and used with
+monai.data.Dataset and Dataloader.
+
+"""
+
+# ==========================
+# Imports
+# ==========================
+import os
+from sklearn.model_selection import train_test_split
+from monai.transforms import (LoadImaged, EnsureChannelFirstd, NormalizeIntensityd,
+                               SpatialCropd, RandFlipd, RandRotated, AsDiscreted,
+                               RandGaussianNoised, Compose, CastToTyped, Resized)
+import torch
+
+__author__ = "Ryuto Hisamoto"
+
+__license__ = "Apache"
+__version__ = "1.0.0"
+__maintainer__ = "Ryuto Hisamoto"
+__email__ = "s4704935@student.uq.edu.au"
+__status__ = "Committed"
+
+# ==========================
+# Constants
+# ==========================
+
+IMAGE_FILE_NAME = '/home/groups/comp3710/HipMRI_Study_open/semantic_MRs' # on rangpur
+LABEL_FILE_NAME = '/home/groups/comp3710/HipMRI_Study_open/semantic_labels_only' # on rangpur
+
+# IMAGE_FILE_NAME = os.path.join(os.getcwd(), 'semantic_MRs_anon') # assuming folders are in the cwd
+# LABEL_FILE_NAME = os.path.join(os.getcwd(), 'semantic_labels_anon')
+
+rawImageNames = sorted(os.listdir(IMAGE_FILE_NAME))
+rawLabelNames = sorted(os.listdir(LABEL_FILE_NAME))
+
+# Split the set into train, validation, and test set (80 : 20 for train:test)
+train_images, test_images, train_labels, test_labels = train_test_split(rawImageNames, rawLabelNames, train_size=0.8) # Split the data in training and test set
+
+"""
+A transformation is performed in for consistent dimensions in each images and labels, and random augmentation
+of files to prevent the model's overfitting to the training set. They are performed in the order of: loading, cropping (to remove extra dimensions),
+normalisation of voxel values, random vertical flip (spatial_axis = 2), random rotation (of small degrees), and 
+addition of random noise. For labels, an extra step to change encodings is applied.
+"""
+
+train_transforms = Compose(
+    [
+        LoadImaged(keys=["image", "label"]),
+        EnsureChannelFirstd(keys=["image", "label"]),
+        SpatialCropd(keys=["image", "label"], roi_slices=[slice(None), slice(None), slice(0, 128)]),  # Crop to depth 128
+        NormalizeIntensityd(keys=["image"]),
+        Resized(keys=["image", "label"], spatial_size=(128, 128, 64)),
+        RandFlipd(keys=["image", "label"], spatial_axis=2, prob=0.5),
+        RandRotated(keys=["image", "label"], range_x=0.5, range_y=0.5, range_z=0.5, mode='nearest', prob=0.5),
+        RandGaussianNoised(keys=["image"], prob=0.5, mean=0, std=0.5),
+        AsDiscreted(keys=["label"], to_onehot=6),
+        CastToTyped(keys=["label"], dtype=torch.uint8),
+    ]
+)
+
+"""
+A transformation on test set involved the loading on images and labels, cropping for consistent dimensions,
+normalisation of voxel values and encoding of labels.
+"""
+
+test_transforms = Compose(
+    [
+        LoadImaged(keys=["image", "label"]),
+        EnsureChannelFirstd(keys=["image", "label"]),
+        SpatialCropd(keys=["image", "label"], roi_slices=[slice(None), slice(None), slice(0, 128)]),
+        NormalizeIntensityd(keys=["image"]),
+        Resized(keys=["image", "label"], spatial_size=(128, 128, 64)),
+        AsDiscreted(keys=["label"], to_onehot=6),
+        CastToTyped(keys=["label"], dtype=torch.uint8),
+    ]
+)
+
+# Loads paths to images and labels, but do not process them yet
+
+train_dict = [{"image": os.path.join(IMAGE_FILE_NAME, image), "label": os.path.join(LABEL_FILE_NAME, label)}
+               for image, label in zip(train_images, train_labels)]
+test_dict = [{"image": os.path.join(IMAGE_FILE_NAME, image), "label": os.path.join(LABEL_FILE_NAME, label)}
+              for image, label in zip(test_images, test_labels)]
\ No newline at end of file
diff --git a/recognition/47049358/documentation/dice_coefs_test_dice_ce_loss.png b/recognition/47049358/documentation/dice_coefs_test_dice_ce_loss.png
new file mode 100644
index 000000000..98c4671d5
Binary files /dev/null and b/recognition/47049358/documentation/dice_coefs_test_dice_ce_loss.png differ
diff --git a/recognition/47049358/documentation/example_labels_and_images.png b/recognition/47049358/documentation/example_labels_and_images.png
new file mode 100644
index 000000000..7fc154c95
Binary files /dev/null and b/recognition/47049358/documentation/example_labels_and_images.png differ
diff --git a/recognition/47049358/documentation/ground_truths_dice_ce_loss.png b/recognition/47049358/documentation/ground_truths_dice_ce_loss.png
new file mode 100644
index 000000000..d73fea031
Binary files /dev/null and b/recognition/47049358/documentation/ground_truths_dice_ce_loss.png differ
diff --git a/recognition/47049358/documentation/model_architecture.png b/recognition/47049358/documentation/model_architecture.png
new file mode 100644
index 000000000..df02206be
Binary files /dev/null and b/recognition/47049358/documentation/model_architecture.png differ
diff --git a/recognition/47049358/documentation/predictions_dice_ce_loss.png b/recognition/47049358/documentation/predictions_dice_ce_loss.png
new file mode 100644
index 000000000..437e3a75d
Binary files /dev/null and b/recognition/47049358/documentation/predictions_dice_ce_loss.png differ
diff --git a/recognition/47049358/documentation/unet_dice_coefs_over_epochs_dice_ce_loss.png b/recognition/47049358/documentation/unet_dice_coefs_over_epochs_dice_ce_loss.png
new file mode 100644
index 000000000..aec1db5a7
Binary files /dev/null and b/recognition/47049358/documentation/unet_dice_coefs_over_epochs_dice_ce_loss.png differ
diff --git a/recognition/47049358/modules.py b/recognition/47049358/modules.py
new file mode 100644
index 000000000..0edacf759
--- /dev/null
+++ b/recognition/47049358/modules.py
@@ -0,0 +1,281 @@
+#!/usr/bin/env python
+""" 
+The model and its building blocks of 3d Improved UNet
+"""
+import torch
+import torch.nn as nn
+
+__author__ = "Ryuto Hisamoto"
+
+__license__ = "Apache"
+__version__ = "1.0.0"
+__maintainer__ = "Ryuto Hisamoto"
+__email__ = "s4704935@student.uq.edu.au"
+__status__ = "Committed"
+
+NEGATIVE_SLOPE = 10 ** -2
+DROP_PROB = 0.3
+NUM_SEGMENTS = 6
+
+""" The most standard module which contains the 3 x 3 x 3 convolutional operation as with the normalisation
+ of the values and activations with Leaky ReLU. Instance normalisation is affine-enabled.
+
+ Parameters:
+    - in_channels (int): Number of input channels.
+    - out_channels (int): Number of output channels.
+    - kernel_size (int, optional): Size of the convolutional kernel (default is 3).
+    - stride (int, optional): Stride of the convolution operation (default is 1).
+    - padding (int, optional): Padding size for the convolution (default is 1).
+    - inplace (bool, optional): Whether to perform operations in place.
+"""
+class StandardModule(nn.Module):
+    def __init__(self, in_channels, out_channels,
+                 kernel_size = 3, stride = 1, padding = 1, inplace = False):
+        super(StandardModule, self).__init__()
+        self.conv = nn.Conv3d(in_channels = in_channels, out_channels = out_channels,
+                                kernel_size = kernel_size, stride = stride, padding = padding)
+        self.instance_norm = nn.InstanceNorm3d(out_channels, affine=True)
+        self.l_relu = nn.LeakyReLU(negative_slope=NEGATIVE_SLOPE, inplace = inplace)
+
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.instance_norm(x)
+        x = self.l_relu(x)
+        return x
+
+""" Context module which functions as a pre-activation residual block with 2 StandardModules
+
+ Parameters:
+    - in_channels (int): Number of input channels.
+    - out_channels (int): Number of output channels.
+"""
+class ContextModule(nn.Module):
+    def __init__(self, in_channels, out_channels):
+        super(ContextModule, self).__init__()
+        self.block1 = StandardModule(in_channels = in_channels, out_channels = out_channels, inplace = True)
+        self.dropout = nn.Dropout(DROP_PROB)
+        self.block2 = StandardModule(in_channels = in_channels, out_channels = out_channels, inplace = True)
+        
+    def forward(self, x):
+        x = self.block1(x)
+        x = self.dropout(x)
+        x = self.block2(x)
+        return x
+    
+""" A module that applies 3 x 3 x 3 convolution and following operations excpet with the stride of 2. All encoding layers
+utilise this after the first one.
+
+ Parameters:
+    - in_channels (int): Number of input channels.
+    - out_channels (int): Number of output channels.
+    - kernel_size (int, optional): Size of the convolutional kernel (default is 3).
+    - stride (int, optional): Stride of the convolution operation (default is 2).
+    - padding (int, optional): Padding size for the convolution (default is 1).
+    - inplace (bool, optional): Whether to perform operations in place.
+"""
+class Stride2Module(nn.Module):
+    def __init__(self, in_channels, out_channels,
+                 kernel_size=3, stride=2, padding=1, inplace = False):
+        super(Stride2Module, self).__init__()
+        self.conv = nn.Conv3d(in_channels = in_channels, out_channels = out_channels,
+                                kernel_size = kernel_size, stride = stride, padding = padding)
+        self.instance_norm = nn.InstanceNorm3d(out_channels, affine=True)
+        self.l_relu = nn.LeakyReLU(negative_slope=NEGATIVE_SLOPE, inplace = inplace)
+
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.instance_norm(x)
+        x = self.l_relu(x)
+        return x
+
+"""  A module that upsamples (decodes) from the bottom-most layer using a convolutional transpose.
+The module is used throughout the localisation pathway to take featres from lower levels of the network that encode
+contextual information at low spatial resolution and transfer that information to a higher spatial resolution.
+
+ Parameters:
+    - in_channels (int): Number of input channels.
+    - out_channels (int): Number of output channels.
+    - kernel_size (int, optional): Size of the convolutional kernel (default is 4).
+    - stride (int, optional): Stride of the convolution operation (default is 2).
+    - padding (int, optional): Padding size for the convolution (default is 1).
+    - inplace (bool, optional): Whether to perform operations in place.
+"""
+class UpsamplingModule(nn.Module):
+    def __init__(self, in_channels, out_channels):
+        super(UpsamplingModule, self).__init__()
+        self.conv_transpose = nn.ConvTranspose3d(in_channels = in_channels, out_channels = out_channels,
+                                                 kernel_size = 4, stride = 2, padding = 1)
+        self.block = StandardModule(in_channels = out_channels, out_channels = out_channels, inplace = True)
+
+    def forward(self, x):
+        x = self.conv_transpose(x)
+        x = self.block(x)
+        return x
+
+""" Localisation modules that consists of a 3 x 3 x 3 convolution followed by a 1 x 1 x 1 convolution that halves the
+number of feature maps. It acccepts the concatenated features from the skip connection and recombines them together.
+
+ Parameters:
+    - in_channels (int): Number of input channels.
+    - out_channels (int): Number of output channels.
+"""
+class LocalisationModule(nn.Module):
+    def __init__(self, in_channels, out_channels):
+        super(LocalisationModule, self).__init__()
+
+        self.block1 = StandardModule(in_channels = in_channels, out_channels = out_channels)
+
+        self.block2 = StandardModule(in_channels = out_channels, out_channels = out_channels, kernel_size = 1, padding = 0)
+
+    def forward(self, x):
+        x = self.block1(x)
+        x = self.block2(x)
+        return x
+
+""" A segmentation layer that is integrated at different levels of the network, which are combined via elementwise summation
+to form the final network output.
+
+ Parameters:
+    - in_channels (int): Number of input channels.
+    - out_channels (int): Number of output channels.
+"""
+class SegmentationLayer(nn.Module):
+    def __init__(self, in_channels, out_channels):
+        super(SegmentationLayer, self).__init__()
+        self.seg = nn.Conv3d(in_channels = in_channels, out_channels = out_channels,
+                                      kernel_size = 1, stride = 1, padding = 0)
+
+    def forward(self, x):
+        return self.seg(x)
+    
+""" A module that upscales the input for 2 times. The module is to be used to match the scale of feature maps
+of segmentation layers from different levels of the network.
+
+ Parameters:
+    - in_channels (int): Number of input channels.
+    - out_channels (int): Number of output channels.
+"""
+
+class UpScaleModule(nn.Module):
+    def __init__(self, in_channels, out_channels):
+        super(UpScaleModule, self).__init__()
+        self.upscale = nn.ConvTranspose3d(in_channels = in_channels, out_channels = out_channels,
+                                          kernel_size = 4, stride = 2, padding = 1)
+
+    def forward(self, x):
+        return self.upscale(x)
+    
+""" 3D imporoved UNet that produces segmentations by first aggregating high level information by
+context pathway and localising precisely in the localisation pathway. 
+"""
+class ImprovedUnet(nn.Module):
+    def __init__(self):
+        super(ImprovedUnet, self).__init__()
+        self.block1 = StandardModule(1, 16) # Grayscale thus requries 1 input channel
+        self.context1 = ContextModule(16, 16)
+
+        self.block2 = Stride2Module(16, 32)
+        self.context2 = ContextModule(32, 32)
+
+        self.block3 = Stride2Module(32, 64)
+        self.context3 = ContextModule(64, 64)
+
+        self.block4 = Stride2Module(64, 128)
+        self.context4 = ContextModule(128, 128)
+
+        self.block5 = Stride2Module(128, 256)
+        self.context5 = ContextModule(256, 256)        
+
+        self.upsample1 = UpsamplingModule(256, 128)
+
+        self.localise1 = LocalisationModule(256, 128)
+        self.upsample2 = UpsamplingModule(128, 64)
+
+        self.localise2 = LocalisationModule(128, 64)
+        self.upsample3 = UpsamplingModule(64, 32)
+
+        self.localise3 = LocalisationModule(64, 32)
+        self.upsample4 = UpsamplingModule(32, 16)
+
+        self.conv_output = StandardModule(32, 32)
+
+        # first segmentation layer
+        self.segmentation1 = SegmentationLayer(64, NUM_SEGMENTS)
+
+        # second segmentation layer
+        self.segmentation2 = SegmentationLayer(32, NUM_SEGMENTS)
+
+        # third segmentation layer
+        self.segmentation3 = SegmentationLayer(32, NUM_SEGMENTS)
+
+        # upscaling layers
+        self.upscale_1 = UpScaleModule(NUM_SEGMENTS, NUM_SEGMENTS)
+        self.upscale_2 = UpScaleModule(NUM_SEGMENTS, NUM_SEGMENTS)
+
+
+    def forward(self, x):
+
+        # Level 1 context pathway
+        conv_out_1 = self.block1(x)
+        context_out_1 = self.context1(conv_out_1)
+        element_sum_1 = conv_out_1 + context_out_1
+
+        # Level 2 context pathway
+        conv_out_2 = self.block2(element_sum_1)
+        context_out_2  = self.context2(conv_out_2)
+        element_sum_2 = conv_out_2 + context_out_2
+
+        # Level 3 context pathway
+        conv_out_3 = self.block3(element_sum_2)
+        context_out_3 = self.context3(conv_out_3)
+        element_sum_3 = conv_out_3 + context_out_3
+
+        # Level 4 context pathway
+        conv_out_4 = self.block4(element_sum_3)
+        context_out_4 = self.context4(conv_out_4)
+        element_sum_4 = conv_out_4 + context_out_4
+
+        # Level 5 context pathway
+        conv_out_5 = self.block5(element_sum_4)
+        context_out_5 = self.context5(conv_out_5)
+        element_sum_5 = conv_out_5 + context_out_5
+
+        # Level 0 localisation pathway
+        upsample_out_1 = self.upsample1(element_sum_5)
+
+        # Level 1 localisation pathway
+        concat_1 = torch.cat((element_sum_4, upsample_out_1), dim = 1)
+        localisation_out_1 = self.localise1(concat_1)
+        upsample_out_2 = self.upsample2(localisation_out_1)
+
+        # Level 2 localisation pathway
+        concat_2 = torch.cat((element_sum_3, upsample_out_2), dim = 1)
+        localisation_out_2 = self.localise2(concat_2)
+        upsample_out_3 = self.upsample3(localisation_out_2)
+
+        # Level 3 localisation pathway
+        concat_3 = torch.cat((element_sum_2, upsample_out_3), dim = 1)
+        localisation_out_3 = self.localise3(concat_3)
+        upsample_out_4 = self.upsample4(localisation_out_3)
+
+        # Level 4 localisation pathway
+        concat_4 = torch.cat((element_sum_1, upsample_out_4), dim = 1)
+        convoutput_out = self.conv_output(concat_4)
+
+        # 1st Segmentation Layer
+        segment_out_1 = self.segmentation1(localisation_out_2)
+        upscale_out_1 = self.upscale_1(segment_out_1)
+
+        # 2nd Segmentation Layer
+        segment_out_2 = self.segmentation2(localisation_out_3)
+        seg_sum_1 = upscale_out_1 + segment_out_2
+        
+        # 3rd Segmentation Layer
+        upscale_out_2 = self.upscale_2(seg_sum_1)
+        segment_out_3 = self.segmentation3(convoutput_out)
+
+        final_sum = upscale_out_2 + segment_out_3
+        
+        output =  torch.softmax(final_sum, dim = 1)
+        
+        return output
\ No newline at end of file
diff --git a/recognition/47049358/predict.py b/recognition/47049358/predict.py
new file mode 100644
index 000000000..d39a71bae
--- /dev/null
+++ b/recognition/47049358/predict.py
@@ -0,0 +1,261 @@
+"""
+The file contains a method to visualise and/or measure the performance of the trained model
+on unseen data.
+"""
+# libraries 
+import torch
+import torch.nn as nn
+import numpy as np
+import matplotlib.pyplot as plt
+from time import time
+from monai.losses import DiceLoss
+from monai.data import DataLoader, Dataset
+from monai.transforms import (AsDiscrete, Compose, CastToType)
+
+# import from local files  
+from train import trained_model, CRITERION, compute_dice_segments, DEVICE, CRITERION_NAME
+from dataset import test_dict, test_transforms
+
+__author__ = "Ryuto Hisamoto"
+
+__license__ = "Apache"
+__version__ = "1.0.0"
+__maintainer__ = "Ryuto Hisamoto"
+__email__ = "s4704935@student.uq.edu.au"
+__status__ = "Committed"
+
+def visualise_ground_truths(images: list, ground_truths: list, criterion_name: str):
+    """ Visualises the ground truths and their images by overlaying them on the same 3 x 3 plot.
+
+    Args:
+        images (list): Images to overlay labels on.
+        ground_truths (list): Labels to overlay on top of images.
+        criterion_name (str): Name of the loss function used during the training to name the plot.
+
+    Returns:
+        None: The function only plots, so it does not return any value.
+    """
+
+    # Create a 3x3 grid of subplots
+    fig, axes = plt.subplots(3, 3, figsize=(15, 15))
+
+    # Plot the images
+    for i in range(3):
+        for j in range(3):
+
+            idx = i * 3 + j
+
+            # Original image
+
+            image = images[idx]
+
+            axes[i, j].imshow(image, cmap='gray')
+            axes[i, j].axis('off')
+            axes[i, j].set_title(f'Image {idx+1}')
+
+            # Ground truth mask
+
+            ground_truth = ground_truths[idx]
+            num_masks = ground_truth.shape[0]
+
+            mask_gt = np.zeros((ground_truth.shape[1], ground_truth.shape[2]), dtype = np.uint8)
+
+            for k in range(num_masks):
+                mask_gt += (k + 1) * ground_truth[k, : , : ]
+            axes[i, j].imshow(mask_gt, cmap='jet', alpha=0.3)
+
+    # Show the plot
+    plt.tight_layout()
+    plt.savefig(f'ground_truths_{str(CRITERION_NAME)}.png')
+    plt.close()
+
+def visualise_predictions(images: list, predictions: list, criterion_name : str):
+    """Visualises the predictions and their images by overlaying them on the same 3 x 3 plot.
+
+    Args:
+        images (list): A list of images to lay predicted labels on
+        predictions (list): A list of predicted labels proeuced by the model
+        criterion (str): The name of loss function used during the training to name the plot.
+
+    Returns:
+        None: The function only plots, so it does not return any value.
+    """
+
+    # Create a 3x3 grid of subplots
+    fig, axes = plt.subplots(3, 3, figsize=(15, 15))
+
+    # Plot the images
+    for i in range(3):
+        for j in range(3):
+
+            idx = i * 3 + j
+
+            # Original image
+
+            image = images[idx]
+
+            axes[i, j].imshow(image, cmap='gray')
+            axes[i, j].axis('off')
+            axes[i, j].set_title(f'Image {idx+1}')
+
+            mask_pred = predictions[idx]
+
+            axes[i, j].imshow(mask_pred, cmap='jet', alpha=0.3)
+
+    # Show the plot
+    plt.tight_layout()
+    plt.savefig(f'predictions_{str(CRITERION_NAME)}.png')
+    plt.close()
+
+def test(model: nn.Module, test_loader: DataLoader, device: torch.device | str):
+    """The function which tests the model on unseen data stored in a DataLoader
+
+    Args:
+        model (nn.Module): A trained model that is to be tested.
+        test_loader (DataLoader): DataLoader instance which contains image data and their labels for the model
+        to compare its performance against.
+        device (torch.device | str): A device the training is based on. 
+
+    Returns:
+        tuple: A tuple containing:
+            - np.array: An array of overall dice score for each test image and labels
+            - np.array: An array of segment 0 dice score for each test image and labels
+            - np.array: An array of segment 1 dice score for each test image and labels
+            - np.array: An array of segment 2 dice score for each test image and labels
+            - np.array: An array of segment 3 dice score for each test image and labels
+            - np.array: An array of segment 4 dice score for each test image and labels
+            - np.array: An array of segment 5 dice score for each test image and labels
+    """
+
+    model.to(device)
+    model.eval()  # Set the model to evaluation mode
+
+    criterion = DiceLoss(batch = True)
+    
+    test_dice_coefs = np.array([]) # stores dice scores.
+    seg_0_dice_coef = np.array([])
+    seg_1_dice_coef = np.array([])
+    seg_2_dice_coef = np.array([])
+    seg_3_dice_coef = np.array([])
+    seg_4_dice_coef = np.array([])
+    seg_5_dice_coef = np.array([])
+
+    images = []
+    ground_truths = []
+    predictions = []
+
+    output_transform = Compose(
+    [
+        AsDiscrete(to_onehot=6),
+        CastToType(dtype=torch.uint8),
+    ]
+)
+
+    with torch.no_grad():
+
+        for i, batch_data in enumerate(test_loader):
+            inputs, labels = (
+                batch_data["image"].to(device),
+                batch_data["label"].to(device),
+            )
+            outputs = model(inputs)
+            outputs = output_transform(torch.argmax(outputs, dim=1))[np.newaxis, : , : , : , :]
+            segment_coefs = compute_dice_segments(outputs, labels, device)
+            dice_loss = criterion(outputs, labels).item()
+
+            test_dice = 1 - dice_loss
+
+            if len(images) < 9:
+                image = inputs[0, 0 , : , : , 50].cpu().numpy()
+                images.append(image)
+                mask = labels[0, : , : , : , 50].cpu().numpy().astype(np.uint8)
+                ground_truths.append(mask)
+                prediction = torch.argmax(outputs[0, : , : , : , 50 ], dim = 0).cpu().numpy().astype(np.uint8)
+                predictions.append(prediction)
+
+            seg_0_dice_coef = np.append(seg_0_dice_coef, segment_coefs[0].item())
+            seg_1_dice_coef = np.append(seg_1_dice_coef, segment_coefs[1].item())
+            seg_2_dice_coef = np.append(seg_2_dice_coef, segment_coefs[2].item())
+            seg_3_dice_coef = np.append(seg_3_dice_coef, segment_coefs[3].item())
+            seg_4_dice_coef = np.append(seg_4_dice_coef, segment_coefs[4].item())
+            seg_5_dice_coef = np.append(seg_5_dice_coef, segment_coefs[5].item())
+        
+            print(f'Test No.{i} - Overall Dice Coefficient: {test_dice}')
+                
+            test_dice_coefs = np.append(test_dice_coefs, test_dice)
+    
+    visualise_ground_truths(images, ground_truths, CRITERION_NAME)
+    visualise_predictions(images, predictions, CRITERION_NAME)
+
+    return test_dice_coefs, seg_0_dice_coef, seg_1_dice_coef, seg_2_dice_coef, seg_3_dice_coef, seg_4_dice_coef, seg_5_dice_coef
+
+def plot_dice(criterion_name : str, segment_coefs: np.array):
+    """ A method that plots a bar chart to visualise the performance of model on unseen data
+    for each label. It is meant to demonstrated how accurately the model produces segmentations
+    for each lebel.
+
+    Args:
+        criterion (str): The name of loss function used during the training to name the plot.
+        segment_coefs (np.array): an array containing dice scores for each segment at corresponding indices.
+    """
+
+    x_values = np.arange(len(segment_coefs))  # Generate x-values as indices
+
+    # Plot overall dice scores
+    plt.bar(x_values, segment_coefs)
+
+    plt.xlabel("Segment No.")
+    plt.ylabel("Dice Score")
+    plt.title("Dice Score for Each Segment")
+    plt.legend()
+    plt.grid(True)
+    plt.savefig(f'dice_coefs_test_{str(criterion_name)}.png')
+    plt.close()
+
+
+if __name__ == "__main__":
+    # connect to gpu
+
+    test_set = Dataset(test_dict, test_transforms)
+    test_loader = DataLoader(dataset = test_set, batch_size = 1)
+
+    print('> Start Testing')
+
+    start = time()
+
+    # perform predictions
+    dice_coefs, s0, s1, s2, s3, s4, s5 = test(model = trained_model, test_loader = test_loader,
+                                               device = DEVICE)
+    
+    end = time()
+
+    elapsed_time = end - start
+    
+    print(f"> Test completed in {elapsed_time:.2f} seconds")
+
+    average_dice = np.mean(dice_coefs)
+    print(f"Average Dice Coefficient: {average_dice:.4f}")
+
+    average_s0 = np.mean(s0)
+    print(f"Segment 0 Dice Coefficient: {average_s0:.4f}")
+
+    average_s1 = np.mean(s1)
+    print(f"Segment 1 Dice Coefficient: {average_s1:.4f}")
+
+    average_s2 = np.mean(s2)
+    print(f"Segment 2 Dice Coefficient: {average_s2:.4f}")
+
+    average_s3 = np.mean(s3)
+    print(f"Segment 3 Dice Coefficient: {average_s3:.4f}")
+
+    average_s4 = np.mean(s4)
+    print(f"Segment 4 Dice Coefficient: {average_s4:.4f}")
+
+    average_s5 = np.mean(s5)
+    print(f"Segment 5 Dice Coefficient: {average_s5:.4f}")
+
+    segment_coefs = np.array([average_s0, average_s1, average_s2, average_s3,
+                      average_s4, average_s5])
+
+    # plot dice scores across the dataset.
+    plot_dice(CRITERION_NAME, segment_coefs)
\ No newline at end of file
diff --git a/recognition/47049358/requirements.txt b/recognition/47049358/requirements.txt
new file mode 100644
index 000000000..f0f59e104
--- /dev/null
+++ b/recognition/47049358/requirements.txt
@@ -0,0 +1,113 @@
+# This file may be used to create an environment using:
+# $ conda create --name <env> --file <this file>
+# platform: linux-64
+_libgcc_mutex=0.1=main
+_openmp_mutex=5.1=1_gnu
+blas=1.0=mkl
+brotli-python=1.0.9=py312h6a678d5_8
+bzip2=1.0.8=h5eee18b_6
+ca-certificates=2024.9.24=h06a4308_0
+certifi=2024.8.30=py312h06a4308_0
+charset-normalizer=3.3.2=pyhd3eb1b0_0
+contourpy=1.3.0=pypi_0
+cuda-cudart=11.8.89=0
+cuda-cupti=11.8.87=0
+cuda-libraries=11.8.0=0
+cuda-nvrtc=11.8.89=0
+cuda-nvtx=11.8.86=0
+cuda-runtime=11.8.0=0
+cuda-version=12.6=3
+cycler=0.12.1=pypi_0
+expat=2.6.3=h6a678d5_0
+ffmpeg=4.3=hf484d3e_0
+filelock=3.13.1=py312h06a4308_0
+fonttools=4.54.1=pypi_0
+freetype=2.12.1=h4a9f257_0
+fsspec=2024.10.0=pypi_0
+giflib=5.2.2=h5eee18b_0
+gmp=6.2.1=h295c915_3
+gnutls=3.6.15=he1e5248_0
+idna=3.7=py312h06a4308_0
+intel-openmp=2023.1.0=hdb19cb5_46306
+jinja2=3.1.4=py312h06a4308_0
+joblib=1.4.2=pypi_0
+jpeg=9e=h5eee18b_3
+kiwisolver=1.4.7=pypi_0
+lame=3.100=h7b6447c_0
+lcms2=2.12=h3be6417_0
+ld_impl_linux-64=2.40=h12ee557_0
+lerc=3.0=h295c915_0
+libcublas=11.11.3.6=0
+libcufft=10.9.0.58=0
+libcufile=1.11.1.6=0
+libcurand=10.3.7.77=0
+libcusolver=11.4.1.48=0
+libcusparse=11.7.5.86=0
+libdeflate=1.17=h5eee18b_1
+libffi=3.4.4=h6a678d5_1
+libgcc-ng=11.2.0=h1234567_1
+libgomp=11.2.0=h1234567_1
+libiconv=1.16=h5eee18b_3
+libidn2=2.3.4=h5eee18b_0
+libjpeg-turbo=2.0.0=h9bf148f_0
+libnpp=11.8.0.86=0
+libnvjpeg=11.9.0.86=0
+libpng=1.6.39=h5eee18b_0
+libstdcxx-ng=11.2.0=h1234567_1
+libtasn1=4.19.0=h5eee18b_0
+libtiff=4.5.1=h6a678d5_0
+libunistring=0.9.10=h27cfd23_0
+libuuid=1.41.5=h5eee18b_0
+libwebp=1.3.2=h11a3e52_0
+libwebp-base=1.3.2=h5eee18b_1
+llvm-openmp=14.0.6=h9e868ea_0
+lz4-c=1.9.4=h6a678d5_1
+markupsafe=2.1.3=py312h5eee18b_0
+matplotlib=3.9.2=pypi_0
+mkl=2023.1.0=h213fc3f_46344
+mkl-service=2.4.0=py312h5eee18b_1
+mkl_fft=1.3.10=py312h5eee18b_0
+mkl_random=1.2.7=py312h526ad5a_0
+monai=1.4.0=pypi_0
+mpmath=1.3.0=py312h06a4308_0
+ncurses=6.4=h6a678d5_0
+nettle=3.7.3=hbbd107a_1
+networkx=3.2.1=py312h06a4308_0
+nibabel=5.3.2=pypi_0
+numpy=1.26.4=pypi_0
+openh264=2.1.1=h4ff587b_0
+openjpeg=2.5.2=he7f1fd0_0
+openssl=3.0.15=h5eee18b_0
+packaging=24.1=pypi_0
+pillow=10.4.0=py312h5eee18b_0
+pip=24.2=py312h06a4308_0
+pyparsing=3.2.0=pypi_0
+pysocks=1.7.1=py312h06a4308_0
+python=3.12.7=h5148396_0
+python-dateutil=2.9.0.post0=pypi_0
+pytorch=2.5.0=py3.12_cuda11.8_cudnn9.1.0_0
+pytorch-cuda=11.8=h7e8668a_6
+pytorch-mutex=1.0=cuda
+pyyaml=6.0.2=py312h5eee18b_0
+readline=8.2=h5eee18b_0
+requests=2.32.3=py312h06a4308_0
+scikit-learn=1.5.2=pypi_0
+scipy=1.14.1=pypi_0
+setuptools=75.1.0=py312h06a4308_0
+six=1.16.0=pypi_0
+sqlite=3.45.3=h5eee18b_0
+sympy=1.13.1=pypi_0
+tbb=2021.8.0=hdb19cb5_0
+threadpoolctl=3.5.0=pypi_0
+tk=8.6.14=h39e8969_0
+torchaudio=2.5.0=py312_cu118
+torchtriton=3.1.0=py312
+torchvision=0.20.0=py312_cu118
+typing_extensions=4.11.0=py312h06a4308_0
+tzdata=2024b=h04d1e81_0
+urllib3=2.2.3=py312h06a4308_0
+wheel=0.44.0=py312h06a4308_0
+xz=5.4.6=h5eee18b_1
+yaml=0.2.5=h7b6447c_0
+zlib=1.2.13=h5eee18b_1
+zstd=1.5.6=hc292b87_0
diff --git a/recognition/47049358/train.py b/recognition/47049358/train.py
new file mode 100644
index 000000000..28e293b8e
--- /dev/null
+++ b/recognition/47049358/train.py
@@ -0,0 +1,200 @@
+#!/usr/bin/env python
+""" A collection of methods and tools to train the model.
+
+train.py has methods and tools which are related to the training of model,
+some sections are commented out, and are to be enabled to change the training method.
+For example, there are a few CRITERIONs, and one of preference is to be enabled to utilise
+the loss function of preference.
+"""
+
+from dataset import train_dict, train_transforms
+from modules import ImprovedUnet
+import matplotlib.pyplot as plt
+import torch
+from torch.utils.data import DataLoader
+import torch.nn as nn
+from time import time
+import numpy as np
+from monai.losses import DiceLoss, DiceCELoss, DiceFocalLoss
+from monai.data import DataLoader, Dataset
+from torch.amp import autocast, GradScaler
+
+__author__ = "Ryuto Hisamoto"
+
+__license__ = "Apache"
+__version__ = "1.0.0"
+__maintainer__ = "Ryuto Hisamoto"
+__email__ = "s4704935@student.uq.edu.au"
+__status__ = "Committed"
+
+NUM_EPOCHS = 300
+BATCH_SIZE = 2
+LEARNING_RATE = 5e-4
+WEIGHT_DECAY = 1e-5
+LR_INITIAL = 0.985
+DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# CRITERION = DiceLoss(include_background=False, batch=True).to(DEVICE)
+# CRITERION_NAME = 'dice_loss'
+CRITERION = DiceCELoss(include_background=False, batch = True, lambda_ce = 0.2).to(DEVICE) # Based on Thyroid Tumor Segmentation Report
+CRITERION_NAME = 'dice_ce_loss'
+# CRITERION = DiceFocalLoss(include_background=False, batch = True).to(DEVICE) # Default gamma = 2
+# CRITERION_NAME = 'dice_focal_loss'
+
+
+def compute_dice_segments(predictions: torch.Tensor, ground_truths: torch.Tensor, device: torch.device | str):
+    """The method calculates the dice scores for each segment. The score computed
+    inside of the method is independent from the loss used to train the model, hence
+    its scores are purely dice coefficients for different segments. Both predictions and 
+    ground_truths are required to have the same shape.
+
+    Args:
+        predictions (torch.Tensor): Softmax/one-hot encoded predictions from the model.
+        ground_truths (torch.Tensor): One-hot encoded labels for an image.
+        device (torch.device | str): A device the process is based on, 'cuda' or 'cpu'
+
+    Returns:
+        torch.Tensor: _A 0 dimensional tensor in which dice coefficient of a corresponding
+        label is stored in its index.
+    """
+
+    criterion = DiceLoss(reduction='none', batch=True).to(device)
+
+    num_masks = predictions.size(1)
+
+    segment_coefs = torch.zeros(num_masks).to(device)
+
+    segment_losses = criterion(predictions, ground_truths)
+
+    for i in range(num_masks):
+        
+        segment_coefs[i] = 1 - segment_losses[i, : , : , : ].item()
+
+    return segment_coefs
+
+def train(model: nn.Module, train_loader: DataLoader, criterion, num_epochs: int, device: torch.device | str):
+    """The training method that trains the model with given resources and parameters.
+
+    Args:
+        model (nn.Module): A model to train.
+        train_loader (DataLoader): DataLoader instance which contains image data and their labels for the model
+        to compare its performance against.
+        criterion (callable): A loss function the model uses to optimise its performance
+        num_epochs (int): The number of epochs to train the model on.
+        device (torch.device | str): A device the training is based on. It is highly recommended to use 'cuda' for training.
+
+    Returns:
+        tuple: A tuple containing:
+            - nn.Module: A trained model
+            - np.array: An array of overall dice score for each epoch
+            - np.array: An array of segment 0 dice score for each epoch
+            - np.array: An array of segment 1 dice score for each epoch
+            - np.array: An array of segment 2 dice score for each epoch
+            - np.array: An array of segment 3 dice score for each epoch
+            - np.array: An array of segment 4 dice score for each epoch
+            - np.array: An array of segment 5 dice score for each epoch
+    """
+    
+    # Adam is used as an optimiser
+    optimiser = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE, weight_decay = WEIGHT_DECAY)
+    scheduler = torch.optim.lr_scheduler.ExponentialLR(optimiser, gamma = LR_INITIAL)
+
+    model.to(device)
+    model.train()
+
+    dice = DiceLoss(batch=True).to(device)
+
+    training_dice_coefs = np.zeros(NUM_EPOCHS)
+    seg_0_dice_coefs = np.zeros(NUM_EPOCHS)
+    seg_1_dice_coefs = np.zeros(NUM_EPOCHS)
+    seg_2_dice_coefs = np.zeros(NUM_EPOCHS)
+    seg_3_dice_coefs = np.zeros(NUM_EPOCHS)
+    seg_4_dice_coefs = np.zeros(NUM_EPOCHS)
+    seg_5_dice_coefs = np.zeros(NUM_EPOCHS)
+
+    scaler = GradScaler(device=DEVICE.type)
+
+    accumuldation_steps = 2
+
+    for epoch in range(num_epochs):
+        running_dice = 0.0
+        total_segment_coefs = torch.zeros(6, device=device)
+        for i, batch_data in enumerate(train_loader):
+
+            inputs, labels = (
+                batch_data["image"].to(device),
+                batch_data["label"].to(device),
+            )
+            
+            with autocast(device_type=DEVICE.type):
+                optimiser.zero_grad()
+                outputs = model(inputs)
+                loss = criterion(outputs, labels) 
+
+            segment_coefs = compute_dice_segments(outputs, labels, device)
+            total_segment_coefs += segment_coefs
+            scaler.scale(loss).backward()
+
+            if (i + 1) % accumuldation_steps == 0 or i + 1 == len(train_loader): # Gradient Accumulation
+                scaler.step(optimiser)
+                scaler.update()
+
+            dice_batch = dice(outputs, labels)
+            running_dice += 1 - dice_batch.item()
+
+        scheduler.step()
+
+        for i in range(len(total_segment_coefs)):
+            print(f"Epoch {epoch + 1} Segment {i} - Training Dice Coefficient: {total_segment_coefs[i] / len(train_loader)}")
+
+        seg_0_dice_coefs[epoch] = (total_segment_coefs[0] / len(train_loader))
+        seg_1_dice_coefs[epoch] = (total_segment_coefs[1] / len(train_loader))
+        seg_2_dice_coefs[epoch] = (total_segment_coefs[2] / len(train_loader))
+        seg_3_dice_coefs[epoch] = (total_segment_coefs[3] / len(train_loader))
+        seg_4_dice_coefs[epoch] = (total_segment_coefs[4] / len(train_loader))
+        seg_5_dice_coefs[epoch] = (total_segment_coefs[5] / len(train_loader))
+
+        print(f"Epoch {epoch + 1}, Training Overall Dice Coefficient: {running_dice / len(train_loader)}")
+        training_dice_coefs[epoch] = (running_dice / len(train_loader))
+
+    return (model, training_dice_coefs, seg_0_dice_coefs, seg_1_dice_coefs,
+             seg_2_dice_coefs, seg_3_dice_coefs, seg_4_dice_coefs, seg_5_dice_coefs)
+
+if not torch.cuda.is_available():
+    print("Warning CUDA not Found. Using CPU")
+
+# create model. 
+model = ImprovedUnet()
+
+print("> Start Training")
+
+start = time()
+
+train_set = Dataset(data=train_dict, transform=train_transforms)
+train_loader = DataLoader(train_set, batch_size=BATCH_SIZE)
+
+# train improved unet
+trained_model, training_dice_coefs, seg0, seg1, seg2, seg3, seg4, seg5 = train(model, train_loader, criterion = CRITERION,
+                                                            device=DEVICE, num_epochs=NUM_EPOCHS)
+
+end = time()
+
+elapsed_time = end - start
+print(f"> Training completed in {elapsed_time:.2f} seconds")
+
+epochs = range(1, NUM_EPOCHS + 1)
+
+# Plots the model's performance during the training.
+plt.plot(epochs, training_dice_coefs, label='Training Dice Coefficient')
+plt.plot(epochs, seg0, label='Segment 0 Dice Coefficient')
+plt.plot(epochs, seg1, label='Segment 1 Dice Coefficient')
+plt.plot(epochs, seg2, label='Segment 2 Dice Coefficient')
+plt.plot(epochs, seg3, label='Segment 3 Dice Coefficient')
+plt.plot(epochs, seg4, label='Segment 4 Dice Coefficient')
+plt.plot(epochs, seg5, label='Segment 5 Dice Coefficient')
+plt.title(f'Dice Coefficient Over Epochs for {str(CRITERION_NAME)}')
+plt.xlabel('Epochs')
+plt.ylabel('Loss')
+plt.legend()
+plt.grid(True)
+plt.savefig(f'unet_dice_coefs_over_epochs_{str(CRITERION_NAME)}.png')
+plt.close()
\ No newline at end of file
diff --git a/recognition/facebook_gnn_classification/README.md b/recognition/facebook_gnn_classification/README.md
new file mode 100644
index 000000000..698226e88
--- /dev/null
+++ b/recognition/facebook_gnn_classification/README.md
@@ -0,0 +1,135 @@
+# Facebook Page-Page Network Node Classification
+
+Name: Chen Yang
+
+ID: 47867637
+
+
+### Overview
+
+This project involves training a Graph Neural Network (GNN) for semi-supervised multi-class node classification using the Facebook Large Page-Page Network dataset. The dataset represents various categories of Facebook pages (e.g., government, politician, tvshow, company) connected based on mutual likes. The task is to classify each node (Facebook page) into one of the provided categories using Graph Attention Networks (GAT) layers.
+
+The model uses the Graph Attention Network (GAT) architecture and includes visualization of learned embeddings using UMAP.
+
+
+### Dataset
+
+The dataset is provided in the form of an partially processed .npz file containing:
+
+Edges: Connections between nodes representing mutual page likes.
+
+Features: 128-dimensional feature vectors extracted from page descriptions.
+
+Target Labels: Page types (e.g., government, company, tvshow, politician).
+
+### Model Architecture
+
+The GNN model uses four layers of Graph Attention Networks (GATConv) with multi-head attention, combined with ReLU activations and dropout to prevent overfitting. The model architecture is as follows:
+
+GATConv Layers:
+
+- conv1: 8 attention heads for initial node feature transformation.
+
+- conv2: 8 attention heads for further feature transformation.
+
+- conv3: 8 attention heads for deeper representation learning.
+
+- conv4: 1 attention head for the output layer.
+
+Mixed precision training with automatic mixed precision (AMP) was utilized to speed up training and reduce GPU memory usage, enabling the use of larger hidden dimensions.
+
+### Training Details
+
+Hidden Dimension: 512.
+
+Optimizer: AdamW with learning rate 0.0005 and weight decay 5e-4.
+
+Loss Function: CrossEntropyLoss.
+
+Epochs: 300.
+
+Mixed Precision Training: Enabled using PyTorch AMP for increased training efficiency.
+
+Training and Validation Loss
+
+The following plot shows the training and validation loss over 300 epochs:
+
+![Training and Validation Loss](graphs/training_validation_loss.png)
+
+From the training and validation loss plot, we can observe that the model's loss decreases rapidly in the initial few epochs, indicating that it is learning efficiently. Both training and validation losses converge to near zero as training proceeds, demonstrating effective learning with little overfitting. This steady decrease suggests that the model generalizes well to unseen data, which is an important indicator of good training performance.
+
+However, we can also notice that slight fluctuations towards the end of the training process, suggesting that the learning rate may need to be fine-tuned to prevent oscillations. Despite these minor fluctuations, the model overall performs well with low final loss values for both training and validation sets.
+
+### UMAP Embeddings Visualization
+To visualize how well the GNN captures the structure of the graph, **UMAP** was used to reduce the node embeddings to 2 dimensions. The following plot shows the reduced embeddings colored according to their ground truth labels:
+
+![UMAP Embeddings](graphs/umap_embeddings.png)
+
+The UMAP plot indicates that the model has successfully learned to separate the nodes based on their categories. Different colors represent different page types, and the clustering seen in the visualization suggests that nodes of similar types are grouped together, demonstrating good discriminative power of the learned embeddings. Specifically:
+
+Government (Red): This category is well-clustered and separated from other categories, indicating that the features used to represent government pages are distinct from other node types.
+Tvshows (Blue) and Companies (Green): These categories show some overlap, suggesting that certain features between tvshows and companies might be similar, which aligns with the possibility of common public engagement strategies used by both.
+Politicians (Orange): This group is relatively distinct but has some overlap with the government cluster, which is expected due to similarities in content between politicians and government pages.
+Overall, the UMAP visualization indicates that the embeddings learned by the GNN are effective in capturing the latent structure of the dataset, helping to group nodes of similar types together.
+
+### Prediction Accuracy
+The model achieved a prediction accuracy of 98.24% on the entire dataset, which shows that it was able to effectively classify nodes into their respective categories.
+
+### Sample Predictions
+Below are some sample predictions from the model:
+
+- Node 0: Predicted = `0` (True = `0`)
+- Node 1: Predicted = `2` (True = `2`)
+- Node 2: Predicted = `1` (True = `1`)
+- Node 3: Predicted = `2` (True = `2`)
+- Node 4: Predicted = `3` (True = `3`)
+
+
+### Project File Descriptions
+This project contains the following files:
+
+- train.py: The script used to train the GNN model. It defines the GNN model, sets up the training process, and saves the trained model weights. It also includes mixed precision training for efficiency.
+
+- predict.py: This script is used to load the trained GNN model and make predictions on the dataset. It evaluates the accuracy of the model and provides sample predictions.
+
+- modules.py: Defines the architecture of the GNN model. The model uses multiple GATConv layers with ReLU activation and dropout to effectively learn node representations.
+
+- dataset.py: Contains the code to load the dataset from an .npz file, which includes node features, edge connections, and labels. It transforms the raw data into tensors suitable for training the GNN.
+
+- README.md: This file provides an overview of the project, the dataset, training details, discussions and the results obtained.
+
+- graphs (directory): This folder contains the images generated during training and visualization, including:
+
+  -training_validation_loss.png: The plot showing training and validation loss over epochs.
+  -umap_embeddings.png: The UMAP plot showing the 2D representation of node embeddings learned by the model.
+
+### Conclusion
+The GNN model using GATConv layers was successfully trained on the Facebook Page-Page Network dataset to classify nodes based on their types. The key findings are as follows:
+
+- High Accuracy: The model achieved an impressive prediction accuracy of 98.24%, which suggests strong discriminative ability.
+- Training Convergence: The training and validation losses converged smoothly, indicating effective training. Only slight fluctuations in the losses towards the end of training suggest potential room for further learning rate optimization.
+- Embeddings Visualization: The UMAP visualization of node embeddings highlights well-separated clusters, demonstrating that the GNN effectively captures the underlying relationships between nodes, particularly between different page categories. This clustering indicates the robustness of the learned features in differentiating between node classes.
+- The combination of GAT layers, mixed precision training, and effective regularization techniques contributed to this successful result. The insights provided by UMAP confirm that the learned embeddings provide meaningful structure that is aligned with the graph's underlying properties.
+
+### Future Improvements
+Hyperparameter Tuning: Perform a grid search for optimal hyperparameters such as learning rate, dropout rate, and hidden dimension size.
+Regularization: Add L2 regularization to further reduce overfitting.
+Further Visualization: Visualize attention coefficients learned by GAT layers to gain insights into important node relationships.
+
+### Google Colab Usage Instructions
+This project was developed using Google Colab. If you are running the project in Colab, make sure to mount your Google Drive to access the dataset and save model files.
+
+from google.colab import drive
+drive.mount('/content/drive')
+
+Update all paths to include the correct directory, such as:
+
+base_path = "/content/drive/My Drive/COMP3710/Project"
+
+### Running Locally
+If you are running this project locally (i.e., not using Google Colab), you need to update the paths accordingly to your local directories. For example:
+
+base_path = "C:/path/to/your/project"
+
+### Note
+The images used in this README (training_validation_loss.png and umap_embeddings.png) are located in the graphs folder of the project directory. Make sure to check the graphs folder if the images do not show up directly.
diff --git a/recognition/facebook_gnn_classification/dataset.py b/recognition/facebook_gnn_classification/dataset.py
new file mode 100644
index 000000000..55257f3a7
--- /dev/null
+++ b/recognition/facebook_gnn_classification/dataset.py
@@ -0,0 +1,43 @@
+# dataset.py
+
+import os
+import sys
+import torch
+import numpy as np
+
+# Google Colab setup and data loading
+from google.colab import drive
+
+# Mount Google Drive
+drive.mount('/content/drive')
+
+# Define paths for this project
+base_path = "/content/drive/My Drive/COMP3710/Project"
+npz_path = os.path.join(base_path, "facebook.npz")
+
+# Add the project directory to the system path
+sys.path.append(base_path)
+
+# Data loading
+def load_data(npz_path):
+    # Load data from the .npz file
+    data = np.load(npz_path)
+
+    # Extract edges, features, and labels
+    edge_index = torch.tensor(data['edges'], dtype=torch.long).t().contiguous()
+    features = torch.tensor(data['features'], dtype=torch.float)
+    labels = torch.tensor(data['target'], dtype=torch.long)
+
+    # Create page type mapping based on unique labels
+    unique_labels = torch.unique(labels)
+    page_type_mapping = {int(label): idx for idx, label in enumerate(unique_labels)}
+
+    # Convert labels to match the numeric mapping
+    labels = torch.tensor([page_type_mapping[label.item()] for label in labels], dtype=torch.long)
+
+    return features, edge_index, labels, page_type_mapping
+
+# Test loading function
+if __name__ == "__main__":
+    features, edge_index, labels, page_type_mapping = load_data(npz_path)
+    print("Dataset loaded successfully.")
diff --git a/recognition/facebook_gnn_classification/graphs/training_validation_loss.png b/recognition/facebook_gnn_classification/graphs/training_validation_loss.png
new file mode 100644
index 000000000..561f8ceaa
Binary files /dev/null and b/recognition/facebook_gnn_classification/graphs/training_validation_loss.png differ
diff --git a/recognition/facebook_gnn_classification/graphs/umap_embeddings.png b/recognition/facebook_gnn_classification/graphs/umap_embeddings.png
new file mode 100644
index 000000000..987151fa9
Binary files /dev/null and b/recognition/facebook_gnn_classification/graphs/umap_embeddings.png differ
diff --git a/recognition/facebook_gnn_classification/modules.py b/recognition/facebook_gnn_classification/modules.py
new file mode 100644
index 000000000..8de63f382
--- /dev/null
+++ b/recognition/facebook_gnn_classification/modules.py
@@ -0,0 +1,37 @@
+# modules.py
+
+import torch
+import torch.nn.functional as F
+from torch_geometric.nn import GATConv
+
+class GNN(torch.nn.Module):
+    def __init__(self, input_dim, hidden_dim, output_dim):
+        super(GNN, self).__init__()
+        self.conv1 = GATConv(input_dim, hidden_dim, heads=8)  # First GAT layer with 8 attention heads
+        self.conv2 = GATConv(hidden_dim * 8, hidden_dim, heads=8)  # Second GAT layer with 8 attention heads
+        self.conv3 = GATConv(hidden_dim * 8, hidden_dim, heads=8)  # Third GAT layer with 8 attention heads
+        self.conv4 = GATConv(hidden_dim * 8, output_dim, heads=1)  # Output layer with single head
+
+    def forward(self, data, return_embeddings=False):
+        x, edge_index = data.x, data.edge_index
+
+        # First GAT Layer
+        x1 = F.relu(self.conv1(x, edge_index))
+        x1 = F.dropout(x1, p=0.5, training=self.training)
+
+        # Second GAT Layer with Residual Connection
+        x2 = F.relu(self.conv2(x1, edge_index))
+        x2 = x2 + x1  # Add residual connection
+        x2 = F.dropout(x2, p=0.5, training=self.training)
+
+        # Third GAT Layer with Residual Connection
+        x3 = F.relu(self.conv3(x2, edge_index))
+        x3 = x3 + x2  # Add residual connection
+        x3 = F.dropout(x3, p=0.5, training=self.training)
+
+        if return_embeddings:
+            return x3  # Return embeddings before the final layer
+
+        # Output Layer
+        x_out = self.conv4(x3, edge_index)
+        return F.log_softmax(x_out, dim=1)
\ No newline at end of file
diff --git a/recognition/facebook_gnn_classification/predict.py b/recognition/facebook_gnn_classification/predict.py
new file mode 100644
index 000000000..6b4060ea6
--- /dev/null
+++ b/recognition/facebook_gnn_classification/predict.py
@@ -0,0 +1,41 @@
+# predict.py
+
+import torch
+from torch_geometric.data import Data
+from modules import GNN
+from dataset import load_data, npz_path
+
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+
+# Load the dataset from the .npz file
+features, edge_index, labels, page_type_mapping = load_data(npz_path)
+data = Data(x=features, edge_index=edge_index, y=labels).to(device)
+
+# Define the GNN model with matching dimensions
+input_dim = features.shape[1]
+hidden_dim = 512
+output_dim = len(page_type_mapping)
+model = GNN(input_dim, hidden_dim, output_dim).to(device)
+
+# Load the trained model weights
+model_path = "/content/drive/My Drive/COMP3710/Project/gnn_model_weights.pth"
+model.load_state_dict(torch.load(model_path, map_location=device))
+model.eval()  # Set the model to evaluation mode
+print(f"Model loaded successfully from {model_path}")
+
+# Make predictions
+with torch.no_grad():  # No need to track gradients during inference
+    out = model(data)  # Get the model's output
+    predictions = out.argmax(dim=1)  # Get the class with the highest probability
+
+# Evaluate accuracy
+correct = (predictions == data.y).sum().item()
+accuracy = correct / data.num_nodes
+print(f"Prediction Accuracy: {accuracy:.4f}")
+
+# Example predictions
+print("Sample Predictions:")
+for i in range(5):
+    node_label = list(page_type_mapping.keys())[list(page_type_mapping.values()).index(predictions[i].item())]
+    true_label = list(page_type_mapping.keys())[list(page_type_mapping.values()).index(labels[i].item())]
+    print(f"Node {i}: Predicted = {node_label}, True = {true_label}")
\ No newline at end of file
diff --git a/recognition/facebook_gnn_classification/train.py b/recognition/facebook_gnn_classification/train.py
new file mode 100644
index 000000000..5859244f8
--- /dev/null
+++ b/recognition/facebook_gnn_classification/train.py
@@ -0,0 +1,114 @@
+# train.py
+
+
+import torch
+from torch.optim import AdamW
+from torch_geometric.data import Data
+from torch.cuda.amp import GradScaler, autocast
+from modules import GNN
+from dataset import load_data, npz_path
+
+
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+
+# Define paths
+npz_path = "/content/drive/My Drive/COMP3710/Project/facebook.npz"
+
+# Load the dataset
+features, edge_index, labels, page_type_mapping = load_data(npz_path)
+data = Data(x=features, edge_index=edge_index, y=labels).to(device)
+
+# Define the GNN model with updated architecture
+input_dim = features.shape[1]
+hidden_dim = 512
+output_dim = len(page_type_mapping)
+model = GNN(input_dim, hidden_dim, output_dim).to(device)
+
+# Split nodes into training and validation sets
+from sklearn.model_selection import train_test_split
+train_idx, val_idx = train_test_split(range(data.num_nodes), test_size=0.2, random_state=42)
+data.train_mask = torch.zeros(data.num_nodes, dtype=torch.bool)
+data.val_mask = torch.zeros(data.num_nodes, dtype=torch.bool)
+data.train_mask[train_idx] = True
+data.val_mask[val_idx] = True
+
+# Define optimizer and loss function
+optimizer = AdamW(model.parameters(), lr=0.0005, weight_decay=5e-4)
+criterion = torch.nn.CrossEntropyLoss()
+
+# Initialize GradScaler for mixed precision training
+scaler = GradScaler()
+
+# Training function with mixed precision
+def train():
+    model.train()
+    optimizer.zero_grad()
+    with autocast():
+        out = model(data)
+        loss = criterion(out[data.train_mask], data.y[data.train_mask])
+    scaler.scale(loss).backward()
+    scaler.step(optimizer)
+    scaler.update()
+    return loss.item()
+
+# Validation function
+def validate():
+    model.eval()
+    with torch.no_grad():
+        out = model(data)
+        val_loss = criterion(out[data.val_mask], data.y[data.val_mask])
+    return val_loss.item()
+
+# Train the model
+num_epochs = 300
+losses = []
+val_losses = []
+
+for epoch in range(num_epochs):
+    loss = train()
+    val_loss = validate()
+    losses.append(loss)
+    val_losses.append(val_loss)
+    
+    if epoch % 10 == 0:
+        print(f'Epoch {epoch}, Training Loss: {loss:.4f}, Validation Loss: {val_loss:.4f}')
+
+# Save the model after training
+model_path = "/content/drive/My Drive/COMP3710/Project/gnn_model_weights.pth"
+torch.save(model.state_dict(), model_path)
+print(f"Model saved to {model_path}")
+
+# Plotting the training and validation loss
+import matplotlib.pyplot as plt
+
+plt.plot(range(len(losses)), losses, label='Training Loss')
+plt.plot(range(len(val_losses)), val_losses, label='Validation Loss')
+plt.xlabel('Epoch')
+plt.ylabel('Loss')
+plt.title('Training and Validation Loss Over Epochs')
+plt.legend()
+plt.savefig("/content/drive/My Drive/COMP3710/Project/graphs/training_validation_loss.png")
+plt.show()
+
+# UMAP Visualization of Embeddings
+
+model.eval()
+with torch.no_grad():
+    embeddings = model(data, return_embeddings=True).cpu().numpy()
+
+# Apply UMAP to reduce embeddings to 2D for visualization
+umap_reducer = umap.UMAP(n_components=2, random_state=42)
+embeddings_2d = umap_reducer.fit_transform(embeddings)
+
+# Plot UMAP embeddings
+plt.figure(figsize=(10, 8))
+for label in range(len(page_type_mapping)):
+    idx = (labels.cpu().numpy() == label)
+    plt.scatter(embeddings_2d[idx, 0], embeddings_2d[idx, 1], label=str(label), alpha=0.6)
+
+plt.xlabel("UMAP Component 1")
+plt.ylabel("UMAP Component 2")
+plt.title("UMAP Plot of Node Embeddings with Ground Truth Labels")
+plt.legend()
+plt.savefig("/content/drive/My Drive/COMP3710/Project/graphs/umap_embeddings.png")
+plt.show()
\ No newline at end of file