Assignment 2 MPI MSA
Assignment 2 MPI MSA
Instructions:
Objective:
The goal of this task is to familiarize students with SIMD (Single Instruction, Multiple Data)
programming by optimizing a computationally intensive task using AVX intrinsics (for x86
processors) or NEON intrinsics (for Apple Silicon). Students will compare the performance of
their SIMD-optimized implementation against a scalar (non-SIMD) implementation.
Problem Statement
You are given a computationally intensive task: matrix transposition with element-wise
multiplication. Your task is to implement this operation in two ways:
1. Scalar Implementation: A straightforward, non-SIMD implementation. We can try two
options using 2D array and 1D array.
2. SIMD-Optimized Implementation: An optimized version using AVX intrinsics (for x86) or
NEON intrinsics (for Apple Silicon).
Requirements
1. Scalar Implementation:
- Implement the matrix transposition and element-wise multiplication using scalar operations
(no SIMD).
- Ensure the implementation is correct and works for any square matrix of size `N x N`.
2. SIMD-Optimized Implementation:
- Use AVX intrinsics (for x86) or NEON intrinsics (for Apple Silicon) to optimize the
computation.
- Focus on optimizing both the matrix transposition and the element-wise multiplication.
- Ensure the implementation is correct and works for any square matrix of size `N x N`.
3. Performance Comparison:
- Measure the execution time of both implementations for different matrix sizes (e.g., `N =
256, 512, 1024`).
- Compare the performance of the SIMD-optimized implementation against the scalar
implementation.
4. Report:
- Provide a brief report explaining your approach to SIMD optimization.
- Include performance results (e.g., execution time, speedup achieved).
- Discuss any challenges you faced and how you addressed them.
Implementation Details
- Use single-precision floating-point numbers (`float`) for matrix elements.
- Assume `N` is a multiple of 8 (for AVX) or 4 (for NEON) to simplify alignment and
vectorization.
include <stdio.h>
include <stdlib.h>
include <time.h>
include <immintrin.h> // AVX intrinsics (x86)
// include <arm_neon.h> // NEON intrinsics (Apple Silicon)
define N 256
return 0;
}
Question#2
Objective:
Design and implement a parallel algorithm using the Message Passing Interface (MPI) to
perform large-scale multiple sequence alignment (MSA) needs distance matrix. The
distance matrices are used by MSA algorithms such as clustalw, clustal-omega, MAFTT for
multiple sequecne allignment. The focus should be on performance optimization, scalability,
and addressing the challenges of distributed computing in bioinformatics.
Problem Description:
Given a large collection of DNA sequences, perform multiple sequence alignment (MSA) to
identify regions of similarity that may indicate functional, structural, or evolutionary
relationships. Implement the progressive alignment method, which is widely used for MSA
due to its efficiency and scalability. A detailed explanation of sequence alignment and an
example using MAFFT is provided below.
Serial Algorithm Overview:
Multiple sequence alignment can be solved using various approaches. The assignment
requires implementing MAFFT, which follows these steps:
Steps Involved:
Sequence alignment is the process of arranging biological sequences (DNA, RNA, or protein)
to identify regions of similarity. The goal is to determine functional, structural, or evolutionary
relationships between the sequences.
Seq1: MKTLLILTCLVAVALARPKAQQL
Seq2: MKTVLILTCLVALAKPKAQQL
Seq3: MKTLLILACLVALARKAQQL
Seq4: MKTLLILTCLVALAKPQQL
● Each amino acid or nucleotide is converted into a numerical vector (e.g., using
physicochemical properties or binary encoding).
● Example encoding (simplified for demonstration):
○ A = (1,0,0)
○ C = (0,1,0)
○ G = (0,0,1)
For parallel execution, each process (MPI rank) can handle a subset of sequences.
Step 2: Apply FFT to Sequences (Parallelizable)
● (Seq1, Seq2)
● (Seq1, Seq3)
● (Seq1, Seq4)
● (Seq2, Seq3)
● (Seq2, Seq4)
● (Seq3, Seq4)
Steps Involved:
1. Initialization:
● Create a scoring matrix HHH with dimensions (n+1)×(m+1), where n and m are the
lengths of the two sequences.
● Initialize the first row and first column with gap penalties:
2. Scoring:
● Fill in the scoring matrix using the following recurrence relation:
where:
○ s(ai, bj) is the substitution score for aligning characters ai and bj.
○ Wg is the gap penalty.
3. Traceback:
● Start from the bottom-right of the matrix and backtrack to reconstruct the optimal
alignment.
● Move according to the highest scoring path:
○ Diagonal move → Match/Mismatch
○ Up move → Insertion (gap in sequence 2)
---- Seq1
----|
| ---- Seq2
----|
| ---- Seq4
—--------|
---- Seq3
Seq1: MKTLLILTCLVAVALARPKAQQL
Seq2: MKTVLILTCLVALA--KPKAQQL
Seq3: MKTLLILACLVALA--RKAQQL
Seq4: MKTLLILTCLVALA--KPQQL
Dataset to be used:
The following link contains a few benchmark dataset for this problem. Also the repository
contains a solution based on SIMD that you may explore for learning purposes.
https://github.com/TimoLassmann/kalign/blob/main/scripts/benchmark.org
You may deviate from this approach if you feel you have abetter idea.
Data Partitioning:
● Distribute the sequences among MPI processes for pairwise sequence alignment
computations.
● Construct the guide tree in a distributed manner to reduce memory overhead.
● Perform progressive alignment in a hierarchical manner across MPI ranks.
Communication:
Synchronization:
● Ensure all processes complete pairwise alignment before constructing the guide tree.
● Implement efficient synchronization during progressive alignment.
Computation:
● Each process calculates the alignment for its assigned sequence pairs.
● Construct and store sub-alignments progressively
Result Aggregation:
● Gather and merge the partial alignments from all MPI processes into a single process
for final output.
Implementation Requirements:
Report Requirements:
Evaluation Criteria:
Constraints: