Problem 1
Your task is to implement an algorithm for pairwise alignment that computes, for two given protein sequences X and Y, a highest- scoring alignment between X and any substring of Y. The scoring model is given by a substitution matrix s (we use the BLOSUM50 matrix) and linear gap penalties with parameter d (we use d=8). You can use the implementation of the Needleman-Wunsch algorithm provided below as a starting point for your implementation. Your specific tasks are:
Implement an algorithm that computes an alignment of highest score among all alignments between X and any substring of Y, using the substitution matrix BLOSUM50 and linear gap penalties with parameter d.
Modify the program so that, in addition to a highest-scoring alignment between X and any substring of Y, it outputs the number of different such alignments. The program should output "There are x optimal alignments." where x is the number of different optimal alignments between X and a substring of Y.
Submit your solution as a java file called Alignment.java. (You can submit additional files if necessary.) You can combine the solutions to both question parts into a single method that computes an optimal alignment and outputs the additional information required by (b), but you can also write separate methods if you prefer.
As a starting point, you are free to use the following implementation of the Needleman-Wunsch algorithm with linear gap penalties in Java:
• Alignment.java (Needleman-Wunsch algorithm for global alignment)
This file Alignment.java contains a main method and several other useful methods. The main method defines two sequences X and Y. It then creates an Alignment object that stores these two sequences as Strings X and Y. Then it calls the computeAlignment method on this object; the Needleman-Wunsch algorithm is implemented in that method. It stores the computed alignment in the member variables Xa and Ya of the Alignment object. In the end, the main method displays the alignment and outputs its score.
Some auxiliary methods and variables that are implemented in Alignment.java are:
• int s(char x, char y): computes the BLOSUM50 score for aligning x with y. Note that the parameters x and y must be valid characters from the BLOSUM50 matrix (A, R, N, etc.; see slide 39 of part 1 of the slides).
• final static int d = 8; gap penalty parameter for linear gap penalties
• static int gamma(int g): calculate gap penalty for gap of length g (using linear gap penalties with parameter d)
• int scoreAlignment(): scores an alignment using the BLOSUM50 matrix and the gap penalty function gamma
Problem 2 Sorting by Reversals - Unsigned Model
Implement an algorithm for Sorting by Reversals (in the unsigned model of unichromosomal genomes) in a Java program. (If you want to use another programming language, please discuss it with the module convenor.) For any given permutation p of the numbers from 1 to n (where n is an arbitrary positive integer), the program should find a sequence of reversals that sorts p. You can use ideas from the algorithms discussed in the lectures (the greedy algorithm, the 4-approximation algorithm, or the 2-approximation algorithm) or design your own method. Your solution will be mainly marked based on the quality of the solutions produced (i.e. how many reversals does your program need to sort a given permutation), but also based on the general quality of your implementation (e.g. does your program work correctly in all cases, is the time that your program needs to find the solution appropriate for the quality of the solution, and is your implementation well structured and is your code clear and easy to read).
If you implement two or more algorithms, you can submit all of them (and you will receive at least the marks that the best of your algorithms would receive by itself).
Your program should output the original permutation as well as the permutation obtained after each individual reversal step. (This means that if your algorithm uses k steps to sort the permutation, it will output k+1 permutations: the initial permutation, and a further permutation after each of the k steps.) A permutation should be output in a single line, with genes separated by spaces. At the end, the algorithm should output "Finished after x reversals.", where x is to be replaced by the number of reversals used.