Bioinformatics & Molecular Modelling Assignment -
Assessment 1 -
For all questions illustrate your answers fully, describing what you did at every step and providing output illustrating what output was obtained.
Q1. Adrenaline is a hormone released by the adrenal glands as part of a classical "flight or fight" response to a sudden change in physiological demand, such as sprinting. It acts on a family of receptor proteins present on tissues throughout the body and evokes a variety of cellular responses, often involving increased metabolic activity.
a) Use the NCBI or EBI portals to retrieve one file for each of the several different forms of adrenergic receptor found in humans. Each file should contain the complete mRNA sequence (it is not necessary to print the sequence out).
b) Compile a table, similar to the one from tutorial 1 (page 6), comparing the sequence elements of the mRNA of each different adrenergic receptor that you find. Include extra columns indicating the length of the protein and the receptor sub-type. Comment on your findings.
c) Retrieve files for three genes of the adrenergic receptor types you retrieved in part (a) and compare the structure of the genes (exon / intron profile). Comment on your findings.
For all parts describe how you obtained your data by stating the bioinformatics portal used and the search strategy. Accession numbers of all sequence files must be given. Any references used should be cited in your answer. Expect to retrieve about a dozen files in total.
Q2. The protein Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is an enzyme that plays a major role in glycolysis and nuclear functions. Here we aim to learn about the relationship between the human protein sequence and a number of other species, as well as aiming to show that those conserved part of the protein sequence are important for function.
(i) Using UniProt locate the sequence for the full-length human GAPDH sequence. Then run a BLAST search for this sequence at the NCBI site (not from UniProt) against the Swiss-Prot protein database and identify 7 other different species of GAPDH with close similarity to the human sequence (make sure they are full length sequences).
(ii) Give the Accession number for each protein sequence identified, together with the species. Give the percentage identity for each of the 7 sequences with that of the human sequence. State E values and the length of each sequence.
(iii) For all 8 sequences run a multiple sequence alignment using program Clustal Omega and show the alignment generated.
(iv) Discuss the conserved regions that you observe within the 8 aligned sequences and relate these regions within the protein with their functional role. Make reference to the source of your information.
(v) Display both the cladogram and phylogram trees obtained for the aligned sequences. Briefly discuss the evolutionary relationship between the 8 species as indicated by the phylogram and cladograms. Which species is the closest relation to the human species?
Q3. Detecting remote homologs with BLAST and PSI-BLAST.
The NCBI website gives the option to run both BLAST and PSI-BLAST for a query protein sequence. For this question you need to use the NCBI website to run both BLAST and PSI-BLAST.
3-oxoacyl-[acyl-carrier-protein] synthase 1 is a 406 amino acid protein (UniProt accession number P0A953) and 3-ketoacyl-CoA thiolase is a 386 amino acid protein (UniProt accession number Q15ZF4). These two proteins have similar 3-dimensional structures and are both in the Thiolase-like SCOP superfamily. However they are remote homologs having a low sequence percentage identity.
Perform a protein-protein BLAST search at the NCBI website using the sequence for the 3-oxoacyl-[acyl-carrier-protein] synthase 1 (UniProt accession number P0A953) searching against the Swiss-Prot Database. Search the results for 3-ketoacyl-CoA thiolase (Swiss-Prot accession number Q15ZF4). Now repeat using PSI-BLAST and compare your results from those obtained from protein-protein BLAST.
Discuss what you observe from the BLAST and PSI-BLAST searches. Discuss which of the two search methods proved most effective and why. Include output generated as appropriate to illustrate your answer, including the pairwise alignment(s) obtained between the two sequences. Comment on the sequence similarities shown for the alignments generated.
Assessment 2 - Data Analysis
For all questions illustrate your answers fully, describing what you did at every step and providing input/output illustrating what input/output was obtained.
Q1. Raloxifene is a drug used to lower the risk of breast cancer in women who are at a high or moderate risk of developing it and who have been through the menopause. In addition, Raloxifene can also be used to prevent and treat bone thinning (osteoporosis) in post-menopausal women. Raloxifene acts as a selective oestrogen receptor modulator (SERM) and acts like oestrogen by targeting the estrogen-receptor. In breast cancer cells the drug binds to the estrogen-receptor preventing the binding of estrogen which would ordinarily stimulate the cell to divide and grow.
- Locate within the Protein Data Bank (PDB) the 3-D structure of a complex between Raloxifene and the estrogen-receptor. State what your chosen entry is, and download the coordinates for the structure to use with Rasmol to investigate your chosen structure.
- Using the program rasmol or Swiss-PDB Viewer produce an image of the protein that you think clearly illustrates the major structural features within the enzyme. State the commands used within the selected program to obtain your image.
- Using tools within the PDB investigate the interactions between the drug and its receptor. Illustrate with images the different types of interactions that exist and give details of these interactions in your discussion.
- Discuss which types of bioinformatics tools and programs could be used to design new potentially improved inhibitors for the estrogen-receptor.
Q2. Cytochrome P450s are a family of proteins involved in phase I drug metabolism reactions. They are highly expressed in the liver, in the endoplasmic reticulum membrane. In this question you will explore the use of protein-protein interaction databases to find out what other proteins P450s interact with and whether the potential partnerships could have biological significance.
- Use the UniProt file for human cytochrome P450 2E1 as your starting point. Summarise the key structural features of P450 2E1 including how it is able to bind to the ER membrane, and structural features of the active site.
- Use a range of PPI databases to identify possible protein partners. Summarise your findings.
- From your searches select three proteins with different activities that interact with P450 2E1, describe the evidence for the interaction and discuss whether these interactions could be relevant to P450's ability to metabolise drugs, in particular ethanol and paracetamol (acetaminophen). Wherever possible select proteins for which there is experimental evidence for the interaction.
Q3. Using the human sequence for Rhodopsin from Uniprot determine the domain(s) present within this protein sequence, using the Pfam domain database. State the domain(s) and the amino acid range for each domain.
Run homology modelling for this sequence using SWISS-MODEL to obtain a 3-dimensional structure for this sequence.
DISCUSS, in detail, the results of the modelling that you obtain, including an in-depth discussion of the model obtained, the template used by the program, and all of the key features of the model and its quality from the output generated.
Download the coordinates of the model obtained, as a protein databank (*.pdb) file, and create an image of your modelled structure using rasmol or Swiss-PdbViewer which clearly shows the main features of the model.
Q4. Micro-RNAs are known to target selected mRNAs as part of their mode of action. In this question you will explore the interaction between an miRNA and the mRNA of beta-site APP cleaving enzyme (BACE), and amyloid plaque protein (APP).
- Retrieve the sequences of human, mouse, dog and cow BACE1 and APP mRNAs. Align the 3' UTRs of each set of four mRNAs and identify on your output conserved sequence elements.
- Retrieve the file containing the sequence of human miR-15a from the miRNA database. Run the complete sequence of the miRNA on UNAfold and show the predicted structure of the RNA. Calculate the folding energy per base and comment on your findings.
- Using the mature sequence of miR-15a (this is given in the miRBase file) identify potential binding sites on the 3' UTRs of human APP and BACE1 mRNAs. Assume that the miRNAs will bind to complementary sequences in the mRNAs, but not necessarily with complete complementarity. You will have to use alignment software to map complementary regions. Describe the procedure you followed, discuss the output with reference to a diagram of the alignment.
- On the basis of your models how would you expect the miR-15a to affect expression of amyloid plaque protein?