Skip to content Skip to navigation
UW Biotechnology Center Logo
Mass Spectrometry Facility

Chem 638 – March 12, 2009– Database Exercise – Chemistry, room B225

A. C. Harms and M. M. Vestling


Goal:  To learn how to use several public protein databases to answer biological questions.


One way to identify a protein:

a) Digest the protein with trypsin.
b) Measure using mass spectrometry the masses of the peptides produced.
c) Compare the masses of the peptides produced experimentally to those in a database of the masses of the peptides expected from trypsin digestions of proteins.

The protein sequences found in databases are just strings of letters and by themselves do not reflect the post-translational modifications (PTM) found in most proteins.  This means that the peptides that match the theoretical are not modified.  Since most of the sequence of a protein is not modified, this matching of experiment with theory generally works out.


1.)  Find the peptide mass lists for 5 proteins:  1maldi, 2maldi, 3maldi, 4maldi, 5maldi.  Use both of the following computer programs to identify the proteins.  Make sure you use each program at least once.  Set the accuracy to 0.2 Da, and try using SwissProt as the database.  With larger databases, you may want to limit your searches to mammals.

http://prospector.ucsf.edu [MS-Fit]

http://www.matrixscience.com [Mascot then Peptide Mass Fingerprint]


2.a)  Find the largest unmodified tryptic peptide (no missed cleavages) in the protein human histone H2B.  (Hint, you can change the format to  FASTA instead of the default GenPept in the display option.  Be careful not to choose a fragment or a histone-like protein.)

http://ncbi.nlm.nih.gov

http://prospector.ucsf.edu [MS-Digest]

2.b)  Find the known post-translational modifications for human histone H2B.  The following database is at your fingertips:

http://www.expasy.uniprot.org


3.)  Phosphokinase A is itself phosphorylated.  One bit of phosphorylated sequence found in its tryptic digest is KGSEQESVK.  Use BLAST to see if other proteins contain this sequence.

http://www.ncbi.nlm.nih.gov/BLAST

[M. J. Chalmers, K. Hakansson, R. Johnson, R. Smith, J. Shen, M. R. Emmett, A. G. Marshall, Protein kinase A phosphorylation characterized by tandem Fourier transform ion cyclotron resonance mass spectrometry, Proteomics 4: 970-981 (2004).]


4.)  Each of the files below contains MS/MS data for a single peptide.  The proteins were digested using trypsin, and the cysteines are unmodified.  Use the following search engines to identify the protein each one came from.

http://www.matrixscience.com [Mascot the MS/MS Ion Search]

Mascot uses files that contain mass, intensity, and charge state information.  Use the files with .pkl or .mgf extensions for correct formatting. 

You will need a file that you can browse into MS/MS Ion Search.

http://prospector.ucsf.edu [MS-Tag]

Prospector uses lists that contain mass and charge state information.  Use the files with .txt extensions, and cut and paste the lists into the

application.  You will need to relax the Max% unmatched ions value to get good hits.


QTOF data from mouse

Suggested tolerances:  Parent/Peptide 0.2 Da.  MS/MS Fragment 0.2 Da.

mouse1.pkl or mouse1.txt  (your ability to identify this sequence might depend on your choice of database)

mouse2.pkl or mouse2.txt


Ion trap data from arabidopsis

Suggested tolerances:  Parent/Peptide 2.0 Da.  MS/MS Fragment 0.8 Da

traparab.mgf or traparab.txt


MALDI TOF/TOF data from E. coli

Suggested tolerances:  Parent/Peptide 0.1 Da.  MS/MS Fragment 0.3 Da.

toftofecoli.pkl  or  toftofecoli.txt