Skip to content Skip to navigation
UW Biotechnology Center Logo
Mass Spectrometry Facility

Introduction to Proteomics – Aug 6, 2010– Database Exercise

A. C. Harms and M. M. Vestling


Goal:  To learn how to use several public protein databases to answer biological questions.


One way to identify a protein:

a) Digest the protein with trypsin.
b) Measure using mass spectrometry the masses of the peptides produced.
c) Compare the masses of the peptides produced experimentally to those in a database of the masses of the peptides expected from trypsin digestions of proteins.

The protein sequences found in databases are just strings of letters and by themselves do not reflect the post-translational modifications (PTM) found in most proteins.  This means that the peptides that match the theoretical are not modified.  Since most of the sequence of a protein is not modified, this matching of experiment with theory generally works out.


1.)  Find the peptide mass lists for 5 proteins:  1maldi, 2maldi, 3maldi, 4maldi, 5maldi.  Use both of the following computer programs to identify the proteins.  Make sure you use each program at least once.  Set the accuracy to 0.2 Da, and try using SwissProt as the database.  With larger databases, you may want to limit your searches to mammals.

http://prospector.ucsf.edu [MS-Fit]

http://www.matrixscience.com [Mascot then Peptide Mass Fingerprint]


2.)  Find the elemental formula for the following peptide: KGSEQESVK

http://prospector.ucsf.edu [MS-Product]

Mass Spectrometry deals with numbers, so it is important to have a way to translate letters to numbers at your computer.  Most instruments have such translators, but one is not always at one's instrument.


3.)  Each of the files below contains MS/MS data for a single peptide.  The proteins were digested using trypsin, and the cysteines are unmodified.  Use the following search engines to identify the protein each one came from.

http://www.matrixscience.com [Mascot the MS/MS Ion Search]

Mascot uses files that contain mass, intensity, and charge state information.  Use the files with .pkl or .mgf extensions for correct formatting. 

You will need a file that you can browse into MS/MS Ion Search.

http://prospector.ucsf.edu [MS-Tag]

Prospector uses lists that contain mass and charge state information.  Use the files with .txt extensions, and cut and paste the lists into the

application.  You will need to relax the Max% unmatched ions value to get good hits.


Ion trap data from arabidopsis

Suggested tolerances:  Parent/Peptide 2.0 Da.  MS/MS Fragment 0.8 Da

traparab.mgf or traparab.txt


MALDI TOF/TOF data from E. coli

Suggested tolerances:  Parent/Peptide 0.1 Da.  MS/MS Fragment 0.3 Da.

toftofecoli.pkl  or  toftofecoli.txt

 

4) The MALDI-TOF-TOF data you received from your in-gel digest lab was collected by first collecting a MALDI mass spectrum and using this to generate a list of peptide masses.  The instrument then collected MS/MS data on peptides in this list.  The data was combined into a single file formatted for searching using Mascot.

The pdf file shows the MS data of the MALDI spot.  Did you get a signal in the e4 range?  Lower intensity might not be sufficient.

Look at your text file (Mascot generic format).  How many peptides were selected for MS/MS? What masses were they?

Use Mascot to search your data inputting the enzyme you used and the cysteine modification you expect based on the procedure you used. The charge state is 1+ for MALDI data. Suggested tolerances 0.2 for both precursor and product ions. Try swissprot with your organism first for fastest searching.