Thank you for participating in this year’s Proteome Informatics Research
Group (iPRG) study. This letter provides the instructions needed to access
the data files, complete your analysis, and submit your results. The deadline has been extended and results
returned by Monday, January 16, 2017 will be included in the iPRG
presentation at the next Annual ABRF Meeting, March 25-28, 2017.
News (April 2018): The answer key to this study is now available (see bottom of this page). For refernce, the other pages are kept as when the study was conducted.
This study of bottom-up proteomics LC-MS/MS data analysis focuses on the identification and false-discovery rate (FDR) estimation of proteins, or more specifically, proteoforms (Smith et. al. 2013). In this study, we have acquired data from four samples prepared by spiking different combination of partially overlapping oligopeptides recombinantly expressed in the bacterium Escherichia coli (Figure 1 in the instruction file) into a common background. These oligos are here referred to as Protein Epitope Signature Tags (PrESTs, Figure 2 in the instruction file) and mimic protein homologs for the purpose of this study. Three technical replicate runs of each sample were acquired in random order. The goal of the study is to compare methods for inferring and estimating the confidence of proteoform assignments in each of the samples. The participants are free to use any peptide and protein identification software, or a combination of several search engines. The participants are also free to use MS1 or MS2 data, or both.
Raw data is provided along with a FASTA sequence database that should be used without modification. The database contains 5,592 background proteins. However, only results on the PrESTs should be reported. These are the sequences with names beginning with 'HPRR' followed by a unique number.
To evaluate the submissions in this and future studies, and to enable the participants themselves to compare their methods and results, an alternative, open notebook-style submission and evaluation system is being introduced in this study. As this is a novelty for 2016, we will also allow uploading of results in the form of plain data table as in previous studies. As part of the study data package, we also provide templates for R Markdown and an IPython notebook, defining the starting point and output data matrix to ensure all participants start from the same data and report using the same format. The participants are then free to insert their database search results, along with R, Java or Python scripts, to further analyze the data and visualize the results. We hope that this new submission format will be more transparent and facilitate sharing of methods for analysis and visualization, thereby extending the life of the study. These will be validated during submission to ensure they conform to the submission template.
The study package can be downloaded from here. Unboxing
the study package, you will find:
1 copy of these instructions
12 raw Q Exactive LC-MS/MS datasets
1 FASTA file to be used in this study
1 R Markdown containing one example solution
1 IPython notebook containing another example solution
1 example tab-separated data table containing results
1 Allen key
Please send questions to
here. All identifying
information will be removed prior to forwarding the question to the iPRG
group members. For details, please refer to this PDF.
We thank you for your support of the ABRF and look forward to receiving
your results for the study.
Magnus Palmblad (Chair) - Leiden University Medical Center, Netherlands
Henry Lam (Co-Chair) - Hong Kong University of Science and Technology
Michael Hoopmann - Institute for Systems Biology, Seattle, WA
Susan T. Weintraub - University of Texas Health Science Center at San Antonio, TX
Hyungwon Choi - National University of Singapore
Samuel Payne - Pacific Northwest National Laboratory, Richland, WA
Lukas Käll - KTH - Royal Institute of Technology, Stockholm, Sweden
Darryl Davis - Janssen Pharmaceuticals, Horsham, PA
Yasset Perez-Riverol - European Bioinformatics Institute, Hinxton, UK
Christopher Colangelo (EB Liaison) - Primary Ion, Old Lyme, CT
Answer key (revealed April 10, 2018): PrEST pool A
(192 sequences) PrEST pool B