PyProphet [1] is a reimplementation of the mProphet [2] algorithm for targeted proteomics. It is particularly optimized for analysis of large scale data sets generated by OpenSWATH or DIANA.

This description represents the new SQLite-based workflow that is currently in development. This version includes the IPF [3] and large-scale data set optimizations [4]. You can alternatively follow the instructions for the PyProphet Legacy Workflow.


As the new workflow is still in development, ensure that all data is processed by the latest OpenMS/develop and PyProphet versions.

Contact and Support

We provide support for PyProphet on the GitHub repository.

You can contact the authors Uwe Schmitt, Johan Teleman, Hannes Röst and George Rosenberger.



Generate OSW output files according to section Integrated OpenSWATH Workflow. PyProphet is then applied to the one or several such SQLite-based reports. Several different commands can be run to consecutively to do the analysis:

pyprophet --help
pyprophet merge --help

This command provides an overview of all available commands to manipulate OSW input files. Further instructions are available for the individual commands.

pyprophet merge --out=merged.osw \
--subsample_ratio=1 *.osw

In most scenarios, more than a single DIA / SWATH-MS run was acquired and the samples should be compared qualitatively and/or quantitatively with the OpenSWATH workflow. After individual processing with OpenSWATH and the identical spectral library, the files can be merged by PyProphet.

This command will merge and optionally subsample multiple files. Please note that the experiment-wide context on peptide query-level is applied to merged files, whereas the run-specific context is used with separate OSW files [4].

If semi-supervised learning is too slow, or the run-specific context is required, create an additional merged file with a smaller subsample_ratio. The model will be stored in the output and can be applied to the full file(s).


pyprophet score --in=merged.osw --level=ms2

The main command will conduct semi-supervised learning and error-rate estimation in a fully automated fashion. --help will show the full selection of parameters to adjust the process. The default parameters are recommended for SCIEX TripleTOF 5600/6600 instrument data, but can be adjusted in other scenarios.

When using the IPF extension, the parameter --level can be set to ms2, ms1 or transition. If MS1 or transition-level data should be scored, the command is executed three times, e.g.:

pyprophet score --in=merged.osw --level=ms1 \
score --in=merged.osw --level=ms2 \
score --in=merged.osw --level=transition

The scoring steps on MS1 and transition-level have some dependencies on the MS2 peak group signals. The parameter --ipf_max_peakgroup_rank specifies how many peak group candidates should be assessed in IPF. For example, if this parameter is set to 1, only the top scoring peak group will be investigated. In some scenarios, a set of peptide query parameters might detect several peak groups of different peptidoforms that should be independently identified. If the parameter is set to 3, the top 3 peak groups are investigated. Note that for higher values (or very generic applications), it might be a better option to disable the PyProphet assumption of a single best peak group per peptide query. This can be conducted by setting --group_id to feature_id and will change the assumption that all high scoring peak groups are potential peptide signals.

Importantly, PyProphet will store all results in the input OSW files. This can be changed by specifying --out. However, since all steps are non-destructive, this is not necessary.


If IPF should be applied after scoring, the following command can be used:

pyprophet ipf --in=merged.osw

To adjust the IPF-specific parameters, please consult pyprophet ipf --help. If MS1 or MS2 precursor data should not be used, e.g. due to poor instrument performance, this can be disabled by setting --no-ipf_ms1_scoring and --no-ipf_ms2_scoring. The experimental setting --ipf_grouped_fdr can be used in case of extremly heterogeneous spectral library, e.g. containing mostly unmodified peptides that are mainly detect and peptidoforms with various potential site-localizations, which are mostly not detectable. This parameter will estimate the FDR independently group according to number of site-localizations.

Several thresholds (–ipf_max_precursor_pep,`–ipf_max_peakgroup_pep`,` –ipf_max_precursor_peakgroup_pep`,`–ipf_max_transition_pep`) are defined for IPF to exclude very poor signals. When disabled, the error model still works, but sensitivity is reduced. Tweaking of these parameters should only be conducted with a reference data set.

Contexts & FDR

To conduct peptide inference in run-specific, experiment-wide and global contexts, the following command can be applied:

pyprophet peptide --in=merged.osw --context=run-specific \
peptide --in=merged.osw --context=experiment-wide \
peptide --in=merged.osw --context=global

This will generate individual PDF reports and store the scores in a non-redundant fashion in the OSW file.

Analogously, this can be conducted on protein-level as well:

pyprophet protein --in=merged.osw --context=run-specific \
protein --in=merged.osw --context=experiment-wide \
protein --in=merged.osw --context=global


Finally, we can export the results to legacy OpenSWATH TSV report:

pyprophet export --in=merged.osw --out=legacy.tsv

By default, both peptide- and transition-level quantification is reported, which is necessary for requantification or SWATH2stats. If peptide and protein inference in the global context was conducted, the results will be filtered to 1% FDR by default. Further details can be found by pyprophet export --help.


By default, IPF results will be used if available. This can be disabled by setting --no-ipf. The IPF results require different properties for TRIC. Please ensure that you want to analyze the results in the context of IPF, else, use the --no-ipf settings.


[1]Teleman J, Röst HL, Rosenberger G, Schmitt U, Malmström L, Malmström J, Levander F. DIANA–algorithmic improvements for analysis of data-independent acquisition MS data. Bioinformatics. 2015 Feb 15;31(4):555-62. doi: 10.1093/bioinformatics/btu686. Epub 2014 Oct 27. PMID: 25348213
[2]Reiter L, Rinner O, Picotti P, Hüttenhain R, Beck M, Brusniak MY, Hengartner MO, Aebersold R. mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat Methods. 2011 May;8(5):430-5. doi: 10.1038/nmeth.1584. Epub 2011 Mar 20. PMID: 21423193
[3]Rosenberger G, Liu Y, Röst HL, Ludwig C, Buil A, Bensimon A, Soste M, Spector TD, Dermitzakis ET, Collins BC, Malmström L, Aebersold R. Inference and quantification of peptidoforms in large sample cohorts by SWATH-MS. Nat Biotechnol. 2017 Aug;35(8):781-788. doi: 10.1038/nbt.3908. Epub 2017 Jun 12. PMID: 28604659
[4](1, 2) Rosenberger G, Bludau I, Schmitt U, Heusel M, Hunter CL, Liu Y, MacCoss MJ, MacLean BX, Nesvizhskii AI, Pedrioli PGA, Reiter L, Röst HL, Tate S, Ting YS, Collins BC, Aebersold R. Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses. Nat Methods. 2017 Sep;14(9):921-927. doi: 10.1038/nmeth.4398. Epub 2017 Aug 21. PMID: 28825704