PyProphet Legacy Workflow

Overview

PyProphet [1] is a reimplementation of the mProphet [2] algorithm for targeted proteomics. It is particularly optimized for analysis of large scale data sets generated by OpenSWATH or DIANA.

This description represents the legacy workflow using TSV file formats.

Contact and Support

We provide support for PyProphet on the GitHub repository.

You can contact the authors Uwe Schmitt, Johan Teleman, Hannes Röst and George Rosenberger.

Installation

The PyProphet legacy workflow is distributed by two modules:

PyProphet

PyProphet is the main Python package.

Currently PyProphet requires Python 2.7 and several dependencies. Windows users should install Anaconda, Mac and Linux users should be able to install PyProphet directly from PyPI:

pip install git+https://github.com/PyProphet/pyprophet.git@legacy

PyProphet-cli aka Jumbo-PyProphet

To deal with larger data sets and to provide error rate control on the level of peptide sequences and proteins for different contexts (run-specific, experiment-wide and global), an extension of PyProphet is in development [4]. It is optimized to analyse hundreds of runs simultaneously and builds on IBM LSF or OpenLava workflow managers, but the steps can also be executed independently. It can be installed from PyPI:

pip install git+https://github.com/PyProphet/pyprophet.git@legacy
pip install git+https://github.com/PyProphet/pyprophet-cli.git@legacy
pip install git+https://github.com/PyProphet/pyprophet-brutus-driver.git

PyProphet-cli can be adapted to other workflow managers by development of lightweight modules replacing pyprophet-brutus.

Tutorial

PyProphet

An extended tutorial describing a complete OpenSWATH analysis workflow including PyProphet was recently published [3] and is also available from bioRxiv.

PyProphet-cli aka Jumbo-PyProphet

If the three modules have been properly configured, PyProphet jobs can be submitted using the following command:

pyprophet-cli run_on_brutus \
--data-folder="/tmp/openswath_results/" \
--data-filename-pattern="openswath_output_*.tsv" --sample-factor=0.1 --job-count=10 \
--extra-args-prepare --extra-group-column=ProteinName \
--extra-args-score --lambda=0.8

The example works as following:

  • –data-folder: /tmp/openswath_results/ contains 10 files, openswath_output_0.tsv - openswath_output_9.tsv.

  • –data-filename-pattern: This regular expression is used to grab the correct files.

  • –sample-factor: This value can be anything from 0 - 1. We recommend to use 1/(#runs), here 1/10=0.1.

  • –job-count: Specifies the number of parallel jobs to submit.

  • –extra-args-prepare –extra-group-column=ProteinName: Also compute protein-level q-values

  • –extra-args-score –lambda=0.8: Set lambda to 0.8 for q-value estimation.

There are further parameters that can be set, please refer to:

pyprophet-cli --help

Alternatively, if pyprophet-brutus-driver is not available or for integration with other workflow managers, it is also possible to execute all steps independently. In the following example, 3 example runs are used:

  1. Prepare data

pyprophet-cli prepare --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder=/tmp/pyprophet_work/ --separator="tab" --extra-group-column="ProteinName"
  1. Subsample

pyprophet-cli subsample --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --separator="tab" --job-number 1 --job-count 3 --sample-factor=0.4 &
pyprophet-cli subsample --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --separator="tab" --job-number 2 --job-count 3 --sample-factor=0.4 &
pyprophet-cli subsample --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --separator="tab" --job-number 3 --job-count 3 --sample-factor=0.4 &
  1. Semi-supervised learning

pyprophet-cli learn --work-folder="/tmp/pyprophet_work/" --separator="tab" --ignore-invalid-scores
  1. Scoring

pyprophet-cli apply_weights --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --separator="tab" --job-number 1 --job-count 3 &
pyprophet-cli apply_weights --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --separator="tab" --job-number 2 --job-count 3 &
pyprophet-cli apply_weights --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --separator="tab" --job-number 3 --job-count 3 &
  1. Statistical validation

  • Run-specific context

pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_run_specific" --separator="tab" \
--job-number 1 --job-count 3 --lambda=0.4 --statistics-mode=run-specific --overwrite-results &
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_run_specific" --separator="tab" \
--job-number 2 --job-count 3 --lambda=0.4 --statistics-mode=run-specific --overwrite-results &
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_run_specific" --separator="tab" \
--job-number 3 --job-count 3 --lambda=0.4 --statistics-mode=run-specific --overwrite-results &
  • Experiment-wide context

pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_experiment_wide" --separator="tab" \
--job-number 1 --job-count 3 --lambda=0.4 --statistics-mode=experiment-wide &
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_experiment_wide" --separator="tab" \
--job-number 2 --job-count 3 --lambda=0.4 --statistics-mode=experiment-wide &
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_experiment_wide" --separator="tab" \
--job-number 3 --job-count 3 --lambda=0.4 --statistics-mode=experiment-wide &
  • Global context

pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_global" --separator="tab" \
--job-number 1 --job-count 3 --lambda=0.4 --statistics-mode=global &
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_global" --separator="tab" \
--job-number 2 --job-count 3 --lambda=0.4 --statistics-mode=global --overwrite-results &
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_global" --separator="tab" \
--job-number 3 --job-count 3 --lambda=0.4 --statistics-mode=global --overwrite-results &

References