PyProphet Legacy Workflow
Overview
PyProphet [1] is a reimplementation of the mProphet [2] algorithm for targeted proteomics. It is particularly optimized for analysis of large scale data sets generated by OpenSWATH or DIANA.
This description represents the legacy workflow using TSV file formats.
Contact and Support
We provide support for PyProphet on the GitHub repository.
You can contact the authors Uwe Schmitt, Johan Teleman, Hannes Röst and George Rosenberger.
Installation
The PyProphet legacy workflow is distributed by two modules:
PyProphet
PyProphet is the main Python package.
Currently PyProphet requires Python 2.7 and several dependencies. Windows users should install Anaconda, Mac and Linux users should be able to install PyProphet directly from PyPI:
pip install git+https://github.com/PyProphet/pyprophet.git@legacy
PyProphet-cli aka Jumbo-PyProphet
To deal with larger data sets and to provide error rate control on the level of peptide sequences and proteins for different contexts (run-specific, experiment-wide and global), an extension of PyProphet is in development [4]. It is optimized to analyse hundreds of runs simultaneously and builds on IBM LSF or OpenLava workflow managers, but the steps can also be executed independently. It can be installed from PyPI:
pip install git+https://github.com/PyProphet/pyprophet.git@legacy
pip install git+https://github.com/PyProphet/pyprophet-cli.git@legacy
pip install git+https://github.com/PyProphet/pyprophet-brutus-driver.git
PyProphet-cli can be adapted to other workflow managers by development of lightweight modules replacing pyprophet-brutus.
Tutorial
PyProphet
An extended tutorial describing a complete OpenSWATH analysis workflow including PyProphet was recently published [3] and is also available from bioRxiv.
PyProphet-cli aka Jumbo-PyProphet
If the three modules have been properly configured, PyProphet jobs can be submitted using the following command:
pyprophet-cli run_on_brutus \
--data-folder="/tmp/openswath_results/" \
--data-filename-pattern="openswath_output_*.tsv" --sample-factor=0.1 --job-count=10 \
--extra-args-prepare --extra-group-column=ProteinName \
--extra-args-score --lambda=0.8
The example works as following:
–data-folder: /tmp/openswath_results/ contains 10 files, openswath_output_0.tsv - openswath_output_9.tsv.
–data-filename-pattern: This regular expression is used to grab the correct files.
–sample-factor: This value can be anything from 0 - 1. We recommend to use 1/(#runs), here 1/10=0.1.
–job-count: Specifies the number of parallel jobs to submit.
–extra-args-prepare –extra-group-column=ProteinName: Also compute protein-level q-values
–extra-args-score –lambda=0.8: Set lambda to 0.8 for q-value estimation.
There are further parameters that can be set, please refer to:
pyprophet-cli --help
Alternatively, if pyprophet-brutus-driver is not available or for integration with other workflow managers, it is also possible to execute all steps independently. In the following example, 3 example runs are used:
Prepare data
pyprophet-cli prepare --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder=/tmp/pyprophet_work/ --separator="tab" --extra-group-column="ProteinName"
Subsample
pyprophet-cli subsample --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --separator="tab" --job-number 1 --job-count 3 --sample-factor=0.4 &
pyprophet-cli subsample --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --separator="tab" --job-number 2 --job-count 3 --sample-factor=0.4 &
pyprophet-cli subsample --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --separator="tab" --job-number 3 --job-count 3 --sample-factor=0.4 &
Semi-supervised learning
pyprophet-cli learn --work-folder="/tmp/pyprophet_work/" --separator="tab" --ignore-invalid-scores
Scoring
pyprophet-cli apply_weights --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --separator="tab" --job-number 1 --job-count 3 &
pyprophet-cli apply_weights --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --separator="tab" --job-number 2 --job-count 3 &
pyprophet-cli apply_weights --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --separator="tab" --job-number 3 --job-count 3 &
Statistical validation
Run-specific context
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_run_specific" --separator="tab" \
--job-number 1 --job-count 3 --lambda=0.4 --statistics-mode=run-specific --overwrite-results &
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_run_specific" --separator="tab" \
--job-number 2 --job-count 3 --lambda=0.4 --statistics-mode=run-specific --overwrite-results &
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_run_specific" --separator="tab" \
--job-number 3 --job-count 3 --lambda=0.4 --statistics-mode=run-specific --overwrite-results &
Experiment-wide context
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_experiment_wide" --separator="tab" \
--job-number 1 --job-count 3 --lambda=0.4 --statistics-mode=experiment-wide &
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_experiment_wide" --separator="tab" \
--job-number 2 --job-count 3 --lambda=0.4 --statistics-mode=experiment-wide &
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_experiment_wide" --separator="tab" \
--job-number 3 --job-count 3 --lambda=0.4 --statistics-mode=experiment-wide &
Global context
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_global" --separator="tab" \
--job-number 1 --job-count 3 --lambda=0.4 --statistics-mode=global &
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_global" --separator="tab" \
--job-number 2 --job-count 3 --lambda=0.4 --statistics-mode=global --overwrite-results &
pyprophet-cli score --data-folder="/tmp/openswath_results/" --data-filename-pattern="*.tsv" \
--work-folder="/tmp/pyprophet_work/" --result-folder="/tmp/pyprophet_result_global" --separator="tab" \
--job-number 3 --job-count 3 --lambda=0.4 --statistics-mode=global --overwrite-results &