SQLite-based Workflow

Overview

With the increasing size and number of runs acquired by data-independent acquisition (DIA)-based methods, data analysis algorithms like OpenSWATH [1] are challenged by the size and data formats of the input data. Additionally, recent extensions like TRIC [2], IPF [3] and error-rate control on different levels and in different contexts [4] produce additional layers of data that could not be ideally represented by the original data structures.

For this reason, we are currently adapting the tools for new SQLite [5] based data formats that represent the data in a non-redundant fashion, require less storage and processing time. The new data formats have been implemented in OpenSWATH and PyProphet and the final results can be exported to the legacy TSV reports.

PQP files represent the data stored in TraML files. OSW files copy the exact data structure of the PQP files and append feature tables generated by OpenSWATH. Finally, PyProphet appends score tables linked to the feature tables. OpenSWATH stores the results of one run in a single OSW file. However, PyProphet can merge OSW files in a non-redundant, non-destructive fashion.

Contact and Support

The new data formats are currently in development and must NOT be used in production environments. We would however be very grateful for testing of the new workflows and reporting of problems and bugs.

You can contact the author George Rosenberger.

Installation

To use the new data formats, please use the following versions of our tools:

OpenMS

Full support for PQP and OSW files is provided in OpenMS/develop, with limited support available since OpenMS 2.2. Please follow the instructions in the OpenSWATH tutorial to install OpenMS.

PyProphet

We have developed a new, substantially changed version of PyProphet that integrates the new functionality of both IPF and can conduct error-rate control in different contexts and on different levels. If Python and PIP are configured correctly, the following command can be used to install the development version:

pip install git+https://github.com/grosenberger/pyprophet.git@feature/refactoring

TRIC

TRIC should be installed according to the TRIC installation instructions. The SQLite format is presently not supported, however exporting the legacy format will enable intermediate compatibility.

Tutorial

The general workflow is very similar to the original OpenSWATH workflow with a few minor changes:

1. Peptide Query Parameter Generation

Peptide query parameters should be generated identically as described previously, including appended decoys. Optionally, OpenSwathAssayGenerator can append peptide query parameters for IPF. The final TraML should then be converted to a peptide query parameter (PQP) file und TargetedFileConverter from OpenMS:

TargetedFileConvert -in assays_ipf_decoys.TraML -out assays_ipf_decoys.pqp

2. Targeted data extraction using OpenSWATH

The next step is conducted using OpenSWATH.

OpenSwathWorkflow
-in MSDATA.mzXML.gz \
-tr assays_ipf_decoys.pqp \
-out_osw MSDATA_RESULTS.osw
[OTHER PARAMETERS]

The workflow is executed identically as before, with the only change being that the PQP file is used -tr assays_ipf_decoys.pqp and an OSW file is exported -out_osw MSDATA_RESULTS.osw.

3. Statistical validation using PyProphet

PyProphet is then applied to the OSW files. Importantly, the updated version has changed substantially internally and in terms of the command line interface. Several different commands can be run to consecutively to do the analysis:

pyprophet merge --out=merged.osw \
--subsample_ratio=1 *.osw

This command will merge and optionally subsample multiple files. If a set of runs should be analyzed in an experiment-wide fashion, we recommend to conduct this step. If semi-supervised learning is too slow, create an additional merged file with a smaller subsample_ratio. The model will be stored in the output and can be applied to the full file.

pyprophet score --in=merged.osw --level=ms2

The main command will conduct semi-supervised learning and error-rate estimation in a fully automated fashion. --help will show the full selection of parameters to adjust the process. The default parameters are recommended for SCIEX TripleTOF 5600/6600 instrument data, but can be adjusted in other scenarios. The parameter --level can be set to ms2, ms1 or transition. If MS1 or transition-level data should be scored, the command is executed three times, e.g.:

pyprophet score --in=merged.osw --level=ms1 \
score --in=merged.osw --level=ms2 \
score --in=merged.osw --level=transition

Importantly, PyProphet will store all results in the input OSW files. This can be changed by specifying --out. However, since all steps are non-destructive, this is not necessary.

If IPF should be applied after scoring, the following command can be used:

pyprophet ipf --in=merged.osw

To adjust the IPF-specific parameters, please consult pyprophet ipf --help.

To conduct peptide inference in run-specific, experiment-wide and global contexts, the following command can be applied:

pyprophet peptide --in=merged.osw --context=run-specific \
peptide --in=merged.osw --context=run-specific \
peptide --in=merged.osw --context=global

This will generate individual PDF reports and store the scores in a non-redundant fashion in the OSW file.

Analogously, this can be conducted on protein-level as well:

pyprophet protein --in=merged.osw --context=run-specific \
protein --in=merged.osw --context=run-specific \
protein --in=merged.osw --context=global

Finally, we can export the results to legacy OpenSWATH TSV report:

pyprophet export --in=merged.osw --out=legacy.tsv \

By default, IPF results will be used. This can be disabled by setting --no-ipf.

References

[1]Röst HL, Rosenberger G, Navarro P, Gillet L, Miladinović SM, Schubert OT, Wolski W, Collins BC, Malmström J, Malmström L, Aebersold R. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat Biotechnol. 2014 Mar 10;32(3):219-23. doi: 10.1038/nbt.2841. PMID: 24727770
[2]Röst HL, Liu Y, D’Agostino G, Zanella M, Navarro P, Rosenberger G, Collins BC, Gillet L, Testa G, Malmström L, Aebersold R. TRIC: an automated alignment strategy for reproducible protein quantification in targeted proteomics. Nat Methods. 2016 Sep;13(9):777-83. doi: 10.1038/nmeth.3954. Epub 2016 Aug 1. PMID: 27479329
[3]Rosenberger G, Liu Y, Röst HL, Ludwig C, Buil A, Bensimon A, Soste M, Spector TD, Dermitzakis ET, Collins BC, Malmström L, Aebersold R. Inference and quantification of peptidoforms in large sample cohorts by SWATH-MS. Nat Biotechnol. 2017 Aug;35(8):781-788. doi: 10.1038/nbt.3908. Epub 2017 Jun 12. PMID: 28604659
[4]Rosenberger G, Bludau I, Schmitt U, Heusel M, Hunter CL, Liu Y, MacCoss MJ, MacLean BX, Nesvizhskii AI, Pedrioli PGA, Reiter L, Röst HL, Tate S, Ting YS, Collins BC, Aebersold R. Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses. Nat Methods. 2017 Sep;14(9):921-927. doi: 10.1038/nmeth.4398. Epub 2017 Aug 21. PMID: 28825704
[5]http://sqlite.org/
[6]Schubert OT, Gillet LC, Collins BC, Navarro P, Rosenberger G, Wolski WE, Lam H, Amodei D, Mallick P, MacLean B, Aebersold R. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat Protoc. 2015 Mar;10(3):426-41. doi: 10.1038/nprot.2015.015. Epub 2015 Feb 12. PMID: 25675208