Execute the STAVER workflow¶
The STAVER algorithm is implemented in the staver_pipeline module. This module provides a comprehensive proteomics data analysis tool designed to streamline the workflow from raw data preprocessing to the final result output. We provide a tutorial for running the STAVER workflow with the Command-Line Interface (CLI). For more details about the STAVER algorithm, please refer to the STAVER Document.
[2]:
import pandas as pd
import numpy as np
import staver as st
import warnings
warnings.filterwarnings('ignore')
List all the optional command arguments¶
[3]:
%run ~/STAVER/staver/staver_pipeline.py -h
usage: staver_pipeline.py [-h] -n NUMBER_THRESHODS -i DIA_PATH
[-ref REFERENCE_STANDARD_DATASET] -o
DIA_PEP_DATA_OUTPATH -op DIA_PROTEIN_DATA_OUTPATH
[-fdr FDR_THRESHOLD] [-c COUNT_CUTOFF_SAME_LIBS]
[-d COUNT_CUTOFF_DIFF_LIBS]
[-pep_cv PEPTIDES_CV_THRESH]
[-pro_cv PROTEINS_CV_THRESH]
[-na_thresh NA_THRESHOLD] [-top TOP_PRECURSOR_IONS]
[-norm NORMALIZATION_METHOD] [-suffix FILE_SUFFIX]
[-sample SAMPLE_TYPE] [-ver VERBOSE] [-v]
STAVER: A Standardized Dataset-Based Algorithm for Efficient Variation
Reduction in Large-Scale DIA MS Data
optional arguments:
-h, --help show this help message and exit
-n NUMBER_THRESHODS, --thread_numbers NUMBER_THRESHODS
The number of thresholds for computer operations
-i DIA_PATH, --input DIA_PATH
The DIA input data path
-ref REFERENCE_STANDARD_DATASET, --reference_dataset_path REFERENCE_STANDARD_DATASET
The DIA standarde reference directory
-o DIA_PEP_DATA_OUTPATH, --output_peptide DIA_PEP_DATA_OUTPATH
The processed DIA proteomics of peptide data output
path
-op DIA_PROTEIN_DATA_OUTPATH, --output_protein DIA_PROTEIN_DATA_OUTPATH
The processed DIA proteomics protein data output path
-fdr FDR_THRESHOLD, --fdr_threshold FDR_THRESHOLD
Setting the FDR threshold (default: 0.01)
-c COUNT_CUTOFF_SAME_LIBS, --count_cutoff_same_libs COUNT_CUTOFF_SAME_LIBS
Setting the count cutoff of same files (default: 1)
-d COUNT_CUTOFF_DIFF_LIBS, --count_cutoff_diff_libs COUNT_CUTOFF_DIFF_LIBS
Setting the count cutoff of different files (default:
2)
-pep_cv PEPTIDES_CV_THRESH, --peptides_cv_thresh PEPTIDES_CV_THRESH
Setting coefficient of variation threshold for the
peptides (default: 0.3)
-pro_cv PROTEINS_CV_THRESH, --proteins_cv_thresh PROTEINS_CV_THRESH
Setting coefficient of variation threshold for the
proteins (default: 0.3)
-na_thresh NA_THRESHOLD, --na_threshold NA_THRESHOLD
Setting the minimum threshold for NUll peptides
(default: 0.3)
-top TOP_PRECURSOR_IONS, --top_precursor_ions TOP_PRECURSOR_IONS
Setting the top high confidence interval precursor
ions (default: 6)
-norm NORMALIZATION_METHOD, --normalization_method NORMALIZATION_METHOD
Specify data normalization method
-suffix FILE_SUFFIX, --file_suffix FILE_SUFFIX
Set the suffix for folder specific identification
-sample SAMPLE_TYPE, --sample_type SAMPLE_TYPE
Description of the sample type
-ver VERBOSE, --verbose VERBOSE
Set the verbose mode for the output information
-v, --version show program's version number and exit
Run the staver_pipeline¶
(Estimated time: ~5 min of 20 samples)
To begin with, the Environment and the DIA dataset should be prepared:
Preparing the Environment:
Ensure that Python is installed on your system.
Download or clone the
STAVERrepository to your local machine or HPC.Install the required packages by running
pip install -r requirements.txtin theSTAVERdirectory.
Setting Up the Parameters:
Use the
-nflag to set the number of threads for computation.The
-iflag should point to your input DIA data path.If you have a reference dataset, use the
-refflag to provide its path; otherwise, the default dataset will be used.Define the output paths for peptide data with
-oand protein data with-op.
Users have the option to configure various parameters, including false discovery rate (FDR) and coefficient of variation thresholds for peptides, as well as intensity and frequency thresholds and selection criteria for top precursor ions, among others. These customizable settings enable the tailoring of the STAVER processing workflow to meet specific experimental requirements.
For comprehensive information, please refer to the Tutorials section in the STAVER documentation, specifically under the subsection “The Detailed Description of the Parameters”. The detailed information about “data preparation and format requirements” can be found in the Data Ingestion subsection of the Tutorials.
[5]:
## run staver_pipeline
%run ~/STAVER/staver/staver_pipeline.py \
--thread_numbers 16 \
--input /Volumes/T7_Shield/staver/data/likai-diann-raw-20/ \
--reference_dataset_path /Volumes/T7_Shield/staver/data/likai-diann-raw \
--output_peptide /Volumes/T7_Shield/staver/results/DIA_repeat20_2023010/peptides/ \
--output_protein /Volumes/T7_Shield/staver/results/DIA_repeat20_2023010/proteins/ \
--count_cutoff_same_libs 1 \
--count_cutoff_diff_libs 2 \
--fdr_threshold 0.01 \
--peptides_cv_thresh 0.3 \
--proteins_cv_thresh 0.3 \
--na_threshold 0.3 \
--top_precursor_ions 6 \
--file_suffix _F1_R1
All parsed arguments:
number_threshods: 16
dia_path: /Volumes/T7_Shield/staver/data/likai-diann-raw-20/
reference_standard_dataset: /Volumes/T7_Shield/staver/data/likai-diann-raw
dia_pep_data_outpath: /Volumes/T7_Shield/staver/results/DIA_repeat20_2023010/peptides/
dia_protein_data_outpath: /Volumes/T7_Shield/staver/results/DIA_repeat20_2023010/proteins/
fdr_threshold: 0.01
count_cutoff_same_libs: 1
count_cutoff_diff_libs: 2
peptides_cv_thresh: 0.3
proteins_cv_thresh: 0.3
na_threshold: 0.3
top_precursor_ions: 6
normalization_method: median
file_suffix: _F1_R1
sample_type: None
verbose: False
===================== 'run_staver' function begins running... ======================
====================== 'load_data' function begins running... ======================
/Volumes/T7_Shield/staver/results/DIA_repeat20_2023010/peptides/
[ ]: