Overview
This workflow was designed by the Genomics & Data Science team (GDS) at 54gene and is used to analyze data from the Olink Explore platform. This pipeline is designed to be deployed per-batch of Olink Explore data. The workflow is designed to support reproducible bioinformatics, and is written in Snakemake to be platform-agnostic. All dependencies are installed by the pipeline as-needed using conda. Development and testing has been predominantly on AWS’ ParallelCluster using Amazon Linux using Snakemake version 7.16.0.
Features:
Filters excessive Assay Warnings
Applies outlier filtering using PCA, IQR, and Median differentiation
Integration of metadata to assess correlation of proteome and metadata variables
Generate interactive HTML QC reports
To install the latest release, type:
git clone https://gitlab.com/data-analysis5/proteomics/54gene-olink-qc.git
Inputs
The pipeline requires the following inputs:
A CSV in “tall” format from Olink.
A file with the SHA256 hash of the CSV file from Olink.
Config file with other pipeline parameters configured (see default config provided in
config/config.yaml
).A tab-delimited
metadata.tsv
file with sample linked metadata (NOTE: the sampleIDs must match those in the Olink dataset). One column must contain the sample identifiers and noted in the configuration.
Outputs
Following a pipeline run, you will be able to generate:
A post-QC set of Olink proteomic and metadata and a postqc.yaml file that can be passed along to further analyses.
Two HTML based reports: - a report describing the structure of the pre-QC data - a report describing the structure of the post-QC data
Principal components and assorted covariates for carrying forward in downstream analyses.
See the Installation, Execution, and Configuration for details on setting up and running the pipeline.