Overview

This workflow was designed by the Genomics & Data Science team (GDS) at 54gene and is used to analyze data from the Olink Explore platform. This pipeline is designed to be deployed per-batch of Olink Explore data. The workflow is designed to support reproducible bioinformatics, and is written in Snakemake to be platform-agnostic. All dependencies are installed by the pipeline as-needed using conda. Development and testing has been predominantly on AWS’ ParallelCluster using Amazon Linux using Snakemake version 7.16.0.

Features:

  • Filters excessive Assay Warnings

  • Applies outlier filtering using PCA, IQR, and Median differentiation

  • Integration of metadata to assess correlation of proteome and metadata variables

  • Generate interactive HTML QC reports

To install the latest release, type:

git clone https://gitlab.com/data-analysis5/proteomics/54gene-olink-qc.git

Inputs

The pipeline requires the following inputs:

  • A CSV in “tall” format from Olink.

  • A file with the SHA256 hash of the CSV file from Olink.

  • Config file with other pipeline parameters configured (see default config provided in config/config.yaml).

  • A tab-delimited metadata.tsv file with sample linked metadata (NOTE: the sampleIDs must match those in the Olink dataset). One column must contain the sample identifiers and noted in the configuration.

Outputs

Following a pipeline run, you will be able to generate:

  • A post-QC set of Olink proteomic and metadata and a postqc.yaml file that can be passed along to further analyses.

  • Two HTML based reports: - a report describing the structure of the pre-QC data - a report describing the structure of the post-QC data

  • Principal components and assorted covariates for carrying forward in downstream analyses.

See the Installation, Execution, and Configuration for details on setting up and running the pipeline.