Configuration

The workflow needs to be configured to perform the quality control analyses by creating a set of files that are defined in the config/config.yaml. Each of the underlying sections below corresponds to specific files and configuration options that should be added in.

Configuation file

A configuration file for this pipeline can be found in config/config.yaml and is used for generating and specifying the report from the pipeline.

Config parameters

Below are descriptions and usage options for the various config parameters specified in config.yaml.

Parameter

Required

Description

name

Y

Name for the project (prepended to results files)

olink_data/data

Y

CSV File with Olink proteomics data

olink_data/checksum

Y

File with SHA256 checksum for Olink CSV file

olink_data/ignoreFile

N

File with sample IDs to ignore from analysis (default: “”)

olink_data/ignoreRegex

N

File with regex pattern to ignore samples (default: “”)

olink_data/sdIQR

Y

Standard deviations for IQR-based outlier filter

olink_data/sdMedian

Y

Standard deviations for Median based outlier filter

olink_data/sdPCA

Y

Standard deviations for PC1 & PC2 based outlier filter

olink_data/assayWarnProp

Y

Proportion of assay warnings required to remove assay

metadata/filename

Y

Metadata filename (see below)

Metadata file

The metadata file can specify both quantitative and qualitative covariates to check for downstream association with axes of proteomic variation (e.g. Age, Sex, BMI).

The file can be structured as as TSV or CSV as below:

SampleID    COV1    COV2
A1  0.0334699       0.329964
A10 0.690636        0.422487
A100        0.206265        0.250128
A101        0.636559        0.863622
A102        0.301656        0.0249239
A103        0.364993        0.765381

For a clearer example of an example dataset, explore our example data here. Note that checks will be performed to ensure that every single individual has metadata accompanying it. If you have no metadata for an individual, make sure to fill it in with empty or NA values.

Sample linker file

In both processing of metadata and Olink proteomic data there can be potential shifts in sample ID nomenclature that can be difficult for merging and performing analyses. In instances like these you can provide a simple two column file that will perform the sample renaming in downstream files for you, for example:

curID newID
A1 B1
A2 B2

This file is largely necessary when the IDs sent to Olink are discrepant with your in-house metadata/phenotypes. The column headers are important to retain.