Analysis Details

This is a QC report based on the following data:

  • Olink Data: /home/ec2-user/pipelines/proteomics/54gene-olink-qc/results/olink_data_postqc/example_54gene.tsv.gz
  • Metadata/Technical Variables: /home/ec2-user/pipelines/proteomics/54gene-olink-qc/results/dataset_metadata/example_54gene.metadata.tsv.gz

Overall structure across the dataset

Here we visualize the overall PCs based on just the data matrix of probes by samples and the NPX values contained. We have colored each point arbitrarily by the first metadata variable that is available (to limit having hard requirements on the metadata). For a more detailed breakdown of technical metadata that correlate with proteomic variation, please see the last section of this report for a statistical test of significant covariates.

fig.pca <- plot_ly(merged.meta.df,
  x = ~PC1, y = ~PC2, text = ~SampleID,
  type ="scatter",
  mode = "markers",
  #name = as.formula(paste("~", meta.variables[1])),
  color = as.formula(paste("~", meta.variables[1])),
  marker = list(size = 10)
  ) %>% layout(
    title = "PCA Overview of Olink Dataset",
    legend = list(title = list(text = paste("<b>", meta.variables[1], "</b>"))),
    xaxis = list(title = "PC1"),
    yaxis = list(title = "PC2")
)

fig.pca

Missingness across samples

Missingness across samples are measured by the proportion of assays that fail Olink’s internal QC measures. Full details on how to interpret their QC metrics and warnings can be found here: https://www.olink.com/faq/how-is-quality-control-of-the-data-performed/.

QC Warnings from Olink are provided based on internal controls and detection controls with respect to a given 96-well plate. The rule for determining if a QC Warning is emitted is if any of these control values deviate more than 0.3 NPX from the plate median. As a general rule, if more than 1/6th of the samples fail the QC then the run is deemed to be unreliable and this should be noted and sent as immediate feedback to the Olink team.

olink.merged.df <- fread(snakemake@input[["merged_olink_data"]], sep = "\t")
qc.warning.samples.df <- olink.merged.df %>%
  group_by(SampleID) %>%
  summarise(QC_Warning = sum(QC_Warning == "WARN"), N = n()) %>%
  mutate(Fraction_QC_Warning = QC_Warning / N)

reactable(qc.warning.samples.df)

Per-Panel Sample Outliers

Outliers (either in terms of missingness or measurement) can appear on a per-panel basis and we may want to filter specific individuals per-panel. In this setting we are recreating the olink_qc_plot function from the OlinkAnalyze package but for a more interactive view per-panel. Here we set the threshold for outlier detection as being outside:

  • outside 3 standard deviations on either the sample median NPX values
  • outside 3 standard deviations outside the inter-quartile range

Dynamic-Range Outliers

# Defining outliers based on medians and assay IQR
olink.outliers.df <- olink.merged.df %>%
  inner_join(merged.meta.df, by = "SampleID") %>%
  group_by(Panel, SampleID, Index) %>%
  mutate(
    iqr = IQR(NPX, na.rm = TRUE),
    sample_median = median(NPX, na.rm = TRUE),
    pc1_mean = mean(PC1, na.rm = TRUE),
    pc2_mean = mean(PC2, na.rm = TRUE)
  ) %>%
  ungroup() %>%
  select(SampleID, Index, Panel, iqr, sample_median, pc1_mean, pc2_mean, PC1, PC2) %>%
  distinct() %>%
  group_by(Panel) %>%
  mutate(
    pc1_outlier_low = mean(pc1_mean, na.rm = TRUE) - pc.sd * sd(pc1_mean, na.rm = TRUE),
    pc1_outlier_high = mean(pc1_mean, na.rm = TRUE) + pc.sd * sd(pc1_mean, na.rm = TRUE),
    pc2_outlier_low = mean(pc2_mean, na.rm = TRUE) - pc.sd * sd(pc2_mean, na.rm = TRUE),
    pc2_outlier_high = mean(pc2_mean, na.rm = TRUE) + pc.sd * sd(pc2_mean, na.rm = TRUE),
    median_low = mean(sample_median, na.rm = TRUE) - median.sd * sd(sample_median, na.rm = TRUE),
    median_high = mean(sample_median, na.rm = TRUE) + median.sd * sd(sample_median, na.rm = TRUE),
    iqr_low = mean(iqr, na.rm = TRUE) - iqr.sd * sd(iqr, na.rm = TRUE),
    iqr_high = mean(iqr, na.rm = TRUE) + iqr.sd * sd(iqr, na.rm = TRUE),
  ) %>%
  ungroup() %>%
  mutate(valid_outliers = remove.outliers) %>%
  mutate(outlier_range = ifelse(sample_median > median_low & sample_median < median_high & iqr > iqr_low & iqr < iqr_high & valid_outliers, FALSE, TRUE)) %>%
  mutate(pc_outlier = ifelse(pc1_mean > pc1_outlier_low & pc1_mean < pc1_outlier_high & pc2_mean > pc2_outlier_low & pc2_mean < pc2_outlier_high & valid_outliers, FALSE, TRUE)) %>%
  mutate(outlier = (outlier_range | pc_outlier) & valid_outliers)

uniq.panels <- unique(olink.outliers.df$Panel)

fig1 <- plot_ly(olink.outliers.df %>% filter(Panel == "Neurology"),
  x = ~sample_median,
  y = ~iqr,
  type = "scatter",
  mode = "markers",
  text = ~SampleID,
  color = ~outlier,
  marker = list(size = 10)
) %>%
  layout(
    yaxis = list(title = "IQR"),
    xaxis = list(title = "Sample Median"),
    legend = list(title = list(text = "<b> Outlier </b>")),
    title = "Neurology"
  )

fig2 <- plot_ly(olink.outliers.df %>% filter(Panel == "Cardiometabolic"),
  x = ~sample_median,
  y = ~iqr,
  type = "scatter",
  mode = "markers",
  text = ~SampleID,
  color = ~outlier,
  marker = list(size = 10)
) %>%
  layout(
    yaxis = list(title = "IQR"),
    xaxis = list(title = "Sample Median"),
    legend = list(title = list(text = "<b> Outlier </b>")),
    title = "Cardiometabolic"
  )

fig3 <- plot_ly(olink.outliers.df %>% filter(Panel == "Oncology"),
  x = ~sample_median,
  y = ~iqr,
  type = "scatter",
  mode = "markers",
  text = ~SampleID,
  color = ~outlier,
  marker = list(size = 10)
) %>%
  layout(
    yaxis = list(title = "IQR"),
    xaxis = list(title = "Sample Median"),
    legend = list(title = list(text = "<b> Outlier </b>")),
    title = "Oncology"
  )

fig4 <- plot_ly(olink.outliers.df %>% filter(Panel == "Inflammation"),
  x = ~sample_median,
  y = ~iqr,
  type = "scatter",
  mode = "markers",
  text = ~SampleID,
  color = ~outlier,
  marker = list(size = 10)
) %>%
  layout(
    yaxis = list(title = "IQR"),
    xaxis = list(title = "Sample Median"),
    legend = list(title = list(text = "<b> Outlier </b>")),
    title = "Inflammation"
  )

fig1
fig2
fig3
fig4

PCA-Based Outliers

The PCA-based outliers are also visualized in the following plot. The rule to determine outliers is based on a sample being outside 3 outside standard deviations from the means of PC1 and PC2

fig <- plot_ly(olink.outliers.df,
  x = ~PC1,
  y = ~PC2,
  type = "scatter",
  mode = "markers",
  text = ~SampleID,
  color = ~pc_outlier,
  marker = list(size = 10)
) %>%
  layout(
    yaxis = list(title = "PC2"),
    xaxis = list(title = "PC1"),
    legend = list(title = list(text = "<b> Outlier </b>")),
    title = "PCA-based Outliers"
  )

fig

Covariation of Technical and Latent Factors

To evaluate if protein expression co-varies with technical covariates, we also take a look at the association between provided technical covariates and PCs computed using the Olink data matrix. Principal components are denoted as “PC” within the table.

# Evaluating quality control via an anova of technical factors ...
anova.df <- fread(snakemake@input[["anova_table"]], sep = "\t")

reactable(anova.df)

FAQ

Some commonly answered questions regarding processing and quality control of Olink data. We suggest the reader to read these to gain a little more insight into the data generated and recommendations for filtering.

Session Information


The following summarizes the loaded R configuration for the run that created this report.

sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Amazon Linux 2
##
## Matrix products: default
## BLAS/LAPACK: /home/ec2-user/pipelines/proteomics/54gene-olink-qc/.snakemake/conda/dce01d8475ee1861579de54f6c914635_/lib/libopenblasp-r0.3.21.so
##
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## other attached packages:
##  [1] reactable_0.3.0    plotly_4.10.1      data.table_1.14.6  RColorBrewer_1.1-3
##  [5] forcats_0.5.2      stringr_1.4.1      dplyr_1.0.10       purrr_0.3.5
##  [9] readr_2.1.3        tidyr_1.2.1        tibble_3.1.8       ggplot2_3.4.0
## [13] tidyverse_1.3.2    knitr_1.41
##
## loaded via a namespace (and not attached):
##  [1] httr_1.4.4          sass_0.4.4          jsonlite_1.8.3
##  [4] viridisLite_0.4.1   R.utils_2.12.2      modelr_0.1.10
##  [7] bslib_0.4.1         assertthat_0.2.1    googlesheets4_1.0.1
## [10] cellranger_1.1.0    yaml_2.3.6          pillar_1.8.1
## [13] backports_1.4.1     glue_1.6.2          digest_0.6.30
## [16] rvest_1.0.3         colorspace_2.0-3    htmltools_0.5.3
## [19] R.oo_1.25.0         plyr_1.8.8          reactR_0.4.4
## [22] pkgconfig_2.0.3     broom_1.0.1         haven_2.5.1
## [25] scales_1.2.1        tzdb_0.3.0          googledrive_2.0.0
## [28] generics_0.1.3      farver_2.1.1        ellipsis_0.3.2
## [31] cachem_1.0.6        withr_2.5.0         lazyeval_0.2.2
## [34] cli_3.4.1           magrittr_2.0.3      crayon_1.5.2
## [37] readxl_1.4.1        evaluate_0.18       R.methodsS3_1.8.2
## [40] fs_1.5.2            fansi_1.0.3         xml2_1.3.3
## [43] tools_4.2.2         hms_1.1.2           gargle_1.2.1
## [46] lifecycle_1.0.3     munsell_0.5.0       reprex_2.0.2
## [49] compiler_4.2.2      jquerylib_0.1.4     rlang_1.0.6
## [52] grid_4.2.2          htmlwidgets_1.5.4   crosstalk_1.2.0
## [55] rmarkdown_2.18      gtable_0.3.1        DBI_1.1.3
## [58] reshape2_1.4.4      R6_2.5.1            lubridate_1.8.0
## [61] fastmap_1.1.0       utf8_1.2.2          stringi_1.7.8
## [64] Rcpp_1.0.9          vctrs_0.5.1         dbplyr_2.2.1
## [67] tidyselect_1.2.0    xfun_0.35