NanoString Technologies Inc.

10/03/2024 | Press release | Distributed by Public on 10/03/2024 22:32

Tips when performing CosMx™ data analysis with AtoMx SIP

Single-cell spatial biology opens doors to unprecedented scientific questions requiring the development of new computational workflows and approaches to analyze and interpret data. Whether you are new to single-cell spatial transcriptomics data analysis or are an experienced computational biologist, the AtoMx platform provides avenues to explore, compute, iterate, and export CosMx SMI data.

In this post, we walk through a standard single-cell spatial transcriptomic cell typing pipeline in AtoMx (Figure 1), which is a typical pre-requisite to performing downstream spatial analyses.

NOTE: Recommendations in this guide are based on AtoMx v1.3.2.

Tip 1: Run Quality Control (QC)

The QC module in AtoMx is currently intended for flagging potentially lower quality cells for downstream removal after data export. QC and FOV QC are both recommended for every dataset, while additional methods can be explored.

Cell QC - flags cells with specific characteristics.

  • Minimum counts per cell: 20 (for 1K RNA) and 50 (for 6K RNA); minimum counts per cell will be higher for higher plex assays and ultimately varies by dataset.

FOV QC - flags outlier FOVs that have overall low average expression or inconsistent clustering compared to neighboring FOVs.

  • Method: mean
  • FOV Count Cutoff: 100 (varies based on assay and dataset); minimum counts per cell will be higher for higher plex assays and ultimately varies by dataset. Outlier FOVs may also require detection from downstream analyses like UMAP.

Note: other methods/parameters can be explored as shown Figure 2 but are optional and dependent on the dataset.

Signal (counts per cell) is the result of multiple factors, including tissue, image quality, and cell segmentation (see CosMx 1000-plex RNA Assays: Considerations When Generating Single-Cell Spatial Data). For FOV QC, we flag outliers that stand out with low average expression or disparate cell typing/spatial clustering compared to neighboring FOVs of similar biology.

Note: the quality control (QC) module in AtoMx flags but does not remove data; the flags are preserved in the data and can be removed after data export.

What is the purpose of negative probes?

Every CosMx RNA panel contains a series of negative probes that target no known RNA transcripts. The average of the negative probe count helps assess background noise. Compare total transcripts (counts) per cell against the average of negative probes to evaluate relative sample performance across tissues in a study.

Tip 2: Apply the appropriate normalization

CosMx RNA normalization adjusts for cell-specific total transcript abundance and distribution of counts (may vary between some FOVs and between samples) to minimize influence on downstream visualization and data analyses.

AtoMx includes three normalization methods for RNA assays:

  • Total Counts normalization (recommended) - a global-scaling normalization method that normalizes gene counts for each cell by the total expression of each cell.
  • Seurat normalization - total counts normalization that is multiplied by a scale factor (10,000) and natural-log transformed after adding 1.
  • Pearson Residuals normalization - based on the estimated mean and variance: (raw gene count in a cell - mean gene count in the cell) / SD of gene counts in the cell).

A reasonable assumption to make is that a cell's detection efficiency is estimated by its total counts; thus, one normalization approach is to scale each cell's profile by its total counts for normalization.

When is Seurat or Pearson Residuals normalization preferred over Total Counts normalization?

Seurat normalization can be used to optimize the UMAP or run Leiden clustering. Seurat normalization may offer more flexible visualization capabilities, especially for studies with high-count genes (e.g., high-expressing housekeeping genes).

When working with smaller datasets (≤ 500,000 total cells per study), Pearson Residuals normalization can outperform other normalization methods, especially when identifying biologically variable genes. In larger datasets (> 500,000 total cells per study), the Pearson residuals transformation is not as efficient computationally; in these cases, we typically recommend a total counts normalization.

Note: For differential expression and many advanced analysis modules, unnormalized data is utilized; in these cases, however, total counts is still an input variable in the algorithms.