ToppCell: A Hierarchical Modular Single Cell Gene Expression Analysis System

ToppCell

With Support From

40745a30fcdd78e5112347bd0d505796d524fb4b

Introduction

ToppCell (https://toppcell.cchmc.org/) is a web portal designed for biologists and bioinformaticians to explore single-cell RNA-seq data swiftly in the era of single cell and high throughput sequencing. It utilized modularized approaches to curate gene signatures of cells based on their lineages, cell classes, subclasses, clinical information and so on, which enables the generation of multi-layered atlases of cells with the potential to investigate the roles of ensembles of cell types, genes, pathways and procedures.

ToppCell doesn’t provide innovative computational algorithms for clustering or trajectory inference. Instead, it is an interactive toolkit that allows researchers to see and play with gene signatures in a hierarchically organized way. Seamless connections to ToppGene and ToppCluster enable swift pathway enrichment and comparative analysis of single-cell data. Using ToppCell, we can quickly build up an atlas of signatures for a single-cell study.

Scope

Currently we have curated over 45 public single-cell studies covering various tissues in human, mouse and in-vitro cell lines. They’re categorized into several atlases:

  • COVID-19 Atlas
  • Lung Map
  • Immune Map
  • Brain Map
  • Mouse Atlas
  • OncoMap
  • Cardiovascular Atlas
  • GI MAP

Icon Field

We’re still adding more data to build comprehensive signature database for human cells. Please contact us if you have interests in contributing to it.

Tutorial

Upload data and generate gene modules

Gene module generation requires 3 files from users’ upload: expression table, cell annotation table and shred structure.

Expression tables could be text files (csv, tsv or txt files), which contain raw counts or normalized expression levels. Raw counts will be normalized first (we usually use Log2(CPM+1) normalization) and then used for gene module generation.

Cell annotation table should contain metadata of each barcode, including information of cluster, cell class, subclass, disease conditions and so on. Input files of Scanpy Anndata or Seurat object will be supported in the near future.

Shred structure decides the way and scope for comparisons of cells. The example below shows how we get gene modules of cell types in COVID-19 patients and healthy donors.

workflow

Currently we only support uploads from our end. We’re working on easy uploads in an interactive way. In addition, Python / R packages will also come out in the near future.

Exploring gene modules of a single study

After generation of gene modules, we can start to explore the abundant information in our single-cell study. Here we show gene modules of COVID-19 PBMC single-cell data (Wilk et al., 2020). In this page, we can see data title and study metadata, which are brief descriptions of the single-cell study. Below it is the shred structure, which defines the comparison strategy. 5 downloadable files are listed in the middle, including original, binned and superbinned expression tables; cell annotation file and gene module report file. Below it, we can see hierarchically organized gene modules for all cell types and corresponding preview heatmaps. ToppGene enrichment results were pre-calculated and shown on the right.

Anotated Shred

Functional Comparative analysis

Apart from ToppGene enrichment for individual gene modules, we can also do functional comparative analysis using ToppCluster enrichment for a set of gene modules. In the figure below, we choose gene modules for monocytes in ventilated COVID-19 patients and healthy donors and compare their functional enrichments side by side in ToppCluster.

comparative analysis

Cell-cell interaction inference

Cell interaction network is an important part of cell atlases. We can do it in ToppCluster as well using the same procedure. The only difference is that we choose “Interaction Network”, instead of “Functional Enrichment”, on the ToppCluster Page.

comparative analysis

Gene query

Single gene query function is available on ToppCell. We can input a gene symbol and get gene modules where the queried gene is highly ranked. For example, we query for ACE2, the SARS-CoV-2 entry receptor, in the search box. The searching result in current curation is shown below.

comparative analysis

comparative analysis

Integrate multiple studies

We suggest integrating multiple single-cell studies when we prepare input expression tables. An alternative way to integrate multiple datasets is to merge multiple super binned expression files in Morpheus, and then append gene modules from those studies.

Contact Us

If you want to contribute to ToppCell or upload your data onto it, or if you have any questions, feel free to contact us. Kang.Jin@cchmc.org; Eric.Bardes@cchmc.org