# C. elegans single cell gene expression

The wormcells-de app allows you to perform differential expression on data from C. elegans single cell RNA sequencing (scRNAseq). 21 experiments from 3 different studies were integrated and can be compared. That means you can select cells from two different experiments and perform differential expression on them!

Just select cell types and experiments to compare, some genes to highlight in your volcano plot, and leave your email. You will receive an interactive volcano plot, a csv file with results and a csv file with the selected groups and genes. The example files linked show the result of the example submission below

Results should arrive in less than 15 minutes. If they take more than an hour, something broke, so let me know by writing to eduardo@wormbase.org. Also feel free to write me if you have any feedback.

The single cell gene count matrices were processed using a machine learning framework called Single-cell Variational Inference (scVI). The scVI framework enables integrating data from different sources (different experiments, batches and technologies), clustering and label transfer, and performing differential expression between clusters. The code, data and a tutorial are available at the bottom of this page.

The wormcells-de app is still in development and on this tool will inform how WormBase may incorporate and display single cell data in the future.

### 3) Optional: choose genes to highlight

If you would like to highlight genes on the resulting volcano plot, add one gene per line.
Gene name (bus-1) or WormBase ID are accepted (WBGene00018223)
Partial matches will be highlighted , e.g. just bus highlight bus-1 and bus-18 genes.

## Select cells on group 1

Hold shift to select a range of cells

## Select cells on group 2

Hold shift to select a range of cells

The 21 experiments come from 3 different studies and are described below. To perform a comparison, you must choose two groups. Each group can have any number of pairs (cell_type, experiment) to be compared.

The experiments are described below, and a full table of how many cells of each type were seen in each experiment can be accessed here. Note that sometimes experiments used slightly different labels for the same cell type, which is why there might be repetition. This table provides the original study annotations. In the future we might re-annotate the data with harmonized labels.

## Cao 2017 L2 Larva Dataset (reprocessed)

Cao and friends. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 2017.

This was the first scRNAseq C. elegans data to be published. The single cell matrices used here were a newer version that was reprocessed and re-annotated and kindly provided by Robert Waterston and colleagues.

The technology used was sci-RNA seq. Two experiments with C. elegans at the L2 larval stage were performed with labels and cell counts as below. Their annotations define 117 cell types.

L2_experiment_1    35480
L2_experiment_2      507


## Packer 2019 Embryogenesis Dataset

Packer and friends. A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution. Science 2019.

In the paper supplements, the way the age of the embryos was defined is described as follows:

Around 250,000 L1 larvae were 29 plated onto four 100 mm petri plates seeded with NA22 bacteria and allowed to develop at 20 30 °C. As the worms reached the young adult stage, the population was closely monitored. When 31 about 20-30% of the adults had a single embryo in either arm of the gonad, worms were 32 subjected to hypochlorite treatment. The time hypochlorite was added to the worms was 33 considered t = 0.

They used 10x Genomics v2 chemistry to profile 89,701 cells, split as below. Their annotations define 183 cell types.

Waterston_400_minutes            25875      # 400 minute synchronized embryos
Waterston_300_minutes            17168      # 300 minute synchronized embryos
Murray_b01                       12129      # mixed time point. Unclear description of sample from the paper
Waterston_500_minutes_batch_2    11589      # 500 minute synchronized embryos, replicate 1
Waterston_500_minutes_batch_1    10532      # 500 minute synchronized embryos, replicate 2
Murray_r17                        9363      # mixed time point. Unclear description of sample from the paper
Murray_b02                        3045      # mixed time point. Unclear description of sample from the paper


## Taylor 2019 Neuron Dataset

Taylor and friends. Expression profiling of the mature C. elegans nervous system by single-cell RNA-Sequencing. biorxiv 2019.

This is the first data release of the C. elegans Neuronal Gene Expression Map & Network (CeNGEN). The aim of the project is to establish a comprehensive gene expression atlas of an entire nervous system at single-neuron resolution, described in the announcement publication. Their website is cengen.org.

As described in their website: We are performing 10x single-cell RNA-Seq on FACS-isolated neurons. Using 52,412 sequenced cells, 109/118 neuronal classes have been identified and computationally assigned to a cluster based on their gene expression fingerprint (93 confidently, 16 tentatively), and 9/118 classes have not been annotated yet.

They used 10x Genomics v2 chemistry to profile 65,450 cells from L4 stage larvae, split as below. Their annotations define 133 cell types.

eat-4                            12743      # pan-glutamate neurons from strain OH9625
acr-2                            11719      # cholinergic MN from strain CZ631
Pan                               9216      # neurons from pan-neural marker strain otIs355
unc-3                             6165      # strain OH11746
tph-1_ceh-10                      4810      # strain NC3580 and NC3580
ift-20                            4056      # pan sensory neurons from strain OH11157
cho-1_1                           3849      # pan-cholinergic 1 from strain OH13470
cho-1_2                           3471      # pan-cholinergic 2 from strain OH13470
unc-47_2                          3123      # pan-GABA 2 from strain EG1285 and NC3582
ceh-34                            2648      # strains RW10754 and NC3583
nmr-1                             2389      # strains VM484 and NC3572
unc-47_1                          1261      # pan-GABA 1 from strain EG1285 and NC3582


# Data and code availability

To perform differential expression we use scVI v0.6.3. The method for performing differential expression is the change option introduced in scVI v0.6.0 and described in Boyeau et al, bioRxiv 2019 . It consists in estimating an effect size random variable (here, log2 fold-change) and performing Bayesian hypothesis testing on this variable.

A Python tutorial using Jupyter Notebooks on how to reproduce this analysis using the Packer 2019 dataset is available on the official scVI documentation . The code used to run the wormcells-de app is available at the wormcells-de GitHub repository. The data for Packer 2019, Taylor 2019 and Cao 2017 is available on GitHub as an anndata file (1GB size).

### Interpreting Bayes factors

In addition to p-values, which are commonly seen in volcano plots, scVI also provides Bayes Factors

To learn more about Bayes factors vs. p-values, see the review On p-Values and Bayes Factors by Leonhard Held and Manuela Ott.

For a shorter overview, see this blog post. A common interpretation table is copied below.
In our notation, $$BF_{10}$$ is $$BF^g_{12}$$ , $$H_0$$ is $$M^g_1$$ and $$H_1$$ is $$M^g_2$$

 Bayes factor $$BF_{10}$$ $$\ln(BF_{10})$$ Interpretation > 100 > 4.60 Extreme evidence for H1 30 – 100 (3.4, 4.6) Very strong evidence for H1 10 – 30 (2.3, 3.4) Strong evidence for H1 3 – 10 (1.1, 2.3) Moderate evidence for H1 1 – 3 (0 , 1.1) Anecdotal evidence for H1 1 0 No evidence 1/3 – 1 (-1.1, 0) Anecdotal evidence for H0 1/3 – 1/10 (-2.30, -1.1) Moderate evidence for H0 1/10 – 1/30 (-3.4, -2.30) Strong evidence for H0 1/30 – 1/100 (-4.6, -3.4) Very strong evidence for H0 < 1/100 < -4.6 Extreme evidence for H0