Tutorial 9: 10x Visium mouse brain data from Squidpy package

In this tutorial, we demonstrate how to enhance and denoise the gene expression patterns of mouse brain data using STGMVA.

You can install Squidpy package and load this data via “adata = sq.datasets.visium_hne_adata()”

The original mouse brain is available at 10x Genomics website: https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Adult_Mouse_Brain

This data has been manually annotated in the Squidpy package.

Loading package

[3]:
import os
import torch
import pandas as pd
import scanpy as sc
from sklearn import metrics
import multiprocessing as mp
import squidpy as sq
import numpy as np
import matplotlib.pyplot as plt
[2]:
from STGMVA.STGMVA import STGMMVE
from STGMVA import mk_dir
2023-07-15 14:20:09.973455: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

Reading ST data and showing the manual annotation

[13]:
adata = sq.datasets.visium_hne_adata() # read data
[19]:
sq.pl.spatial_scatter(adata, color="cluster", size=40, figsize=(8,6),shape=None)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
/home/tengliu/miniconda3/envs/Torch_pyG2.0/lib/python3.9/site-packages/squidpy/pl/_spatial_utils.py:955: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = scatter(
_images/Tutorial_6_mouse_brain_gene_imputation_7_2.svg
[7]:
adata
[7]:
AnnData object with n_obs × n_vars = 2688 × 18078
    obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'n_counts', 'leiden', 'cluster'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'cluster_colors', 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'rank_genes_groups', 'spatial', 'umap'
    obsm: 'X_pca', 'X_umap', 'spatial'
    varm: 'PCs'
    obsp: 'connectivities', 'distances'

Create directory for pretrained model

[8]:
save_path='./Imputation_results/'
section_id = "mouse_brain"
mk_dir(save_path,section_id)

Training the model

STGMVA aims to learn the representations by two-step process. First, pretrained the GMM clustering model. Second, discerned the spatial domains for spatial transcriptomics data.

After model training, the learned representations will be saved in adata.obsm[‘embedding’], and can be used for spatial clustering.

[9]:
model = STGMMVE(adata, imputation=True,datatype = "Mouse", nCluster=15,save_path=save_path, section_id=section_id)

# model.pretrain() # Train your own pretranined model or use the pretrained model we provided.

adata_res = model.train_cluster()

/home/tengliu/miniconda3/envs/Torch_pyG2.0/lib/python3.9/site-packages/scanpy/preprocessing/_highly_variable_genes.py:62: UserWarning: `flavor='seurat_v3'` expects raw count data, but non-integers were found.
  warnings.warn(
Graph constructed!
  0%|          | 0/50 [00:00<?, ?it/s]R[write to console]:                    __           __
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 5.4.10
Type 'citation("mclust")' for citing this R package in publications.

  2%|▏         | 1/50 [00:06<05:17,  6.49s/it]
Epoch: 0
NMI=0.643384, ARI=0.461683
Loss=12.1072,  ELBO Loss=60.4636
 12%|█▏        | 6/50 [00:40<05:10,  7.06s/it]
Epoch: 5
NMI=0.627292, ARI=0.439771
Loss=12.9829,  ELBO Loss=36.5022
 22%|██▏       | 11/50 [01:16<04:20,  6.69s/it]
Epoch: 10
NMI=0.647456, ARI=0.473869
Loss=12.9240,  ELBO Loss=29.2507
 32%|███▏      | 16/50 [01:47<03:25,  6.03s/it]
Epoch: 15
NMI=0.647690, ARI=0.458287
Loss=12.7701,  ELBO Loss=26.6341
 42%|████▏     | 21/50 [02:20<03:08,  6.52s/it]
Epoch: 20
NMI=0.637479, ARI=0.444379
Loss=12.6324,  ELBO Loss=25.4331
 52%|█████▏    | 26/50 [02:49<02:18,  5.75s/it]
Epoch: 25
NMI=0.647323, ARI=0.468131
Loss=12.5578,  ELBO Loss=24.8886
 62%|██████▏   | 31/50 [03:22<02:08,  6.74s/it]
Epoch: 30
NMI=0.650110, ARI=0.478146
Loss=12.4956,  ELBO Loss=24.4925
 72%|███████▏  | 36/50 [03:51<01:22,  5.92s/it]
Epoch: 35
NMI=0.643957, ARI=0.475068
Loss=12.4747,  ELBO Loss=24.2227
 82%|████████▏ | 41/50 [04:23<00:57,  6.42s/it]
Epoch: 40
NMI=0.638198, ARI=0.449252
Loss=12.4180,  ELBO Loss=23.9077
 92%|█████████▏| 46/50 [04:49<00:21,  5.29s/it]
Epoch: 45
NMI=0.649857, ARI=0.488727
Loss=12.4057,  ELBO Loss=23.7013
100%|██████████| 50/50 [05:11<00:00,  6.23s/it]
[10]:
adata_res
[10]:
AnnData object with n_obs × n_vars = 2688 × 2688
    obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'n_counts', 'leiden', 'cluster', 'pre_label'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'cluster_colors', 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'rank_genes_groups', 'spatial', 'umap', 'log1p', 'ari_list', 'loss'
    obsm: 'X_pca', 'X_umap', 'spatial', 'distance_matrix', 'graph_neigh', 'adj', 'feat_mat', 'embedding'
    varm: 'PCs'
    layers: 'Recons_features'
    obsp: 'connectivities', 'distances'

Visualization

We choose several marker genes to visualize the gene expression patterns in the spatial domain.

[12]:
plot_gene = '4732440D04Rik'
fig, axs = plt.subplots(1, 2, figsize=(8, 4),dpi=100)
sc.pl.spatial(adata_res, img_key="hires", color=plot_gene, show=False, ax=axs[0], title='RAW_'+plot_gene, vmax='p99')
sc.pl.spatial(adata_res, img_key="hires", color=plot_gene, show=False, ax=axs[1], title='STGMMVE_'+plot_gene, layer='Recons_features', vmax='p99')
[12]:
[<Axes: title={'center': 'STGMMVE_4732440D04Rik'}, xlabel='spatial1', ylabel='spatial2'>]
_images/Tutorial_6_mouse_brain_gene_imputation_17_1.svg
[10]:
plot_gene = 'Eya1'
fig, axs = plt.subplots(1, 2, figsize=(8, 4),dpi=100)
sc.pl.spatial(adata_res, img_key="hires", color=plot_gene, show=False, ax=axs[0], title='RAW_'+plot_gene, vmax='p99')
sc.pl.spatial(adata_res, img_key="hires", color=plot_gene, show=False, ax=axs[1], title='STGMMVE_'+plot_gene, layer='Recons_features', vmax='p99')
[10]:
[<Axes: title={'center': 'STGMMVE_Eya1'}, xlabel='spatial1', ylabel='spatial2'>]
_images/Tutorial_6_mouse_brain_gene_imputation_18_1.svg
[9]:
plot_gene = 'Tfap2b'
fig, axs = plt.subplots(1, 2, figsize=(8, 4),dpi=100)
sc.pl.spatial(adata_res, img_key="hires", color=plot_gene, show=False, ax=axs[0], title='RAW_'+plot_gene, vmax='p99')
sc.pl.spatial(adata_res, img_key="hires", color=plot_gene, show=False, ax=axs[1], title='STGMMVE_'+plot_gene, layer='Recons_features', vmax='p99')
[9]:
[<Axes: title={'center': 'STGMMVE_Tfap2b'}, xlabel='spatial1', ylabel='spatial2'>]
_images/Tutorial_6_mouse_brain_gene_imputation_19_1.svg
[8]:
plot_gene = 'Gm29107'
fig, axs = plt.subplots(1, 2, figsize=(8, 4),dpi=100)
sc.pl.spatial(adata_res, img_key="hires", color=plot_gene, show=False, ax=axs[0], title='RAW_'+plot_gene, vmax='p99')
sc.pl.spatial(adata_res, img_key="hires", color=plot_gene, show=False, ax=axs[1], title='STGMMVE_'+plot_gene, layer='Recons_features', vmax='p99')
[8]:
[<Axes: title={'center': 'STGMMVE_Gm29107'}, xlabel='spatial1', ylabel='spatial2'>]
_images/Tutorial_6_mouse_brain_gene_imputation_20_1.svg
[6]:
plot_gene = 'Col19a1'
fig, axs = plt.subplots(1, 2, figsize=(8, 4),dpi=100)
sc.pl.spatial(adata_res, img_key="hires", color=plot_gene, show=False, ax=axs[0], title='RAW_'+plot_gene, vmax='p99')
sc.pl.spatial(adata_res, img_key="hires", color=plot_gene, show=False, ax=axs[1], title='STGMMVE_'+plot_gene, layer='Recons_features', vmax='p99')
[6]:
[<Axes: title={'center': 'STGMMVE_Col19a1'}, xlabel='spatial1', ylabel='spatial2'>]
_images/Tutorial_6_mouse_brain_gene_imputation_21_1.svg

Save the results for more figures

[ ]:
adata_res.filename = './'+"mouse_brain_gene_imputation.h5ad"