Tutorial 10: Multi-sample integration task for mouse brain data

In this tutorial, we demonstrate how to integrate multiple samples of mouse brain data vertically using STGMVA.

You can download the two sections of mouse brain data at https://zenodo.org/record/8141084.

This data is stored in “8. Mouse brain_batch effects” folder.

STGMVA could remove the batch effects and identify the cell types in the integrated data.

Loading package

[1]:
import os
import torch
import pandas as pd
import scanpy as sc
from sklearn import metrics
import multiprocessing as mp
import squidpy as sq
import numpy as np
import matplotlib.pyplot as plt
import anndata as ad

import warnings
warnings.filterwarnings("ignore")
[2]:
from STGMVA.STGMVA import STGMMVE
from STGMVA import mk_dir
2023-07-15 17:15:24.324353: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

Reading ST data and showing two sections of mouse brain data

[3]:
# load section1
file_fold = '/home/tengliu/Paper6-NC/STGMVA_Tutorials/ST data/8. Mouse brain_batch effects/Section1'
adata1 = sc.read_visium(file_fold, count_file='V1_Adult_Mouse_Brain_Coronal_Section_1_filtered_feature_bc_matrix.h5', load_images=True)
adata1.var_names_make_unique()

# load section2
file_fold = '/home/tengliu/Paper6-NC/STGMVA_Tutorials/ST data/8. Mouse brain_batch effects/Section2'
adata2 = sc.read_visium(file_fold, count_file='V1_Adult_Mouse_Brain_Coronal_Section_2_filtered_feature_bc_matrix.h5', load_images=True)
adata2.var_names_make_unique()

adata = ad.concat([adata1,adata2],label="data")
adata.obs['data'].replace({'0':'Section 1', '1':'Section 2'}, inplace=True)
[4]:
adata
[4]:
AnnData object with n_obs × n_vars = 5710 × 32285
    obs: 'in_tissue', 'array_row', 'array_col', 'data'
    obsm: 'spatial'
[5]:
plt.rcParams["figure.figsize"] = (6, 5)
sc.pl.embedding(adata, basis='spatial',size=35,
                # palette ='viridis',
                color='data',show=False) #vlag, icefire, viridis
[5]:
<Axes: title={'center': 'data'}, xlabel='spatial1', ylabel='spatial2'>
_images/Tutorial_7_mouse_brain_batch_effects_8_1.svg

Create directory for pretrained model

[6]:
save_path='./Samp_results/'
section_id = "mouse_brain2_batch"
mk_dir(save_path,section_id)

Training the model

STGMVA aims to learn the representations by two-step process. First, pretrained the GMM clustering model. Second, discerned the spatial domains for spatial transcriptomics data.

After model training, the learned representations will be saved in adata.obsm[‘embedding’], and can be used for spatial clustering.

[7]:
model = STGMMVE(adata, datatype="mouse_brain2", nCluster=8,save_path=save_path, section_id=section_id)

# model.pretrain() # Train your own pretranined model or use the pretrained model we provided.

adata_res = model.train_cluster()

Graph constructed!
R[write to console]:                    __           __
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 5.4.10
Type 'citation("mclust")' for citing this R package in publications.

fitting ...
  |======================================================================| 100%
  0%|          | 0/50 [00:11<?, ?it/s]
Epoch: 0
NMI=0.627154, ARI=0.465271
Loss=25.7184,  ELBO Loss=65.6800
 10%|█         | 5/50 [01:00<07:12,  9.60s/it]
Epoch: 5
NMI=0.616150, ARI=0.456283
Loss=27.6566,  ELBO Loss=50.9886
 20%|██        | 10/50 [01:56<06:56, 10.41s/it]
Epoch: 10
NMI=0.626796, ARI=0.459680
Loss=27.8410,  ELBO Loss=45.3978
 30%|███       | 15/50 [02:45<06:26, 11.05s/it]
Epoch: 15
NMI=0.620105, ARI=0.437321
Loss=27.6282,  ELBO Loss=42.8611
 40%|████      | 20/50 [03:35<04:35,  9.19s/it]
Epoch: 20
NMI=0.628982, ARI=0.472545
Loss=27.4959,  ELBO Loss=41.7353
 50%|█████     | 25/50 [04:20<04:11, 10.07s/it]
Epoch: 25
NMI=0.612770, ARI=0.457316
Loss=27.1771,  ELBO Loss=40.8897
 60%|██████    | 30/50 [05:10<02:57,  8.86s/it]
Epoch: 30
NMI=0.614638, ARI=0.459332
Loss=27.0467,  ELBO Loss=40.2724
 70%|███████   | 35/50 [05:56<02:36, 10.43s/it]
Epoch: 35
NMI=0.611849, ARI=0.456538
Loss=26.9236,  ELBO Loss=39.7616
 80%|████████  | 40/50 [06:48<01:34,  9.43s/it]
Epoch: 40
NMI=0.582316, ARI=0.419671
Loss=26.9343,  ELBO Loss=39.3927
 90%|█████████ | 45/50 [07:34<00:52, 10.51s/it]
Epoch: 45
NMI=0.577671, ARI=0.415394
Loss=26.8497,  ELBO Loss=39.0086
100%|██████████| 50/50 [08:16<00:00,  9.93s/it]
[8]:
adata_res
[8]:
AnnData object with n_obs × n_vars = 5710 × 32285
    obs: 'in_tissue', 'array_row', 'array_col', 'data', 'pre_label'
    var: 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'data_colors', 'hvg', 'log1p', 'mclust_labels', 'ari_list', 'loss'
    obsm: 'spatial', 'distance_matrix', 'graph_neigh', 'adj', 'feat_mat', 'embedding'

Visualization

Batch effect removal figure

[10]:
from STGMVA import batch_effect
batch_effect(adata_res)
WARNING: adata.X seems to be already log-transformed.
_images/Tutorial_7_mouse_brain_batch_effects_17_1.svg

Spatial clustering for two sections

[11]:
from STGMVA import split_cluster
split_cluster(adata_res)
_images/Tutorial_7_mouse_brain_batch_effects_19_0.svg

save the results for more figures

[12]:
adata_res.filename = './'+"mouse_brain2_batch_effect.h5ad"