subsetting anndata on basis of louvain clusters


I want to subset anndata on basis of clusters, but i am not able to understand how to do it.

I am running scVelo pipeline, and in that i ran tl.louvain function to cluster cells on basis of louvain. I got around 32 clusters, of which cluster 2 and 4 is of my interest, and i have to run the pipeline further on these clusters only. (Initially i had the loom file which i read in scVelo, so i have now the anndata.)

I tried using adata.obs["louvain"] which gave me the cluster information, but i need to write a new anndata with only 2 clusters and process further.

Please help on how to subset anndata. Any help is highly appreciated. (Being very new to it, i am finding it difficult to get)

Asked By: sidrah maryam



If your adata.obs has a "louvain" column that I’d expect after running tl.louvain, you could do the subsetting as
adata[adata.obs["louvain"] == "2"]
if you want to obtain one cluster and
adata[adata.obs['louvain'].isin(['2', '4'])]
for obtaining cluster 2 & 4.

Answered By: puermaris

Feel free to use this function I wrote for my work.

import AnnData
import numpy as np

def cluster_sampled(adata: AnnData, clusters: list, n_samples: int) -> AnnData:
    """Sample n_samples randomly from each louvain cluster from the provided clusters

        AnnData object
        List of clusters to sample from
        Number of samples to take from each cluster

        Annotated data matrix with sampled cells from the clusters
    l = []
    adata_cluster_sampled = adata[adata.obs["louvain"].isin(clusters), :].copy()
    for k, v in adata_cluster_sampled.obs.groupby("louvain").indices.items():
        l.append(np.random.choice(v, n_samples, replace=False))
    return adata_cluster_sampled[np.concatenate(l)]
Answered By: Dinesh Palli
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.