Advanced Tutorial: Full Pipeline using the Lupus Dataset#

This tutorial walks through the complete huCIRA analysis pipeline using the Lupus dataset as an example. It starts from raw, unprocessed data with ENSG symbols and covers data preparation, enrichment analysis, robustness filtering, and visualization. This comprehensive tutorial covers detailed explanations of all parameters, filtering mechanisms, and plotting options.

For minimal working examples, see the quick_start_cytokine_enrichment.ipynb, quick_start_CIP_enrichment.ipynb, and quick_start_cell-cell_communication.ipynb tutorials.

# Import core packages and the hucira package.

import os
import time
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scanpy as sc
from tqdm import TqdmWarning

import hucira as hc

warnings.simplefilter("ignore", TqdmWarning)

start = time.time()
current_wd = os.getcwd()
print(f"Your current working directory: {current_wd}.")
Your current working directory: /ictstr01/home/icb/jenni.liu/all_projects/cytokine_dict_project/huCIRA/docs/notebooks.

1. Load input data#

The two main input files for the huCIRA analysis are:

  1. the Human Cytokine Dictionary

  2. your query transcriptome dataset. Here, we will download the Lupus dataset. It contains ENSG symbols for transcripts, which need to be converted to HGNC symbols. Additionally, the active layer in the dataset is currently scaled, so the raw counts have to be log-normalized for usage of huCIRA. For convenience, the Lupus data is subsampled to 10%.

The files are saved to or loaded from the current working directory. The path is explicitly specified here, although it’s also the default saving location if left unspecified. Choose your own path (directory to project folder) as the workspace for this analysis.

#### Load the human cytokine dictionary (full version) and the Lupus example adata.

df_hcd_all = hc.load_human_cytokine_dict(save_dir=current_wd, mini_version=False, force_download=False)
adata = hc.load_lupus_data(save_dir=current_wd, force_download=False)
sc.pp.subsample(adata, fraction=0.1, copy=False)
adata
AnnData object with n_obs × n_vars = 126367 × 30172
    obs: 'library_uuid', 'assay_ontology_term_id', 'mapped_reference_annotation', 'is_primary_data', 'cell_type_ontology_term_id', 'author_cell_type', 'cell_state', 'sample_uuid', 'tissue_ontology_term_id', 'development_stage_ontology_term_id', 'disease_state', 'suspension_enriched_cell_types', 'suspension_uuid', 'suspension_type', 'donor_id', 'self_reported_ethnicity_ontology_term_id', 'disease_ontology_term_id', 'sex_ontology_term_id', 'Processing_Cohort', 'ct_cov', 'ind_cov', 'tissue_type', 'cell_type', 'assay', 'disease', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid'
    var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype', 'feature_length', 'feature_type'
    uns: 'citation', 'default_embedding', 'organism', 'organism_ontology_term_id', 'schema_reference', 'schema_version', 'title'
    obsm: 'X_pca', 'X_umap'
#### Optional for easy workflow:
# Enter celltype column name and condition column name of your query adata, which you can read from previous cell output

your_celltype_colname = "cell_type"
your_contrast_colname = "disease_state"

The Enrichment analysis needs two main pieces of information from the query adata: cell types and experimental (disease) conditions.

Annotations have to be examined manually, because naming convention between objects differ. Compare the cell type annotation of the dictionary with the naming convention of your query adata and re-annotate your query adata’s annotation if necessary.

print(f"All cytokines in dictionary:\n  {df_hcd_all.cytokine.unique()}\n")
print(f"All celltypes in dictionary:\n  {df_hcd_all.celltype.unique()}\n")

print(f"All celltypes in query data:\n  {sorted(adata.obs[your_celltype_colname].unique())}\n")
print(
    f"All experimental states (contrasts/conditions) in query data:\n  {sorted(adata.obs[your_contrast_colname].unique())}\n"
)
All cytokines in dictionary:
  ['4-1BBL' 'ADSF' 'APRIL' 'BAFF' 'C5a' 'CD27L' 'CD30L' 'CD40L' 'CT-1'
 'Decorin' 'EGF' 'EPO' 'FGF-beta' 'FLT3L' 'FasL' 'G-CSF' 'GDNF' 'GITRL'
 'GM-CSF' 'HGF' 'IFN-alpha1' 'IFN-beta' 'IFN-epsilon' 'IFN-gamma'
 'IFN-lambda1' 'IFN-lambda2' 'IFN-lambda3' 'IFN-omega' 'IGF-1'
 'IL-1-alpha' 'IL-1-beta' 'IL-10' 'IL-11' 'IL-12' 'IL-13' 'IL-15' 'IL-16'
 'IL-17A' 'IL-17B' 'IL-17C' 'IL-17D' 'IL-17E' 'IL-17F' 'IL-19' 'IL-1Ra'
 'IL-2' 'IL-20' 'IL-21' 'IL-22' 'IL-23' 'IL-24' 'IL-26' 'IL-27' 'IL-3'
 'IL-31' 'IL-32-beta' 'IL-33' 'IL-34' 'IL-35' 'IL-36-alpha' 'IL-36Ra'
 'IL-4' 'IL-5' 'IL-6' 'IL-7' 'IL-8' 'IL-9' 'LIF' 'LIGHT' 'LT-alpha1-beta2'
 'LT-alpha2-beta1' 'Leptin' 'M-CSF' 'Noggin' 'OSM' 'OX40L' 'PRL' 'PSPN'
 'RANKL' 'SCF' 'TL1A' 'TNF-alpha' 'TPO' 'TRAIL' 'TSLP' 'TWEAK' 'VEGF']

All celltypes in dictionary:
  ['Intermediate_B_cell' 'NKT' 'CD8_Memory_T_cell' 'NK_CD56low' 'CD16_Mono'
 'NK_CD56hi' 'CD8_Naive_T_cell' 'pDC' 'ILC' 'MAIT' 'Naive_B_cell'
 'CD4_Naive_T_cell' 'Treg' 'Plasmablast' 'Granulocyte' 'B_cell'
 'CD4_T_cell' 'HSPC' 'CD8_T_cell' 'CD14_Mono' 'cDC' 'CD4_Memory_T_cell'
 'NK' 'Mono']

All celltypes in query data:
  ['B cell', 'CD4-positive, alpha-beta T cell', 'CD8-positive, alpha-beta T cell', 'classical monocyte', 'conventional dendritic cell', 'lymphocyte', 'natural killer cell', 'non-classical monocyte', 'plasmablast', 'plasmacytoid dendritic cell', 'progenitor cell']

All experimental states (contrasts/conditions) in query data:
  ['flare', 'managed', 'na', 'treated']

To create celltype_combos, enter ordered lists that match the same cell types across different naming conventions in your data vs the human cytokine dictionary. “create_celltype_combos()” can help you to create the right format, but you can also format manually.

# Define celltype_combos, an input parameter for enrichment testing
adata_celltypes = [
    "B cell",
    "CD8-positive, alpha-beta T cell",
    "classical monocyte",
    "non-classical monocyte",
    "natural killer cell",
]

hcd_celltypes = ["B_cell", "CD8_T_cell", "CD14_Mono", "CD16_Mono", "NK_CD56hi"]

celltype_combos = hc.create_celltype_combos(adata_celltypes, hcd_celltypes)
print(celltype_combos)

# Define condition of interest
contrast = ("flare", "treated")
contrast_list = [("flare", "treated"), ("flare", "managed"), ("flare", "na")]
print(contrast)
print(contrast_list)
[('B cell', 'B_cell'), ('CD8-positive, alpha-beta T cell', 'CD8_T_cell'), ('classical monocyte', 'CD14_Mono'), ('non-classical monocyte', 'CD16_Mono'), ('natural killer cell', 'NK_CD56hi')]
('flare', 'treated')
[('flare', 'treated'), ('flare', 'managed'), ('flare', 'na')]

2. Process query data#

The enrichment analysis requires log-normalized expression data, and gene identifiers must be provided as HGNC gene symbols, since the Human Cytokine Dictionary uses HGNC nomenclature.

To prepare your data in the appropriate format:

  1. Preprocess the expression data and apply log normalization.

  2. Convert gene identifiers from Ensembl IDs (ENSG) to HGNC gene symbols.
    You may use any preferred method for this conversion; below we provide an example using the mygene module. Install it in your environment if it’s missing.

# You can see that the active layer is currently storing scaled gene expression.
example_row = adata.X[:5].toarray()
example_row
array([[ 0.        ,  0.        ,  0.        , ...,  0.        ,
        -0.25323954,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
        -0.35095528,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
        -0.36150452,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
        -0.37223315,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
        -0.2582687 ,  0.        ]], shape=(5, 30172), dtype=float32)
# Save layers
adata.layers["counts"] = adata.raw.X.copy()
adata.layers["scaled_original_counts"] = adata.X.copy()
adata.X = adata.layers["counts"]

# Basic quality checks:
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)

# Log-normalize:
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

adata
AnnData object with n_obs × n_vars = 126343 × 20426
    obs: 'library_uuid', 'assay_ontology_term_id', 'mapped_reference_annotation', 'is_primary_data', 'cell_type_ontology_term_id', 'author_cell_type', 'cell_state', 'sample_uuid', 'tissue_ontology_term_id', 'development_stage_ontology_term_id', 'disease_state', 'suspension_enriched_cell_types', 'suspension_uuid', 'suspension_type', 'donor_id', 'self_reported_ethnicity_ontology_term_id', 'disease_ontology_term_id', 'sex_ontology_term_id', 'Processing_Cohort', 'ct_cov', 'ind_cov', 'tissue_type', 'cell_type', 'assay', 'disease', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid', 'n_genes'
    var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype', 'feature_length', 'feature_type', 'n_cells'
    uns: 'citation', 'default_embedding', 'organism', 'organism_ontology_term_id', 'schema_reference', 'schema_version', 'title', 'log1p'
    obsm: 'X_pca', 'X_umap'
    layers: 'counts', 'scaled_original_counts'
import mygene

mg = mygene.MyGeneInfo()
results = mg.querymany(adata.var_names.tolist(), scopes="ensembl.gene", fields="symbol", species="human")
df = pd.DataFrame(results)

# Create mapping dict (fallback to Ensembl ID when symbol missing)
ensembl_to_symbol = df.set_index("query")["symbol"].fillna(df["query"]).to_dict()
adata.var["gene_symbol"] = adata.var_names.map(ensembl_to_symbol)
adata.var_names_make_unique()

keep_idx = ~adata.var["gene_symbol"].isna()
adata = adata[:, keep_idx].copy()
dup_idx = adata.var["gene_symbol"].duplicated(keep="first")
adata = adata[:, ~dup_idx].copy()
adata.var_names = adata.var["gene_symbol"]
querying 1-1000...done.
querying 1001-2000...done.
querying 2001-3000...done.
querying 3001-4000...done.
querying 4001-5000...done.
querying 5001-6000...done.
querying 6001-7000...done.
querying 7001-8000...done.
querying 8001-9000...done.
querying 9001-10000...done.
querying 10001-11000...done.
querying 11001-12000...done.
querying 12001-13000...done.
querying 13001-14000...done.
querying 14001-15000...done.
querying 15001-16000...done.
querying 16001-17000...done.
querying 17001-18000...done.
querying 18001-19000...done.
querying 19001-20000...done.
querying 20001-20426...done.
Finished.
7 input query terms found dup hits:
	[('ENSG00000261600', 2), ('ENSG00000234162', 2), ('ENSG00000227110', 2), ('ENSG00000233656', 2), ('E
4 input query terms found no hit:
	['ENSG00000250410', 'ENSG00000168078', 'ENSG00000189144', 'ENSG00000267058']
Pass "returnall=True" to return complete lists of duplicate or missing query terms.
# Check your transcript names. Now, they are in HGNC format and you can start the huCIRA analysis.
adata.var.head()
feature_is_filtered feature_name feature_reference feature_biotype feature_length feature_type n_cells gene_symbol
gene_symbol
RPL23AP21 True ENSG00000228463 NCBITaxon:9606 gene 457 transcribed_processed_pseudogene 2063 RPL23AP21
LINC01409 True LINC01409 NCBITaxon:9606 gene 1089 lncRNA 570 LINC01409
FAM87B True FAM87B NCBITaxon:9606 gene 1960 lncRNA 20 FAM87B
LINC00115 True LINC00115 NCBITaxon:9606 gene 874 lncRNA 844 LINC00115
FAM41C True FAM41C NCBITaxon:9606 gene 919 lncRNA 1193 FAM41C

3. Run enrichment analysis#

run_all_enrichment_test() is a wrapper around the core function run_one_enrichment_test(). It runs enrichment analyses across multiple cell types and over a range of filtering parameters, allowing you to assess how robust your results are. If you only need to analyze a single cell type with fixed parameters, you can use run_one_enrichment_test() directly.

Gene set construction#

Before running GSEA, gene sets (signatures) are built from the cytokine dictionary by filtering for differentially expressed genes. Three thresholds control which genes make it into a gene set. Higher values for these parameters produce smaller, more specific gene sets, potentially more precise but at the risk of dropping below min_size.

  • threshold_lfc — minimum log-fold change a gene must show in response to a cytokine.

  • threshold_expression — minimum expression level, filtering out lowly expressed genes.

  • threshold_pval — maximum adjusted p-value, keeping only statistically significant genes.

With run_all_enrichment_test(), you can pass lists of values for threshold_lfc and threshold_expression to sweep over multiple filtering stringencies at once. This helps you gauge how sensitive your results are to the choice of these thresholds. This will later be important when computing robustness of results: Passing multiple values like threshold_lfc=[0.5, 0.8, 1.0, 1.5] generates multiple gene sets per cytokine. The robustness functions then assess whether enrichment is consistent across these variations. You can choose to run enrichment analyses either for upregulated or downregulated signals.

GSEA parameters#

The enrichment analysis is based on pre-rank Gene Set Enrichment Analysis (GSEA). A few parameters control how it runs:

  • min_size / max_size — gene sets smaller or larger than these bounds are excluded.

  • permutation_num — number of permutations for estimating p-values. More permutations give more stable results but take longer.

  • weight — controls how much the gene-level ranking metric influences the enrichment score. The default of 1.0 is standard weighted GSEA; setting it to 0.0 gives an unweighted statistic.

  • seed — random seed for reproducibility.

Ranking statistic#

By default, genes are ranked using the signal-to-noise ratio (SNR), which measures how large the expression difference between two groups is relative to the variability within each group. If you prefer a different ranking metric, you can pass your own function via ranking_statistic_fn. More details on how to do define your custom metric can be found in the quick start tutorial quick_start_cytokine_enrichment.ipynb.

# Enrichment analysis for one contrast

enrichment_results = hc.run_all_enrichment_test(
    adata=adata,
    df=df_hcd_all,
    contrast_combo=contrast,
    celltype_combos=celltype_combos,
    contrast_column=your_contrast_colname,
    celltype_column=your_celltype_colname,
    # The following parameters are optional (they have default specifications)
    direction="upregulated",
    # Filtering parameters for gene set construction (of dictionary or gene program)
    threshold_lfc=1.0,
    threshold_expression=0.0,
    threshold_pval=0.01,
    # GSEA parameters
    min_size=10,
    max_size=1000,
    permutation_num=1000,
    weight=1.0,
    seed=2025,
    ranking_statistic_fn=None,
)

# Look at all enrichment results (reduced view to columns of interest, not all statistics)
enrichment_results[
    [
        "celltype_combo",
        "cytokine",
        "contrast",
        "direction",
        "ES",
        "NES",
        "NOM p-val",
        "FDR q-val",
        "FWER p-val",
        "frac_shared_genes_signature",
        "Lead_genes",
        "threshold_lfc",
        "threshold_expression",
    ]
].sort_values("NES", ascending=False).head()
celltype_combo cytokine contrast direction ES NES NOM p-val FDR q-val FWER p-val frac_shared_genes_signature Lead_genes threshold_lfc threshold_expression
0 classical monocyte (CD14_Mono) IL-6 flare_vs_treated upregulated 0.503346 1.591792 0.005445 0.017648 0.045 0.892857 ENPP2;VCAN;MS4A4A;KCNJ15;OLFML2B;GPR34;LINC026... 1.0 0.0
2 classical monocyte (CD14_Mono) IL-10 flare_vs_treated upregulated 0.370541 1.409779 0.003257 0.049564 0.206 0.852174 CD14;RGS2;TXLNB;PTGFRN;SCARB1;SCD;KCNMB4;SPIRE... 1.0 0.0
5 classical monocyte (CD14_Mono) IL-21 flare_vs_treated upregulated 0.553939 1.278504 0.167984 0.082107 0.435 0.650000 FCGR1A;ENPP2;C1orf21;ABLIM3;ABHD17C;FAM20A 1.0 0.0
5 non-classical monocyte (CD16_Mono) IL-10 flare_vs_treated upregulated 0.367412 1.238451 0.039683 0.585217 0.62 0.871186 RFLNB;CAMKK1;PVR;S100A9;FPR1;ATP13A3;ZNRF1;INP... 1.0 0.0
2 B cell (B_cell) IL-21 flare_vs_treated upregulated 0.465881 1.223775 0.170391 1.0 0.645 0.914286 BATF;FURIN;SLC2A5;LMO2;LINC02156;BUB1;LINC0241... 1.0 0.0

You can combine results from multiple experimental conditions into a single DataFrame by looping over all contrasts of interest. The variable contrast_list was defined in Section 1. This can be useful if you want to aggregate or export results, or perform downstream analyses such as plotting multiple heatmaps side by side. For this tutorial, however, most steps focus on evaluating one contrast at a time.

# To run enrichment analysis for several contrasts, loop through all contrast conditions you are interested in.

all_results = []
for contrast in contrast_list:
    enrichment_df = hc.run_all_enrichment_test(
        adata=adata,
        df=df_hcd_all,
        contrast_combo=contrast,
        celltype_combos=celltype_combos,
        contrast_column=your_contrast_colname,
        celltype_column=your_celltype_colname,
        direction="upregulated",
        threshold_lfc=1.0,
        threshold_expression=0.0,
        threshold_pval=0.01,
        min_size=10,
        max_size=1000,
        permutation_num=1000,
        weight=1.0,
        seed=2025,
        ranking_statistic_fn=None,
    )
    all_results.append(enrichment_df)
multi_contrast_enrichment_results = pd.concat(all_results, ignore_index=True)

# Look at all enrichment results (reduced view to columns of interest, not all statistics)
multi_contrast_enrichment_results[
    [
        "celltype_combo",
        "cytokine",
        "contrast",
        "direction",
        "ES",
        "NES",
        "NOM p-val",
        "FDR q-val",
        "FWER p-val",
        "frac_shared_genes_signature",
        "Lead_genes",
        "threshold_lfc",
        "threshold_expression",
    ]
].sort_values("NES", ascending=False)
celltype_combo cytokine contrast direction ES NES NOM p-val FDR q-val FWER p-val frac_shared_genes_signature Lead_genes threshold_lfc threshold_expression
232 natural killer cell (NK_CD56hi) IFN-beta flare_vs_na upregulated 0.806282 2.530997 0.0 0.0 0.0 0.877419 IFI44L;ISG15;IFITM1;LY6E;IFI6;MX1;IFITM3;XAF1;... 1.0 0.0
164 B cell (B_cell) IFN-beta flare_vs_na upregulated 0.805907 2.49256 0.0 0.0 0.0 0.904110 IFI44L;IFITM1;ISG15;MX1;LY6E;EIF2AK2;IFI44;IFI... 1.0 0.0
233 natural killer cell (NK_CD56hi) IFN-omega flare_vs_na upregulated 0.80324 2.488184 0.0 0.0 0.0 0.875000 IFI44L;ISG15;LY6E;MX1;PRF1;IFITM3;XAF1;IFI44;G... 1.0 0.0
177 CD8-positive, alpha-beta T cell (CD8_T_cell) IFN-gamma flare_vs_na upregulated 0.880601 2.468958 0.0 0.0 0.0 0.904110 IFI44L;IFITM1;ISG15;IFI6;MX1;IFI44;EIF2AK2;OAS... 1.0 0.0
178 CD8-positive, alpha-beta T cell (CD8_T_cell) IL-2 flare_vs_na upregulated 0.812445 2.450793 0.0 0.0 0.0 0.922156 IFI44L;IFITM1;ISG15;LY6E;IFI6;GZMH;PRF1;GZMB;M... 1.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
47 non-classical monocyte (CD16_Mono) C5a flare_vs_treated upregulated -0.785597 -1.820491 0.002381 0.010364 0.005056 0.818182 CTSL;ADM;SLC16A10;HS3ST1;LGALS1;LILRA2 1.0 0.0
15 CD8-positive, alpha-beta T cell (CD8_T_cell) IFN-gamma flare_vs_treated upregulated -0.54836 -1.908967 0.0 0.0 0.0 0.904110 ISG15;IFI6;IFITM1;ISG20;TYMP;MX1;IFI44L;EPSTI1... 1.0 0.0
191 classical monocyte (CD14_Mono) TL1A flare_vs_na upregulated -0.83048 -1.965004 0.0 0.005078 0.002103 0.611111 MT-CO2;DUSP1;CSF1 1.0 0.0
14 CD8-positive, alpha-beta T cell (CD8_T_cell) IFN-alpha1 flare_vs_treated upregulated -0.716089 -2.157756 0.0 0.0 0.0 0.942857 ISG15;IFI6;TYMP;MX1;IFI44L;EPSTI1;IRF7;IFIT1;L... 1.0 0.0
13 CD8-positive, alpha-beta T cell (CD8_T_cell) IL-1-beta flare_vs_treated upregulated -0.689024 -2.215796 0.0 0.0 0.0 0.937500 RPL18A;ISG15;MT-CO1;IFI6;MT-CO2;IFITM1;TYMP;PI... 1.0 0.0

246 rows × 13 columns

CIP enrichment#

Cytokine-induced gene program enrichment analysis is ran in a very similar way. Please refer to the quick_start_CIP_enrichment.ipynb tutorial for details.

4. Identify robust and significant results#

Since run_all_enrichment_test() runs enrichment across multiple filtering thresholds, each cytokine–cell type pair can produce several results — one per parameter combination. This lets us assess whether a result holds up across different gene set constructions or only appears under one specific setting.

Robustness and significance are determined by two criteria:

  1. Is the result evaluable often enough? Depending on the filtering thresholds, some gene sets may end up too small (or too large) for GSEA to run (see min_size/max_size in run_all_enrichment_test). The threshold_valid parameter sets the minimum fraction of parameter combinations that must yield a valid result. Cytokines that are only evaluable under very few settings are discarded.

  2. Is the result significant often enough? Among the valid results, threshold_below_alpha sets the minimum fraction that must be significant (based on the FDR q-value). A cytokine that is only significant under one particular filtering setting is less reliable than one that is consistently significant across many.

Results passing both criteria are considered robust. They hold up across different gene set constructions rather than depending on a single parameter choice.

Two functions apply these criteria:

  • check_robustness filters for a single q-value threshold and directly returns the filtered enrichment results.

  • get_robust_significant_results additionally sweeps over three significance thresholds (alphas), which get assigned signficance stars. By default, alphas will represent [0.1, 0.05, 0.01]. Returns three DataFrames, the first two of which are used for plotting the enrichment heatmap. The main difference to check_robustness lies in its generation of functional and plot-ready outputs.

robust_results = hc.check_robustness(
    enrichment_results,
    threshold_qval=0.01,  # direct impact on NES results: filters FDR q-val
    threshold_valid=0.1,
    threshold_below_alpha=0.9,
)
robust_results
celltype_combo contrast cytokine frac_valid frac_significant is_robust NES_min NES_max FDR q-val qval_threshold threshold_frac_below_alpha
0 CD8-positive, alpha-beta T cell (CD8_T_cell) flare_vs_treated IL-1-beta 1.0 1.0 True -2.215796 -2.215796 0.0 0.01 0.9
1 CD8-positive, alpha-beta T cell (CD8_T_cell) flare_vs_treated IFN-alpha1 1.0 1.0 True -2.157756 -2.157756 0.0 0.01 0.9
2 CD8-positive, alpha-beta T cell (CD8_T_cell) flare_vs_treated IFN-gamma 1.0 1.0 True -1.908967 -1.908967 0.0 0.01 0.9
nes_plot_df, annot_plot_df, robust_results = hc.get_robust_significant_results(
    results=enrichment_results,
    alphas=[0.1, 0.05, 0.01],
    threshold_valid=0.1,
    threshold_below_alpha=0.9,
    display_robust_results=True,
)
contrast celltype_combo cytokine qval_threshold
0 flare_vs_treated CD8-positive, alpha-beta T cell (CD8_T_cell) IFN-alpha1 0.01
1 flare_vs_treated CD8-positive, alpha-beta T cell (CD8_T_cell) IFN-beta 0.05
2 flare_vs_treated CD8-positive, alpha-beta T cell (CD8_T_cell) IFN-gamma 0.01
3 flare_vs_treated CD8-positive, alpha-beta T cell (CD8_T_cell) IFN-omega 0.05
4 flare_vs_treated CD8-positive, alpha-beta T cell (CD8_T_cell) IL-1-beta 0.01
5 flare_vs_treated CD8-positive, alpha-beta T cell (CD8_T_cell) IL-15 0.05
6 flare_vs_treated CD8-positive, alpha-beta T cell (CD8_T_cell) IL-2 0.05
7 flare_vs_treated CD8-positive, alpha-beta T cell (CD8_T_cell) IL-27 0.05
8 flare_vs_treated CD8-positive, alpha-beta T cell (CD8_T_cell) IL-7 0.05
9 flare_vs_treated classical monocyte (CD14_Mono) IL-10 0.05
10 flare_vs_treated classical monocyte (CD14_Mono) IL-21 0.10
11 flare_vs_treated classical monocyte (CD14_Mono) IL-6 0.05
12 flare_vs_treated non-classical monocyte (CD16_Mono) C5a 0.05

You can see that the output robust_results contains similar information for both functions. However, get_robust_significant_results returns a set of enrichment results that pass at least one of the three qval_thresholds (alphas). All output dataframes from get_robust_significant_results are then used as input for the plotting function. In particular, take a look at nes_plot_df and annot_plot_df, which will be visualized as a heatmap in the next section.

nes_plot_df
celltype_combo B cell (B_cell) CD8-positive, alpha-beta T cell (CD8_T_cell) classical monocyte (CD14_Mono) natural killer cell (NK_CD56hi) non-classical monocyte (CD16_Mono)
cytokine
C5a NaN NaN -0.804926 NaN -1.820491
CD40L -0.747786 NaN NaN NaN NaN
GM-CSF NaN NaN -1.181667 NaN 0.986207
IFN-alpha1 NaN -2.157756 -1.408583 NaN -1.301530
IFN-beta -0.949559 -1.521071 -1.088863 -1.324211 -1.272691
IFN-gamma 0.838154 -1.908967 -0.978505 -1.176742 0.951379
IFN-omega -0.901971 -1.489039 -1.072829 -1.356344 -1.177617
IL-1-alpha NaN NaN -1.132500 NaN -1.434384
IL-1-beta -1.524112 -2.215796 -0.943179 -0.881689 -1.200225
IL-10 0.721797 NaN 1.409779 -1.092695 1.238451
IL-12 NaN NaN 0.966729 1.093809 -1.113725
IL-13 1.047150 NaN 0.871762 NaN -0.912971
IL-15 0.767731 -1.390675 -1.194397 0.939899 -0.996253
IL-2 -0.837790 -1.448697 -1.065552 1.039084 -1.247886
IL-21 1.223775 NaN 1.278504 1.062732 0.851697
IL-27 NaN -1.517245 NaN NaN NaN
IL-3 NaN NaN -1.209489 NaN 0.851473
IL-32-beta 0.903246 1.042213 -0.992106 1.073143 -1.016514
IL-4 0.954229 -0.815442 -0.950532 0.841600 0.826358
IL-6 NaN NaN 1.591792 NaN 0.958220
IL-7 -1.335636 -1.422273 -1.247765 1.163014 -1.055112
Leptin NaN NaN -1.327519 -1.225018 NaN
M-CSF NaN NaN -1.154549 NaN -1.083629
TL1A NaN NaN -1.483487 NaN NaN
TSLP NaN NaN -1.203972 1.120691 -0.992963
annot_plot_df
celltype_combo B cell (B_cell) CD8-positive, alpha-beta T cell (CD8_T_cell) classical monocyte (CD14_Mono) natural killer cell (NK_CD56hi) non-classical monocyte (CD16_Mono)
cytokine
C5a **
CD40L
GM-CSF
IFN-alpha1 ***
IFN-beta **
IFN-gamma ***
IFN-omega **
IL-1-alpha
IL-1-beta ***
IL-10 **
IL-12
IL-13
IL-15 **
IL-2 **
IL-21 *
IL-27 **
IL-3
IL-32-beta
IL-4
IL-6 **
IL-7 **
Leptin
M-CSF
TL1A
TSLP

5. Plotting#

huCIRA provides two plotting functions: plot_significant_results visualizes the enrichment results in heatmaps and plot_communication plots cytokine-mediated cell-cell communication.

Enrichment heatmap#

plot_significant_results() plots a heatmap of the robust enrichment results. Each cell shows the mean NES across parameter combinations, and significance is annotated with stars (*, **, ***) based on the alphas defined in get_robust_significant_results. The heatmap takes the outputs nes_plot_df and annot_plot_df as input. Optionally, robust_results can be provided as well to define the plotted contrast conditions.

You can subset the plot to specific cytokines or cell types using selected_cytokines and selected_celltypes. Use fontsize, fig_width, and fig_height to adjust the appearance. Set save_fig=True to save the figure as an .svg file to the directory specified by fig_path.

hc.plot_significant_results(
    nes_plot_df=nes_plot_df,
    annot_plot_df=annot_plot_df,
    robust_results=robust_results,
    selected_celltypes=None,
    selected_cytokines=None,
    fontsize=9,
    save_fig=False,
    fig_path="",
    fig_width=10.0,
    fig_height=10.0,
)
../_images/330ed5148c7424d878f7fcea03eb2f3f67c520c3fc58d48cfbf8975802d5ec0a.png

Earlier, we ran enrichment analyses over several contrast conditions, which returned the dataframe multi_contrast_enrichment_results. To visualize enrichment results across contrasts, you can plot multiple heatmaps side by side. You can do this manually by looping through the contrast conditions. Pass an ax argument to plot_significant_results to draw into a specific subplot. Otherwise, they will be plotted separately. If no robust results exist for a contrast, the subplot is left empty.

fig, axes = plt.subplots(1, 3, figsize=(12, 12))

for idx, contrast in enumerate(multi_contrast_enrichment_results.contrast.unique()):
    print(contrast)
    nes_plot_df, annot_plot_df, robust_results = hc.get_robust_significant_results(
        results=multi_contrast_enrichment_results,
        contrast_combo=contrast,
        alphas=[0.1, 0.05, 0.01],
        threshold_valid=0.1,
        threshold_below_alpha=0.9,
        display_robust_results=False,
    )

    if robust_results is not None and not robust_results.empty:
        hc.plot_significant_results(
            nes_plot_df,
            annot_plot_df,
            robust_results,
            ax=axes[idx],
            fontsize=9,
        )
    else:
        axes[idx].set_axis_off()
        axes[idx].set_title(f"{contrast}\nNo robust significant results")

plt.tight_layout()
plt.subplots_adjust(wspace=0.01)
plt.show()
flare_vs_treated
flare_vs_managed
flare_vs_na
../_images/eb6042e9051cb408d9ed8cf18d27f07510b5a03c6d3851c9bbf8deb34655ec5a.png

Cell-cell communication: compute senders and receivers#

get_all_senders_and_receivers() identifies which cell types produce a cytokine (senders) and which express its receptor (receivers). Pass in the list of cytokines you want to analyze via cytokine_list, typically the robust cytokines from the enrichment analysis, or a custom set. They must be present in the cytokine information file to define sender and receiver relationships. The function will then identify which genes are in your query data as well.

  • sender_pvalue_threshold — maximum p-value (from a Wilcoxon rank-sum test across cell types) for a cell type to be considered a sender. Only cell types where the cytokine gene is significantly more expressed compared to other cell types are included.

  • receiver_mean_X_threshold — minimum mean receptor gene expression for a cell type to be considered a receiver. Set to 0 to include all cell types with any detectable receptor expression.

cytokine_info = hc.load_cytokine_info()
cytokine_info.head()
Unnamed: 0 name gene used cytokines DEG_plot_family single_family family_level_1 family_level_2 receptor gene alternative names comments parse_cytokine single_gene go_annotations go_labels functional_classification
0 0 IL-1-alpha IL1A IL-1α IL-1 IL-1 interleukin IL-1 IL1R1, IL1RAP Interleukin 1 alpha NaN True IL1A ['GO:0005125', 'GO:0046688', 'GO:0034605', 'GO... cytokine activity, extracellular space, positi... Pro-inflammatory
1 1 IL-1-beta IL1B IL-1β IL-1 IL-1 interleukin IL-1 IL1R1, IL1RAP Interleukin 1 beta NaN True IL1B ['GO:0005125', 'GO:0070372', 'GO:0001934', 'GO... cytokine activity, positive regulation of prot... Pro-inflammatory
2 2 IL-1Ra IL1RN IL-1Ra IL-1 IL-1 interleukin IL-1 IL1R1 Interleukin 1 receptor antagonist NaN True IL1RN ['GO:0006953', 'GO:0005813', 'GO:0005149', 'GO... extracellular space, protein binding, cytosol,... Anti-inflammatory
3 3 IL-18 IL18 IL-18 IL-1 IL-1 interleukin IL-1 IL18R1, IL18RAP Interleukin 18 NaN True IL18 ['GO:0005125', 'GO:0005615', 'GO:0032819', 'GO... cytokine activity, extracellular space, positi... Pro-inflammatory, Th1-associated
4 4 IL-33 IL33 IL-33 IL-1 IL-1 interleukin IL-1 IL1RL1, IL1RAP Interleukin 33 NaN True IL33 ['GO:0005125', 'GO:0005737', 'GO:0005615', 'GO... cytokine activity, cytoplasm, extracellular sp... Pro-inflammatory
df_senders, df_receivers = hc.get_all_senders_and_receivers(
    adata=adata,
    cytokine_info=cytokine_info,
    cytokine_list=robust_results.cytokine.unique(),
    celltype_colname=your_celltype_colname,
    sender_pvalue_threshold=0.1,
    receiver_mean_X_threshold=0,
)
df_senders.head()
min_logfoldchanges min_pvals min_pvals_adj gene mean_X frac_X mean_X>0 cytokine celltype
CD8-positive, alpha-beta T cell 2.76905 0.0 0.0 IFNG 0.054769 0.058260 0.054769 IFN-gamma CD8-positive, alpha-beta T cell
lymphocyte 2.511762 0.0 0.0 IFNG 0.092878 0.185006 0.092878 IFN-gamma lymphocyte
natural killer cell 1.741828 0.0 0.0 IFNG 0.049104 0.054678 0.049104 IFN-gamma natural killer cell
classical monocyte 1.848429 0.0 0.0 IL15 0.029858 0.049086 0.029858 IL-15 classical monocyte
non-classical monocyte 1.758463 0.0 0.0 IL15 0.041609 0.090384 0.041609 IL-15 non-classical monocyte
df_receivers.head()
mean_X frac_X cytokine celltype
B cell 0.030110 0.045329 IFN-beta B cell
lymphocyte 0.054595 0.139057 IFN-beta lymphocyte
natural killer cell 0.034506 0.044113 IFN-beta natural killer cell
CD4-positive, alpha-beta T cell 0.042741 0.065390 IFN-beta CD4-positive, alpha-beta T cell
CD8-positive, alpha-beta T cell 0.037064 0.055736 IFN-beta CD8-positive, alpha-beta T cell

Cell-cell communication circos plot#

plot_communication() visualizes which cell types send and receive cytokine signals using a circos plot. It requires sender and receiver DataFrames computed by get_all_senders_and_receivers(), which identifies for each cytokine which cell types express the cytokine gene (senders) and which express its receptor (receivers). For an overview, refer to quick_start_cell-cell_communication.ipynb.

Appearance#

Colors for cytokines and cell types are assigned automatically, but you can provide your own mappings via cytokine2color and celltype2color. Use figsize to adjust the plot size, lw for the line width of signaling arrows, and fontsize for the labels. loc and bbox_to_anchor control the position of the legend. The figure is only saved if save_path is specified.

# Plots cell-cell communication

expression_threshold = 0.1 / 10_000  # = 100 TPM
expression_threshold = np.log2((expression_threshold) + 1)

legend_handles, legend_labels = hc.plot_communication(
    df_src=df_senders,
    df_tgt=df_receivers,
    df_enrichment=robust_results,
    all_celltypes=np.array(adata.obs[your_celltype_colname].unique()),
    # Filtering statistics
    frac_expressing_cells_sender=0,
    frac_expressing_cells_receiver=0,
    mean_cytokine_gene_expression_sender=expression_threshold,
    mean_cytokine_gene_expression_receiver=expression_threshold,
    # map your custom cytokine and cell type colors as dictionaries
    cytokine2color=None,
    celltype2color=None,
    # Plotting aeshetics
    show_legend=True,
    figsize=(8, 9),
    lw=0.5,
    fontsize=8,
    bbox_to_anchor=(1.1, 1.1),
    loc="center",
    save_path=None,
)
../_images/060eecc7a503e2ba73f9401f375bff91ba1f492ebdba52021ac31ff0d3c94f6a.png
end = time.time()
print(f"Time: {int((end - start) / 60)} min {int((end - start) % 60)} sec")
Time: 6 min 11 sec
## --END-- ##