Skip to content

deepomicslab/FR_Hierarchy_Gut

Repository files navigation

DOI License: MIT GitHub release

Tutorial

Table of Contents

Environment installation

Install with Conda environment

Create conda enviroment, test under conda 25.1.1

conda create -n meta_fr python=3.8 r-base=4.2 -c conda-forge
conda activate meta_fr

Install required python package

pip install networkx==2.8.7
pip install ipykernel==5.3.4
pip install ipython==8.12.3
pip install ipython-genutils==0.2.0
pip install matplotlib
pip install pandas==1.1.3
pip install statsmodels==0.14.0
pip install svglib
pip install scikit-learn==1.1.2
pip install scikit-learn-extra==0.2.0
pip install scikit-network==0.27.1
pip install scipy==1.10.1
pip install seaborn==0.12.0
pip install reportlab==3.6.12
pip install lifelines==0.27.8
pip install cliffs-delta
pip install pyseat
pip install numpy==1.22.4
pip install pandas==1.5.2
pip install matplotlib_venn
python -m ipykernel install --user --name meta_fr --display-name "Python (meta_fr)"

On your jupyter notebook, choose kernel Python (meta_fr)

💡Note: PySEAT have conflict with numpy version. Please use numpy = 1.22.4 and ignore the warning shows on when you install as

pyseat 0.0.1.4 requires numpy>=1.23.3, but you have numpy 1.22.4 which is incompatible.

Install required R package

conda install r-effsize r-ggplot2 r-ggpubr r-svglite r-reshape2 r-dplyr r-tidyr r-readxl r-randomForest r-pROC

Input files

git clone https://github.com/deepomicslab/FR_Hierarchy_Gut
cd FR_Hierarchy_Gut/

Then please unzip data.zip. You will see /data directory.

data/
├── gcn2008.tsv                                  # GCN of 2008 species
├── sp_d.tsv                                     # Precomputed distance matrix for 2008 species in GCN
├── module_def0507.tsv                           # Definition of module in KEGG
├── cMD.select_2008.select_genome.list           # Genomes to create GCN2008
├── cMD.select_2008.tax.fullname.txt             # Full taxonomy of species
├── cMD.select_2008.species_phylum.tsv           # Species phylum matching
│
├── [ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD, 
│   T2D, hypertension, CFS, IGT, adenoma, schizofrenia]/  # Disease categories
│   ├── [cohort_name1, cohort_name2, ...]/                # Multiple cohorts per disease
│   │   ├── metadata.tsv                                  # Metadata (disease in header)
│   │   └── abd.tsv                                       # Abundance profile (species × samples)
│
├── NAFLD/                                       # NAFLD dataset
│   ├── NASH_forward_63_map.txt                  # Metadata of phenotypes for NASH dataset
│   ├── abd.tsv                                  # 16S species level profile
│   ├── NASH_GCN.tsv                             # GCN of NASH for 16S species name
│   └── taxonomy.tsv                             # Class family species matching
│
├── Anti/                                        # Antibiotic treatment dataset
│   ├── metadata.tsv                             # Metadata
│   ├── abd.tsv                                  # Abundance profile
│   ├── Anti.compare.list                        # Abundance profile
│   ├── Anti.group.tsv                           # Abundance profile
│   └── Antibiotic.diversity.Frederic.tsv        # Abundance profile
│
├── FMT/                                         # Fecal microbiota transplantation dataset
│   ├── FMT1/
│   │   ├── LiSS_2016.tsv                        # Species profile (index: species, header: sample name)
│   │   └── Li.txt                               # Fraction of donor specific strains
│   └── FMT2/
│       ├── Eric_abd.tsv                         # Species level profile
│       └── Eric.txt                             # Fraction of donor specific strains
│
└── NSCLC/                                        # Immunotherapy dataset
    ├── merged_species.tsv                       # Species level abundance profile
    ├── sig.txt                                  # Classification of species in original work
    ├── metadata.txt                             # Metadata including cohort
    ├── DS1_oncology_clinical_data.csv           # Metadata including death, os, akk in original work
    └── DS5_longitudinal_clinical_data.csv       # Metadata including akk level in original work

💡
data/NAFLD/* from doi: 10.1002/imt2.61
data/FMT/FMT1/Li.txt from doi: 10.1038/s41467-020-19940-1
data/FMT/FMT2/Eric.txt from doi: 10.1038/s41467-020-19940-1
data/NSCLC/DS* from doi: 10.1016/j.cell.2024.05.029
data/NSCLC/sig.txt from doi: 10.1016/j.cell.2024.05.029

Scripts

We highly recommend running the scripts in the directory sequentially in the following order.

1. Prior GCN structure (01.script_priori_tree/)

Scripts of manuscript section Constructing a priori functional redundancy hierarchical structure of species via structural entropy

a. Compute species distance from GCN [optional]

01.script_priori_tree/a.compute_distance.ipynb

If you want to start the analysis from GCN, please run this script first to compute distance matrix, which will result as sp_d.tsv. This may take some time (around 20 mins). To save time, you can directly use sp_d.tsv in /data directory which is preproduced.

  • input: ../data/gcn2008.tsv GCN of 2008 species
  • output: ../data/sp_d.tsv Distance matrix

b. Constructing a priori functional redundancy hierarchical structure of species via structural entropy

01.script_priori_tree/b.GCN_tree.ipynb

💡please run this script before FMT, NSCLC, Antibiotic, NSCLC which depend on the prior sturcture.

  • inputs:
    • data/gcn2008.tsv
    • data/sp_d.tsv
  • outputs:
    • result/GCN_fix_tree/
      • renamed_GCN_tree.newick.tsv Tree structure in newick format
      • leaves_cluster.tsv Species FRC annotation

🔍 Preview of leaves_cluster.tsv

species cluster supercluster
s__Rhodococcus_fascians S2_C1 S2
s__Nocardia_farcinica S2_C1 S2
s__Rhodococcus_hoagii S2_C1 S2

c. Detect FRC/supercluster enriched/depleted KOs

01.script_priori_tree/c.KO_compare.ipynb Using S1-C8 as example.

  • inputs:
    • data/gcn2008.tsv
    • result/GCN_fix_tree/leaves_cluster.tsv
  • outputs:
    • result/GCN_fix_tree/
      • S1_C8.kos_summary.tsv Statistic of KOs present in S1-C8
      • S1_C8.kos_fisher.tsv Fisher testing results

🔍 Preview of S1_C8.kos_fisher.tsv

KO S1_C8 Present S1_C8 Absent Non S1_C8 Present Non S1_C8 Absent Odds Ratio P-value Adjusted P-value
K03648 6 48 1706 248 1.82E-02 4.47E-35 2.63E-31
K00560 5 49 1576 378 2.45E-02 1.30E-28 3.80E-25
K02837 6 48 1543 411 3.33E-02 1.54E-25 3.01E-22

Evaluation of GCN

01.script_priori_tree/util_evaluation.ipynb Evaluate the feature of GCN following original study.

  • inputs:

    • data/gcn2008.tsv
    • data/sp_d.tsv
  • outputs:

    • result/GCN_evaluation/evaluation.png The plot of evaluation result

2. Completeness of FRC (02.script_signature_modules/)

Result of manuscript section Functional redundancy hierarchical structure reveals species clusters with distinct functions

a. Compute the module completeness of each taxon in GCN2008

02.script_signature_modules/a.genome_module_completeness.ipynb

  • input:
    • data/module_def0507.tsv
    • data/gcn2008.tsv
  • output:
    • result/signature_modules/genome_module.completeness.tsv Genome module completenees matrix, with corresponding species name as rownames, with KEGG modules as column.

c.Signature modules of superclusters/FRCs

02.script_signature_modules/b.signature_modules.ipynb (require 02.script_signature_modules/cluster_completeness_testing.R)

  • input:
    • result/GCN_fix_tree/leaves_cluster.tsv
    • result/signature_modules/genome_module.completeness.tsv
  • output:
    • result/signature_modules/
      • *_species.tsv Species involved in comparison with FRC/superclusters annotation
      • *.genome_module.completeness.tsv Split genome module completeness of each supercluster
      • *.module_comp.wilcox.testing.tsv Testing results of module completeness comparison
      • cluster_module_signature.tsv Summary of signature modules of superclusters/FRCs.

3. FMT (03.script_FMT/)

Scripts of manuscript section Structural entropy of vitamin $K_1$, $K_2$ and $B_2$ biosynthesis FRC in the recipient decreased the fecal microbiota transplantation engraftment efficiency

GCN_fix_tree result is required

  • input:
    • result/GCN_fix_tree/renamed_GCN_tree.newick
    • ../data/sp_d.tsv
      For FMT1:
    • data/FMT/FMT1/metadata.tsv
    • data/FMT/FMT1/fmt_abd.tsv
    • data/FMT/FMT1/Li.txt
      For FMT2:
    • data/FMT/FMT2/Eric.txv
    • data/FMT/FMT2/deltat.txt
    • data/FMT/FMT2/triads.txt
    • data/FMT/FMT2/Eric_abd.tsv

a.Mutiple regression on nFR

03.script_FMT/a.analysis_nfr*.ipynb Mutiple regression on nFR, days after FMT and fraction at each FRC/supercluster.

b.Mutiple regression on SE value

03.script_FMT/b.analysis_se*.ipynb Mutiple regression on SE value, days after FMT and fraction at each FRC/supercluster.

c.Mutiple regression on FR

03.script_FMT/c.analysis_fr*.ipynb Mutiple regression on FR, days after FMT and fraction at each cluster/supercluster.

  • Output
    • result/FMT/*/*/ (First * can be nFR/SE/FR, second * can be FMT1/FMT2)
      • [cluster].tsv Regression plot data
      • [cluster].pdf Plot of regression
      • p_values.tsv F-test p-values of regression, coefficient and its p-values

🔍 Preview of [cluster].tsv

sample SE_pre t_post f_ds
FMT1 0.79168257 2 0.302325581
FMT1 0.83223 2 0.233333

🔍 Preview of pvalues.tsv

F-pvalue se_co t_co const_co se_p t_p const_p
cluster_S1-C3 0.003328 -0.94618 -0.00035 0.514195 0.00079 0.7069 2.80E-15
cluster_S1-C15 0.019 -1.4490 -0.00035 0.515 0.005268 0.7230 2.90E-14

d. Compute FR at each cluster/supercluster for each timepoint

03.script_FMT/d.analysis_compute_fr*.ipynb

  • Output
    • result/FMT/FR_timepoints/*/ (* can be FMT1/FMT2)
      • fr.tsv FR values of each sample at each timepoint

e. Mutiple regression on fd/td/nFR at root

03.script_FMT/e.root_*.ipynb Mutiple regression on fd/td/nFR at root, days after FMT and fraction only at root.

  • Output
    • result/FMT/root/*/ (* can be FMT1/FMT2, here use fd as an exsample, nfr and td are similar)
      • fd.tsv fd values of each sample
      • fd_root.pdf Plot of regression of fd value
      • fd_p_values.tsv F-test p-values of regression, coefficient and its p-values

f. merge and output result

03.script_FMT/f.merge_S4.ipynb

  • Output
    • result/FMT
      • supp_FMT.tsv Regression result for nFR and SE in the two cohorts.
        Results as Supplementary Table S4

4. Antibiotic treatment (04.script_Antibiotic/)

Scripts of manuscript section Low preservation of FRCs in the initial state leads to distinct reshaping of the gut microbiome after cefprozil exposure

GCN_fix_tree result is required

a. nFR analysis

04.script_Antibiotic/a.analysis_nFR.ipynb

  • input:
    • data/sp_d.tsv
    • result/GCN_fix_tree/renamed_GCN_tree.newick
    • data/Anti/metadata.csv
    • data/Anti/abd.csv
  • output:
    • result/Anti/nFR
      • nfr_df.tsv nFR value of each FRC at each timepoints for each sample
      • cluster_[FRC].pdf Plot nFR value boxplot of the FRC at three timepoints
      • p_value.tsv nFR differential test p-values between exposed and control group at each timepoint for each FRC

b. SE analysis

04.script_Antibiotic/b.analysis_SE.ipynb

  • input:
    • data/sp_d.tsv
    • result/GCN_fix_tree/renamed_GCN_tree.newick
    • data/Anti/metadata.csv
    • data/Anti/abd.csv
  • output:
    • result/Anti/SE
      • se_df.tsv SE value of each FRC at each timepoints for each sample
      • cluster_[FRC].pdf Plot SE value boxplot of the FRC at three timepoints
      • p_value.tsv SE differential test p-values between exposed and control group at each timepoint for each FRC

c. Differential testing of SE/nFR

04.script_Antibiotic/c.fr_differential_testing.ipynb

  • input:
    • result/Anti/nFR/nfr_df.tsv
    • result/Anti/SE/se_df.tsv
    • data/Anti/Anti.group.tsv Group information of samples
  • output:
    • result/Anti/nFR/nfr.EB_EN.differential.tsv
    • result/Anti/SE/SE.EB_EN.differential.tsv

Results as Supplementary Table S5

🔍 Preview of SE.EB_EN.differential.tsv

FR Group1 Group2 Cluster p_value enriched mean_g1 mean_g2
SE EB_7 EN_7 cluster_S1-C1 0.0135 EB_7 0.3299 0.0997
SE EB_7 EN_7 cluster_S1-C8 0.0415 EN_7 0.0140 0.1108
SE EB_7 EN_7 cluster_S3-C1 0.0296 EB_7 0.0043 0.0004

d. Eigenspecies analysis

04.script_Antibiotic/d.eigenspecies.ipynb (require 04.script_Antibiotic/eigenspecies_utils.py)

  • prepare group file for comparison pairs, two groups in one comparison

  • calculate eigenspecies of all FRCs in all samples in two groups

  • construct eigenspecies correlation network for two groups respectively

  • preservation matrix of correlation matrices between two groups

  • compare eigenspecies networks difference between two groups

  • input:

    • data/Anti/Anti.group.tsv Group information of samples
    • data/Anti/Anti.compare.list Comparision list of groups, e.g EB0 EN0
    • result/GCN_fix_tree/leaves_cluster.tsv
    • data/Anti/abd.tsv
  • output for given group {g1} and group {g2}:

    • result/Anti/eigenspecies
      • {g1}.{g2}.group.tsv Samples of two groups
      • {g1}.{g2}.eigenspecies.csv Eigenspecies of FRC
      • {g1}.{g2}.eigenspecies_cor.{g1}.tsv Eigenspecies correlation network of {g1}
      • {g1}.{g2}.eigenspecies_cor.{g2}.tsv Eigenspecies correlation network of {g2}
      • {g1}.{g2}.preserv_matrix.tsv Preservation matrix of two eigenspecies correlation networks
      • {g1}.{g2}.preserv_matrix.png Visualization of preservation matrix
      • {g1}.{g2}.compare_eigenspecies_networks.tsv Differential testing of FRC eigenspecies between two groups

e. Correlation between eigenspecies and taxonomic diversity

04.script_Antibiotic/e.correlation_diversity.ipynb

  • input:
    • data/Anti/Antibiotic.diversity.Frederic.tsv Taxonomic diversity provided in 10.1038/ismej.2015.148 Supptable1
    • result/Anti/eigenspecies/EB_0.EN_0.eigenspecies.csv Eigenspecies of EB and EN at day0.
  • output:
    • Correlation of FRC and diversity with p-value in notebook.

5. NAFLD (05.script_NAFLD/)

Scripts of manuscript section FR keystone species in personalized FR network reveals polycentric structure in healthy individuals and monocentric in non-alcoholic steatohepatitis patients

a. Abundance differential testing of each taxon in NAFLD 16s OTU

05.script_NAFLD/abundance_differential_testing.ipynb Test difference between NASH and Normal group

  • input:
    • data/NAFLD/abd.tsv
    • data/NAFLD/NASH_forward_63_map.txt
  • output:
    • result/NAFLD/NASH.Normal.abundance.wilcox_testing.tsv Differential testing result

b. Analyze the NAFLD dataset using NAFLD GCN

05.script_NAFLD/procedure.ipynbAnalyze the NAFLD dataset using NAFLD GCN, compute personalized FR network and find keystone clusters in NASH group and Normal group.

  • input:
    • data/NAFLD/abd.tsv
    • data/NAFLD/NASH_forward_63_map.txt
    • data/NAFLD/NASH_GCN.tsv
  • output:
    • result/NAFLD/cluster_*/ (* can be NASH/Normal)
      • keystone_node.tsv Species and FRCs with their PR score
    • result/NAFLD
      • genome_module.completeness.tsv Completeness of module for each species
      • *.module_comp.wilcox.testing.tsv Testing results of module completeness comparison
      • *_species.tsv Species involved in comparison with FRC/superclusters annotation

6. NSCLC (06.script_NSCLC/)

GCN_fix_tree result is required

Scripts of manuscript section FRCs as immune checkpoint inhibitor indicators can predict patient survival

a. Reproduce original SIG classification

06.script_NSCLC/SIG_SE.ipynb Test difference of SE between response group and non-response group at SIG1/SIG2 clsuter raised in original study and compute S score for each sample.

  • input:
    • data/NSCLC/merged_species.txt
    • data/NSCLC/metadata.txt
    • data/NSCLC/sig.txt
    • data/gcn2008.tsv
    • data/sp_d.tsv
    • data/NSCLC/DS1_oncology_clinical_data.csv
  • output:
    • result/NSCLC/SIG_SE/
      • fig_kde_disc.pdf Plot of distribution of TOPOSCORE in NR and R group
      • fig_ROC.pdf Plot of ROC for NR/R classification
      • pred_binary_disc.tsv Classification result and real group label for each sample
      • NSCLC.pdf FRC with significant SE difference between NR and R group
      • cluster_sp.json species list of each FRC
      • existed_sp.json species exists in each sample of each FRC

b. Use SE of FRC to classify NR/R groups

06.script_NSCLC/FRC_SE.ipynb Test difference of SE between response group and non-response group at each cluster/supercluster and compute FR S score for each sample.

  • input:
    • data/NSCLC/merged_species.txt
    • data/NSCLC/metadata.txt
    • data/gcn2008.tsv
    • data/sp_d.tsv
    • data/NSCLC/DS1_oncology_clinical_data.csv
  • output:
    • result/NSCLC/FRC_SE/
      • fig_kde_disc.pdf Plot of distribution of TOPOSCORE in NR and R group
      • fig_ROC.pdf Plot of ROC for NR/R classification
      • OS_curve.pdf Plot of OS curve
      • pred_binary_disc.tsv Classification result and real group label for each sample
      • NSCLC.pdf FRC with significant SE difference between NR and R group
      • cluster_sp.json species list of each FRC
      • existed_sp.json species exists in each sample of each FRC

c. Use FRC with SIG as SIG' to classify NR/R groups

06.script_NSCLC/c.combination_S_score.ipynb Compute combined sig' S score for each sample.

  • input:
    • result/NSCLC/SIG_SE/cluster_sp.json
    • result/NSCLC/SIG_SE/existed_sp.json
    • result/NSCLC/FRC_SE/existed_sp.json
    • result/NSCLC/FRC_SE/cluster_sp.json
    • data/NSCLC/DS1_oncology_clinical_data.csv
  • output:
    • result/NSCLC/combine/
      • fig_kde_disc.pdf Plot of distribution of TOPOSCORE in NR and R group
      • fig_ROC.pdf Plot of ROC for NR/R classification
      • OS_curve.pdf Plot of OS curve
      • pred_binary_disc.tsv Classification result and real group label for each sample

The R scripts used to produce the analysis in original study and is provided by https://github.com/valerioiebba/TOPOSCORE/tree/main.

7. Large scale cohort analysis on priori tree (07.script_cohort_FRC/)

GCN_fix_tree result is required

Scripts of manuscript section Structural entropy of FRCs identified as robust phenotype-specific indicators

  • input:
    • data/gcn2008.tsv
    • data/sp_d.tsv
    • result/GCN_fix_tree/renamed_GCN_tree.newick
    • data/{disease}/{cohort}/ (disease include ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD, T2D, hypertension, CFS, IGT, adenoma, schizofrenia)
      • metadata.tsv
      • abd.tsv

a. Compute SE values for FRCs

07.script_cohort_FRC/a.analysis_SE.ipynb Compute SE for FRCs in disease and health group and test the difference.

b. Compute nFR values for FRCs

07.script_cohort_FRC/b.analysis_nFR.ipynb Compute nFR for FRCs in disease and health group and test the difference.

  • output:
    Use SE as an example, nFR is similar
    • result/large_scale_cohort/{disease}/{cohort}/SE/se_*.tsv SE value of FRCs in two groups
    • result/large_scale_cohort/{disease}/{disease}_se.pdf Plot FRC with significant difference in SE in at least one cohort of the disease
    • result/large_scale_cohort/p_all_cohorts_se.tsv pvalues of SE at each FRC in all cohorts
    • result/large_scale_cohort/p_all_cohorts_se.svg Plot FRC with significant difference in SE in at least one cohort of all disease

c. Differential testing between disease and health group

07.script_cohort_FRC/c.SE_nFR_differential_testing.ipynb Output some detail statistic information of SE/nFR.

  • input:

    • result/large_scale_cohort/p_all_cohorts_se.tsv
    • result/large_scale_cohort/p_all_cohorts_nfr.tsv
    • result/large_scale_cohort/{disease}/{cohort}/SE/se_*.tsv
    • result/large_scale_cohort/{disease}/{cohort}/nFR/fr_*.tsv
  • output:

    • result/large_scale_cohort/{disease}/{cohort}/SE/p_detail.tsv Statiscic information of SE
    • result/large_scale_cohort/{disease}/{cohort}/nFR/p_detail.tsv Statiscic information of nFR

d. Predict phenotype by SE in FRCs

  • input:
    • result/large_scale_cohort/p_all_cohorts_se.tsv
    • result/large_scale_cohort/{disease}/{cohort}/SE/se_*.tsv

07.script_cohort_FRC/d.CRC_predict_LODO.ipynb Predict CRC by LODO.
07.script_cohort_FRC/d.IBD_predict_CV.ipynb Predict IBD by cross-validation.
07.script_cohort_FRC/d.IBD_predict_LODO.ipynb Predict IBD by LODO.

  • output:
    • result/predict/{cohort}_{prediction_type}.tsv ROC plot of the prediction
    • result/predict/feature_importance_{prediction_type}.tsv Importance of SE in FRCs

f. Random experiment on phenotype

07.script_cohort_FRC/f.pheno_related.ipynb Randamly shuffle label 100 times of sample to prove the relation between SE of FRCs and phentypes.

  • input:
    Use CRC as an example, IBD is also used in this experiment

    • result/large_scale_cohort/CRC/p_all_cohorts_se.tsv
    • result/large_scale_cohort/CRC/{cohort}/SE/se_*.tsv
    • data/CRC/{cohort}/
      • metadata.tsv
      • abd.tsv
  • output:

    • result/validation/phenotype_shuffle/CRC/{cohort}/se_{FRC}*.tsv SE of one random experiment for the FRC
    • result/validation/phenotype_shuffle/CRC/{cohort}/pvalues.tsv pvalues of significant difference between disease and health group of the 100 random experiments

8. Personalized FR network analysis (08.script_cohort_keystone/)

Scripts of manuscript section Integrating taxonomic composition to construct a personalized FR network

a. Abundance differential testing of each species

08.script_cohorts_keystone/a.abundance_differential_testing.ipynbTest abundance difference between disease and health group

  • input:
    • data/{disease}/{cohort}/ (disease include ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD, T2D, hypertension, CFS, IGT, adenoma, schizofrenia)
      • metadata.tsv
      • abd.tsv
  • output:
    • result/large_scale_cohort/{disease}/{cohort}/{cohort}.abundance.wilcox_testing.tsv Differential testing result

b. Find keystone species and keystone cluster in personalized FR network

08.script_cohorts_keystone/b.personalized_FR_keystone.ipynb

  • input:
    • data/gcn2008.tsv
    • data/sp_d.tsv
    • data/{disease}/{cohort}/ (disease include ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD, T2D, hypertension, CFS, IGT, adenoma, schizofrenia)
      • metadata.tsv
      • abd.tsv
  • output:
    • result/large_scale_cohort/{disease}/{cohort}/sp/cluster_*/keystone_node.tsv Species and FRCs with their PR score
    • result/large_scale_cohort/{disease}/{cohort}/sp/layer_0/fr.tsv Personalized FR netowrk

c. Summarize the keystone species in different cohort

08.script_cohorts_keystone/c.keystone_summary.ipynb

  • input:
    • result/large_scale_cohort/{disease}/{cohort}/sp/cluster_*/keystone_node.tsv

9. Personalized FR network nestedness (09. script_personalized_FR_nestedness/)

log effect of personalized FR network

09.script_personalized_FR_nestedness/util_log_effect.ipynb Compute and show effect on the distribution of personalized FR network before and after log rescaled and normalization.

  • input:
    • data/gcn2008.tsv
    • data/sp_d.tsv
    • data/CRC/CRC1/metadata.tsv'
    • data/CRC/CRC1/abd.tsv

nestedness of personalized FR network

Personalized FR network is required. 09.script_personalized_FR_nestedness/util_nestedness_experiment.ipynb Test the nestedness compared with NULL experiments of personalized FR network.

  • input:

    • result/large_scale_cohort/{disease}/{cohort}/sp/layer_0/fr.tsv (disease include ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD, T2D, hypertension, CFS, IGT, adenoma, schizofrenia)
  • output:

    • result/personalized_FR_nestedness/p_df.tsv pvalues of the comparison between real FR network nestedness and NULL model nestedness

10. Eigenspecies analysis (10. script_cohorts_eigenspecies/)

Result - Eigenspecies of FRCs demonstrate potential as cross-cohort indicators of age and BMI

GCN_fix_tree result is required

10.script_cohorts_eigenspecies/a.eigenspecies.ipynb Analysis 28 cohorts with eigenspecies framework.

  • input:

    • 'result/GCN_fix_tree/leaves_cluster.tsv
    • data/{disease}/{cohort}/abd.tsv (disease include ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD, T2D, hypertension, CFS, IGT, adenoma, schizofrenia)
    • data/{disease}/{cohort}/metadata.tsv
  • output:

    • result/large_scale_cohort/{disease}/{cohort}/eigenspecies/
    • same as 04.script_Antibiotic/d.eigenspecies.ipynb output.

10.script_cohorts_eigenspecies/b.confounders.ipynb Correlation of eigenspecies and ohter phenotype with confounder adjusted.

  • input:

    • a.eigenspecies.ipynb output
  • output:

    • result/large_scale_cohort/{disease}/{cohort}/phenotype/
      • confounder.stats.tsv Confounder statistic
      • confounder.summary.tsv Summary of confounder
      • duplicate_variables.tsv Duplicate variables discard
      • recommended_confounders.txt Recommended confounder used in regression
      • eigenspecies_target_analysis.tsv regression results
      • significant_associations.tsv Significantly association with FDR adjusted p-value < 0.05

11. Simulation (11. script_simulation/)

11.script_simulation/a.se_structure_simulation.ipynb Rearrange edges with large weights, make them inside/outside/randomly in cluster and compare the SE of the network

  • input:

    • data/CRC/{cohort}/abd.tsv
    • data/CRC/{cohort}/metadata.tsv
    • result/large_scale_cohort/p_all_cohorts_se.tsv
    • result/GCN_fix_tree/renamed_GCN_tree.newick
  • output:

    • result/validation/se_structure_simulation/CRC/se_p_values.tsv pvalues of the comparison between the three situations
    • result/validation/se_structure_simulation/CRC/se_mean_std.tsv Statistic result of SE values of the 100 experiment under the three situations
    • result/validation/se_structure_simulation/CRC/se_summary.tsv SE values of the 100 experiment under the three situations

11.script_simulation/b.reduction_simulation.ipynb Taxonomy abundance reduction permutation simulation

  • input:

    • ../data/NAFLD/abd.tsv
    • ../data/NAFLD/NASH_forward_63_map.txt
    • cancer_causal_threshold_80/ Pre-bulit causal inference matrix
  • output:

    • result/perturbation_simulation/
    • simulation_params_root_seed_42.tsv simulation pararmeter generated based on seed = 42
    • simulation_results_root_seed_42_reduction_*.tsv Simulation result with reduction [0.05/0.1/0.15/0.2]
    • FR_boxplot_root_seed_42_all_reductions.png

Plot tool

Scripts under plot_tools/ are used to plot figures.

  1. init_network.ipynb init FR network layout.
    input:
  • data/cMD.select_2008.species_phylum.tsv output:
  • plot_tool/sector_sp_layout.tsv sector layout file sector_sp_layout.tsv for network plot
  1. NAFLD_draw.ipynb Plot networks of NASH and health dataset.
    input:
  • NAFLD/taxonomy.tsv
  • plot_tool/NAFLD_layout.tsv
  • result/NAFLD/cluster_*/keystone_node.tsv
  • result/NAFLD/cluster_*/layer_0/fr.tsv
    output:
  • result/NAFLD/cluster_*/network.svg
    example:

NASH network

  1. procedure_draw_network.ipynb Scripts used to plot personalized FR network for disease and health group.
    input:
  • plot_tool/sector_sp_layout.tsv
  • result/large_scale_cohort/{disease}/{cohort}/sp/cluster_*/keystone_node.tsv
  • result/large_scale_cohort/{disease}/{cohort}/sp/cluster_*/layer_0/fr.tsv
    output:
  • result/large_scale_cohort/{disease}/{cohort}/sp/cluster_*/network.svg
    example:

CRC network

  1. pheno_distribution_se.ipynb Plot SE distribution for disease and health group.
    input:
  • result/large_scale_cohort/p_all_cohorts_se.tsv
  • result/large_scale_cohort/{disease}/{cohort}/SE/se_*.tsv
    output:
  • result/large_scale_cohort/{disease}/{cohort}/SE_distribution/cluster_*/{cohort}.svg
    example:

se_distribution

  1. plot_keystone.ipynb.ipynb Plot keystone result of phenotype datasets.
    input:
  • result/GCN_fix_tree/leaves_cluster.tsv
  • result/large_scale_cohort/{disease}/{cohort}/{cohort}.abundance.wilcox_testing.tsv
  • result/large_scale_cohort/{disease}/{cohort}/sp/cluster_*/keystone_node.tsv
    output:
  • result/keystone/{cohort}.PR.svg
    example:

keystone_plot

  1. NSCLC_distribution_se.ipynb Plot SE distribution for response group and non-response group.
    input:
  • result/NSCLC/FRC_SE/Disc/se_*.tsv
  • result/NSCLC/FRC_SE/p_detail.tsv
    output:
  • result/NSCLC/FRC_SE_distribution/{cluster}/Disc.svg
    example:

se_distribution_NSCLC

  1. pheno_simulation_plot.ipynb Plot simulated pvalues and real pvalues
    input:
  • result/large_scale_cohort/p_all_cohorts_se.tsv
  • result/large_scale_cohort/{disease}/{cohort}/SE/p_detail.tsv
  • result/validation/phenotype_shuffle/{disease}/{cohort}/pvalues.tsv
    output:
  • result/validation/phenotype_shuffle/{disease}/{clsuter}.svg
    example:

pheno_simulation

  1. simu_se_strcture_plot.ipynb Plot simulated SE values
    input:
  • result/validation/se_structure_simulation/CRC/se_summary.tsv
  • result/validation/se_structure_simulation/CRC/se_p_values.tsv
    output:
  • result/validation/se_structure_simulation/CRC/se_summary_scatter.svg
  • result/validation/se_structure_simulation/CRC/se_summary_boxplot.svg
    example:

se_simulation_scatter

se_simulation_boxplot

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages