Create conda enviroment, test under conda 25.1.1
conda create -n meta_fr python=3.8 r-base=4.2 -c conda-forge
conda activate meta_fr
pip install networkx==2.8.7
pip install ipykernel==5.3.4
pip install ipython==8.12.3
pip install ipython-genutils==0.2.0
pip install matplotlib
pip install pandas==1.1.3
pip install statsmodels==0.14.0
pip install svglib
pip install scikit-learn==1.1.2
pip install scikit-learn-extra==0.2.0
pip install scikit-network==0.27.1
pip install scipy==1.10.1
pip install seaborn==0.12.0
pip install reportlab==3.6.12
pip install lifelines==0.27.8
pip install cliffs-delta
pip install pyseat
pip install numpy==1.22.4
pip install pandas==1.5.2
pip install matplotlib_venn
python -m ipykernel install --user --name meta_fr --display-name "Python (meta_fr)"
On your jupyter notebook, choose kernel Python (meta_fr)
💡Note: PySEAT have conflict with numpy version. Please use numpy = 1.22.4 and ignore the warning shows on when you install as
pyseat 0.0.1.4 requires numpy>=1.23.3, but you have numpy 1.22.4 which is incompatible.
conda install r-effsize r-ggplot2 r-ggpubr r-svglite r-reshape2 r-dplyr r-tidyr r-readxl r-randomForest r-pROC
git clone https://github.com/deepomicslab/FR_Hierarchy_Gut
cd FR_Hierarchy_Gut/
Then please unzip data.zip. You will see /data directory.
data/
├── gcn2008.tsv # GCN of 2008 species
├── sp_d.tsv # Precomputed distance matrix for 2008 species in GCN
├── module_def0507.tsv # Definition of module in KEGG
├── cMD.select_2008.select_genome.list # Genomes to create GCN2008
├── cMD.select_2008.tax.fullname.txt # Full taxonomy of species
├── cMD.select_2008.species_phylum.tsv # Species phylum matching
│
├── [ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD,
│ T2D, hypertension, CFS, IGT, adenoma, schizofrenia]/ # Disease categories
│ ├── [cohort_name1, cohort_name2, ...]/ # Multiple cohorts per disease
│ │ ├── metadata.tsv # Metadata (disease in header)
│ │ └── abd.tsv # Abundance profile (species × samples)
│
├── NAFLD/ # NAFLD dataset
│ ├── NASH_forward_63_map.txt # Metadata of phenotypes for NASH dataset
│ ├── abd.tsv # 16S species level profile
│ ├── NASH_GCN.tsv # GCN of NASH for 16S species name
│ └── taxonomy.tsv # Class family species matching
│
├── Anti/ # Antibiotic treatment dataset
│ ├── metadata.tsv # Metadata
│ ├── abd.tsv # Abundance profile
│ ├── Anti.compare.list # Abundance profile
│ ├── Anti.group.tsv # Abundance profile
│ └── Antibiotic.diversity.Frederic.tsv # Abundance profile
│
├── FMT/ # Fecal microbiota transplantation dataset
│ ├── FMT1/
│ │ ├── LiSS_2016.tsv # Species profile (index: species, header: sample name)
│ │ └── Li.txt # Fraction of donor specific strains
│ └── FMT2/
│ ├── Eric_abd.tsv # Species level profile
│ └── Eric.txt # Fraction of donor specific strains
│
└── NSCLC/ # Immunotherapy dataset
├── merged_species.tsv # Species level abundance profile
├── sig.txt # Classification of species in original work
├── metadata.txt # Metadata including cohort
├── DS1_oncology_clinical_data.csv # Metadata including death, os, akk in original work
└── DS5_longitudinal_clinical_data.csv # Metadata including akk level in original work
💡
data/NAFLD/* from doi: 10.1002/imt2.61
data/FMT/FMT1/Li.txt from doi: 10.1038/s41467-020-19940-1
data/FMT/FMT2/Eric.txt from doi: 10.1038/s41467-020-19940-1
data/NSCLC/DS* from doi: 10.1016/j.cell.2024.05.029
data/NSCLC/sig.txt from doi: 10.1016/j.cell.2024.05.029
We highly recommend running the scripts in the directory sequentially in the following order.
Scripts of manuscript section Constructing a priori functional redundancy hierarchical structure of species via structural entropy
01.script_priori_tree/a.compute_distance.ipynb
If you want to start the analysis from GCN, please run this script first to compute distance matrix, which will result as sp_d.tsv. This may take some time (around 20 mins). To save time, you can directly use sp_d.tsv in /data directory which is preproduced.
- input:
../data/gcn2008.tsvGCN of 2008 species - output:
../data/sp_d.tsvDistance matrix
b. Constructing a priori functional redundancy hierarchical structure of species via structural entropy
01.script_priori_tree/b.GCN_tree.ipynb
💡please run this script before FMT, NSCLC, Antibiotic, NSCLC which depend on the prior sturcture.
- inputs:
data/gcn2008.tsvdata/sp_d.tsv
- outputs:
- result/GCN_fix_tree/
renamed_GCN_tree.newick.tsvTree structure in newick formatleaves_cluster.tsvSpecies FRC annotation
- result/GCN_fix_tree/
🔍 Preview of leaves_cluster.tsv
| species | cluster | supercluster |
|---|---|---|
| s__Rhodococcus_fascians | S2_C1 | S2 |
| s__Nocardia_farcinica | S2_C1 | S2 |
| s__Rhodococcus_hoagii | S2_C1 | S2 |
01.script_priori_tree/c.KO_compare.ipynb
Using S1-C8 as example.
- inputs:
data/gcn2008.tsvresult/GCN_fix_tree/leaves_cluster.tsv
- outputs:
- result/GCN_fix_tree/
S1_C8.kos_summary.tsvStatistic of KOs present in S1-C8S1_C8.kos_fisher.tsvFisher testing results
- result/GCN_fix_tree/
🔍 Preview of S1_C8.kos_fisher.tsv
| KO | S1_C8 Present | S1_C8 Absent | Non S1_C8 Present | Non S1_C8 Absent | Odds Ratio | P-value | Adjusted P-value |
|---|---|---|---|---|---|---|---|
| K03648 | 6 | 48 | 1706 | 248 | 1.82E-02 | 4.47E-35 | 2.63E-31 |
| K00560 | 5 | 49 | 1576 | 378 | 2.45E-02 | 1.30E-28 | 3.80E-25 |
| K02837 | 6 | 48 | 1543 | 411 | 3.33E-02 | 1.54E-25 | 3.01E-22 |
01.script_priori_tree/util_evaluation.ipynb Evaluate the feature of GCN following original study.
-
inputs:
data/gcn2008.tsvdata/sp_d.tsv
-
outputs:
result/GCN_evaluation/evaluation.pngThe plot of evaluation result
Result of manuscript section Functional redundancy hierarchical structure reveals species clusters with distinct functions
02.script_signature_modules/a.genome_module_completeness.ipynb
- input:
data/module_def0507.tsvdata/gcn2008.tsv
- output:
result/signature_modules/genome_module.completeness.tsvGenome module completenees matrix, with corresponding species name as rownames, with KEGG modules as column.
02.script_signature_modules/b.signature_modules.ipynb
(require 02.script_signature_modules/cluster_completeness_testing.R)
- input:
result/GCN_fix_tree/leaves_cluster.tsvresult/signature_modules/genome_module.completeness.tsv
- output:
- result/signature_modules/
*_species.tsvSpecies involved in comparison with FRC/superclusters annotation*.genome_module.completeness.tsvSplit genome module completeness of each supercluster*.module_comp.wilcox.testing.tsvTesting results of module completeness comparisoncluster_module_signature.tsvSummary of signature modules of superclusters/FRCs.
- result/signature_modules/
Scripts of manuscript section Structural entropy of vitamin $K_1$, $K_2$ and $B_2$ biosynthesis FRC in the recipient decreased the fecal microbiota transplantation engraftment efficiency
GCN_fix_tree result is required
- input:
result/GCN_fix_tree/renamed_GCN_tree.newick../data/sp_d.tsv
For FMT1:data/FMT/FMT1/metadata.tsvdata/FMT/FMT1/fmt_abd.tsvdata/FMT/FMT1/Li.txt
For FMT2:data/FMT/FMT2/Eric.txvdata/FMT/FMT2/deltat.txtdata/FMT/FMT2/triads.txtdata/FMT/FMT2/Eric_abd.tsv
03.script_FMT/a.analysis_nfr*.ipynb
Mutiple regression on nFR, days after FMT and fraction at each FRC/supercluster.
03.script_FMT/b.analysis_se*.ipynb
Mutiple regression on SE value, days after FMT and fraction at each FRC/supercluster.
03.script_FMT/c.analysis_fr*.ipynb
Mutiple regression on FR, days after FMT and fraction at each cluster/supercluster.
- Output
- result/FMT/*/*/ (First * can be nFR/SE/FR, second * can be FMT1/FMT2)
[cluster].tsvRegression plot data[cluster].pdfPlot of regressionp_values.tsvF-test p-values of regression, coefficient and its p-values
- result/FMT/*/*/ (First * can be nFR/SE/FR, second * can be FMT1/FMT2)
🔍 Preview of [cluster].tsv
| sample | SE_pre | t_post | f_ds |
|---|---|---|---|
| FMT1 | 0.79168257 | 2 | 0.302325581 |
| FMT1 | 0.83223 | 2 | 0.233333 |
🔍 Preview of pvalues.tsv
| F-pvalue | se_co | t_co | const_co | se_p | t_p | const_p | |
|---|---|---|---|---|---|---|---|
| cluster_S1-C3 | 0.003328 | -0.94618 | -0.00035 | 0.514195 | 0.00079 | 0.7069 | 2.80E-15 |
| cluster_S1-C15 | 0.019 | -1.4490 | -0.00035 | 0.515 | 0.005268 | 0.7230 | 2.90E-14 |
03.script_FMT/d.analysis_compute_fr*.ipynb
- Output
- result/FMT/FR_timepoints/*/ (* can be FMT1/FMT2)
fr.tsvFR values of each sample at each timepoint
- result/FMT/FR_timepoints/*/ (* can be FMT1/FMT2)
03.script_FMT/e.root_*.ipynb
Mutiple regression on fd/td/nFR at root, days after FMT and fraction only at root.
- Output
- result/FMT/root/*/ (* can be FMT1/FMT2, here use fd as an exsample, nfr and td are similar)
fd.tsvfd values of each samplefd_root.pdfPlot of regression of fd valuefd_p_values.tsvF-test p-values of regression, coefficient and its p-values
- result/FMT/root/*/ (* can be FMT1/FMT2, here use fd as an exsample, nfr and td are similar)
03.script_FMT/f.merge_S4.ipynb
- Output
- result/FMT
supp_FMT.tsvRegression result for nFR and SE in the two cohorts.
Results as Supplementary Table S4
- result/FMT
Scripts of manuscript section Low preservation of FRCs in the initial state leads to distinct reshaping of the gut microbiome after cefprozil exposure
GCN_fix_tree result is required
04.script_Antibiotic/a.analysis_nFR.ipynb
- input:
data/sp_d.tsvresult/GCN_fix_tree/renamed_GCN_tree.newickdata/Anti/metadata.csvdata/Anti/abd.csv
- output:
- result/Anti/nFR
nfr_df.tsvnFR value of each FRC at each timepoints for each samplecluster_[FRC].pdfPlot nFR value boxplot of the FRC at three timepointsp_value.tsvnFR differential test p-values between exposed and control group at each timepoint for each FRC
- result/Anti/nFR
04.script_Antibiotic/b.analysis_SE.ipynb
- input:
data/sp_d.tsvresult/GCN_fix_tree/renamed_GCN_tree.newickdata/Anti/metadata.csvdata/Anti/abd.csv
- output:
- result/Anti/SE
se_df.tsvSE value of each FRC at each timepoints for each samplecluster_[FRC].pdfPlot SE value boxplot of the FRC at three timepointsp_value.tsvSE differential test p-values between exposed and control group at each timepoint for each FRC
- result/Anti/SE
04.script_Antibiotic/c.fr_differential_testing.ipynb
- input:
result/Anti/nFR/nfr_df.tsvresult/Anti/SE/se_df.tsvdata/Anti/Anti.group.tsvGroup information of samples
- output:
result/Anti/nFR/nfr.EB_EN.differential.tsvresult/Anti/SE/SE.EB_EN.differential.tsv
Results as Supplementary Table S5
🔍 Preview of SE.EB_EN.differential.tsv
| FR | Group1 | Group2 | Cluster | p_value | enriched | mean_g1 | mean_g2 |
|---|---|---|---|---|---|---|---|
| SE | EB_7 | EN_7 | cluster_S1-C1 | 0.0135 | EB_7 | 0.3299 | 0.0997 |
| SE | EB_7 | EN_7 | cluster_S1-C8 | 0.0415 | EN_7 | 0.0140 | 0.1108 |
| SE | EB_7 | EN_7 | cluster_S3-C1 | 0.0296 | EB_7 | 0.0043 | 0.0004 |
04.script_Antibiotic/d.eigenspecies.ipynb
(require 04.script_Antibiotic/eigenspecies_utils.py)
-
prepare group file for comparison pairs, two groups in one comparison
-
calculate eigenspecies of all FRCs in all samples in two groups
-
construct eigenspecies correlation network for two groups respectively
-
preservation matrix of correlation matrices between two groups
-
compare eigenspecies networks difference between two groups
-
input:
data/Anti/Anti.group.tsvGroup information of samplesdata/Anti/Anti.compare.listComparision list of groups, e.g EB0 EN0result/GCN_fix_tree/leaves_cluster.tsvdata/Anti/abd.tsv
-
output for given group
{g1}and group{g2}:- result/Anti/eigenspecies
{g1}.{g2}.group.tsvSamples of two groups{g1}.{g2}.eigenspecies.csvEigenspecies of FRC{g1}.{g2}.eigenspecies_cor.{g1}.tsvEigenspecies correlation network of{g1}{g1}.{g2}.eigenspecies_cor.{g2}.tsvEigenspecies correlation network of{g2}{g1}.{g2}.preserv_matrix.tsvPreservation matrix of two eigenspecies correlation networks{g1}.{g2}.preserv_matrix.pngVisualization of preservation matrix{g1}.{g2}.compare_eigenspecies_networks.tsvDifferential testing of FRC eigenspecies between two groups
- result/Anti/eigenspecies
04.script_Antibiotic/e.correlation_diversity.ipynb
- input:
data/Anti/Antibiotic.diversity.Frederic.tsvTaxonomic diversity provided in 10.1038/ismej.2015.148 Supptable1result/Anti/eigenspecies/EB_0.EN_0.eigenspecies.csvEigenspecies of EB and EN at day0.
- output:
- Correlation of FRC and diversity with p-value in notebook.
Scripts of manuscript section FR keystone species in personalized FR network reveals polycentric structure in healthy individuals and monocentric in non-alcoholic steatohepatitis patients
05.script_NAFLD/abundance_differential_testing.ipynb Test difference between NASH and Normal group
- input:
data/NAFLD/abd.tsvdata/NAFLD/NASH_forward_63_map.txt
- output:
result/NAFLD/NASH.Normal.abundance.wilcox_testing.tsvDifferential testing result
05.script_NAFLD/procedure.ipynbAnalyze the NAFLD dataset using NAFLD GCN, compute personalized FR network and find keystone clusters in NASH group and Normal group.
- input:
data/NAFLD/abd.tsvdata/NAFLD/NASH_forward_63_map.txtdata/NAFLD/NASH_GCN.tsv
- output:
- result/NAFLD/cluster_*/ (* can be NASH/Normal)
keystone_node.tsvSpecies and FRCs with their PR score
- result/NAFLD
genome_module.completeness.tsvCompleteness of module for each species*.module_comp.wilcox.testing.tsvTesting results of module completeness comparison*_species.tsvSpecies involved in comparison with FRC/superclusters annotation
- result/NAFLD/cluster_*/ (* can be NASH/Normal)
GCN_fix_tree result is required
Scripts of manuscript section FRCs as immune checkpoint inhibitor indicators can predict patient survival
06.script_NSCLC/SIG_SE.ipynb Test difference of SE between response group and non-response group at SIG1/SIG2 clsuter raised in original study and compute S score for each sample.
- input:
data/NSCLC/merged_species.txtdata/NSCLC/metadata.txtdata/NSCLC/sig.txtdata/gcn2008.tsvdata/sp_d.tsvdata/NSCLC/DS1_oncology_clinical_data.csv
- output:
- result/NSCLC/SIG_SE/
fig_kde_disc.pdfPlot of distribution of TOPOSCORE in NR and R groupfig_ROC.pdfPlot of ROC for NR/R classificationpred_binary_disc.tsvClassification result and real group label for each sampleNSCLC.pdfFRC with significant SE difference between NR and R groupcluster_sp.jsonspecies list of each FRCexisted_sp.jsonspecies exists in each sample of each FRC
- result/NSCLC/SIG_SE/
06.script_NSCLC/FRC_SE.ipynb Test difference of SE between response group and non-response group at each cluster/supercluster and compute FR S score for each sample.
- input:
data/NSCLC/merged_species.txtdata/NSCLC/metadata.txtdata/gcn2008.tsvdata/sp_d.tsvdata/NSCLC/DS1_oncology_clinical_data.csv
- output:
- result/NSCLC/FRC_SE/
fig_kde_disc.pdfPlot of distribution of TOPOSCORE in NR and R groupfig_ROC.pdfPlot of ROC for NR/R classificationOS_curve.pdfPlot of OS curvepred_binary_disc.tsvClassification result and real group label for each sampleNSCLC.pdfFRC with significant SE difference between NR and R groupcluster_sp.jsonspecies list of each FRCexisted_sp.jsonspecies exists in each sample of each FRC
- result/NSCLC/FRC_SE/
06.script_NSCLC/c.combination_S_score.ipynb Compute combined sig' S score for each sample.
- input:
result/NSCLC/SIG_SE/cluster_sp.jsonresult/NSCLC/SIG_SE/existed_sp.jsonresult/NSCLC/FRC_SE/existed_sp.jsonresult/NSCLC/FRC_SE/cluster_sp.jsondata/NSCLC/DS1_oncology_clinical_data.csv
- output:
- result/NSCLC/combine/
fig_kde_disc.pdfPlot of distribution of TOPOSCORE in NR and R groupfig_ROC.pdfPlot of ROC for NR/R classificationOS_curve.pdfPlot of OS curvepred_binary_disc.tsvClassification result and real group label for each sample
- result/NSCLC/combine/
The R scripts used to produce the analysis in original study and is provided by https://github.com/valerioiebba/TOPOSCORE/tree/main.
GCN_fix_tree result is required
Scripts of manuscript section Structural entropy of FRCs identified as robust phenotype-specific indicators
- input:
data/gcn2008.tsvdata/sp_d.tsvresult/GCN_fix_tree/renamed_GCN_tree.newick- data/{disease}/{cohort}/ (disease include ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD, T2D, hypertension, CFS, IGT, adenoma, schizofrenia)
metadata.tsvabd.tsv
07.script_cohort_FRC/a.analysis_SE.ipynb
Compute SE for FRCs in disease and health group and test the difference.
07.script_cohort_FRC/b.analysis_nFR.ipynb
Compute nFR for FRCs in disease and health group and test the difference.
- output:
Use SE as an example, nFR is similarresult/large_scale_cohort/{disease}/{cohort}/SE/se_*.tsvSE value of FRCs in two groupsresult/large_scale_cohort/{disease}/{disease}_se.pdfPlot FRC with significant difference in SE in at least one cohort of the diseaseresult/large_scale_cohort/p_all_cohorts_se.tsvpvalues of SE at each FRC in all cohortsresult/large_scale_cohort/p_all_cohorts_se.svgPlot FRC with significant difference in SE in at least one cohort of all disease
07.script_cohort_FRC/c.SE_nFR_differential_testing.ipynb
Output some detail statistic information of SE/nFR.
-
input:
result/large_scale_cohort/p_all_cohorts_se.tsvresult/large_scale_cohort/p_all_cohorts_nfr.tsvresult/large_scale_cohort/{disease}/{cohort}/SE/se_*.tsvresult/large_scale_cohort/{disease}/{cohort}/nFR/fr_*.tsv
-
output:
result/large_scale_cohort/{disease}/{cohort}/SE/p_detail.tsvStatiscic information of SEresult/large_scale_cohort/{disease}/{cohort}/nFR/p_detail.tsvStatiscic information of nFR
- input:
result/large_scale_cohort/p_all_cohorts_se.tsvresult/large_scale_cohort/{disease}/{cohort}/SE/se_*.tsv
07.script_cohort_FRC/d.CRC_predict_LODO.ipynb Predict CRC by LODO.
07.script_cohort_FRC/d.IBD_predict_CV.ipynb Predict IBD by cross-validation.
07.script_cohort_FRC/d.IBD_predict_LODO.ipynb Predict IBD by LODO.
- output:
result/predict/{cohort}_{prediction_type}.tsvROC plot of the predictionresult/predict/feature_importance_{prediction_type}.tsvImportance of SE in FRCs
07.script_cohort_FRC/f.pheno_related.ipynb Randamly shuffle label 100 times of sample to prove the relation between SE of FRCs and phentypes.
-
input:
Use CRC as an example, IBD is also used in this experimentresult/large_scale_cohort/CRC/p_all_cohorts_se.tsvresult/large_scale_cohort/CRC/{cohort}/SE/se_*.tsv- data/CRC/{cohort}/
metadata.tsvabd.tsv
-
output:
result/validation/phenotype_shuffle/CRC/{cohort}/se_{FRC}*.tsvSE of one random experiment for the FRCresult/validation/phenotype_shuffle/CRC/{cohort}/pvalues.tsvpvalues of significant difference between disease and health group of the 100 random experiments
Scripts of manuscript section Integrating taxonomic composition to construct a personalized FR network
08.script_cohorts_keystone/a.abundance_differential_testing.ipynbTest abundance difference between disease and health group
- input:
- data/{disease}/{cohort}/ (disease include ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD, T2D, hypertension, CFS, IGT, adenoma, schizofrenia)
metadata.tsvabd.tsv
- data/{disease}/{cohort}/ (disease include ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD, T2D, hypertension, CFS, IGT, adenoma, schizofrenia)
- output:
result/large_scale_cohort/{disease}/{cohort}/{cohort}.abundance.wilcox_testing.tsvDifferential testing result
08.script_cohorts_keystone/b.personalized_FR_keystone.ipynb
- input:
data/gcn2008.tsvdata/sp_d.tsv- data/{disease}/{cohort}/ (disease include ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD, T2D, hypertension, CFS, IGT, adenoma, schizofrenia)
metadata.tsvabd.tsv
- output:
result/large_scale_cohort/{disease}/{cohort}/sp/cluster_*/keystone_node.tsvSpecies and FRCs with their PR scoreresult/large_scale_cohort/{disease}/{cohort}/sp/layer_0/fr.tsvPersonalized FR netowrk
08.script_cohorts_keystone/c.keystone_summary.ipynb
- input:
result/large_scale_cohort/{disease}/{cohort}/sp/cluster_*/keystone_node.tsv
09.script_personalized_FR_nestedness/util_log_effect.ipynb Compute and show effect on the distribution of personalized FR network before and after log rescaled and normalization.
- input:
data/gcn2008.tsvdata/sp_d.tsvdata/CRC/CRC1/metadata.tsv'data/CRC/CRC1/abd.tsv
Personalized FR network is required.
09.script_personalized_FR_nestedness/util_nestedness_experiment.ipynb Test the nestedness compared with NULL experiments of personalized FR network.
-
input:
result/large_scale_cohort/{disease}/{cohort}/sp/layer_0/fr.tsv(disease include ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD, T2D, hypertension, CFS, IGT, adenoma, schizofrenia)
-
output:
result/personalized_FR_nestedness/p_df.tsvpvalues of the comparison between real FR network nestedness and NULL model nestedness
Result - Eigenspecies of FRCs demonstrate potential as cross-cohort indicators of age and BMI
GCN_fix_tree result is required
10.script_cohorts_eigenspecies/a.eigenspecies.ipynb
Analysis 28 cohorts with eigenspecies framework.
-
input:
'result/GCN_fix_tree/leaves_cluster.tsvdata/{disease}/{cohort}/abd.tsv(disease include ACVD, CRC, asthma, carcinoma_surgery_history, STH, migraine, BD, IBD, T2D, hypertension, CFS, IGT, adenoma, schizofrenia)data/{disease}/{cohort}/metadata.tsv
-
output:
- result/large_scale_cohort/{disease}/{cohort}/eigenspecies/
- same as
04.script_Antibiotic/d.eigenspecies.ipynboutput.
10.script_cohorts_eigenspecies/b.confounders.ipynb
Correlation of eigenspecies and ohter phenotype with confounder adjusted.
-
input:
a.eigenspecies.ipynboutput
-
output:
- result/large_scale_cohort/{disease}/{cohort}/phenotype/
confounder.stats.tsvConfounder statisticconfounder.summary.tsvSummary of confounderduplicate_variables.tsvDuplicate variables discardrecommended_confounders.txtRecommended confounder used in regressioneigenspecies_target_analysis.tsvregression resultssignificant_associations.tsvSignificantly association with FDR adjusted p-value < 0.05
- result/large_scale_cohort/{disease}/{cohort}/phenotype/
11.script_simulation/a.se_structure_simulation.ipynb Rearrange edges with large weights, make them inside/outside/randomly in cluster and compare the SE of the network
-
input:
data/CRC/{cohort}/abd.tsvdata/CRC/{cohort}/metadata.tsvresult/large_scale_cohort/p_all_cohorts_se.tsvresult/GCN_fix_tree/renamed_GCN_tree.newick
-
output:
result/validation/se_structure_simulation/CRC/se_p_values.tsvpvalues of the comparison between the three situationsresult/validation/se_structure_simulation/CRC/se_mean_std.tsvStatistic result of SE values of the 100 experiment under the three situationsresult/validation/se_structure_simulation/CRC/se_summary.tsvSE values of the 100 experiment under the three situations
11.script_simulation/b.reduction_simulation.ipynb
Taxonomy abundance reduction permutation simulation
-
input:
../data/NAFLD/abd.tsv../data/NAFLD/NASH_forward_63_map.txtcancer_causal_threshold_80/Pre-bulit causal inference matrix
-
output:
- result/perturbation_simulation/
simulation_params_root_seed_42.tsvsimulation pararmeter generated based on seed = 42simulation_results_root_seed_42_reduction_*.tsvSimulation result with reduction [0.05/0.1/0.15/0.2]FR_boxplot_root_seed_42_all_reductions.png
Scripts under plot_tools/ are used to plot figures.
init_network.ipynbinit FR network layout.
input:
data/cMD.select_2008.species_phylum.tsvoutput:plot_tool/sector_sp_layout.tsvsector layout file sector_sp_layout.tsv for network plot
NAFLD_draw.ipynbPlot networks of NASH and health dataset.
input:
NAFLD/taxonomy.tsvplot_tool/NAFLD_layout.tsvresult/NAFLD/cluster_*/keystone_node.tsvresult/NAFLD/cluster_*/layer_0/fr.tsv
output:result/NAFLD/cluster_*/network.svg
example:
procedure_draw_network.ipynbScripts used to plot personalized FR network for disease and health group.
input:
plot_tool/sector_sp_layout.tsvresult/large_scale_cohort/{disease}/{cohort}/sp/cluster_*/keystone_node.tsvresult/large_scale_cohort/{disease}/{cohort}/sp/cluster_*/layer_0/fr.tsv
output:result/large_scale_cohort/{disease}/{cohort}/sp/cluster_*/network.svg
example:
pheno_distribution_se.ipynbPlot SE distribution for disease and health group.
input:
result/large_scale_cohort/p_all_cohorts_se.tsvresult/large_scale_cohort/{disease}/{cohort}/SE/se_*.tsv
output:result/large_scale_cohort/{disease}/{cohort}/SE_distribution/cluster_*/{cohort}.svg
example:
plot_keystone.ipynb.ipynbPlot keystone result of phenotype datasets.
input:
result/GCN_fix_tree/leaves_cluster.tsvresult/large_scale_cohort/{disease}/{cohort}/{cohort}.abundance.wilcox_testing.tsvresult/large_scale_cohort/{disease}/{cohort}/sp/cluster_*/keystone_node.tsv
output:result/keystone/{cohort}.PR.svg
example:
NSCLC_distribution_se.ipynbPlot SE distribution for response group and non-response group.
input:
result/NSCLC/FRC_SE/Disc/se_*.tsvresult/NSCLC/FRC_SE/p_detail.tsv
output:result/NSCLC/FRC_SE_distribution/{cluster}/Disc.svg
example:
pheno_simulation_plot.ipynbPlot simulated pvalues and real pvalues
input:
result/large_scale_cohort/p_all_cohorts_se.tsvresult/large_scale_cohort/{disease}/{cohort}/SE/p_detail.tsvresult/validation/phenotype_shuffle/{disease}/{cohort}/pvalues.tsv
output:result/validation/phenotype_shuffle/{disease}/{clsuter}.svg
example:
simu_se_strcture_plot.ipynbPlot simulated SE values
input:
result/validation/se_structure_simulation/CRC/se_summary.tsvresult/validation/se_structure_simulation/CRC/se_p_values.tsv
output:result/validation/se_structure_simulation/CRC/se_summary_scatter.svgresult/validation/se_structure_simulation/CRC/se_summary_boxplot.svg
example: