Hello! Thank you very much for creating and maintaining this package.
I'm currently using mashR to combine differentially expressed genes across multiple cell types and conditions. Since the cell types come from the same patients, I’m including a correlation matrix and testing different approaches (both null tests and EM), as well as comparing data-driven matrices (cov_pca and cov_flash).
I have a few questions and an issue to report:
1) Choosing the number of PCs in cov_pca
Do you have a general rule to decide how many pcs to use? I have up to 44 conditions (columns) in my bhat / shat matrices, though I might not include all of them in a single analysis.
In your simulations, you had 5 conditions and used 5 PCs, does that mean n_conditions = n_PCs? Or is 5 PCs generally sufficient even for larger numbers of conditions?
2) EM method for correlated measurements
When running mash_estimate_corr_em, should one include the canonical matrices (as in the vignette), or is it preferable to use only data-driven matrices, or both?
3) Error when running cov_flash
I’ve been trying to follow the vignette for flash but I’m encountering the following error/output:
cov_flash(D.simple$strong, factors="nonneg", tag="non_neg")
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Factor doesn't significantly increase objective and won't be added.
Wrapping up...
Done.
Backfitting 27 factors (tolerance: 1.61e-03)...
Difference between iterations is within 1.0e+03...
Difference between iterations is within 1.0e+02...
Difference between iterations is within 1.0e+01...
Error in if (any(s == 0)) { : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In simpute.als(x, J, thresh, lambda, maxit, trace.it, warm.start, :
Convergence not achieved by 100 iterations
2: In handle_standard_errors(x, s) :
Nonpositive SEs have been replaced by small positive SEs.
Both my bhat and shat matrices have no missing values, and shat contains no negative entries, so I’m unsure what might be triggering this issue. The cov_pca function runs successfully with the same input, suggesting the issue may be specific to cov_flash.
I'm using R 4.5.0, mashr_0.2.79 and flashier_1.0.7.
My data takes some time to run with flash but I could try to find a smaller subset or share the full data if needed.
Thanks!
Hello! Thank you very much for creating and maintaining this package.
I'm currently using mashR to combine differentially expressed genes across multiple cell types and conditions. Since the cell types come from the same patients, I’m including a correlation matrix and testing different approaches (both null tests and EM), as well as comparing data-driven matrices (cov_pca and cov_flash).
I have a few questions and an issue to report:
1) Choosing the number of PCs in cov_pca
Do you have a general rule to decide how many pcs to use? I have up to 44 conditions (columns) in my bhat / shat matrices, though I might not include all of them in a single analysis.
In your simulations, you had 5 conditions and used 5 PCs, does that mean n_conditions = n_PCs? Or is 5 PCs generally sufficient even for larger numbers of conditions?
2) EM method for correlated measurements
When running
mash_estimate_corr_em, should one include the canonical matrices (as in the vignette), or is it preferable to use only data-driven matrices, or both?3) Error when running cov_flash
I’ve been trying to follow the vignette for flash but I’m encountering the following error/output:
Both my bhat and shat matrices have no missing values, and shat contains no negative entries, so I’m unsure what might be triggering this issue. The
cov_pcafunction runs successfully with the same input, suggesting the issue may be specific to cov_flash.I'm using R 4.5.0, mashr_0.2.79 and flashier_1.0.7.
My data takes some time to run with flash but I could try to find a smaller subset or share the full data if needed.
Thanks!