Skip to content

Guidance on cov_pca and EM usage + error running cov_flash #138

@lucia-ramirez

Description

@lucia-ramirez

Hello! Thank you very much for creating and maintaining this package.

I'm currently using mashR to combine differentially expressed genes across multiple cell types and conditions. Since the cell types come from the same patients, I’m including a correlation matrix and testing different approaches (both null tests and EM), as well as comparing data-driven matrices (cov_pca and cov_flash).

I have a few questions and an issue to report:

1) Choosing the number of PCs in cov_pca

Do you have a general rule to decide how many pcs to use? I have up to 44 conditions (columns) in my bhat / shat matrices, though I might not include all of them in a single analysis.

In your simulations, you had 5 conditions and used 5 PCs, does that mean n_conditions = n_PCs? Or is 5 PCs generally sufficient even for larger numbers of conditions?

2) EM method for correlated measurements

When running mash_estimate_corr_em, should one include the canonical matrices (as in the vignette), or is it preferable to use only data-driven matrices, or both?

3) Error when running cov_flash

I’ve been trying to follow the vignette for flash but I’m encountering the following error/output:

cov_flash(D.simple$strong, factors="nonneg", tag="non_neg")
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Adding factor 6 to flash object...
Adding factor 7 to flash object...
Adding factor 8 to flash object...
Adding factor 9 to flash object...
Adding factor 10 to flash object...
Adding factor 11 to flash object...
Adding factor 12 to flash object...
Adding factor 13 to flash object...
Adding factor 14 to flash object...
Adding factor 15 to flash object...
Adding factor 16 to flash object...
Adding factor 17 to flash object...
Adding factor 18 to flash object...
Adding factor 19 to flash object...
Adding factor 20 to flash object...
Adding factor 21 to flash object...
Adding factor 22 to flash object...
Adding factor 23 to flash object...
Adding factor 24 to flash object...
Adding factor 25 to flash object...
Adding factor 26 to flash object...
Adding factor 27 to flash object...
Adding factor 28 to flash object...
Factor doesn't significantly increase objective and won't be added.
Wrapping up...
Done.
Backfitting 27 factors (tolerance: 1.61e-03)...
  Difference between iterations is within 1.0e+03...
  Difference between iterations is within 1.0e+02...
  Difference between iterations is within 1.0e+01...
Error in if (any(s == 0)) { : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In simpute.als(x, J, thresh, lambda, maxit, trace.it, warm.start,  :
  Convergence not achieved by 100 iterations
2: In handle_standard_errors(x, s) :
  Nonpositive SEs have been replaced by small positive SEs.

Both my bhat and shat matrices have no missing values, and shat contains no negative entries, so I’m unsure what might be triggering this issue. The cov_pca function runs successfully with the same input, suggesting the issue may be specific to cov_flash.

I'm using R 4.5.0, mashr_0.2.79 and flashier_1.0.7.

My data takes some time to run with flash but I could try to find a smaller subset or share the full data if needed.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions