Skip to content

v1.2.3 Improvements

Rauf Salamzade edited this page Nov 24, 2025 · 5 revisions

In v1.2.3 of codoff we replaced calculation of an empirical P-value with the metric "Discordance Percentile". These are very similar metrics and both rely on performing simulations to gauge how different the codon usage profile of the focal region of interest is to the background genome. The "Discordance Percentile" is effectively just the empirical P-value multiplied by 100. The rationale here is that using a percentile simplifies interpretation and also systematic investigation of multiple focal regions (especially when independence between multiple regions can't be assumed).

However, you will notice there are differences between v1.2.2 and prior versions with v1.2.3 beyond just the use of "Discordance Percentile". This is because in v1.2.3, our main change was actually to how the simulations are carried out.

In v1.2.2 and earlier versions, we simulated by creating hypothetical gene clusters of similar size to the focal gene cluster composed of genes from across the genome. While the genes were real, they could be from very different genomic regions and this made the simulation less realistic. In v1.2.3, for each simulation, we instead select a random point in the genome and then extract codon usage information for a neighborhood of equivalent size to the focal region. This new approach thus preserves information on genomic structure/organization that the previous simulation did not.

Checking the distributions:

The motivation for switching approaches for the simulation came from checking how distributions for the Discordance Percentile (previously empirical p-value) look for randomly selected regions of a certain size across an input genome. If the simulation is working properly, we would expect that the distribution of such values would be uniform. This testing was performed using the run_simulations.py script included in the main folder of codoff.

Corynebacterium simulans - 10 kb windows

Corynebacterium_10k

Corynebacterium simulans - 50 kb windows

Corynebacterium_50k

Corynebacterium simulans - 100 kb windows

Corynebacterium_100k

Aspergillus flavus - 10 kb windows

Aflavus_10k

Aspergillus flavus - 50 kb windows

Aflavus_50k

Aspergillus flavus - 100 kb windows

Aflavus_100k