Skip to content

Fixup the chromosome zarr encoding bug#8

Merged
yonromai merged 1 commit intomainfrom
rav-fixup-chr-zarr
Jun 19, 2025
Merged

Fixup the chromosome zarr encoding bug#8
yonromai merged 1 commit intomainfrom
rav-fixup-chr-zarr

Conversation

@ravwojdyla
Copy link
Copy Markdown
Contributor

@ravwojdyla ravwojdyla commented Jun 19, 2025

If we start the inference from the beginning (i.e. chr1) we encode chr_name as <U4, but at some point <U5 is required.

Below is the list of all the chr_name:

{'chr1', 'chrX', 'chr13', 'chr16', 'chr4', 'chr7', 'chr14', 'chr11', 'chr22', 'chr20', 'chr17', 'chr2', 'chr15', 'chr10', 'chr19', 'chr5', 'chr9', 'chr21', 'chr12', 'chr8', 'chr6', 'chr18', 'chr3'}

@yonromai I originally thought it was the token character, but it's actually the chromosome name, which is a lot more useful downstream, so we should keep it :)

We start writing the chr_name as U4, but at some point U5 is required.
@ravwojdyla ravwojdyla requested a review from yonromai June 19, 2025 01:53
@yonromai yonromai merged commit b0a6cf7 into main Jun 19, 2025
2 checks passed
@yonromai yonromai deleted the rav-fixup-chr-zarr branch June 19, 2025 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants