Skip to content

ASCAT Exome processing steps incorrect for hg38 #2089

@mheskett

Description

@mheskett

Description of the bug

The reference files provided by van loo lab github for hg38 DO have 'chr' prefixes.

so this is incorrect

How to generate ASCAT resources for exome or targeted sequencing
Fetch the GC content correction and replication timing (RT) correction files from the [Dropbox links provided by the ASCAT developers](https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS) and intersect the SNP coordinates with the exome target coordinates. If the target file has ‘chr’ prefixes, make a copy with these removed first. Extract the GC and RT information for only the on target SNPs and zip the results.

this code fails for multiple reasons

  1. you remove the chr prefix
  2. the files are ${t}_G1000_WES_hg38.zip but then unzip to xx_G100_hg38.txt
sed -e 's/chr//' targets_with_chr.bed > targets.bed
 
for t in GC RT
do
  unzip ${t}_G1000_hg38.zip
 
  cut -f 1-3 ${t}_G1000_hg38.txt > ascat_${t}_snps_hg38.txt
  tail -n +2 ascat_${t}_snps_hg38.txt | awk '{ print $2 "\t" $3-1 "\t" $3 "\t" $1 }' > ascat_${t}_snps_hg38.bed
  bedtools intersect -a ascat_${t}_snps_hg38.bed -b targets.bed | awk '{ print $1 "_" $3 }' > ascat_${t}_snps_on_target_hg38.txt
 
  head -n 1 ${t}_G1000_hg38.txt > ${t}_G1000_on_target_hg38.txt
  grep -f ascat_${t}_snps_on_target_hg38.txt ${t}_G1000_hg38.txt >> ${t}_G1000_on_target_hg38.txt
  zip ${t}_G1000_on_target_hg38.zip ${t}_G1000_on_target_hg38.txt
 
  rm ${t}_G1000_hg38.zip
done

Command used and terminal output

Relevant files

No response

System information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions