Skip to content

ingcoder/NGS-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NGS-Pipeline

Next-Generation Sequencing (NGS) pipeline designed for identifying genetic variants through a comparative analysis of the mt1 sequence and the reference genome hq19.

This project was done as part of the UCSD extension course on processing large genomic datasets. https://extendedstudies.ucsd.edu/courses-and-programs/processing-actionable-data-in-genomics. Extract from the course description: "...This hands-on course focuses on UNIX-based analysis tasks commonly employed in the field. Instruction covers general considerations ranging from experiment configuration, data QC, and software systems, to tuning of algorithms and visualization of results. In addition, assigned work with public datasets will provide students with weekly hands-on experience with several widely used techniques, including read mapping, variant calling, annotation, and pipeline construction."

The pipeline encompasses the following key steps:

  • Sets up directories and file paths on the local machine.
  • Connects to Google Cloud and downloads reference genome and sequencing files.
  • Indexes the reference genome using bwa index command.
  • Aligns sequencing files to the reference genome using bwa mem.
  • Sorts the aligned file using samtools sort and converts .sam to .bam.
  • Calls variants using bcftools and outputs a .vcf file.
  • Performs a quality check by counting the number of variants found.
  • Handles errors at each step and prints corresponding error messages.
  • Requires user input for file names to start the pipeline.
  • Includes three classes: NGSPipeline, GCloud, and FilePath, handling different tasks.

About

NGS pipeline for variant calling between hq19 reference genome and mt1 sequence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors