Child pages
  • Abstract: Identification of canine disease-causing mutations from non-coding, mapped and unmapped regions of whole genome sequences of healthy and disease-carrying dogs, Y2
Skip to end of metadata
Go to start of metadata

Since the submission and award of my Startup Grant (TG-MCB 160055), the number of dog whole genomes we have sequenced has doubled to > 140. While our success rate identifying causal mutations has remained at ~30% and largely limited to indels, missense and splice junction mutations, we have recently identified a few copy number and apparent intronic variants as disease-causing mutations. As I argued in my earlier Startup grant application, mutation discovery rates can still be improved by intensifying the search for variants in the non-coding regulatory regions such as promoters that reside in hard to sequence regions and in unmapped contigs that result from assembly errors, gaps or other issues with the current dog reference genome assembly. I initially applied for a modest Startup allocation of 50 000 SUs and 1 TB storage to take advantage virtual Galaxy machines on Jetstream. As my project developed and after receiving guidance from my ECSS team, I submitted and was granted additional resources on Jetstream, Bridges, Bridges Large and Bridges Pylon. I accomplished the main grant objective of filling in gaps in the dog reference genome on a genome-wide scale. However, I would like to renew my Startup grant for one more year to enable me to add more value to the data that I generated using the gap-filling pipeline. I need the extra time and computing resources to combine my python script that locates gaps in reference sequence with a recently published simple open source python script, 'reform' and/or Maker software package, to come up with a pipeline that semi-automates the editing, annotation and improvement of the existing dog reference genome. Although my primary focus will remain on dog genomics, I now plan to make my bioinformatics pipelines, whenever possible, species-agnostic because we now collaborate with researchers working on other species. Improved reference genome assemblies generated this way will not only facilitate the discovery of additional disease-causing variants but will also facilitate the study of breed, strain or even line differences within species, including evolutionary and comparative genomics. I plan to use GATK and bcbio pipelines for the variant calling. I hope I have adequately justified requesting an additional year’s access to the unused resources associated with my current Startup grant.

  • No labels