Gene Discovery

Hunting for New Therapies

There is a treasure trove of valuable information in both the mapped and unmapped reads of the genome.

Gene Discovery involves identifying novel genes implicated in causing rare disease, developing methods to identify patient’s predisposition to a rare disease and building the knowledge base to improve clinical management of novel genetic disease.

At RCIGM, this work is led by Matthew Bainbridge, PhD, Assistant Director of Translational Research. His team develops novel analysis techniques to squeeze every last bit of information from WGS and to attempt to identify uncommon disease mechanisms (such as ALU insertions and deep intronic mutations) in the pediatric patient population. 

Bioinformatic analysis of Whole Genome Sequencing (WGS) data is used to gain a better understanding of the mechanisms by which pathogenic genomic variants contribute to the development of rare diseases.

Traditional wet-lab modeling of novel diseases is used to functionalize variants of uncertain significance.

Research Projects

Several grant funded research projects are currently under Dr. Bainbridge’s direction:

  • Oligogenic Models of Cardiomyopathy
    The goal is to identify synergistic and modifier mutations that impact structural cardiomyopathies.  Bioinformatically identified variants are prioritized and then functionally tested by Dr. Neil Chi at UC San Diego. Learn More

Matthew Bainbridge, PhD

RCIGM Assistant Director of Translational Research


Clinical Variants in C. elegans Expressing Human STXBP1 Reveals a Novel Class of Pathogenic Variants and Classifies Variants of Uncertain Significance

Genetics in Medicine Open (2023), doi:


Purpose: Modeling disease variants in animals is useful for drug discovery, understanding disease pathology, as well as classifying variants of uncertain significance (VUS) as pathogenic or benign.


Using Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR), we performed a Whole-gene Humanized Animal Model (WHAM) procedure to replace the coding sequence of the animal model’s unc-18 ortholog with the coding sequence for the human STXBP1 gene. Next, we used CRISPR to introduce precise point variants in the WHAM-humanized STXBP1 locus from three clinical categories (benign, pathogenic, and VUS). 26 phenotypic features extracted from video recordings were used to train machine learning classifiers on 25 pathogenic and 32 benign variants.


Using multiple models, we were able to obtain a diagnostic sensitivity near 0.9. Twenty-three VUS were also interrogated and 8 of 23 (34.8%) were observed to be functionally abnormal. Interestingly, unsupervised clustering identified two distinct subsets of known pathogenic variants with distinct phenotypic features; Both p.Tyr75Cys and p.Arg406Cys cluster away from other variants and show an increase in swim speed compared to hSTXBP1 worms. This leads to the hypothesis that the mechanism of disease for these two variants may differ from most STXBP1-mutated patients and may account for some of the clinical heterogeneity observed in the patient population


PLoS One. 2023 Jan 26;18(1):e0279430. doi: 10.1371/journal.pone.0279430. eCollection 2023.


Short Tandem Repeats (STRs) have been found to play a role in a myriad of complex traits and genetic diseases. We examined the variability in the lengths of over 850,000 STR loci in 996 children with suspected genetic disorders and 1,178 parents across six separate ancestral groups: Africans, Europeans, East Asians, Admixed Americans, Non-admixed Americans, and Pacific Islanders. For each STR locus we compared allele length between and within each ancestry group. In relation to Europeans, admixed Americans had the most similar STR lengths with only 623 positions either significantly expanded or contracted, while the divergence was highest in Africans, with 4,933 chromosomal positions contracted or expanded. We also examined probands to identify STR expansions at known pathogenic loci. The genes TCF4, AR, and DMPK showed significant expansions with lengths 250% greater than their various average allele lengths in 49, 162, and 11 individuals respectively. All 49 individuals containing an expansion in TCF4 and six individuals containing an expansion in DMPK presented with allele lengths longer than the known pathogenic length for these genes. Next, we identified individuals with significant expansions in highly conserved loci across all ancestries. Eighty loci in conserved regions met criteria for divergence. Two of these individuals were found to have exonic STR expansions: one in ZBTB4 and the other in SLC9A7, which is associated with X-linked mental retardation. Finally, we used parent-child trios to detect and analyze de novo mutations. In total, we observed 3,219 de novo expansions, where proband allele lengths are greater than twice the longest parental allele length. This work helps lay the foundation for understanding STR lengths genome-wide across ancestries and may help identify new disease genes and novel mechanisms of pathogenicity in known disease genes.

PMID:36701310 DOI:10.1371/journal.pone.0279430

Want to Learn More?

Contact Us About BeginNGS