Reference Allele Pathogenic Variants: Finding Disease Risk in the Human Reference Genome

Find ClinVar pathogenic variants where the GRCh38 reference allele itself is disease-associated. These occur when the reference genome donor carried a …

Find ClinVar pathogenic variants where the GRCh38 reference allele itself is disease-associated. These occur when the reference genome donor carried a pathogenic variant at that position. Query these genes: HGNC:613, HGNC:4879, HGNC:5147, HGNC:3531, HGNC:12530, HGNC:3620 using: >>hgnc>>clinvar>>dbsnp[clinical_significance.startsWith("5") && clinvar_hgvs.contains("=") && is_common==true] Explain clinical significance of the variants found and why reference pathogenic variants matter for variant interpretation pipelines.

Structured Data Sources

This answer was generated using Claude Opus 4.5 + BioBTree MCP, querying 3 biological databases.

Further analyze this answer or run your own queries with BioBTree MCP.

Datasets: clinvar, dbsnp, hgnc
Generated: 2026-04-03 — For the latest data, query BioBTree directly via MCP or API.
View API calls (3)
ClinVar

Reference Pathogenic Variants Found I found 6 variants where the GRCh38 reference allele itself is classified as pathogenic:

GenersIDPosition (GRCh38)RefAltFrequencyDisease Association
APOErs429358chr19:44908684TC15.8%Alzheimer's disease, Type 3 hyperlipoproteinemia
HEXBrs820878chr5:74685445TC,GcommonHexosaminidase deficiency
HPDrs1154510chr12:121857429TCcommonTyrosinemia
F13A1rs2815822chr6:6320575TA,C,GcommonCoagulation factor XIII deficiency
UGT1A1rs3755319chr2:233758936AC,G,T52.5%Lucey-Driscoll syndrome
FCGR3Brs2290834chr1:161629781TCcommonFc receptor dysfunction

Clinical Significance Explained What “Reference = Pathogenic” Means The clinvar_hgvs.contains("=") filter identifies variants where ClinVar has the notation g.POSITION=, meaning the reference allele at that position is pathogenic. This happens when:

1. The GRCh38 reference genome donor carried a disease-associated allele at that position

  1. The “normal” allele (non-pathogenic) is actually the alternate allele in VCF format
  2. Having no variant call at this position means you carry the pathogenic allele Key Examples rs429358 (APOE ε4 allele)
  • Reference T encodes Cys112→Arg (the ε4 allele)

  • This is the strongest genetic risk factor for late-onset Alzheimer’s disease

  • ~15.8% of the population carries this “reference” allele

  • The protective ε3 allele is actually the C alternate rs3755319 (UGT1A1)

  • Found in 55% of the population

  • Associated with Lucey-Driscoll syndrome (transient neonatal hyperbilirubinemia)

  • The reference A allele is the risk variant


Why This Matters for Variant Interpretation Pipelines

1. False Negative Problem Standard pipelines only report differences from reference. If a patient is homozygous for the reference pathogenic allele, no variant is called — the pathogenic genotype goes undetected.

2. Pipeline Modifications Needed Traditional: Sample → Align → Call variants → Report ALT alleles Fixed: Sample → Align → Call variants → Also check REF-pathogenic sites Pipelines must:

  • Maintain a list of known reference-pathogenic positions
  • Explicitly report genotypes at these positions even when homozygous reference
  • Flag “reference” calls at these positions as potentially pathogenic

3. VCF Interpretation Complexity At rs429358 (APOE):

  • 0/0 (hom ref) = carries pathogenic ε4/ε4 (highest Alzheimer’s risk)
  • 0/1 (het) = ε4/ε3 carrier
  • 1/1 (hom alt) = protective ε3/ε3 This is inverted from typical interpretation where 0/0 is assumed benign.

4. Pharmacogenomics Impact Many of these genes (UGT1A1, APOE) affect drug metabolism. Missing a reference-pathogenic genotype could lead to:

  • Incorrect drug dosing
  • Missed drug-gene interactions
  • Adverse drug reactions

5. Quality Control Labs should validate that their pipelines explicitly handle these ~50+ known reference-pathogenic positions across the genome, particularly in clinically actionable genes.


Note on ClinVar Annotation Complexity Upon detailed examination of the clinical_significance field, we found that ClinVar annotations can be complex:

  • Clean cases: Only F13A1 (rs2815822) and FCGR3B (rs2290834) have clear “ref=pathogenic, alt=benign” annotations
  • Complex cases: APOE (rs429358) has pathogenic annotations for BOTH alleles (different diseases)
  • Conflicting cases: HEXB and HPD have both pathogenic AND benign submissions for the reference allele

This highlights that the startsWith("5") filter finds variants where the reference has at least one pathogenic annotation, but doesn’t guarantee the alternate is benign or that annotations are unambiguous. Always examine the full clinical_significance field structure for clinical interpretation.