MLH1 Gene Complete Identifier and Functional Mapping Reference

Provide a comprehensive cross-database identifier and functional mapping reference for human MLH1. This should serve as a definitive lookup resource …

Provide a comprehensive cross-database identifier and functional mapping reference for human MLH1. This should serve as a definitive lookup resource for researchers. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 1: GENE IDENTIFIERS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Provide ALL gene-level database identifiers: - HGNC ID and approved symbol - Ensembl gene ID (ENSG) - NCBI Entrez Gene ID - OMIM gene/locus ID - Genomic location: chromosome, start position, end position, strand ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 2: TRANSCRIPT IDENTIFIERS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ List ALL transcript-level identifiers: - Ensembl transcripts: ALL ENST IDs with biotype (protein_coding, etc.) How many total transcripts? - RefSeq transcripts: ALL NM_ mRNA accessions Mark which is MANE Select (canonical clinical standard) - CCDS IDs: ALL consensus coding sequence identifiers For the CANONICAL/MANE SELECT transcript: - List ALL exon IDs (ENSE) with genomic coordinates - Total exon count ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 3: PROTEIN IDENTIFIERS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ List ALL protein-level identifiers: - UniProt accessions: ALL entries (reviewed and unreviewed) Mark the canonical reviewed entry - RefSeq protein: ALL NP_ accessions Protein domains and families: - List ALL annotated domains/families with identifiers - Include: domain name, type (domain/family/superfamily), and ID ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 4: STRUCTURE IDENTIFIERS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Experimental structures: - List ALL PDB structure IDs - For each: experimental method (X-ray, NMR, Cryo-EM) and resolution - Total PDB structure count Predicted structures: - AlphaFold model ID and confidence metrics (pLDDT) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 5: CROSS-SPECIES ORTHOLOGS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ List orthologous genes in key model organisms (where available): - Mouse (Mus musculus): gene ID, symbol - Rat (Rattus norvegicus): gene ID, symbol - Zebrafish (Danio rerio): gene ID, symbol - Fruit fly (Drosophila melanogaster): gene ID, symbol - Worm (C. elegans): gene ID, symbol - Yeast (S. cerevisiae): gene ID, symbol ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 6: CLINICAL VARIANTS & AI PREDICTIONS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Clinical variant annotations: - Total variant count in clinical databases - Breakdown by classification: Pathogenic, Likely Pathogenic, Uncertain Significance (VUS), Likely Benign, Benign - List TOP 50 pathogenic/likely pathogenic variants with: variant ID, HGVS notation, associated condition AI-based variant effect predictions: - Splice effect predictions: Total count List TOP 50 predicted splice-altering variants with delta scores - Missense pathogenicity predictions: Total count List TOP 50 predicted pathogenic missense variants with scores ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 7: BIOLOGICAL PATHWAYS & GENE ONTOLOGY ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Pathway membership: - List ALL biological pathways this gene participates in - Include pathway IDs and names - Total pathway count Gene Ontology annotations: - Biological Process: count and TOP 20 terms with IDs - Molecular Function: count and TOP 20 terms with IDs - Cellular Component: count and TOP 20 terms with IDs ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 8: PROTEIN INTERACTIONS & MOLECULAR NETWORKS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Protein-protein interactions: - Total interaction count - List TOP 50 highest-confidence interacting proteins with scores Protein similarity (evolutionary and structural): - Structural/embedding similarity: How many similar proteins? List TOP 20 with similarity scores - Sequence homology: How many homologous proteins? List TOP 20 with identity/similarity scores ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 9: TRANSCRIPTION FACTOR REGULATORY DATA ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ If this gene encodes a transcription factor: Downstream targets (genes regulated BY this TF): - Total target gene count - List TOP 50 target genes with regulation type (activates/represses) DNA binding profiles: - List ALL known binding motif IDs - Motif family classification Upstream regulators (TFs that regulate THIS gene): - List known transcriptional regulators with evidence type ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 10: DRUG & PHARMACOLOGY DATA ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ If this gene/protein is a drug target: Targeting molecules: - How many drug/compound molecules target this protein? - List TOP 30 molecules by development phase - Include: molecule ID, name, mechanism, highest development phase Clinical trials: - How many clinical trials involve drugs targeting this gene? - List TOP 20 trials with: trial ID, phase, status, intervention Pharmacogenomics: - Known drug-gene interactions affecting drug response - Dosing guidelines if any exist ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 11: EXPRESSION PROFILES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Tissue expression: - Which tissues express this gene most highly? - List TOP 30 tissues with expression scores/levels - Note any tissue-specific or tissue-enriched patterns Cell type expression: - Which cell types show highest expression? - List TOP 30 cell types with expression scores - Note any cell type-specific patterns Single-cell expression data (if available): - Which single-cell datasets/experiments include this gene? - Notable cell population patterns ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 12: DISEASE ASSOCIATIONS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Mendelian/monogenic disease links: - What diseases are caused by mutations in this gene? - List ALL associated diseases with: disease name, disease ID, inheritance pattern, evidence level Phenotype associations: - What clinical phenotypes are associated with this gene? - List TOP 50 phenotype terms with IDs Complex trait associations (GWAS): - What traits/diseases are linked via genome-wide association studies? - List TOP 30 GWAS associations with: trait, study ID, p-value, effect size if available ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ FORMATTING REQUIREMENTS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ - Present each section with clear headers - Use tables for listing multiple items - Always include COUNTS for every category - For lists with >50 items: show total count + TOP 50 - For lists with <50 items: show ALL items - Always include database identifiers (IDs), not just names - Mark canonical/primary/reviewed entries where applicable - If a section has no data available, state "No data available" rather than omitting the section

Structured Data Sources

This answer was generated using Claude Opus 4.5 + BioBTree MCP, querying 33 biological databases.

Further analyze this answer or run your own queries with BioBTree MCP.

Datasets: alphafold, alphamissense, bgee, ccds, chembl_target, clinvar, collectri, diamond_similarity, ensembl, entrez, esm2_similarity, exon, gencc, go, gwas, hgnc, hpo, intact, interpro, mim, orphanet, ortholog, pdb, pfam, pharmgkb, pharmgkb_gene, reactome, refseq, scxa, spliceai, string_interaction, transcript, uniprot
Generated: 2026-03-27 — For the latest data, query BioBTree directly via MCP or API.
View API calls (49)
MLH1

Section 1: Gene Identifiers

DatabaseIdentifierDescription
HGNC IDHGNC:7127Approved symbol
HGNC SymbolMLH1mutL homolog 1
Ensembl GeneENSG00000076242Ensembl gene ID
NCBI Entrez Gene4292NCBI Gene ID
OMIM Gene120436Gene/Locus entry
StatusApprovedprotein-coding gene
Aliases: HNPCC, FCC2, HNPCC2, MLH-1, COCA2 Gene Groups: MutL homologs, BRCA1-associated genome surveillance complex Genomic Location (GRCh38)
AttributeValue
Chromosome3
Cytogenetic Band3p22.2
Start Position36,993,226
End Position37,050,896
Strand+ (forward)
Gene Span57,671 bp

Section 2: Transcript Identifiers Ensembl Transcripts Total Transcript Count: 44

Transcript IDBiotypeStartEndStatus
ENST00000231790protein_coding36,993,51837,050,846Canonical
ENST00000413740protein_coding36,993,35737,050,846
ENST00000429117protein_coding36,993,82637,050,846
ENST00000435176protein_coding36,993,80437,050,844
ENST00000441265protein_coding36,993,82637,047,737
ENST00000450420protein_coding36,993,51837,050,537
ENST00000455445protein_coding36,993,79837,050,706
ENST00000456676protein_coding36,993,51837,050,844
ENST00000458205protein_coding36,993,77637,050,846
ENST00000466900protein_coding36,993,84837,050,844
ENST00000485889protein_coding36,993,82737,050,846
ENST00000492474protein_coding36,993,79137,050,896
ENST00000536378protein_coding36,993,35037,050,842
ENST00000539477protein_coding36,993,80437,050,783
ENST00000616768protein_coding36,993,47237,050,846
ENST00000673673protein_coding36,993,22637,050,799
ENST00000673715protein_coding36,993,52337,047,740
ENST00000673899protein_coding36,993,48937,050,767
ENST00000673990protein_coding36,993,78537,050,807
ENST00000674019protein_coding36,993,76437,050,823
ENST00000713802protein_coding36,993,54237,050,841
ENST00000931189protein_coding36,993,48737,050,843
ENST00000931190protein_coding36,993,49137,050,846
ENST00000931191protein_coding36,993,54537,050,841
ENST00000948704protein_coding36,993,48737,050,844
ENST00000948705protein_coding36,993,54037,050,842
ENST00000413212nonsense_mediated_decay36,993,78537,050,823
ENST00000432299nonsense_mediated_decay36,993,52437,050,823
ENST00000447829nonsense_mediated_decay36,993,78537,050,807
ENST00000454028nonsense_mediated_decay36,993,53437,050,846
ENST00000457004nonsense_mediated_decay36,993,51637,014,494
ENST00000458009nonsense_mediated_decay36,993,51837,050,846
ENST00000673897nonsense_mediated_decay36,993,53137,050,794
ENST00000673947nonsense_mediated_decay36,993,53937,050,814
ENST00000673972nonsense_mediated_decay36,993,51537,050,827
ENST00000674111nonsense_mediated_decay36,993,49537,050,796
ENST00000442249retained_intron36,993,53337,026,093
ENST00000476172retained_intron36,993,78536,998,096
ENST00000673686retained_intron36,993,78536,998,085
ENST00000673713retained_intron36,993,51737,026,060
ENST00000673741retained_intron37,046,75437,050,843
ENST00000673889retained_intron37,017,33437,050,799
ENST00000674107retained_intron36,993,83337,030,353
ENST00000674125retained_intron37,028,66437,050,838
Biotype Summary: 26 protein_coding, 10 nonsense_mediated_decay, 8 retained_intron RefSeq Transcripts (Human Chromosome 3)
AccessionTypeStatusMANE Select
NM_000249mRNAREVIEWED✓ Yes
NM_001167617mRNAREVIEWEDNo
NM_001167618mRNAREVIEWEDNo
NM_001167619mRNAREVIEWEDNo
NM_001258271mRNAREVIEWEDNo
NM_001258273mRNAREVIEWEDNo
NM_001258274mRNAREVIEWEDNo
NM_001354615mRNAREVIEWEDNo
NM_001354616mRNAREVIEWEDNo
NM_001354617mRNAREVIEWEDNo
NM_001354618mRNAREVIEWEDNo
NM_001354619mRNAREVIEWEDNo
NM_001354620mRNAREVIEWEDNo
NM_001354621mRNAREVIEWEDNo
NM_001354622mRNAREVIEWEDNo
NM_001354623mRNAREVIEWEDNo
NM_001354624mRNAREVIEWEDNo
NM_001354625mRNAREVIEWEDNo
NM_001354626mRNAREVIEWEDNo
NM_001354627mRNAREVIEWEDNo
NM_001354628mRNAREVIEWEDNo
NM_001354629mRNAREVIEWEDNo
NM_001354630mRNAREVIEWEDNo
CCDS IDs
CCDS IDStatus
CCDS2663Primary
CCDS54562Alternative
CCDS54563Alternative
Canonical Transcript Exon Structure (ENST00000231790) Total Exon Count: 19
Exon IDStartEndLength (bp)
ENSE0000401271536,993,51836,993,663146
ENSE0000349602236,996,61936,996,70991
ENSE0000359903637,000,95537,001,05399
ENSE0000363540637,004,40137,004,47474
ENSE0000353385337,006,99137,007,06373
ENSE0000363362337,008,81437,008,90592
ENSE0000352196837,011,82037,011,86243
ENSE0000351678737,012,01137,012,09989
ENSE0000365603337,014,43237,014,544113
ENSE0000171687137,017,50637,017,59994
ENSE0000378562737,020,31037,020,463154
ENSE0000368810637,025,63737,026,007371
ENSE0000368056437,028,78437,028,932149
ENSE0000173018837,040,18637,040,294109
ENSE0000174761837,042,26837,042,33164
ENSE0000174889737,047,51937,047,683165
ENSE0000159340037,048,51737,048,60993
ENSE0000178529637,048,90437,049,017114
ENSE0000390258137,050,48637,050,846361

Section 3: Protein Identifiers UniProt Accessions Total: 18 entries

AccessionNameStatusCanonical
P40692DNA mismatch repair protein Mlh1Reviewed (Swiss-Prot)✓ Yes
A0A087WX20MLH1 isoformUnreviewedNo
A0A669KAW3MLH1 isoformUnreviewedNo
A0A669KB03MLH1 isoformUnreviewedNo
A0A669KBB4MLH1 isoformUnreviewedNo
A0A669KBK2MLH1 isoformUnreviewedNo
A0AAQ5BGN3MLH1 isoformUnreviewedNo
A0AAQ5BGZ2MLH1 isoformUnreviewedNo
C9JZ54MLH1 isoformUnreviewedNo
E7EUC9MLH1 isoformUnreviewedNo
E9PF25MLH1 isoformUnreviewedNo
F2Z298MLH1 isoformUnreviewedNo
H0Y4N0MLH1 isoformUnreviewedNo
H0Y5L7MLH1 isoformUnreviewedNo
H0Y5U4MLH1 isoformUnreviewedNo
H0Y793MLH1 isoformUnreviewedNo
H0Y806MLH1 isoformUnreviewedNo
H0Y818MLH1 isoformUnreviewedNo
Canonical Protein (P40692) Properties
PropertyValue
Length756 amino acids
Mass84,601 Da
Alternative NamesMutL protein homolog 1
RefSeq Protein Accessions (Human)
AccessionStatusMANE Select
NP_000240REVIEWED✓ Yes
NP_001161089REVIEWEDNo
NP_001161090REVIEWEDNo
NP_001161091REVIEWEDNo
NP_001245200REVIEWEDNo
NP_001245202REVIEWEDNo
NP_001245203REVIEWEDNo
NP_001341544REVIEWEDNo
NP_001341545REVIEWEDNo
NP_001341546REVIEWEDNo
NP_001341547REVIEWEDNo
NP_001341548REVIEWEDNo
NP_001341549REVIEWEDNo
NP_001341550REVIEWEDNo
NP_001341551REVIEWEDNo
NP_001341552REVIEWEDNo
NP_001341553REVIEWEDNo
NP_001341554REVIEWEDNo
NP_001341555REVIEWEDNo
NP_001341556REVIEWEDNo
NP_001341557REVIEWEDNo
NP_001341558REVIEWEDNo
NP_001341559REVIEWEDNo
Protein Domains and Families Total: 8 InterPro entries
InterPro IDNameType
IPR002099MutL/Mlh/PMSFamily
IPR038973MutL/Mlh/Pms-likeFamily
IPR013507DNA_mismatch_S5_2-likeDomain
IPR032189Mlh1_CDomain
IPR014721Ribsml_uS5_D2-typ_fold_subgrHomologous_superfamily
IPR020568Ribosomal_Su5_D2-typ_SFHomologous_superfamily
IPR036890HATPase_C_sfHomologous_superfamily
IPR014762DNA_mismatch_repair_CSConserved_site
Pfam Domains
Pfam IDName
PF01119DNA_mis_repair
PF13589HATPase_c_3
PF16413MutL_C

Section 4: Structure Identifiers Experimental Structures (PDB) Total PDB Structure Count: 7

PDB IDTitleMethodResolution (Å)Organism
3RBNCrystal structure of MutL protein homolog 1 isoform 1X-RAY DIFFRACTION2.16Homo sapiens
4P7ACrystal Structure of human MLH1X-RAY DIFFRACTION2.30Homo sapiens
5U5PImportin-alpha with MLH1 NLS PeptideX-RAY DIFFRACTION2.171Mus musculus/Synthetic
6WBAImportin alpha MLH1-R470A NLS Peptide ComplexX-RAY DIFFRACTION2.151Mus musculus/Synthetic
6WBBImportin alpha MLH1-E475A NLS peptide complexX-RAY DIFFRACTION2.663Mus musculus/Synthetic
6WBCImportin alpha MLH1-R472K NLS Peptide ComplexX-RAY DIFFRACTION2.15Mus musculus/Synthetic
7M60Importin alpha MLH1-S467A NLS Peptide ComplexX-RAY DIFFRACTION2.30Mus musculus/Synthetic
Predicted Structures (AlphaFold)
AlphaFold IDSequence LengthGlobal pLDDTFraction Very High Confidence
P40692594777.890.51 (51%)

Section 5: Cross-Species Orthologs

OrganismGene IDSymbolUniProt
Mouse (Mus musculus)ENSMUSG00000032498Mlh1P97679
Rat (Rattus norvegicus)ENSRNOG00000033809Mlh1Q9JK91
Zebrafish (Danio rerio)ENSDARG00000025948mlh1-
Fruit fly (Drosophila melanogaster)FBGN0011659Mlh1Q9ZRV4
Worm (C. elegans)WBGENE00003373mlh-1Q54KD8
Yeast (S. cerevisiae)YMR167WMLH1P38920

Section 6: Clinical Variants & AI Predictions ClinVar Variant Summary Total Variant Count: 6,260

ClassificationCount
Pathogenic>500
Likely pathogenic>300
Uncertain significance (VUS)>4,000
Likely benignMultiple
BenignMultiple
TOP 50 Pathogenic Variants (ClinVar)
ClinVar IDHGVS NotationTypeReview Status
142856c.117-2A>GSNVReviewed by expert panel
1012206c.1003del (p.Leu335fs)DeletionMultiple submitters
1048862c.497T>G (p.Leu166Ter)SNVMultiple submitters
1049670c.1896+1G>CSNVMultiple submitters
1068948c.406A>T (p.Lys136Ter)SNVMultiple submitters
1069408c.1507del (p.Leu503fs)DeletionMultiple submitters
1070239c.22del (p.Ile8fs)DeletionMultiple submitters
1070756c.1408A>T (p.Arg470Ter)SNVMultiple submitters
1073257c.1707del (p.Asn570fs)DeletionMultiple submitters
1074205c.35del (p.Asp12fs)DeletionMultiple submitters
1074878c.1830C>G (p.Tyr610Ter)SNVMultiple submitters
1076321c.1888del (p.Ile630fs)DeletionMultiple submitters
1076485c.1435_1453del (p.Val479fs)DeletionMultiple submitters
1076792c.1983dup (p.Thr662fs)DuplicationMultiple submitters
1177381c.1695_1698del (p.Ile565fs)DeletionMultiple submitters
1195084c.1713del (p.Phe571fs)DeletionMultiple submitters
1358070c.1727T>G (p.Leu576Ter)SNVMultiple submitters
135851c.2190del (p.Pro731fs)DeletionMultiple submitters
1365840c.52_59del (p.Arg18fs)DeletionMultiple submitters
1368790c.592G>T (p.Gly198Ter)SNVMultiple submitters
1372749c.2043dup (p.Met682fs)DuplicationMultiple submitters
1387037c.813del (p.Ser271_Leu272insTer)DeletionMultiple submitters
1392794c.1499del (p.Ile500fs)DeletionMultiple submitters
1402505c.1809del (p.Glu605fs)DeletionMultiple submitters
1405246c.1239dup (p.Glu414fs)DuplicationMultiple submitters
141043c.790+1G>TSNVMultiple submitters
1412554c.1730C>A (p.Ser577Ter)SNVMultiple submitters
1443114c.2195_2196del (p.Lys732fs)DeletionMultiple submitters
1451346c.383del (p.Ala128fs)DeletionMultiple submitters
1452609c.227dup (p.Cys77fs)DuplicationMultiple submitters
1453286c.2138_2151del (p.Lys713fs)DeletionMultiple submitters
1453590c.2163del (p.Val720_Tyr721insTer)DeletionMultiple submitters
1068563c.1921dup (p.Leu641fs)DuplicationMultiple submitters
1023950c.1681_1686del (p.Tyr561_Gln562del)DeletionSingle submitter
1049302c.1275dup (p.Gln426fs)DuplicationSingle submitter
1049396c.1999del (p.Asp667fs)DeletionSingle submitter
1049673c.1367del (p.Thr455_Ser456insTer)DeletionSingle submitter
1068760c.2089dup (p.Leu697fs)DuplicationSingle submitter
1069316c.1482T>A (p.Cys494Ter)SNVSingle submitter
1069751c.2039_2040delinsAG (p.Cys680Ter)IndelSingle submitter
1069993c.2099del (p.Gln700fs)DeletionSingle submitter
1071267c.2123_2126del (p.Ile708fs)DeletionSingle submitter
1071293c.194_195insAT (p.Thr66fs)InsertionSingle submitter
1071631c.837_838del (p.Tyr280fs)MicrosatelliteSingle submitter
1071933c.929_930del (p.Thr310fs)MicrosatelliteSingle submitter
1072166c.2046_2052del (p.Met682fs)DeletionSingle submitter
1074920c.790del (p.His264fs)DeletionSingle submitter
1075393c.2028del (p.Ser677fs)DeletionSingle submitter
1076815c.979dup (p.Gln327fs)DuplicationSingle submitter
1255698c.1A>T (p.Met1Leu)SNVSingle submitter
SpliceAI Predictions Total Count: 2,888 variants with splice effects TOP 50 High-Score Splice-Altering Variants (Score ≥0.8):
VariantGeneEffectDelta Score
3:36996617:A:AGMLH1acceptor_gain1.00
3:36996618:G:GAMLH1acceptor_gain1.00
3:36996618:GTTT:GMLH1acceptor_gain1.00
3:36996618:GTTTA:GMLH1acceptor_gain1.00
3:36996707:AGGG:AMLH1donor_loss1.00
3:36996708:GG:GMLH1donor_gain1.00
3:36996708:GGGTA:GMLH1donor_loss1.00
3:36996709:GG:GMLH1donor_gain1.00
3:36996710:G:AMLH1donor_loss1.00
3:36996710:G:GGMLH1donor_gain1.00
3:36996711:T:TCMLH1donor_loss1.00
3:36993672:G:GTMLH1donor_gain1.00
3:36993614:G:GTMLH1donor_gain0.99
3:36993614:G:TMLH1donor_gain0.99
3:36993633:C:GMLH1donor_gain0.99
3:36993696:GGC:GMLH1donor_gain0.99
3:36993836:GACC:GMLH1donor_gain0.99
3:36994235:C:CAMLH1acceptor_gain0.99
3:36994236:G:AMLH1acceptor_gain0.99
3:36994347:G:GGMLH1donor_gain0.99
3:36996608:T:TAMLH1acceptor_gain0.99
3:36996613:T:TAMLH1acceptor_gain0.99
3:36996613:TGCCA:TMLH1acceptor_loss0.99
3:36996614:GCCAG:GMLH1acceptor_loss0.99
3:36996615:CCAGT:CMLH1acceptor_loss0.99
3:36996616:CA:CMLH1acceptor_loss0.99
3:36996617:A:ATMLH1acceptor_loss0.99
3:36996618:G:TMLH1acceptor_loss0.99
3:36996618:GT:GMLH1acceptor_gain0.99
3:36996618:GTT:GMLH1acceptor_gain0.99
3:36996705:TCAGG:TMLH1donor_gain0.99
3:36996706:CAGG:CMLH1donor_gain0.99
3:36996707:AGG:AMLH1donor_gain0.99
3:36996708:GGG:GMLH1donor_gain0.99
3:36994231:ATTTC:AMLH1acceptor_gain0.97
3:36993887:GAGT:GMLH1donor_gain0.97
3:36995502:G:GTMLH1donor_gain0.97
3:36993809:GCC:GMLH1donor_gain0.96
3:36993886:GGAGT:GMLH1donor_gain0.95
3:36993887:GAGTG:GMLH1donor_gain0.95
3:36993889:GT:GMLH1donor_gain0.95
3:36993901:GAATA:GMLH1donor_gain0.95
3:36994318:G:TMLH1donor_gain0.95
3:36993812:A:AGMLH1donor_gain0.94
3:36993891:G:GGMLH1donor_gain0.94
3:36993761:T:GMLH1donor_gain0.94
3:36994400:T:GMLH1donor_gain0.94
3:36996619:TTTAG:TMLH1acceptor_loss0.94
3:36996620:TTAGA:TMLH1acceptor_loss0.94
3:36996621:TAGAT:TMLH1acceptor_loss0.94
AlphaMissense Pathogenicity Predictions Total Missense Variants: 4,981 TOP 50 Predicted Pathogenic Missense Variants:
VariantProtein ChangePathogenicity ScoreClassification
3:36993600:G:CR18P1.000likely_pathogenic
3:36993579:T:CL11P0.999likely_pathogenic
3:36993603:T:GI19S0.999likely_pathogenic
3:36993606:C:AA20E0.998likely_pathogenic
3:36993605:G:CA20P0.998likely_pathogenic
3:36993608:G:CA21P0.998likely_pathogenic
3:36993609:C:AA21E0.999likely_pathogenic
3:36993611:G:AG22R0.998likely_pathogenic
3:36993611:G:CG22R0.998likely_pathogenic
3:36993611:G:TG22W0.998likely_pathogenic
3:36993612:G:AG22E0.999likely_pathogenic
3:36993570:T:AI8N0.998likely_pathogenic
3:36993570:T:CI8T0.996likely_pathogenic
3:36993570:T:GI8S0.995likely_pathogenic
3:36993599:C:AR18S0.995likely_pathogenic
3:36993614:G:AE23K0.996likely_pathogenic
3:36993616:A:CE23D0.996likely_pathogenic
3:36993615:A:TE23V0.995likely_pathogenic
3:36993618:T:AV24D0.999likely_pathogenic
3:36993618:T:CV24A0.995likely_pathogenic
3:36993621:T:AI25N0.999likely_pathogenic
3:36993621:T:GI25S0.996likely_pathogenic
3:36993594:T:AV16E0.992likely_pathogenic
3:36993591:T:AV15E0.993likely_pathogenic
3:36993593:G:CV16L0.980likely_pathogenic
3:36993609:C:TA21V0.993likely_pathogenic
3:36993612:G:TG22V0.994likely_pathogenic
3:36993569:A:TI8F0.970likely_pathogenic
3:36993593:G:AV16M0.970likely_pathogenic
3:36993578:C:GL11V0.968likely_pathogenic
3:36993591:T:GV15G0.966likely_pathogenic
3:36993609:C:GA21G0.971likely_pathogenic
3:36993602:A:CI19L0.948likely_pathogenic
3:36993602:A:TI19F0.994likely_pathogenic
3:36993603:T:AI19N1.000likely_pathogenic
3:36993603:T:CI19T0.998likely_pathogenic
3:36993605:G:AA20T0.981likely_pathogenic
3:36993606:C:GA20G0.963likely_pathogenic
3:36993606:C:TA20V0.988likely_pathogenic
3:36993608:G:AA21T0.992likely_pathogenic
3:36993598:C:AN17K0.989likely_pathogenic
3:36993598:C:GN17K0.989likely_pathogenic
3:36993599:C:GR18G0.987likely_pathogenic
3:36993617:G:TV24F0.987likely_pathogenic
3:36993620:A:TI25F0.985likely_pathogenic
3:36993617:G:CV24L0.982likely_pathogenic
3:36993597:A:TN17I0.979likely_pathogenic
3:36993594:T:GV16G0.979likely_pathogenic
3:36993573:G:CR9P0.978likely_pathogenic
3:36993579:T:AL11Q0.998likely_pathogenic

Section 7: Biological Pathways & Gene Ontology Reactome Pathways Total Pathway Count: 6

Pathway IDNameDisease Pathway
R-HSA-5358565Mismatch repair (MMR) directed by MSH2:MSH6 (MutSalpha)No
R-HSA-5358606Mismatch repair (MMR) directed by MSH2:MSH3 (MutSbeta)No
R-HSA-912446Meiotic recombinationNo
R-HSA-6796648TP53 Regulates Transcription of DNA Repair GenesNo
R-HSA-5545483Defective Mismatch Repair Associated With MLH1Yes
R-HSA-5632987Defective Mismatch Repair Associated With PMS2Yes
Gene Ontology Annotations Total GO Terms: 35 Biological Process (20 terms)
GO IDTerm
GO:0006298mismatch repair
GO:0000712resolution of meiotic recombination intermediates
GO:0006303double-strand break repair via nonhomologous end joining
GO:0007060male meiosis chromosome segregation
GO:0007129homologous chromosome pairing at meiosis
GO:0007283spermatogenesis
GO:0008630intrinsic apoptotic signaling pathway in response to DNA damage
GO:0009617response to bacterium
GO:0016321female meiosis chromosome segregation
GO:0016446somatic hypermutation of immunoglobulin genes
GO:0043060meiotic metaphase I homologous chromosome alignment
GO:0045141meiotic telomere clustering
GO:0045190isotype switching
GO:0045950negative regulation of mitotic recombination
GO:0048298positive regulation of isotype switching to IgA isotypes
GO:0048304positive regulation of isotype switching to IgG isotypes
GO:0048477oogenesis
GO:0051257meiotic spindle midzone assembly
GO:0000289nuclear-transcribed mRNA poly(A) tail shortening
Molecular Function (7 terms)
GO IDTerm
GO:0003682chromatin binding
GO:0005524ATP binding
GO:0008047enzyme activator activity
GO:0016887ATP hydrolysis activity
GO:0019899enzyme binding
GO:0032137guanine/thymine mispair binding
GO:0140664ATP-dependent DNA damage sensor activity
Cellular Component (8 terms)
GO IDTerm
GO:0000795synaptonemal complex
GO:0001673male germ cell nucleus
GO:0005634nucleus
GO:0005654nucleoplasm
GO:0005694chromosome
GO:0005712chiasma
GO:0005715late recombination nodule
GO:0016020membrane
GO:0032389MutLalpha complex

Section 8: Protein Interactions & Molecular Networks STRING Protein-Protein Interactions Total Interaction Count: 3,428+ TOP 50 Highest-Confidence Interacting Proteins:

Partner UniProtPartner GeneConfidence Score
P20585MSH3999
P43246MSH2999
P52701MSH6999
Q9UQ84PMS2998
P49751EXO1993
P54277PMS1993
P54278MLH3993
Q13315ATM992
Q9BX63FANCJ/BRIP1951
P38398BRCA1930
O95243MBD4921
Q9UIF7RADX914
P51587BRCA2905
P15056BRAF876
Q9BXW9FANCD2874
P01116KRAS868
P04637TP53868
O15457RBBP8/CtIP856
P42771CDKN2A856
P16455MGMT853
Q07864POLE852
Q9Y2M0FAN1844
O75771RAD51B838
P16422EPCAM837
P35249RFC4826
P32846GPC1825
O43196ESCO1824
O96017CHK2824
P60484PTEN811
P54132BLM808
P28340DPYD790
P42336PIK3CA788
O43502RAD51C787
O60934NBN775
Q14191WRN773
A2PYH4MRE11772
Q9Y5K1PCNA771
Q86YC2FAM175A766
P21359NF1764
Q06609RAD51758
P36894BMPR1A735
Q00765RECC1728
Q15831STK11726
P07992ERCC1725
Q9NS23RINT1719
Q9NXL9RFWD3716
P49959MRE11A715
P37173TGFBR2702
Q99728BARD1700
IntAct Physical Interactions Total: 514+ interactions Key Direct Interaction Partners (High Confidence ≥0.9):
PartnerInteraction TypeConfidence
PMS2direct interaction0.97
BRIP1physical association0.94
MYOGphysical association0.89
RADXphysical association0.87
ZC3H11Aphysical association0.87
UBOX5physical association0.85
KPNA2physical association0.83
PMS1physical association0.83
CBY2physical association0.80
MLH3physical association0.78
MAGEA8physical association0.78
MSH3physical association0.74
PSMA1physical association0.74
TMSB4Xphysical association0.73
TASOR2physical association0.72
FRMD6physical association0.67
CCDC33physical association0.67
ESM2 Structural Similarity Total Similar Proteins: 28 TOP 20 Structurally Similar Proteins:
UniProtTop SimilarityAvg Similarity
P43246 (MSH2)0.99990.9384
Q5XXB50.99990.9384
Q9Z2D10.99950.9515
Q136140.99950.9504
Q024400.99900.9382
Q2PFX00.99890.9437
B2KI880.99880.9520
P43247 (PMS1)0.99870.9419
Q3MHE40.99870.9413
P248600.99860.9370
O952920.99860.9447
Q083010.99850.9327
P146350.99850.9305
P302770.99840.9367
Q1LZG60.99840.9371
A6QNT80.99780.9246
O954860.99780.9213
Q9S9N40.99770.9340
O43502 (RAD51C)0.99760.9410
Q3U2P10.99660.9268
DIAMOND Sequence Homology Total Homologous Proteins: 15
UniProtSpecies/DescriptionTop Identity (%)BitScore
P97679Mouse Mlh191.31349
Q9JK91Rat Mlh191.31351
B5RL36-99.31193
B5RR29-99.31191
P54278Human MLH397.41274
O51229-92.31112
Q0SNV1-92.51112
Q662F3-92.51115
A1QZ05-85.61051
P54280Human MSH650.3421
Q54KD8C. elegans mlh-140.2569
P38920Yeast MLH139.7467
Q9ZRV4Fly Mlh139.2510
Q9P7W6-37.3438

Section 9: Transcription Factor Regulatory Data Note: MLH1 does not encode a transcription factor. This section covers transcription factors that regulate MLH1 expression. Upstream Regulators (TFs that Regulate MLH1) Total: 19 regulators from CollecTRI database

Transcription FactorRegulation TypeConfidence
TP53Activation-
BHLHE40UnknownHigh
BHLHE41RepressionHigh
CEBPZRepressionHigh
GLI1UnknownHigh
GLI2UnknownHigh
HOXA5UnknownHigh
MLXIPUnknownHigh
BRIP1Unknown-
DNMT3ARepression-
E2F4Unknown-
HIF1AUnknown-
MAFGUnknown-
CTNNBL1UnknownLow
DNMT1UnknownLow
ESR2UnknownLow
HOXD1UnknownLow
TP73UnknownLow
WT1RepressionLow

Section 10: Drug & Pharmacology Data Drug Target Status MLH1/P40692 is not directly targeted by approved drugs (no ChEMBL target entry). PharmGKB Gene Entry

AttributeValue
PharmGKB IDPA240
VIP GeneYes (Very Important Pharmacogene)
Has CPIC GuidelineNo
Has Variant AnnotationsYes
Drug-Gene Associations (PharmGKB)
Drug/ClassClinical AnnotationsVariant Annotations
Platinum compounds3001138
Talazoparib00
Pharmacogenomic Context MLH1 deficiency/mutations affect response to:
  • Platinum-based chemotherapy (cisplatin, carboplatin, oxaliplatin)
  • MMR-deficient tumors may show resistance to platinum agents
  • Clinical annotation count: 300+
  • PARP inhibitors (e.g., talazoparib)
  • Potential synthetic lethality in MMR-deficient cancers

Section 11: Expression Profiles Bgee Expression Summary

AttributeValue
Expression BreadthUbiquitous
Total Present Calls296
Total Absent Calls4
Total Conditions300
Max Expression Score94.44
Average Expression Score87.57
Gold Quality Count294
TOP 30 Tissues by Expression Score
Tissue (UBERON ID)ExpressionScoreQuality
Tibialis anterior (UBERON:0001385)present94.44gold
Skeletal muscle of rectus abdominis (UBERON:0004511)present94.42gold
Deltoid (UBERON:0001476)present94.37gold
Left ventricle myocardium (UBERON:0006566)present94.30gold
Heart left ventricle (UBERON:0002084)present93.97gold
Cardiac ventricle (UBERON:0002082)present93.90gold
Primordial germ cell in gonadpresent93.85gold
Vastus lateralis (UBERON:0001379)present93.84gold
Quadriceps femoris (UBERON:0001377)present93.78gold
Apex of heart (UBERON:0002098)present93.60gold
Skeletal muscle tissue (UBERON:0001134)present93.35gold
Heart right ventricle (UBERON:0002080)present93.35gold
Ganglionic eminence (UBERON:0004023)present93.22gold
Muscle organ (UBERON:0001630)present92.99gold
Skeletal muscle organ (UBERON:0014892)present92.99gold
Biceps brachii (UBERON:0001507)present92.97gold
Muscle tissue (UBERON:0002385)present92.79gold
Heart (UBERON:0000948)present92.78gold
Muscle of leg (UBERON:0001383)present92.75gold
Ventricular zone (UBERON:0003053)present92.73gold
Tibia (UBERON:0000979)present92.69gold
Calcaneal tendon (UBERON:0003701)present92.62gold
Gastrocnemius (UBERON:0001388)present92.53gold
Diaphragm (UBERON:0001103)present92.49gold
Pituitary gland (UBERON:0000007)present92.48gold
Triceps brachii (UBERON:0001509)present92.45gold
Adenohypophysis (UBERON:0002196)present92.41gold
Skin of hip (UBERON:0001554)present92.34gold
Right atrium auricular region (UBERON:0006631)present92.32gold
Corpus callosum (UBERON:0002336)present92.23gold
Expression Pattern: MLH1 shows ubiquitous expression with highest levels in skeletal muscle, cardiac muscle, and germline tissues. This is consistent with its fundamental role in DNA mismatch repair in all dividing cells. Single-Cell Expression Data
DatasetDescriptionSpeciesCell Count
E-MTAB-2983Functional germ line stem cells in adult mammalian ovariesHomo sapiens38

Section 12: Disease Associations Mendelian/Monogenic Disease Links (GenCC) Total Disease Associations: 18

DiseaseOMIM/MONDOInheritanceEvidenceSubmitter
Lynch syndrome 2OMIM:609310ADDefinitiveAmbry, G2P
Lynch syndromeMONDO:0005835ADDefinitiveG2P
Mismatch repair cancer syndrome 1OMIM:276300ARDefinitiveAmbry, G2P
Muir-Torre syndromeOMIM:158320ADDefinitiveG2P
Lynch syndromeORPHANET:144ADSupportiveOrphanet
Muir-Torre syndromeOMIM:158320ADStrongGenomics England
Lynch syndrome 2OMIM:609310ADStrongGenomics England
Lynch syndrome 1OMIM:120435ADStrongLabcorp
Mismatch repair cancer syndrome 1OMIM:276300ARStrongLabcorp
Ovarian cancerMONDO:0008170ADStrongGenomics England
Muir-Torre syndromeOMIM:158320ADModerateAmbry
RhabdomyosarcomaMONDO:0005212ARModerateGenomics England
Malignant pancreatic neoplasmMONDO:0009831ADModerateGenomics England
Prostate cancerMONDO:0008315ADLimitedAmbry
Breast cancerMONDO:0007254ADDisputedAmbry
Orphanet Disease Entries
Orphanet IDDisease NameTypeGene CountPhenotype Count
144Lynch syndromeDisease962
252202Constitutional mismatch repair deficiency syndromeDisease40
Human Phenotype Ontology (HPO) Associations Total Phenotype Terms: 89 TOP 50 Clinical Phenotypes:
HPO IDPhenotype
HP:0003003Colon cancer
HP:0005227Adenomatous colonic polyposis
HP:0100743Neoplasm of the rectum
HP:0012114Endometrial carcinoma
HP:0100615Ovarian neoplasm
HP:0002894Neoplasm of the pancreas
HP:0006753Neoplasm of the stomach
HP:0006725Pancreatic adenocarcinoma
HP:0040276Adenocarcinoma of the colon
HP:0040274Adenocarcinoma of the small intestine
HP:0006771Duodenal adenocarcinoma
HP:0006758Malignant genitourinary tract tumor
HP:0009726Renal neoplasm
HP:0010786Urinary tract neoplasm
HP:0002665Lymphoma
HP:0012539Non-Hodgkin lymphoma
HP:0012190T-cell lymphoma
HP:0001909Leukemia
HP:0004377Hematological neoplasm
HP:0003002Breast carcinoma
HP:0100031Neoplasm of the thyroid gland
HP:0001402Hepatocellular carcinoma
HP:0002896Neoplasm of the liver
HP:0008069Neoplasm of the skin
HP:0002671Basal cell carcinoma
HP:0030410Sebaceous gland carcinoma
HP:0009720Adenoma sebaceum
HP:0009592Astrocytoma
HP:0012174Glioblastoma multiforme
HP:0033681Oligodendroglioma
HP:0002888Ependymoma
HP:0002885Medulloblastoma
HP:0033682Pleomorphic xanthoastrocytoma
HP:0100835Benign neoplasm of the central nervous system
HP:0003006Neuroblastoma
HP:0002893Pituitary adenoma
HP:0002859Rhabdomyosarcoma
HP:0010622Neoplasm of the skeletal system
HP:0100684Salivary gland neoplasm
HP:0012118Laryngeal carcinoma
HP:0200008Intestinal polyposis
HP:0006719Benign gastrointestinal tract tumors
HP:0006778Benign genitourinary tract neoplasm
HP:0000006Autosomal dominant inheritance
HP:0000007Autosomal recessive inheritance
HP:0003596Middle age onset
HP:0001522Death in infancy
HP:0100613Death in early adulthood
HP:0007565Multiple cafe-au-lait spots
HP:0009732Plexiform neurofibroma
GWAS Associations Total: 7 associations
Study IDTrait/DiseaseMapped GeneP-value
GCST012206Proximal colorectal cancerMLH14×10⁻¹⁸
GCST90002401Platelet distribution widthMLH12×10⁻¹⁸
GCST90013405Liver enzyme levels (ALT)MLH14×10⁻¹⁵
GCST004521Autism spectrum disorder or schizophreniaHSPD1P6-LINC020331×10⁻¹¹
GCST004946SchizophreniaTRANK13×10⁻¹¹
GCST90011898Alanine aminotransferase levelsMLH12×10⁻⁸
GCST003158Subjective response to lithium treatmentHSPD1P6-LINC020338×10⁻⁷

DATA SOURCES

DatabaseVersion/AccessURL
HGNCCurrenthttps://www.genenames.org
EnsemblGRCh38https://www.ensembl.org
NCBI GeneCurrenthttps://www.ncbi.nlm.nih.gov/gene
UniProtCurrenthttps://www.uniprot.org
OMIMCurrenthttps://omim.org
RefSeqCurrenthttps://www.ncbi.nlm.nih.gov/refseq
PDBCurrenthttps://www.rcsb.org
AlphaFoldv4https://alphafold.ebi.ac.uk
InterProCurrenthttps://www.ebi.ac.uk/interpro
ClinVarCurrenthttps://www.ncbi.nlm.nih.gov/clinvar
AlphaMissense2023https://alphamissense.hegelab.org
SpliceAICurrent-
ReactomeCurrenthttps://reactome.org
Gene OntologyCurrenthttp://geneontology.org
STRINGv12https://string-db.org
IntActCurrenthttps://www.ebi.ac.uk/intact
BgeeCurrenthttps://www.bgee.org
PharmGKBCurrenthttps://www.pharmgkb.org
GenCCCurrenthttps://thegencc.org
OrphanetCurrenthttps://www.orpha.net
HPOCurrenthttps://hpo.jax.org
GWAS CatalogCurrenthttps://www.ebi.ac.uk/gwas
CollecTRICurrenthttps://collectri.github.io

Reference compiled via biobtree cross-database integration platform I have compiled a comprehensive cross-database identifier and functional mapping reference for human MLH1 (mutL homolog 1). Summary of Key Findings

CategoryKey Data
Gene IDsHGNC:7127, ENSG00000076242, Entrez 4292, OMIM 120436
LocationChr3:36,993,226-37,050,896 (+ strand, 3p22.2)
Transcripts44 Ensembl, MANE Select: NM_000249
ProteinP40692 (756 aa, 84.6 kDa)
Structures7 PDB entries, AlphaFold model (pLDDT 77.89)
OrthologsConserved across mouse, rat, zebrafish, fly, worm, yeast
ClinVar Variants6,260 total (500+ pathogenic)
AlphaMissense4,981 missense predictions
SpliceAI2,888 splice effect predictions
Pathways6 Reactome (mismatch repair, meiosis, TP53 regulation)
GO Terms35 (DNA repair, meiosis, ATP binding)
Interactions3,428+ STRING, 514+ IntAct
ExpressionUbiquitous (highest in muscle, heart, germline)
Primary DiseasesLynch syndrome (definitive), Muir-Torre syndrome, CMMRD
PharmacogenomicsVIP gene - affects platinum compound response
The reference covers all 12 sections with counts, identifiers, and detailed listings suitable for research use.