MLH1 Gene Complete Identifier and Functional Mapping Reference
Provide a comprehensive cross-database identifier and functional mapping reference for human MLH1 — a definitive lookup resource covering: ### Section 1: Gene identifiers For human gene MLH1, list ALL gene-level database identifiers. Required: - HGNC ID and approved symbol - Ensembl gene ID (ENSG...) - NCBI Entrez Gene ID - OMIM gene/locus ID - Genomic location: chromosome, start position, end position, strand (GRCh38) ### Section 2: Transcript identifiers For human gene MLH1, list ALL transcript-level identifiers. Required: - Ensembl transcripts: ALL ENST IDs with biotype. Total count. - RefSeq transcripts: ALL NM_ mRNA accessions. Mark which is MANE Select. - CCDS IDs. - For the CANONICAL/MANE SELECT transcript: ALL exon IDs (ENSE) with genomic coordinates and total exon count. ### Section 3: Protein identifiers For human gene MLH1 protein product(s), list ALL protein-level identifiers. Required: - UniProt accessions: ALL entries (reviewed and unreviewed). Mark the canonical reviewed entry. - RefSeq protein: ALL NP_ accessions. - Protein domains and families: list ALL annotated domains/families with identifiers, including name, type (domain/family/superfamily), and ID. - Antibody availability: known antibody resources for the protein. ### Section 4: Structure For human gene MLH1 protein, list ALL structural data. Required: - Experimental structures: ALL PDB IDs. For each: experimental method (X-ray/NMR/Cryo-EM) and resolution. Total count. - Predicted structures: AlphaFold model ID and confidence metrics (pLDDT). ### Section 5: Cross-species orthologs For human gene MLH1, list orthologous genes in key model organisms. Organisms: - Mouse (Mus musculus): gene ID, symbol - Rat (Rattus norvegicus): gene ID, symbol - Zebrafish (Danio rerio): gene ID, symbol - Fruit fly (Drosophila melanogaster): gene ID, symbol - Worm (C. elegans): gene ID, symbol - Yeast (S. cerevisiae): gene ID, symbol ### Section 6: Clinical variants & AI predictions For human gene MLH1, summarize clinical variants and AI predictions. Clinical variant annotations (ClinVar): - Total variant count (approximate is fine) - Breakdown by classification: Pathogenic, Likely Pathogenic, VUS, Likely Benign, Benign - TOP 30 pathogenic/likely pathogenic variants with: variant ID, HGVS notation, associated condition AI-based variant effect predictions: - Splice effect predictions: total count + TOP 30 with delta scores if known - Missense pathogenicity from AlphaMissense — total count + TOP 30 likely-pathogenic with am_pathogenicity scores. ### Section 7: Pathways & Gene Ontology For human gene MLH1, list biological pathways and Gene Ontology annotations. Pathway membership: - ALL biological pathways this gene participates in, with pathway IDs and names - Total pathway count Gene Ontology: - Biological Process: count and TOP 20 terms with GO IDs - Molecular Function: count and TOP 20 terms with GO IDs - Cellular Component: count and TOP 20 terms with GO IDs ### Section 8: Protein interactions & networks For human gene MLH1 protein, summarize protein interactions and networks. Protein-protein interactions (STRING, IntAct, BioGRID, etc.): - Total interaction count (approximate) - TOP 30 highest-confidence interacting proteins with scores/evidence Protein similarity: - Structural/embedding similarity (e.g. Foldseek, ESM): TOP 20 similar proteins with scores - Sequence homology: TOP 20 homologous proteins with identity/similarity ### Section 9: Transcription factor regulatory data For human gene MLH1, summarize transcription factor regulatory data. If MLH1 is a transcription factor: - Downstream targets: total count + TOP 30 with regulation type (activates/represses) and evidence - DNA binding motifs from JASPAR — all known motif IDs and motif family classification. Regardless: - Upstream regulators: TFs that regulate MLH1 — names with evidence type (ChIP-seq / predicted / experimentally validated) If MLH1 is not a transcription factor, say so briefly and skip the downstream/motif sections. ### Section 10: Drug & pharmacology data For human gene MLH1 protein as a drug target, summarize pharmacology data. If MLH1 is a known drug target: - Targeting molecules: total count in ChEMBL/DrugBank + TOP 30 by development phase (molecule ID, name, mechanism, highest phase) - Clinical trials: TOP 20 involving drugs targeting this gene — trial ID, phase, status, intervention - Pharmacogenomics: known drug-gene interactions affecting drug response + dosing guidelines if any If MLH1 is not currently a drug target, say so briefly. ### Section 11: Expression profiles For human gene MLH1, summarize expression profiles. Tissue expression (GTEx, HPA, Bgee, etc.): - TOP 30 tissues with expression scores/levels (direction, units if known) - Note tissue-specific or tissue-enriched patterns Cell type expression (Tabula Sapiens, HCA, etc.): - TOP 30 cell types with expression scores - Note cell-type-specific patterns Single-cell expression: notable datasets or cell populations of interest for this gene. ### Section 12: Disease associations For human gene MLH1, summarize disease associations. Mendelian / monogenic disease: - Diseases caused by mutations in MLH1: disease name, disease ID (OMIM/Orphanet/Mondo), inheritance pattern, evidence level - Include all directly linked conditions Phenotype associations: - Clinical phenotypes associated with the gene (HPO terms where known) - TOP 30 phenotype terms with HPO IDs Complex-disease / GWAS: - Traits and diseases significantly associated via GWAS: trait name, variant, effect size, study where known - TOP 30 GWAS associations
Executive summary
MLH1 (MutL Homolog 1, HGNC:7127) encodes a 756 amino acid, 84.6 kDa DNA mismatch repair protein that is a cornerstone of genomic stability, and its loss is one of the most clinically consequential germline events in human cancer predisposition. Pathogenic and likely pathogenic variants — totaling roughly 1,200 out of ~6,260 ClinVar entries — cause Lynch syndrome (autosomal dominant) and mismatch repair cancer syndrome 1 (autosomal recessive), conferring strong inherited risk for colorectal, endometrial, ovarian, and several other cancers. MLH1 is ubiquitously expressed across 296 of 300 surveyed tissues (mean Bgee score 87.57), with particularly high expression in skeletal and cardiac muscle and in primordial germ cells, consistent with its additional role in meiotic recombination. Its highest-confidence protein partners are the core MMR complex members MSH2, MSH3, and MSH6 (STRING scores 999), and TP53 is a notable upstream transcriptional activator. Although MLH1 is a VIP pharmacogene in PharmGKB, it has no approved or investigational small-molecule drugs targeting it directly; its clinical utility remains in germline variant screening rather than pharmacological intervention.
Gene identifiers
| Identifier | Value |
|---|---|
| HGNC ID | HGNC:7127 |
| Approved symbol | MLH1 |
| Ensembl gene ID | ENSG00000076242 |
| NCBI Entrez Gene ID | 4292 |
| OMIM gene ID | 120436 |
| Chromosome | 3 |
| Start position (GRCh38) | 36,993,226 |
| End position (GRCh38) | 37,050,896 |
| Strand | + |
Transcript identifiers
Ensembl transcripts (ENST IDs)
| Transcript ID | Biotype |
|---|---|
| ENST00000231790 | protein_coding |
| ENST00000413212 | nonsense_mediated_decay |
| ENST00000413740 | protein_coding |
| ENST00000429117 | protein_coding |
| ENST00000432299 | nonsense_mediated_decay |
| ENST00000435176 | protein_coding |
| ENST00000441265 | protein_coding |
| ENST00000442249 | retained_intron |
| ENST00000447829 | nonsense_mediated_decay |
| ENST00000450420 | protein_coding |
| ENST00000454028 | nonsense_mediated_decay |
| ENST00000455445 | protein_coding |
| ENST00000456676 | protein_coding |
| ENST00000457004 | nonsense_mediated_decay |
| ENST00000458009 | nonsense_mediated_decay |
| ENST00000458205 | protein_coding |
| ENST00000466900 | protein_coding |
| ENST00000476172 | retained_intron |
| ENST00000485889 | protein_coding |
| ENST00000492474 | protein_coding |
| ENST00000536378 | protein_coding |
| ENST00000539477 | protein_coding |
| ENST00000616768 | protein_coding |
| ENST00000673673 | protein_coding |
| ENST00000673686 | retained_intron |
| ENST00000673713 | retained_intron |
| ENST00000673715 | protein_coding |
| ENST00000673741 | retained_intron |
| ENST00000673889 | retained_intron |
| ENST00000673897 | nonsense_mediated_decay |
| ENST00000673899 | protein_coding |
| ENST00000673947 | nonsense_mediated_decay |
| ENST00000673972 | nonsense_mediated_decay |
| ENST00000673990 | protein_coding |
| ENST00000674019 | protein_coding |
| ENST00000674107 | retained_intron |
| ENST00000674111 | nonsense_mediated_decay |
| ENST00000674125 | retained_intron |
| ENST00000713802 | protein_coding |
| ENST00000931189 | protein_coding |
| ENST00000931190 | protein_coding |
| ENST00000931191 | protein_coding |
| ENST00000948704 | protein_coding |
| ENST00000948705 | protein_coding |
Total: 44 Ensembl transcripts
RefSeq mRNA transcripts (NM_ accessions)
| RefSeq ID | MANE Select |
|---|---|
| NM_000249 | ✓ |
| NM_001167617 | |
| NM_001167618 | |
| NM_001167619 | |
| NM_001258271 | |
| NM_001258273 | |
| NM_001258274 | |
| NM_001354615 | |
| NM_001354616 | |
| NM_001354617 | |
| NM_001354618 | |
| NM_001354619 | |
| NM_001354620 | |
| NM_001354621 | |
| NM_001354622 | |
| NM_001354623 | |
| NM_001354624 | |
| NM_001354625 | |
| NM_001354626 | |
| NM_001354627 | |
| NM_001354628 | |
| NM_001354629 | |
| NM_001354630 |
Total: 23 human MLH1 mRNA transcripts (NM_000249 is MANE Select)
CCDS IDs
- CCDS2663
- CCDS54562
- CCDS54563
MANE SELECT transcript exons (ENST00000231790 / NM_000249)
| Exon # | Exon ID | Genomic Coordinates |
|---|---|---|
| 1 | ENSE00004012715 | 36993518–36993663 |
| 2 | ENSE00003496022 | 36996619–36996709 |
| 3 | ENSE00003599036 | 37000955–37001053 |
| 4 | ENSE00003635406 | 37004401–37004474 |
| 5 | ENSE00003533853 | 37006991–37007063 |
| 6 | ENSE00003633623 | 37008814–37008905 |
| 7 | ENSE00003521968 | 37011820–37011862 |
| 8 | ENSE00003516787 | 37012011–37012099 |
| 9 | ENSE00003656033 | 37014432–37014544 |
| 10 | ENSE00001716871 | 37017506–37017599 |
| 11 | ENSE00003785627 | 37020310–37020463 |
| 12 | ENSE00003688106 | 37025637–37026007 |
| 13 | ENSE00003680564 | 37028784–37028932 |
| 14 | ENSE00001730188 | 37040186–37040294 |
| 15 | ENSE00001747618 | 37042268–37042331 |
| 16 | ENSE00001748897 | 37047519–37047683 |
| 17 | ENSE00001593400 | 37048517–37048609 |
| 18 | ENSE00001785296 | 37048904–37049017 |
| 19 | ENSE00003902581 | 37050486–37050846 |
Total exons: 19 (Chromosome 3, + strand)
Protein identifiers
UniProt Accessions
- P40692 (Reviewed - canonical entry) | DNA mismatch repair protein Mlh1 | 756 aa, 84.6 kDa
RefSeq Protein (NP_) Accessions
- NP_000240 (MANE Select)
- NP_001161089
- NP_001161090
- NP_001161091
- NP_001245200
- NP_001245202
- NP_001245203
- NP_001341544
- NP_001341545
- NP_001341546
- NP_001341547
- NP_001341548
- NP_001341549
Protein Domains and Families
InterPro entries (8):
| ID | Name | Type |
|---|---|---|
| IPR002099 | DNA mismatch repair protein MutL/Mlh/PMS | Family |
| IPR013507 | DNA mismatch repair protein, S5 domain 2-like | Domain |
| IPR014721 | Small ribosomal subunit protein uS5 domain 2-type fold, subgroup | Homologous_superfamily |
| IPR014762 | DNA mismatch repair, conserved site | Conserved_site |
| IPR020568 | Ribosomal protein uS5 domain 2-type superfamily | Homologous_superfamily |
| IPR032189 | DNA mismatch repair protein Mlh1, C-terminal | Domain |
| IPR036890 | Histidine kinase/HSP90-like ATPase superfamily | Homologous_superfamily |
| IPR038973 | DNA mismatch repair protein MutL/Mlh/Pms-like | Family |
Pfam entries (3):
- PF01119
- PF13589
- PF16413
Antibody Resources
No antibody resources available in biobtree database.
Structure
Experimental Structures (PDB)
Human MLH1 has 2 crystal structures:
| PDB ID | Method | Resolution (Å) |
|---|---|---|
| 3RBN | X-ray Diffraction | 2.16 |
| 4P7A | X-ray Diffraction | 2.30 |
Additional structures contain MLH1 NLS peptide fragments in complex with mouse importin-alpha: 5U5P (2.171 Å), 6WBA (2.151 Å), 6WBB (2.663 Å), 6WBC (2.15 Å), 7M60 (2.30 Å).
Total human MLH1 structures: 2
Predicted Structures (AlphaFold)
| Model ID | Global pLDDT | pLDDT > 90 (%) |
|---|---|---|
| AF-P40692-F1 | 77.89 | 51 |
Cross-species orthologs
| Organism | Gene ID | Symbol |
|---|---|---|
| Mouse (Mus musculus) | ENSMUSG00000032498 | Mlh1 |
| Rat (Rattus norvegicus) | ENSRNOG00000033809 | Mlh1 |
| Zebrafish (Danio rerio) | ENSDARG00000025948 | mlh1 |
| Fruit fly (Drosophila melanogaster) | FBGN0011659 | Mlh1 |
| Worm (C. elegans) | WBGENE00003373 | mlh-1 |
| Yeast (S. cerevisiae) | YMR167W | MLH1 |
Clinical variants & AI predictions
Clinical Variants (ClinVar)
Variant counts by classification:
| Classification | Count |
|---|---|
| Uncertain significance | ~4,600 |
| Pathogenic | ~900 |
| Likely pathogenic | ~300 |
| Benign/Likely benign | ~200 |
| Conflicting classifications | ~260 |
| Total variants | ~6,260 |
Top 30 Pathogenic/Likely Pathogenic Variants:
| Variant ID | HGVS | Type | Classification | Associated Condition |
|---|---|---|---|---|
| 1012206 | c.1003del (p.Leu335fs) | Deletion | Pathogenic | Lynch Syndrome 1 |
| 1048862 | c.497T>G (p.Leu166Ter) | SNV | Pathogenic | Colorectal cancer, hereditary nonpolyposis |
| 1049302 | c.1275dup (p.Gln426fs) | Duplication | Pathogenic | Colorectal cancer, hereditary nonpolyposis |
| 1049248 | c.1039-2_1409+1del | Deletion | Pathogenic | Hereditary cancer syndrome |
| 1049670 | c.1896+1G>C | SNV | Pathogenic | Lynch syndrome |
| 1049673 | c.1367del (p.Thr455_Ser456insTer) | Deletion | Pathogenic | Colorectal cancer, hereditary nonpolyposis |
| 1049396 | c.1999del (p.Asp667fs) | Deletion | Pathogenic | Lynch syndrome |
| 1049508 | c.631_632del (p.Ser211fs) | Deletion | Pathogenic | Hereditary neoplastic syndrome |
| 1049646 | c.546-2_589-59del | Deletion | Pathogenic | Hereditary cancer-predisposing syndrome |
| 1050017 | c.365del (p.Gly122fs) | Deletion | Pathogenic | Lynch syndrome |
| 1050185 | c.1039-2_1409+150del | Deletion | Pathogenic | Hereditary cancer syndrome |
| 1050489 | c.1770_1772delinsC (p.Leu590fs) | Indel | Pathogenic | Tumor predisposition |
| 1050518 | c.1732-2_1896+1del | Deletion | Pathogenic | Lynch syndrome |
| 1050645 | c.1559-2_1667+1del | Deletion | Pathogenic | Colorectal cancer, hereditary nonpolyposis |
| 1050662 | c.1559-4_1667+63del | Deletion | Pathogenic | Hereditary cancer-predisposing syndrome |
| 1050751 | c.995_996insA (p.Ser332fs) | Insertion | Pathogenic | Lynch syndrome |
| 1023950 | c.1681_1686del (p.Tyr561_Gln562del) | Deletion | Pathogenic | Colorectal cancer |
| 1048888 | c.2131dup (p.Ser711fs) | Duplication | Pathogenic | Hereditary neoplastic syndrome |
| 1048910 | c.131_132delinsTT (p.Ser44Phe) | Indel | Pathogenic | Lynch syndrome |
| 1049107 | c.588+1G>C | SNV | Likely pathogenic | Hereditary cancer syndrome |
| 1048973 | c.2258_2259dup (p.Glu754fs) | Duplication | Likely pathogenic | Tumor predisposition |
| 1006220 | c.193G>A (p.Gly65Ser) | SNV | Likely pathogenic | Lynch syndrome |
| 1049886 | c.-2_116+1del | Deletion | Pathogenic | Colorectal cancer, hereditary nonpolyposis |
| 1050070 | c.-11C>A | SNV | Uncertain/Likely pathogenic | Hereditary cancer syndrome |
| 1049099 | c.2031del (p.Ser677fs) | Deletion | Uncertain/Likely pathogenic | Lynch syndrome |
| 1009544 | c.657_662del (p.Phe220_Gly221del) | Deletion | Uncertain significance | Colorectal cancer |
| 1034510 | c.569_571del (p.Ile190del) | Deletion | Uncertain significance | Hereditary cancer syndrome |
| 1001928 | c.380+6G>T | SNV | Conflicting classifications | Lynch syndrome |
| 1010470 | c.117-3C>T | SNV | Conflicting classifications | Colorectal cancer |
| 1015357 | c.117-6T>A | SNV | Conflicting classifications | Hereditary cancer-predisposing syndrome |
AI-based Predictions
SpliceAI Predictions: 2,888 total variants
Top predictions by effect score:
- Donor gain: c.1896+1G>C (0.99), c.614G>T (0.99), c.3633G>T (0.99)
- Donor loss: c.1660+5_1667del (0.91), c.1410-2_1558+1del (0.91), c.588+15del (0.91)
AlphaMissense (Likely Pathogenic): ~100+ variants
| Variant | Position | Score | Classification |
|---|---|---|---|
| L11P | 3:36993579 | 0.999 | Likely pathogenic |
| I8N | 3:36993570 | 0.998 | Likely pathogenic |
| R9P | 3:36993573 | 0.978 | Likely pathogenic |
| R10W | 3:36993575 | 0.871 | Likely pathogenic |
| V15E | 3:36993591 | 0.993 | Likely pathogenic |
| V16E | 3:36993594 | 0.992 | Likely pathogenic |
| I19N | 3:36993603 | 1.000 | Likely pathogenic |
| A20P | 3:36993605 | 0.998 | Likely pathogenic |
| A21P | 3:36993608 | 0.998 | Likely pathogenic |
| G22R | 3:36993611 | 0.998 | Likely pathogenic |
| E23K | 3:36993614 | 0.996 | Likely pathogenic |
| V24D | 3:36993618 | 0.999 | Likely pathogenic |
| I25N | 3:36993621 | 0.999 | Likely pathogenic |
| [+86 additional variants] | — | 0.615–0.999 | Likely pathogenic |
Pathways & Gene Ontology
Biological Pathways
Reactome Pathways (6 total)
| ID | Pathway Name |
|---|---|
| R-HSA-5358565 | Mismatch repair (MMR) directed by MSH2:MSH6 (MutSalpha) |
| R-HSA-5358606 | Mismatch repair (MMR) directed by MSH2:MSH3 (MutSbeta) |
| R-HSA-5545483 | Defective Mismatch Repair Associated With MLH1 |
| R-HSA-5632987 | Defective Mismatch Repair Associated With PMS2 |
| R-HSA-6796648 | TP53 Regulates Transcription of DNA Repair Genes |
| R-HSA-912446 | Meiotic recombination |
MSigDB Gene Sets (100 total)
MLH1 is annotated in 100 MSigDB gene sets, including Reactome pathway collections, KEGG pathways, Gene Ontology-based sets, and curated gene signatures. Key pathway-related entries include:
- REACTOME_MISMATCH_REPAIR, REACTOME_DNA_REPAIR, REACTOME_DISEASES_OF_MISMATCH_REPAIR_MMR, REACTOME_REPRODUCTION, REACTOME_TRANSCRIPTIONAL_REGULATION_BY_TP53
- KEGG_MISMATCH_REPAIR, KEGG_PATHWAYS_IN_CANCER, KEGG_COLORECTAL_CANCER, KEGG_ENDOMETRIAL_CANCER
Total pathway membership: 6 Reactome + 100 MSigDB gene sets
Gene Ontology Annotations
Biological Process (19 terms)
| GO ID | Term |
|---|---|
| GO:0000289 | Nuclear-transcribed mRNA poly(A) tail shortening |
| GO:0000712 | Resolution of meiotic recombination intermediates |
| GO:0006298 | Mismatch repair |
| GO:0006303 | Double-strand break repair via nonhomologous end joining |
| GO:0007060 | Male meiosis chromosome segregation |
| GO:0007129 | Homologous chromosome pairing at meiosis |
| GO:0007283 | Spermatogenesis |
| GO:0008630 | Intrinsic apoptotic signaling pathway in response to DNA damage |
| GO:0009617 | Response to bacterium |
| GO:0016321 | Female meiosis chromosome segregation |
| GO:0016446 | Somatic hypermutation of immunoglobulin genes |
| GO:0043060 | Meiotic metaphase I homologous chromosome alignment |
| GO:0045141 | Meiotic telomere clustering |
| GO:0045190 | Isotype switching |
| GO:0045950 | Negative regulation of mitotic recombination |
| GO:0048298 | Positive regulation of isotype switching to IgA isotypes |
| GO:0048304 | Positive regulation of isotype switching to IgG isotypes |
| GO:0048477 | Oogenesis |
| GO:0051257 | Meiotic spindle midzone assembly |
Molecular Function (7 terms)
| GO ID | Term |
|---|---|
| GO:0003682 | Chromatin binding |
| GO:0005524 | ATP binding |
| GO:0008047 | Enzyme activator activity |
| GO:0016887 | ATP hydrolysis activity |
| GO:0019899 | Enzyme binding |
| GO:0032137 | Guanine/thymine mispair binding |
| GO:0140664 | ATP-dependent DNA damage sensor activity |
Cellular Component (9 terms)
| GO ID | Term |
|---|---|
| GO:0000795 | Synaptonemal complex |
| GO:0001673 | Male germ cell nucleus |
| GO:0005634 | Nucleus |
| GO:0005654 | Nucleoplasm |
| GO:0005694 | Chromosome |
| GO:0005712 | Chiasma |
| GO:0005715 | Late recombination nodule |
| GO:0016020 | Membrane |
| GO:0032389 | MutLalpha complex |
Protein interactions & networks
Protein-Protein Interactions (PPIs)
Total interaction counts:
- STRING: 3,428 interactions
- IntAct: 514 interactions
- BioGRID: 491 interactions
- DIP: 1 interaction
TOP 30 highest-confidence STRING interactors with scores (0-1000 scale):
| Rank | Gene | UniProt | Score | Evidence |
|---|---|---|---|---|
| 1 | MSH3 | P20585 | 999 | Mismatch repair complex |
| 2 | MSH2 | P43246 | 999 | Mismatch repair complex |
| 3 | MSH6 | P52701 | 999 | Mismatch repair complex |
| 4 | EXO1 | Q9UQ84 | 998 | DNA repair pathway |
| 5 | MLH3 | P49751 | 993 | Mismatch repair |
| 6 | PMS1 | P54277 | 993 | Mismatch repair complex |
| 7 | PMS2 | P54278 | 993 | Mismatch repair complex |
| 8 | ATM | Q13315 | 992 | DNA damage response |
| 9 | BRIP1 | Q9BX63 | 951 | Fanconi anemia pathway |
| 10 | BRCA1 | P38398 | 930 | DNA repair |
| 11 | MBD4 | O95243 | 921 | Base excision repair |
| 12 | NEIL1 | Q9UIF7 | 914 | Base excision repair |
| 13 | BRCA2 | P51587 | 905 | DNA repair |
| 14 | BRAF | P15056 | 876 | Signaling pathway |
| 15 | FANCD2 | Q9BXW9 | 874 | Fanconi anemia pathway |
| 16 | KRAS | P01116 | 868 | Signaling |
| 17 | TP53 | P04637 | 868 | Tumor suppressor |
| 18 | MSH4 | O15457 | 856 | Mismatch repair |
| 19 | CDKN2A | P42771 | 856 | Cell cycle control |
| 20 | MGMT | P16455 | 853 | DNA repair |
| 21 | POLE | Q07864 | 852 | DNA polymerase |
| 22 | FAN1 | Q9Y2M0 | 844 | Fanconi anemia pathway |
| 23 | RAD51D | O75771 | 838 | Homologous recombination |
| 24 | EPCAM | P16422 | 837 | Cell adhesion |
| 25 | RFC4 | P35249 | 826 | Replication factor |
| 26 | RFC2 | P32846 | 825 | Replication factor |
| 27 | MSH5 | O43196 | 824 | Mismatch repair |
| 28 | CHK2 | O96017 | 824 | DNA damage checkpoint |
| 29 | PTEN | P60484 | 811 | Phosphatase/tumor suppressor |
| 30 | BLM | P54132 | 808 | DNA helicase |
Top IntAct interactions with confidence scores:
- PMS2: 0.970 (direct interaction)
- MYOG: 0.890
- RADX: 0.870
- ZC3H11A: 0.870
- KPNA2: 0.830
- PMS1: 0.830
- MSH3: 0.740
- AP2B1: 0.790
- CBY2: 0.790-0.800
- MAGEA8: 0.780
- MLH3: 0.780
- TASOR2: 0.720
- CCDC33: 0.670
- TMSB4X: 0.730
- SKP2: 0.560
- HSPA8: 0.500
Protein Similarity
TOP 20 ESM2 structural embeddings (0-1.0 scale, max similarity = 1.0):
| Rank | UniProt | Top Similarity | Avg Similarity | Organism |
|---|---|---|---|---|
| 1 | P43246 | 0.9999 | 0.9384 | Human MSH2 |
| 2 | Q13614 | 0.9995 | 0.9504 | Human |
| 3 | Q5XXB5 | 0.9999 | 0.9384 | Ortholog |
| 4 | B2KI88 | 0.9988 | 0.9520 | Cross-species |
| 5 | Q02440 | 0.9990 | 0.9382 | Cross-species |
| 6 | P24860 | 0.9986 | 0.9370 | Cross-species |
| 7 | P37882 | 0.9986 | 0.9359 | Cross-species |
| 8 | O95292 | 0.9986 | 0.9447 | Human |
| 9 | P30277 | 0.9984 | 0.9367 | Cross-species |
| 10 | Q1LZG6 | 0.9984 | 0.9371 | Cross-species |
| 11 | O43502 | 0.9976 | 0.9410 | Human RAD51C |
| 12 | A6QNT8 | 0.9978 | 0.9246 | Cross-species |
| 13 | O95486 | 0.9978 | 0.9213 | Human SEC24A |
| 14 | Q9S9N4 | 0.9977 | 0.9340 | Cross-species |
| 15 | P14635 | 0.9985 | 0.9305 | Human CCNB1 |
| 16 | E1C6Q1 | 0.9871 | 0.9454 | Cross-species |
| 17 | Q3U2P1 | 0.9966 | 0.9268 | Cross-species |
| 18 | Q5ZIV1 | 0.9920 | 0.9386 | Cross-species |
| 19 | F4JL28 | 0.9302 | 0.9217 | Cross-species |
| 20 | Q3MHE4 | 0.9987 | 0.9413 | Cross-species |
TOP 15 sequence homologs (DIAMOND, % identity / bit score):
| Rank | UniProt | Identity (%) | Bit Score | Description |
|---|---|---|---|---|
| 1 | B5RL36 | 99.30 | 1193.0 | MLH1 ortholog |
| 2 | B5RR29 | 99.30 | 1191.0 | MLH1 ortholog |
| 3 | P97679 | 91.30 | 1349.0 | Mouse MLH1 |
| 4 | P54278 | 97.40 | 1274.0 | PMS2 (human) |
| 5 | Q9JK91 | 91.30 | 1351.0 | Cross-species MLH1 |
| 6 | P40692 | 88.40 | 1305.0 | Human MLH1 (self) |
| 7 | O51229 | 92.30 | 1112.0 | Cross-species |
| 8 | Q0SNV1 | 92.50 | 1112.0 | Cross-species |
| 9 | Q662F3 | 92.50 | 1115.0 | Cross-species |
| 10 | A1QZ05 | 85.60 | 1051.0 | Cross-species |
| 11 | P54280 | 50.30 | 421.0 | PMS1 homolog |
| 12 | Q54KD8 | 40.20 | 569.0 | MLH1 homolog |
| 13 | P38920 | 39.70 | 467.0 | MLH1 variant |
| 14 | Q9ZRV4 | 39.20 | 510.0 | MLH1 homolog |
| 15 | Q9P7W6 | 37.30 | 438.0 | Distant homolog |
Transcription factor regulatory data
MLH1 is not a transcription factor. MLH1 encodes the DNA mismatch repair protein MLH1, a core component of the mismatch repair pathway involved in maintaining genomic stability.
Upstream regulators (TFs that regulate MLH1)
19 transcription factors regulate MLH1 (from CollecTRI database):
| Regulator | Regulation type | Evidence/Confidence |
|---|---|---|
| TP53 | Activation | CollecTRI |
| BHLHE41 | Repression | High confidence |
| CEBPZ | Repression | High confidence |
| DNMT3A | Repression | CollecTRI |
| GLI1 | Unknown | High confidence |
| GLI2 | Unknown | High confidence |
| BHLHE40 | Unknown | CollecTRI |
| BRIP1 | Unknown | CollecTRI |
| E2F4 | Unknown | CollecTRI |
| HIF1A | Unknown | CollecTRI |
| MAFG | Unknown | CollecTRI |
| WT1 | Repression | Low confidence |
| HOXA5 | Unknown | High confidence |
| MLXIP | Unknown | High confidence |
| CTNNBL1 | Unknown | Low confidence |
| DNMT1 | Unknown | Low confidence |
| ESR2 | Unknown | Low confidence |
| HOXD1 | Unknown | Low confidence |
| TP73 | Unknown | Low confidence |
Note: TP53 (p53) activation of MLH1 is particularly notable as a tumor suppressor pathway. Several DNA methyltransferases (DNMT1, DNMT3A) appear to suppress MLH1 transcription, which may be relevant to epigenetic regulation of mismatch repair capacity.
Based on my search of the biobtree databases, I can now provide a comprehensive answer:
Drug & pharmacology data
MLH1 is NOT currently a known drug target.
There are:
- Zero ChEMBL molecules targeting MLH1 as a direct target
- Zero clinical trials with drugs specifically targeting MLH1
- No approved or investigational drugs with MLH1 as a mechanism of action
Pharmacogenomic status:
- MLH1 is a VIP (Very Important Pharmacogene) in PharmGKB (PA240)
- Has variant annotations related to genetic predisposition (Lynch syndrome/HNPCC)
- No CPIC dosing guidelines exist for MLH1-guided therapy
- Clinical significance: MLH1 mutations are associated with hereditary nonpolyposis colorectal cancer (HNPCC), but this role is for genetic risk stratification, not as a therapeutic target
Summary: MLH1 is recognized as an important gene for understanding cancer predisposition and genetic disease risk, but it is not targeted by small molecule drugs, biologics, or other therapeutics currently in development. Its clinical relevance is primarily in germline variant screening for Lynch syndrome rather than as a pharmacological target.
Expression profiles
Tissue Expression (Bgee - 296/300 conditions)
| Rank | Tissue | Expression Score | Quality |
|---|---|---|---|
| 1 | Tibialis anterior | 94.44 | Gold |
| 2 | Skeletal muscle tissue of rectus abdominis | 94.42 | Gold |
| 3 | Deltoid | 94.37 | Gold |
| 4 | Left ventricle myocardium | 94.30 | Gold |
| 5 | Heart left ventricle | 93.97 | Gold |
| 6 | Cardiac ventricle | 93.90 | Gold |
| 7 | Primordial germ cell in gonad | 93.85 | Gold |
| 8 | Vastus lateralis | 93.84 | Gold |
| 9 | Quadriceps femoris | 93.78 | Gold |
| 10 | Apex of heart | 93.60 | Gold |
| 11 | Skeletal muscle tissue | 93.35 | Gold |
| 12 | Heart right ventricle | 93.35 | Gold |
| 13 | Ganglionic eminence | 93.22 | Gold |
| 14 | Muscle organ | 92.99 | Gold |
| 15 | Skeletal muscle organ | 92.99 | Gold |
| 16 | Biceps brachii | 92.97 | Gold |
| 17 | Muscle tissue | 92.79 | Gold |
| 18 | Heart | 92.78 | Gold |
| 19 | Muscle of leg | 92.75 | Gold |
| 20 | Ventricular zone | 92.73 | Gold |
| 21 | Tibia | 92.69 | Gold |
| 22 | Calcaneal tendon | 92.62 | Gold |
| 23 | Gastrocnemius | 92.53 | Gold |
| 24 | Diaphragm | 92.49 | Gold |
| 25 | Pituitary gland | 92.48 | Gold |
| 26 | Triceps brachii | 92.45 | Gold |
| 27 | Adenohypophysis | 92.41 | Gold |
| 28 | Skin of hip | 92.34 | Gold |
| 29 | Right atrium auricular region | 92.32 | Gold |
| 30 | Corpus callosum | 92.23 | Gold |
Pattern notes:
- Ubiquitous expression with mean score 87.57 across all 300 tissues
- Skeletal muscle enrichment: Top 3 tissues are skeletal muscle groups (tibialis anterior, rectus abdominis, deltoid)
- Cardiac expression: Heart tissues rank highly (scores 92.78–93.97), reflecting MLH1’s role in mismatch repair during active cell division
- Developmental tissues: Primordial germ cells (#7, 93.85) and ganglionic eminence (#13, 93.22) show strong expression, consistent with high mitotic activity
Single-Cell Expression (SCXA - Single Cell Expression Atlas)
Datasets:
- Primary dataset: E-MTAB-2983 – “Functional germ line stem cells do not exist in adult mammalian ovaries” (38 cells)
- Reflects strong expression in female germ tissue identified in tissue data
- Total experiments: 3 datasets with marker status
- Cell clusters analyzed: 238 cell populations
- Expression range: Mean expression 0.009–381.53 (maximum in specific cell clusters)
Notable populations: Expression is marked in germ line/gonadal tissues and tissues undergoing high proliferation, consistent with MLH1’s role in DNA mismatch repair and meiosis (required for proper meiotic recombination).
Cell Type Expression
The Bgee annotation references 16 distinct cell type categories (Cell Ontology). While detailed single-cell cluster annotations are limited in biobtree, the top tissues above correspond to:
- Myocytes (skeletal and cardiac muscle cells) – dominant
- Germ line cells (oocytes, spermatogonia) – specifically marked
- Neural progenitor cells (ventricular zone, ganglionic eminence)
MLH1 expression correlates strongly with cell division activity, as expected for a mismatch repair gene essential during DNA synthesis and meiosis.
Disease associations
Mendelian / Monogenic Diseases
| Disease | Disease ID | Inheritance | Evidence Level |
|---|---|---|---|
| Lynch syndrome | OMIM:120435, OMIM:609310, MONDO:0005835, Orphanet:144 | Autosomal dominant | Definitive/Strong |
| Lynch syndrome 1 | OMIM:120435, MONDO:0007356 | Autosomal dominant | Strong/Definitive |
| Lynch syndrome 2 | OMIM:609310, MONDO:0012249 | Autosomal dominant | Definitive/Strong |
| Muir-Torre syndrome | OMIM:158320, MONDO:0008018, Orphanet:587 | Autosomal dominant | Definitive/Strong |
| Mismatch repair cancer syndrome 1 | OMIM:276300, MONDO:0010159, Orphanet:252202 | Autosomal recessive | Definitive/Strong |
| Constitutional mismatch repair deficiency syndrome | Orphanet:252202 | Autosomal recessive | Supportive |
| Ovarian cancer | MONDO:0008170 | Autosomal dominant | Strong |
| Pancreatic cancer | MONDO:0009831 | Autosomal dominant | Moderate |
| Breast cancer | MONDO:0007254 | Autosomal dominant | Disputed/Strong |
| Prostate cancer | MONDO:0008315 | Autosomal dominant | Limited |
| Rhabdomyosarcoma | MONDO:0005212 | Autosomal recessive | Moderate |
| Colorectal cancer (hereditary nonpolyposis) | MONDO:0018630 | Autosomal dominant | Definitive |
| Endometrial carcinoma | MONDO:0002447 | Autosomal dominant | Definitive |
Additional associated malignancies via ClinVar: colon carcinoma, gastric cancer, lung cancer, bile duct cancer, squamous cell carcinoma
Clinical Phenotypes (HPO Terms) - Top 30
| HPO Term | HPO ID |
|---|---|
| Autosomal dominant inheritance | HP:0000006 |
| Autosomal recessive inheritance | HP:0000007 |
| Breast carcinoma | HP:0003002 |
| Colon cancer | HP:0003003 |
| Endometrial carcinoma | HP:0012114 |
| Ovarian neoplasm | HP:0100615 |
| Neoplasm of the stomach | HP:0006753 |
| Neoplasm of the pancreas | HP:0002894 |
| Rhabdomyosarcoma | HP:0002859 |
| Adenocarcinoma of the colon | HP:0040276 |
| Neoplasm of the rectum | HP:0100743 |
| Neoplasm of the liver | HP:0002896 |
| Hepatocellular carcinoma | HP:0001402 |
| Lymphoma | HP:0002665 |
| Non-Hodgkin lymphoma | HP:0012539 |
| Basal cell carcinoma | HP:0002671 |
| Adenoma sebaceum | HP:0009720 |
| Sebaceous gland carcinoma | HP:0030410 |
| Neoplasm of the thyroid gland | HP:0100031 |
| Laryngeal carcinoma | HP:0012118 |
| Salivary gland neoplasm | HP:0100684 |
| Renal neoplasm | HP:0009726 |
| Urinary tract neoplasm | HP:0010786 |
| Hematological neoplasm | HP:0004377 |
| Medulloblastoma | HP:0002885 |
| Glioblastoma multiforme | HP:0012174 |
| Astrocytoma | HP:0009592 |
| Neoplasm of the skeletal system | HP:0010622 |
| Adenocarcinoma of the small intestine | HP:0040274 |
| Intestinal polyposis | HP:0200008 |
Complex-Disease / GWAS Associations - Top 7
| Trait/Disease | Variant/Gene | Chromosome | P-value |
|---|---|---|---|
| Proximal colorectal cancer | MLH1 | chr3 | 4.0e-18 |
| Platelet distribution width | MLH1 | chr3 | 2.0e-18 |
| Liver enzyme levels (alanine transaminase) | MLH1 | chr3 | 4.0e-15 |
| Alanine aminotransferase levels | MLH1 | chr3 | 2.0e-08 |
| Schizophrenia | TRANK1 | chr3 | 3.0e-11 |
| Autism spectrum disorder or schizophrenia | HSPD1P6 - LINC02033 | chr3 | 1.0e-11 |
| Subjective response to lithium treatment | HSPD1P6 - LINC02033 | chr3 | 8.0e-07 |