Section 1: Gene Identifiers
| Database | Identifier | Description |
|---|
| HGNC ID | HGNC:7127 | Approved symbol |
| HGNC Symbol | MLH1 | mutL homolog 1 |
| Ensembl Gene | ENSG00000076242 | Ensembl gene ID |
| NCBI Entrez Gene | 4292 | NCBI Gene ID |
| OMIM Gene | 120436 | Gene/Locus entry |
| Status | Approved | protein-coding gene |
Aliases: HNPCC, FCC2, HNPCC2, MLH-1, COCA2
Gene Groups: MutL homologs, BRCA1-associated genome surveillance complex
Genomic Location (GRCh38)
| Attribute | Value |
|---|
| Chromosome | 3 |
| Cytogenetic Band | 3p22.2 |
| Start Position | 36,993,226 |
| End Position | 37,050,896 |
| Strand | + (forward) |
| Gene Span | 57,671 bp |
Section 2: Transcript Identifiers
Ensembl Transcripts
Total Transcript Count: 44
| Transcript ID | Biotype | Start | End | Status |
|---|
| ENST00000231790 | protein_coding | 36,993,518 | 37,050,846 | Canonical |
| ENST00000413740 | protein_coding | 36,993,357 | 37,050,846 | |
| ENST00000429117 | protein_coding | 36,993,826 | 37,050,846 | |
| ENST00000435176 | protein_coding | 36,993,804 | 37,050,844 | |
| ENST00000441265 | protein_coding | 36,993,826 | 37,047,737 | |
| ENST00000450420 | protein_coding | 36,993,518 | 37,050,537 | |
| ENST00000455445 | protein_coding | 36,993,798 | 37,050,706 | |
| ENST00000456676 | protein_coding | 36,993,518 | 37,050,844 | |
| ENST00000458205 | protein_coding | 36,993,776 | 37,050,846 | |
| ENST00000466900 | protein_coding | 36,993,848 | 37,050,844 | |
| ENST00000485889 | protein_coding | 36,993,827 | 37,050,846 | |
| ENST00000492474 | protein_coding | 36,993,791 | 37,050,896 | |
| ENST00000536378 | protein_coding | 36,993,350 | 37,050,842 | |
| ENST00000539477 | protein_coding | 36,993,804 | 37,050,783 | |
| ENST00000616768 | protein_coding | 36,993,472 | 37,050,846 | |
| ENST00000673673 | protein_coding | 36,993,226 | 37,050,799 | |
| ENST00000673715 | protein_coding | 36,993,523 | 37,047,740 | |
| ENST00000673899 | protein_coding | 36,993,489 | 37,050,767 | |
| ENST00000673990 | protein_coding | 36,993,785 | 37,050,807 | |
| ENST00000674019 | protein_coding | 36,993,764 | 37,050,823 | |
| ENST00000713802 | protein_coding | 36,993,542 | 37,050,841 | |
| ENST00000931189 | protein_coding | 36,993,487 | 37,050,843 | |
| ENST00000931190 | protein_coding | 36,993,491 | 37,050,846 | |
| ENST00000931191 | protein_coding | 36,993,545 | 37,050,841 | |
| ENST00000948704 | protein_coding | 36,993,487 | 37,050,844 | |
| ENST00000948705 | protein_coding | 36,993,540 | 37,050,842 | |
| ENST00000413212 | nonsense_mediated_decay | 36,993,785 | 37,050,823 | |
| ENST00000432299 | nonsense_mediated_decay | 36,993,524 | 37,050,823 | |
| ENST00000447829 | nonsense_mediated_decay | 36,993,785 | 37,050,807 | |
| ENST00000454028 | nonsense_mediated_decay | 36,993,534 | 37,050,846 | |
| ENST00000457004 | nonsense_mediated_decay | 36,993,516 | 37,014,494 | |
| ENST00000458009 | nonsense_mediated_decay | 36,993,518 | 37,050,846 | |
| ENST00000673897 | nonsense_mediated_decay | 36,993,531 | 37,050,794 | |
| ENST00000673947 | nonsense_mediated_decay | 36,993,539 | 37,050,814 | |
| ENST00000673972 | nonsense_mediated_decay | 36,993,515 | 37,050,827 | |
| ENST00000674111 | nonsense_mediated_decay | 36,993,495 | 37,050,796 | |
| ENST00000442249 | retained_intron | 36,993,533 | 37,026,093 | |
| ENST00000476172 | retained_intron | 36,993,785 | 36,998,096 | |
| ENST00000673686 | retained_intron | 36,993,785 | 36,998,085 | |
| ENST00000673713 | retained_intron | 36,993,517 | 37,026,060 | |
| ENST00000673741 | retained_intron | 37,046,754 | 37,050,843 | |
| ENST00000673889 | retained_intron | 37,017,334 | 37,050,799 | |
| ENST00000674107 | retained_intron | 36,993,833 | 37,030,353 | |
| ENST00000674125 | retained_intron | 37,028,664 | 37,050,838 | |
Biotype Summary: 26 protein_coding, 10 nonsense_mediated_decay, 8 retained_intron
RefSeq Transcripts (Human Chromosome 3)
| Accession | Type | Status | MANE Select |
|---|
| NM_000249 | mRNA | REVIEWED | ✓ Yes |
| NM_001167617 | mRNA | REVIEWED | No |
| NM_001167618 | mRNA | REVIEWED | No |
| NM_001167619 | mRNA | REVIEWED | No |
| NM_001258271 | mRNA | REVIEWED | No |
| NM_001258273 | mRNA | REVIEWED | No |
| NM_001258274 | mRNA | REVIEWED | No |
| NM_001354615 | mRNA | REVIEWED | No |
| NM_001354616 | mRNA | REVIEWED | No |
| NM_001354617 | mRNA | REVIEWED | No |
| NM_001354618 | mRNA | REVIEWED | No |
| NM_001354619 | mRNA | REVIEWED | No |
| NM_001354620 | mRNA | REVIEWED | No |
| NM_001354621 | mRNA | REVIEWED | No |
| NM_001354622 | mRNA | REVIEWED | No |
| NM_001354623 | mRNA | REVIEWED | No |
| NM_001354624 | mRNA | REVIEWED | No |
| NM_001354625 | mRNA | REVIEWED | No |
| NM_001354626 | mRNA | REVIEWED | No |
| NM_001354627 | mRNA | REVIEWED | No |
| NM_001354628 | mRNA | REVIEWED | No |
| NM_001354629 | mRNA | REVIEWED | No |
| NM_001354630 | mRNA | REVIEWED | No |
CCDS IDs
| CCDS ID | Status |
|---|
| CCDS2663 | Primary |
| CCDS54562 | Alternative |
| CCDS54563 | Alternative |
Canonical Transcript Exon Structure (ENST00000231790)
Total Exon Count: 19
| Exon ID | Start | End | Length (bp) |
|---|
| ENSE00004012715 | 36,993,518 | 36,993,663 | 146 |
| ENSE00003496022 | 36,996,619 | 36,996,709 | 91 |
| ENSE00003599036 | 37,000,955 | 37,001,053 | 99 |
| ENSE00003635406 | 37,004,401 | 37,004,474 | 74 |
| ENSE00003533853 | 37,006,991 | 37,007,063 | 73 |
| ENSE00003633623 | 37,008,814 | 37,008,905 | 92 |
| ENSE00003521968 | 37,011,820 | 37,011,862 | 43 |
| ENSE00003516787 | 37,012,011 | 37,012,099 | 89 |
| ENSE00003656033 | 37,014,432 | 37,014,544 | 113 |
| ENSE00001716871 | 37,017,506 | 37,017,599 | 94 |
| ENSE00003785627 | 37,020,310 | 37,020,463 | 154 |
| ENSE00003688106 | 37,025,637 | 37,026,007 | 371 |
| ENSE00003680564 | 37,028,784 | 37,028,932 | 149 |
| ENSE00001730188 | 37,040,186 | 37,040,294 | 109 |
| ENSE00001747618 | 37,042,268 | 37,042,331 | 64 |
| ENSE00001748897 | 37,047,519 | 37,047,683 | 165 |
| ENSE00001593400 | 37,048,517 | 37,048,609 | 93 |
| ENSE00001785296 | 37,048,904 | 37,049,017 | 114 |
| ENSE00003902581 | 37,050,486 | 37,050,846 | 361 |
Section 3: Protein Identifiers
UniProt Accessions
Total: 18 entries
| Accession | Name | Status | Canonical |
|---|
| P40692 | DNA mismatch repair protein Mlh1 | Reviewed (Swiss-Prot) | ✓ Yes |
| A0A087WX20 | MLH1 isoform | Unreviewed | No |
| A0A669KAW3 | MLH1 isoform | Unreviewed | No |
| A0A669KB03 | MLH1 isoform | Unreviewed | No |
| A0A669KBB4 | MLH1 isoform | Unreviewed | No |
| A0A669KBK2 | MLH1 isoform | Unreviewed | No |
| A0AAQ5BGN3 | MLH1 isoform | Unreviewed | No |
| A0AAQ5BGZ2 | MLH1 isoform | Unreviewed | No |
| C9JZ54 | MLH1 isoform | Unreviewed | No |
| E7EUC9 | MLH1 isoform | Unreviewed | No |
| E9PF25 | MLH1 isoform | Unreviewed | No |
| F2Z298 | MLH1 isoform | Unreviewed | No |
| H0Y4N0 | MLH1 isoform | Unreviewed | No |
| H0Y5L7 | MLH1 isoform | Unreviewed | No |
| H0Y5U4 | MLH1 isoform | Unreviewed | No |
| H0Y793 | MLH1 isoform | Unreviewed | No |
| H0Y806 | MLH1 isoform | Unreviewed | No |
| H0Y818 | MLH1 isoform | Unreviewed | No |
Canonical Protein (P40692) Properties
| Property | Value |
|---|
| Length | 756 amino acids |
| Mass | 84,601 Da |
| Alternative Names | MutL protein homolog 1 |
RefSeq Protein Accessions (Human)
| Accession | Status | MANE Select |
|---|
| NP_000240 | REVIEWED | ✓ Yes |
| NP_001161089 | REVIEWED | No |
| NP_001161090 | REVIEWED | No |
| NP_001161091 | REVIEWED | No |
| NP_001245200 | REVIEWED | No |
| NP_001245202 | REVIEWED | No |
| NP_001245203 | REVIEWED | No |
| NP_001341544 | REVIEWED | No |
| NP_001341545 | REVIEWED | No |
| NP_001341546 | REVIEWED | No |
| NP_001341547 | REVIEWED | No |
| NP_001341548 | REVIEWED | No |
| NP_001341549 | REVIEWED | No |
| NP_001341550 | REVIEWED | No |
| NP_001341551 | REVIEWED | No |
| NP_001341552 | REVIEWED | No |
| NP_001341553 | REVIEWED | No |
| NP_001341554 | REVIEWED | No |
| NP_001341555 | REVIEWED | No |
| NP_001341556 | REVIEWED | No |
| NP_001341557 | REVIEWED | No |
| NP_001341558 | REVIEWED | No |
| NP_001341559 | REVIEWED | No |
Protein Domains and Families
Total: 8 InterPro entries
| InterPro ID | Name | Type |
|---|
| IPR002099 | MutL/Mlh/PMS | Family |
| IPR038973 | MutL/Mlh/Pms-like | Family |
| IPR013507 | DNA_mismatch_S5_2-like | Domain |
| IPR032189 | Mlh1_C | Domain |
| IPR014721 | Ribsml_uS5_D2-typ_fold_subgr | Homologous_superfamily |
| IPR020568 | Ribosomal_Su5_D2-typ_SF | Homologous_superfamily |
| IPR036890 | HATPase_C_sf | Homologous_superfamily |
| IPR014762 | DNA_mismatch_repair_CS | Conserved_site |
Pfam Domains
| Pfam ID | Name |
|---|
| PF01119 | DNA_mis_repair |
| PF13589 | HATPase_c_3 |
| PF16413 | MutL_C |
Section 4: Structure Identifiers
Experimental Structures (PDB)
Total PDB Structure Count: 7
| PDB ID | Title | Method | Resolution (Å) | Organism |
|---|
| 3RBN | Crystal structure of MutL protein homolog 1 isoform 1 | X-RAY DIFFRACTION | 2.16 | Homo sapiens |
| 4P7A | Crystal Structure of human MLH1 | X-RAY DIFFRACTION | 2.30 | Homo sapiens |
| 5U5P | Importin-alpha with MLH1 NLS Peptide | X-RAY DIFFRACTION | 2.171 | Mus musculus/Synthetic |
| 6WBA | Importin alpha MLH1-R470A NLS Peptide Complex | X-RAY DIFFRACTION | 2.151 | Mus musculus/Synthetic |
| 6WBB | Importin alpha MLH1-E475A NLS peptide complex | X-RAY DIFFRACTION | 2.663 | Mus musculus/Synthetic |
| 6WBC | Importin alpha MLH1-R472K NLS Peptide Complex | X-RAY DIFFRACTION | 2.15 | Mus musculus/Synthetic |
| 7M60 | Importin alpha MLH1-S467A NLS Peptide Complex | X-RAY DIFFRACTION | 2.30 | Mus musculus/Synthetic |
Predicted Structures (AlphaFold)
| AlphaFold ID | Sequence Length | Global pLDDT | Fraction Very High Confidence |
|---|
| P40692 | 5947 | 77.89 | 0.51 (51%) |
Section 5: Cross-Species Orthologs
| Organism | Gene ID | Symbol | UniProt |
|---|
| Mouse (Mus musculus) | ENSMUSG00000032498 | Mlh1 | P97679 |
| Rat (Rattus norvegicus) | ENSRNOG00000033809 | Mlh1 | Q9JK91 |
| Zebrafish (Danio rerio) | ENSDARG00000025948 | mlh1 | - |
| Fruit fly (Drosophila melanogaster) | FBGN0011659 | Mlh1 | Q9ZRV4 |
| Worm (C. elegans) | WBGENE00003373 | mlh-1 | Q54KD8 |
| Yeast (S. cerevisiae) | YMR167W | MLH1 | P38920 |
Section 6: Clinical Variants & AI Predictions
ClinVar Variant Summary
Total Variant Count: 6,260
| Classification | Count |
|---|
| Pathogenic | >500 |
| Likely pathogenic | >300 |
| Uncertain significance (VUS) | >4,000 |
| Likely benign | Multiple |
| Benign | Multiple |
TOP 50 Pathogenic Variants (ClinVar)
| ClinVar ID | HGVS Notation | Type | Review Status |
|---|
| 142856 | c.117-2A>G | SNV | Reviewed by expert panel |
| 1012206 | c.1003del (p.Leu335fs) | Deletion | Multiple submitters |
| 1048862 | c.497T>G (p.Leu166Ter) | SNV | Multiple submitters |
| 1049670 | c.1896+1G>C | SNV | Multiple submitters |
| 1068948 | c.406A>T (p.Lys136Ter) | SNV | Multiple submitters |
| 1069408 | c.1507del (p.Leu503fs) | Deletion | Multiple submitters |
| 1070239 | c.22del (p.Ile8fs) | Deletion | Multiple submitters |
| 1070756 | c.1408A>T (p.Arg470Ter) | SNV | Multiple submitters |
| 1073257 | c.1707del (p.Asn570fs) | Deletion | Multiple submitters |
| 1074205 | c.35del (p.Asp12fs) | Deletion | Multiple submitters |
| 1074878 | c.1830C>G (p.Tyr610Ter) | SNV | Multiple submitters |
| 1076321 | c.1888del (p.Ile630fs) | Deletion | Multiple submitters |
| 1076485 | c.1435_1453del (p.Val479fs) | Deletion | Multiple submitters |
| 1076792 | c.1983dup (p.Thr662fs) | Duplication | Multiple submitters |
| 1177381 | c.1695_1698del (p.Ile565fs) | Deletion | Multiple submitters |
| 1195084 | c.1713del (p.Phe571fs) | Deletion | Multiple submitters |
| 1358070 | c.1727T>G (p.Leu576Ter) | SNV | Multiple submitters |
| 135851 | c.2190del (p.Pro731fs) | Deletion | Multiple submitters |
| 1365840 | c.52_59del (p.Arg18fs) | Deletion | Multiple submitters |
| 1368790 | c.592G>T (p.Gly198Ter) | SNV | Multiple submitters |
| 1372749 | c.2043dup (p.Met682fs) | Duplication | Multiple submitters |
| 1387037 | c.813del (p.Ser271_Leu272insTer) | Deletion | Multiple submitters |
| 1392794 | c.1499del (p.Ile500fs) | Deletion | Multiple submitters |
| 1402505 | c.1809del (p.Glu605fs) | Deletion | Multiple submitters |
| 1405246 | c.1239dup (p.Glu414fs) | Duplication | Multiple submitters |
| 141043 | c.790+1G>T | SNV | Multiple submitters |
| 1412554 | c.1730C>A (p.Ser577Ter) | SNV | Multiple submitters |
| 1443114 | c.2195_2196del (p.Lys732fs) | Deletion | Multiple submitters |
| 1451346 | c.383del (p.Ala128fs) | Deletion | Multiple submitters |
| 1452609 | c.227dup (p.Cys77fs) | Duplication | Multiple submitters |
| 1453286 | c.2138_2151del (p.Lys713fs) | Deletion | Multiple submitters |
| 1453590 | c.2163del (p.Val720_Tyr721insTer) | Deletion | Multiple submitters |
| 1068563 | c.1921dup (p.Leu641fs) | Duplication | Multiple submitters |
| 1023950 | c.1681_1686del (p.Tyr561_Gln562del) | Deletion | Single submitter |
| 1049302 | c.1275dup (p.Gln426fs) | Duplication | Single submitter |
| 1049396 | c.1999del (p.Asp667fs) | Deletion | Single submitter |
| 1049673 | c.1367del (p.Thr455_Ser456insTer) | Deletion | Single submitter |
| 1068760 | c.2089dup (p.Leu697fs) | Duplication | Single submitter |
| 1069316 | c.1482T>A (p.Cys494Ter) | SNV | Single submitter |
| 1069751 | c.2039_2040delinsAG (p.Cys680Ter) | Indel | Single submitter |
| 1069993 | c.2099del (p.Gln700fs) | Deletion | Single submitter |
| 1071267 | c.2123_2126del (p.Ile708fs) | Deletion | Single submitter |
| 1071293 | c.194_195insAT (p.Thr66fs) | Insertion | Single submitter |
| 1071631 | c.837_838del (p.Tyr280fs) | Microsatellite | Single submitter |
| 1071933 | c.929_930del (p.Thr310fs) | Microsatellite | Single submitter |
| 1072166 | c.2046_2052del (p.Met682fs) | Deletion | Single submitter |
| 1074920 | c.790del (p.His264fs) | Deletion | Single submitter |
| 1075393 | c.2028del (p.Ser677fs) | Deletion | Single submitter |
| 1076815 | c.979dup (p.Gln327fs) | Duplication | Single submitter |
| 1255698 | c.1A>T (p.Met1Leu) | SNV | Single submitter |
SpliceAI Predictions
Total Count: 2,888 variants with splice effects
TOP 50 High-Score Splice-Altering Variants (Score ≥0.8):
| Variant | Gene | Effect | Delta Score |
|---|
| 3:36996617:A:AG | MLH1 | acceptor_gain | 1.00 |
| 3:36996618:G:GA | MLH1 | acceptor_gain | 1.00 |
| 3:36996618:GTTT:G | MLH1 | acceptor_gain | 1.00 |
| 3:36996618:GTTTA:G | MLH1 | acceptor_gain | 1.00 |
| 3:36996707:AGGG:A | MLH1 | donor_loss | 1.00 |
| 3:36996708:GG:G | MLH1 | donor_gain | 1.00 |
| 3:36996708:GGGTA:G | MLH1 | donor_loss | 1.00 |
| 3:36996709:GG:G | MLH1 | donor_gain | 1.00 |
| 3:36996710:G:A | MLH1 | donor_loss | 1.00 |
| 3:36996710:G:GG | MLH1 | donor_gain | 1.00 |
| 3:36996711:T:TC | MLH1 | donor_loss | 1.00 |
| 3:36993672:G:GT | MLH1 | donor_gain | 1.00 |
| 3:36993614:G:GT | MLH1 | donor_gain | 0.99 |
| 3:36993614:G:T | MLH1 | donor_gain | 0.99 |
| 3:36993633:C:G | MLH1 | donor_gain | 0.99 |
| 3:36993696:GGC:G | MLH1 | donor_gain | 0.99 |
| 3:36993836:GACC:G | MLH1 | donor_gain | 0.99 |
| 3:36994235:C:CA | MLH1 | acceptor_gain | 0.99 |
| 3:36994236:G:A | MLH1 | acceptor_gain | 0.99 |
| 3:36994347:G:GG | MLH1 | donor_gain | 0.99 |
| 3:36996608:T:TA | MLH1 | acceptor_gain | 0.99 |
| 3:36996613:T:TA | MLH1 | acceptor_gain | 0.99 |
| 3:36996613:TGCCA:T | MLH1 | acceptor_loss | 0.99 |
| 3:36996614:GCCAG:G | MLH1 | acceptor_loss | 0.99 |
| 3:36996615:CCAGT:C | MLH1 | acceptor_loss | 0.99 |
| 3:36996616:CA:C | MLH1 | acceptor_loss | 0.99 |
| 3:36996617:A:AT | MLH1 | acceptor_loss | 0.99 |
| 3:36996618:G:T | MLH1 | acceptor_loss | 0.99 |
| 3:36996618:GT:G | MLH1 | acceptor_gain | 0.99 |
| 3:36996618:GTT:G | MLH1 | acceptor_gain | 0.99 |
| 3:36996705:TCAGG:T | MLH1 | donor_gain | 0.99 |
| 3:36996706:CAGG:C | MLH1 | donor_gain | 0.99 |
| 3:36996707:AGG:A | MLH1 | donor_gain | 0.99 |
| 3:36996708:GGG:G | MLH1 | donor_gain | 0.99 |
| 3:36994231:ATTTC:A | MLH1 | acceptor_gain | 0.97 |
| 3:36993887:GAGT:G | MLH1 | donor_gain | 0.97 |
| 3:36995502:G:GT | MLH1 | donor_gain | 0.97 |
| 3:36993809:GCC:G | MLH1 | donor_gain | 0.96 |
| 3:36993886:GGAGT:G | MLH1 | donor_gain | 0.95 |
| 3:36993887:GAGTG:G | MLH1 | donor_gain | 0.95 |
| 3:36993889:GT:G | MLH1 | donor_gain | 0.95 |
| 3:36993901:GAATA:G | MLH1 | donor_gain | 0.95 |
| 3:36994318:G:T | MLH1 | donor_gain | 0.95 |
| 3:36993812:A:AG | MLH1 | donor_gain | 0.94 |
| 3:36993891:G:GG | MLH1 | donor_gain | 0.94 |
| 3:36993761:T:G | MLH1 | donor_gain | 0.94 |
| 3:36994400:T:G | MLH1 | donor_gain | 0.94 |
| 3:36996619:TTTAG:T | MLH1 | acceptor_loss | 0.94 |
| 3:36996620:TTAGA:T | MLH1 | acceptor_loss | 0.94 |
| 3:36996621:TAGAT:T | MLH1 | acceptor_loss | 0.94 |
AlphaMissense Pathogenicity Predictions
Total Missense Variants: 4,981
TOP 50 Predicted Pathogenic Missense Variants:
| Variant | Protein Change | Pathogenicity Score | Classification |
|---|
| 3:36993600:G:C | R18P | 1.000 | likely_pathogenic |
| 3:36993579:T:C | L11P | 0.999 | likely_pathogenic |
| 3:36993603:T:G | I19S | 0.999 | likely_pathogenic |
| 3:36993606:C:A | A20E | 0.998 | likely_pathogenic |
| 3:36993605:G:C | A20P | 0.998 | likely_pathogenic |
| 3:36993608:G:C | A21P | 0.998 | likely_pathogenic |
| 3:36993609:C:A | A21E | 0.999 | likely_pathogenic |
| 3:36993611:G:A | G22R | 0.998 | likely_pathogenic |
| 3:36993611:G:C | G22R | 0.998 | likely_pathogenic |
| 3:36993611:G:T | G22W | 0.998 | likely_pathogenic |
| 3:36993612:G:A | G22E | 0.999 | likely_pathogenic |
| 3:36993570:T:A | I8N | 0.998 | likely_pathogenic |
| 3:36993570:T:C | I8T | 0.996 | likely_pathogenic |
| 3:36993570:T:G | I8S | 0.995 | likely_pathogenic |
| 3:36993599:C:A | R18S | 0.995 | likely_pathogenic |
| 3:36993614:G:A | E23K | 0.996 | likely_pathogenic |
| 3:36993616:A:C | E23D | 0.996 | likely_pathogenic |
| 3:36993615:A:T | E23V | 0.995 | likely_pathogenic |
| 3:36993618:T:A | V24D | 0.999 | likely_pathogenic |
| 3:36993618:T:C | V24A | 0.995 | likely_pathogenic |
| 3:36993621:T:A | I25N | 0.999 | likely_pathogenic |
| 3:36993621:T:G | I25S | 0.996 | likely_pathogenic |
| 3:36993594:T:A | V16E | 0.992 | likely_pathogenic |
| 3:36993591:T:A | V15E | 0.993 | likely_pathogenic |
| 3:36993593:G:C | V16L | 0.980 | likely_pathogenic |
| 3:36993609:C:T | A21V | 0.993 | likely_pathogenic |
| 3:36993612:G:T | G22V | 0.994 | likely_pathogenic |
| 3:36993569:A:T | I8F | 0.970 | likely_pathogenic |
| 3:36993593:G:A | V16M | 0.970 | likely_pathogenic |
| 3:36993578:C:G | L11V | 0.968 | likely_pathogenic |
| 3:36993591:T:G | V15G | 0.966 | likely_pathogenic |
| 3:36993609:C:G | A21G | 0.971 | likely_pathogenic |
| 3:36993602:A:C | I19L | 0.948 | likely_pathogenic |
| 3:36993602:A:T | I19F | 0.994 | likely_pathogenic |
| 3:36993603:T:A | I19N | 1.000 | likely_pathogenic |
| 3:36993603:T:C | I19T | 0.998 | likely_pathogenic |
| 3:36993605:G:A | A20T | 0.981 | likely_pathogenic |
| 3:36993606:C:G | A20G | 0.963 | likely_pathogenic |
| 3:36993606:C:T | A20V | 0.988 | likely_pathogenic |
| 3:36993608:G:A | A21T | 0.992 | likely_pathogenic |
| 3:36993598:C:A | N17K | 0.989 | likely_pathogenic |
| 3:36993598:C:G | N17K | 0.989 | likely_pathogenic |
| 3:36993599:C:G | R18G | 0.987 | likely_pathogenic |
| 3:36993617:G:T | V24F | 0.987 | likely_pathogenic |
| 3:36993620:A:T | I25F | 0.985 | likely_pathogenic |
| 3:36993617:G:C | V24L | 0.982 | likely_pathogenic |
| 3:36993597:A:T | N17I | 0.979 | likely_pathogenic |
| 3:36993594:T:G | V16G | 0.979 | likely_pathogenic |
| 3:36993573:G:C | R9P | 0.978 | likely_pathogenic |
| 3:36993579:T:A | L11Q | 0.998 | likely_pathogenic |
Section 7: Biological Pathways & Gene Ontology
Reactome Pathways
Total Pathway Count: 6
| Pathway ID | Name | Disease Pathway |
|---|
| R-HSA-5358565 | Mismatch repair (MMR) directed by MSH2:MSH6 (MutSalpha) | No |
| R-HSA-5358606 | Mismatch repair (MMR) directed by MSH2:MSH3 (MutSbeta) | No |
| R-HSA-912446 | Meiotic recombination | No |
| R-HSA-6796648 | TP53 Regulates Transcription of DNA Repair Genes | No |
| R-HSA-5545483 | Defective Mismatch Repair Associated With MLH1 | Yes |
| R-HSA-5632987 | Defective Mismatch Repair Associated With PMS2 | Yes |
Gene Ontology Annotations
Total GO Terms: 35
Biological Process (20 terms)
| GO ID | Term |
|---|
| GO:0006298 | mismatch repair |
| GO:0000712 | resolution of meiotic recombination intermediates |
| GO:0006303 | double-strand break repair via nonhomologous end joining |
| GO:0007060 | male meiosis chromosome segregation |
| GO:0007129 | homologous chromosome pairing at meiosis |
| GO:0007283 | spermatogenesis |
| GO:0008630 | intrinsic apoptotic signaling pathway in response to DNA damage |
| GO:0009617 | response to bacterium |
| GO:0016321 | female meiosis chromosome segregation |
| GO:0016446 | somatic hypermutation of immunoglobulin genes |
| GO:0043060 | meiotic metaphase I homologous chromosome alignment |
| GO:0045141 | meiotic telomere clustering |
| GO:0045190 | isotype switching |
| GO:0045950 | negative regulation of mitotic recombination |
| GO:0048298 | positive regulation of isotype switching to IgA isotypes |
| GO:0048304 | positive regulation of isotype switching to IgG isotypes |
| GO:0048477 | oogenesis |
| GO:0051257 | meiotic spindle midzone assembly |
| GO:0000289 | nuclear-transcribed mRNA poly(A) tail shortening |
Molecular Function (7 terms)
| GO ID | Term |
|---|
| GO:0003682 | chromatin binding |
| GO:0005524 | ATP binding |
| GO:0008047 | enzyme activator activity |
| GO:0016887 | ATP hydrolysis activity |
| GO:0019899 | enzyme binding |
| GO:0032137 | guanine/thymine mispair binding |
| GO:0140664 | ATP-dependent DNA damage sensor activity |
Cellular Component (8 terms)
| GO ID | Term |
|---|
| GO:0000795 | synaptonemal complex |
| GO:0001673 | male germ cell nucleus |
| GO:0005634 | nucleus |
| GO:0005654 | nucleoplasm |
| GO:0005694 | chromosome |
| GO:0005712 | chiasma |
| GO:0005715 | late recombination nodule |
| GO:0016020 | membrane |
| GO:0032389 | MutLalpha complex |
Section 8: Protein Interactions & Molecular Networks
STRING Protein-Protein Interactions
Total Interaction Count: 3,428+
TOP 50 Highest-Confidence Interacting Proteins:
| Partner UniProt | Partner Gene | Confidence Score |
|---|
| P20585 | MSH3 | 999 |
| P43246 | MSH2 | 999 |
| P52701 | MSH6 | 999 |
| Q9UQ84 | PMS2 | 998 |
| P49751 | EXO1 | 993 |
| P54277 | PMS1 | 993 |
| P54278 | MLH3 | 993 |
| Q13315 | ATM | 992 |
| Q9BX63 | FANCJ/BRIP1 | 951 |
| P38398 | BRCA1 | 930 |
| O95243 | MBD4 | 921 |
| Q9UIF7 | RADX | 914 |
| P51587 | BRCA2 | 905 |
| P15056 | BRAF | 876 |
| Q9BXW9 | FANCD2 | 874 |
| P01116 | KRAS | 868 |
| P04637 | TP53 | 868 |
| O15457 | RBBP8/CtIP | 856 |
| P42771 | CDKN2A | 856 |
| P16455 | MGMT | 853 |
| Q07864 | POLE | 852 |
| Q9Y2M0 | FAN1 | 844 |
| O75771 | RAD51B | 838 |
| P16422 | EPCAM | 837 |
| P35249 | RFC4 | 826 |
| P32846 | GPC1 | 825 |
| O43196 | ESCO1 | 824 |
| O96017 | CHK2 | 824 |
| P60484 | PTEN | 811 |
| P54132 | BLM | 808 |
| P28340 | DPYD | 790 |
| P42336 | PIK3CA | 788 |
| O43502 | RAD51C | 787 |
| O60934 | NBN | 775 |
| Q14191 | WRN | 773 |
| A2PYH4 | MRE11 | 772 |
| Q9Y5K1 | PCNA | 771 |
| Q86YC2 | FAM175A | 766 |
| P21359 | NF1 | 764 |
| Q06609 | RAD51 | 758 |
| P36894 | BMPR1A | 735 |
| Q00765 | RECC1 | 728 |
| Q15831 | STK11 | 726 |
| P07992 | ERCC1 | 725 |
| Q9NS23 | RINT1 | 719 |
| Q9NXL9 | RFWD3 | 716 |
| P49959 | MRE11A | 715 |
| P37173 | TGFBR2 | 702 |
| Q99728 | BARD1 | 700 |
IntAct Physical Interactions
Total: 514+ interactions
Key Direct Interaction Partners (High Confidence ≥0.9):
| Partner | Interaction Type | Confidence |
|---|
| PMS2 | direct interaction | 0.97 |
| BRIP1 | physical association | 0.94 |
| MYOG | physical association | 0.89 |
| RADX | physical association | 0.87 |
| ZC3H11A | physical association | 0.87 |
| UBOX5 | physical association | 0.85 |
| KPNA2 | physical association | 0.83 |
| PMS1 | physical association | 0.83 |
| CBY2 | physical association | 0.80 |
| MLH3 | physical association | 0.78 |
| MAGEA8 | physical association | 0.78 |
| MSH3 | physical association | 0.74 |
| PSMA1 | physical association | 0.74 |
| TMSB4X | physical association | 0.73 |
| TASOR2 | physical association | 0.72 |
| FRMD6 | physical association | 0.67 |
| CCDC33 | physical association | 0.67 |
ESM2 Structural Similarity
Total Similar Proteins: 28
TOP 20 Structurally Similar Proteins:
| UniProt | Top Similarity | Avg Similarity |
|---|
| P43246 (MSH2) | 0.9999 | 0.9384 |
| Q5XXB5 | 0.9999 | 0.9384 |
| Q9Z2D1 | 0.9995 | 0.9515 |
| Q13614 | 0.9995 | 0.9504 |
| Q02440 | 0.9990 | 0.9382 |
| Q2PFX0 | 0.9989 | 0.9437 |
| B2KI88 | 0.9988 | 0.9520 |
| P43247 (PMS1) | 0.9987 | 0.9419 |
| Q3MHE4 | 0.9987 | 0.9413 |
| P24860 | 0.9986 | 0.9370 |
| O95292 | 0.9986 | 0.9447 |
| Q08301 | 0.9985 | 0.9327 |
| P14635 | 0.9985 | 0.9305 |
| P30277 | 0.9984 | 0.9367 |
| Q1LZG6 | 0.9984 | 0.9371 |
| A6QNT8 | 0.9978 | 0.9246 |
| O95486 | 0.9978 | 0.9213 |
| Q9S9N4 | 0.9977 | 0.9340 |
| O43502 (RAD51C) | 0.9976 | 0.9410 |
| Q3U2P1 | 0.9966 | 0.9268 |
DIAMOND Sequence Homology
Total Homologous Proteins: 15
| UniProt | Species/Description | Top Identity (%) | BitScore |
|---|
| P97679 | Mouse Mlh1 | 91.3 | 1349 |
| Q9JK91 | Rat Mlh1 | 91.3 | 1351 |
| B5RL36 | - | 99.3 | 1193 |
| B5RR29 | - | 99.3 | 1191 |
| P54278 | Human MLH3 | 97.4 | 1274 |
| O51229 | - | 92.3 | 1112 |
| Q0SNV1 | - | 92.5 | 1112 |
| Q662F3 | - | 92.5 | 1115 |
| A1QZ05 | - | 85.6 | 1051 |
| P54280 | Human MSH6 | 50.3 | 421 |
| Q54KD8 | C. elegans mlh-1 | 40.2 | 569 |
| P38920 | Yeast MLH1 | 39.7 | 467 |
| Q9ZRV4 | Fly Mlh1 | 39.2 | 510 |
| Q9P7W6 | - | 37.3 | 438 |
Section 9: Transcription Factor Regulatory Data
Note: MLH1 does not encode a transcription factor. This section covers transcription factors that regulate MLH1 expression.
Upstream Regulators (TFs that Regulate MLH1)
Total: 19 regulators from CollecTRI database
| Transcription Factor | Regulation Type | Confidence |
|---|
| TP53 | Activation | - |
| BHLHE40 | Unknown | High |
| BHLHE41 | Repression | High |
| CEBPZ | Repression | High |
| GLI1 | Unknown | High |
| GLI2 | Unknown | High |
| HOXA5 | Unknown | High |
| MLXIP | Unknown | High |
| BRIP1 | Unknown | - |
| DNMT3A | Repression | - |
| E2F4 | Unknown | - |
| HIF1A | Unknown | - |
| MAFG | Unknown | - |
| CTNNBL1 | Unknown | Low |
| DNMT1 | Unknown | Low |
| ESR2 | Unknown | Low |
| HOXD1 | Unknown | Low |
| TP73 | Unknown | Low |
| WT1 | Repression | Low |
Section 10: Drug & Pharmacology Data
Drug Target Status
MLH1/P40692 is not directly targeted by approved drugs (no ChEMBL target entry).
PharmGKB Gene Entry
| Attribute | Value |
|---|
| PharmGKB ID | PA240 |
| VIP Gene | Yes (Very Important Pharmacogene) |
| Has CPIC Guideline | No |
| Has Variant Annotations | Yes |
Drug-Gene Associations (PharmGKB)
| Drug/Class | Clinical Annotations | Variant Annotations |
|---|
| Platinum compounds | 300 | 1138 |
| Talazoparib | 0 | 0 |
Pharmacogenomic Context
MLH1 deficiency/mutations affect response to:
- Platinum-based chemotherapy (cisplatin, carboplatin, oxaliplatin)
- MMR-deficient tumors may show resistance to platinum agents
- Clinical annotation count: 300+
- PARP inhibitors (e.g., talazoparib)
- Potential synthetic lethality in MMR-deficient cancers
Section 11: Expression Profiles
Bgee Expression Summary
| Attribute | Value |
|---|
| Expression Breadth | Ubiquitous |
| Total Present Calls | 296 |
| Total Absent Calls | 4 |
| Total Conditions | 300 |
| Max Expression Score | 94.44 |
| Average Expression Score | 87.57 |
| Gold Quality Count | 294 |
TOP 30 Tissues by Expression Score
| Tissue (UBERON ID) | Expression | Score | Quality |
|---|
| Tibialis anterior (UBERON:0001385) | present | 94.44 | gold |
| Skeletal muscle of rectus abdominis (UBERON:0004511) | present | 94.42 | gold |
| Deltoid (UBERON:0001476) | present | 94.37 | gold |
| Left ventricle myocardium (UBERON:0006566) | present | 94.30 | gold |
| Heart left ventricle (UBERON:0002084) | present | 93.97 | gold |
| Cardiac ventricle (UBERON:0002082) | present | 93.90 | gold |
| Primordial germ cell in gonad | present | 93.85 | gold |
| Vastus lateralis (UBERON:0001379) | present | 93.84 | gold |
| Quadriceps femoris (UBERON:0001377) | present | 93.78 | gold |
| Apex of heart (UBERON:0002098) | present | 93.60 | gold |
| Skeletal muscle tissue (UBERON:0001134) | present | 93.35 | gold |
| Heart right ventricle (UBERON:0002080) | present | 93.35 | gold |
| Ganglionic eminence (UBERON:0004023) | present | 93.22 | gold |
| Muscle organ (UBERON:0001630) | present | 92.99 | gold |
| Skeletal muscle organ (UBERON:0014892) | present | 92.99 | gold |
| Biceps brachii (UBERON:0001507) | present | 92.97 | gold |
| Muscle tissue (UBERON:0002385) | present | 92.79 | gold |
| Heart (UBERON:0000948) | present | 92.78 | gold |
| Muscle of leg (UBERON:0001383) | present | 92.75 | gold |
| Ventricular zone (UBERON:0003053) | present | 92.73 | gold |
| Tibia (UBERON:0000979) | present | 92.69 | gold |
| Calcaneal tendon (UBERON:0003701) | present | 92.62 | gold |
| Gastrocnemius (UBERON:0001388) | present | 92.53 | gold |
| Diaphragm (UBERON:0001103) | present | 92.49 | gold |
| Pituitary gland (UBERON:0000007) | present | 92.48 | gold |
| Triceps brachii (UBERON:0001509) | present | 92.45 | gold |
| Adenohypophysis (UBERON:0002196) | present | 92.41 | gold |
| Skin of hip (UBERON:0001554) | present | 92.34 | gold |
| Right atrium auricular region (UBERON:0006631) | present | 92.32 | gold |
| Corpus callosum (UBERON:0002336) | present | 92.23 | gold |
Expression Pattern: MLH1 shows ubiquitous expression with highest levels in skeletal muscle, cardiac muscle, and germline tissues. This is consistent with its fundamental role in DNA mismatch repair
in all dividing cells.
Single-Cell Expression Data
| Dataset | Description | Species | Cell Count |
|---|
| E-MTAB-2983 | Functional germ line stem cells in adult mammalian ovaries | Homo sapiens | 38 |
Section 12: Disease Associations
Mendelian/Monogenic Disease Links (GenCC)
Total Disease Associations: 18
| Disease | OMIM/MONDO | Inheritance | Evidence | Submitter |
|---|
| Lynch syndrome 2 | OMIM:609310 | AD | Definitive | Ambry, G2P |
| Lynch syndrome | MONDO:0005835 | AD | Definitive | G2P |
| Mismatch repair cancer syndrome 1 | OMIM:276300 | AR | Definitive | Ambry, G2P |
| Muir-Torre syndrome | OMIM:158320 | AD | Definitive | G2P |
| Lynch syndrome | ORPHANET:144 | AD | Supportive | Orphanet |
| Muir-Torre syndrome | OMIM:158320 | AD | Strong | Genomics England |
| Lynch syndrome 2 | OMIM:609310 | AD | Strong | Genomics England |
| Lynch syndrome 1 | OMIM:120435 | AD | Strong | Labcorp |
| Mismatch repair cancer syndrome 1 | OMIM:276300 | AR | Strong | Labcorp |
| Ovarian cancer | MONDO:0008170 | AD | Strong | Genomics England |
| Muir-Torre syndrome | OMIM:158320 | AD | Moderate | Ambry |
| Rhabdomyosarcoma | MONDO:0005212 | AR | Moderate | Genomics England |
| Malignant pancreatic neoplasm | MONDO:0009831 | AD | Moderate | Genomics England |
| Prostate cancer | MONDO:0008315 | AD | Limited | Ambry |
| Breast cancer | MONDO:0007254 | AD | Disputed | Ambry |
Orphanet Disease Entries
| Orphanet ID | Disease Name | Type | Gene Count | Phenotype Count |
|---|
| 144 | Lynch syndrome | Disease | 9 | 62 |
| 252202 | Constitutional mismatch repair deficiency syndrome | Disease | 4 | 0 |
Human Phenotype Ontology (HPO) Associations
Total Phenotype Terms: 89
TOP 50 Clinical Phenotypes:
| HPO ID | Phenotype |
|---|
| HP:0003003 | Colon cancer |
| HP:0005227 | Adenomatous colonic polyposis |
| HP:0100743 | Neoplasm of the rectum |
| HP:0012114 | Endometrial carcinoma |
| HP:0100615 | Ovarian neoplasm |
| HP:0002894 | Neoplasm of the pancreas |
| HP:0006753 | Neoplasm of the stomach |
| HP:0006725 | Pancreatic adenocarcinoma |
| HP:0040276 | Adenocarcinoma of the colon |
| HP:0040274 | Adenocarcinoma of the small intestine |
| HP:0006771 | Duodenal adenocarcinoma |
| HP:0006758 | Malignant genitourinary tract tumor |
| HP:0009726 | Renal neoplasm |
| HP:0010786 | Urinary tract neoplasm |
| HP:0002665 | Lymphoma |
| HP:0012539 | Non-Hodgkin lymphoma |
| HP:0012190 | T-cell lymphoma |
| HP:0001909 | Leukemia |
| HP:0004377 | Hematological neoplasm |
| HP:0003002 | Breast carcinoma |
| HP:0100031 | Neoplasm of the thyroid gland |
| HP:0001402 | Hepatocellular carcinoma |
| HP:0002896 | Neoplasm of the liver |
| HP:0008069 | Neoplasm of the skin |
| HP:0002671 | Basal cell carcinoma |
| HP:0030410 | Sebaceous gland carcinoma |
| HP:0009720 | Adenoma sebaceum |
| HP:0009592 | Astrocytoma |
| HP:0012174 | Glioblastoma multiforme |
| HP:0033681 | Oligodendroglioma |
| HP:0002888 | Ependymoma |
| HP:0002885 | Medulloblastoma |
| HP:0033682 | Pleomorphic xanthoastrocytoma |
| HP:0100835 | Benign neoplasm of the central nervous system |
| HP:0003006 | Neuroblastoma |
| HP:0002893 | Pituitary adenoma |
| HP:0002859 | Rhabdomyosarcoma |
| HP:0010622 | Neoplasm of the skeletal system |
| HP:0100684 | Salivary gland neoplasm |
| HP:0012118 | Laryngeal carcinoma |
| HP:0200008 | Intestinal polyposis |
| HP:0006719 | Benign gastrointestinal tract tumors |
| HP:0006778 | Benign genitourinary tract neoplasm |
| HP:0000006 | Autosomal dominant inheritance |
| HP:0000007 | Autosomal recessive inheritance |
| HP:0003596 | Middle age onset |
| HP:0001522 | Death in infancy |
| HP:0100613 | Death in early adulthood |
| HP:0007565 | Multiple cafe-au-lait spots |
| HP:0009732 | Plexiform neurofibroma |
GWAS Associations
Total: 7 associations
| Study ID | Trait/Disease | Mapped Gene | P-value |
|---|
| GCST012206 | Proximal colorectal cancer | MLH1 | 4×10⁻¹⁸ |
| GCST90002401 | Platelet distribution width | MLH1 | 2×10⁻¹⁸ |
| GCST90013405 | Liver enzyme levels (ALT) | MLH1 | 4×10⁻¹⁵ |
| GCST004521 | Autism spectrum disorder or schizophrenia | HSPD1P6-LINC02033 | 1×10⁻¹¹ |
| GCST004946 | Schizophrenia | TRANK1 | 3×10⁻¹¹ |
| GCST90011898 | Alanine aminotransferase levels | MLH1 | 2×10⁻⁸ |
| GCST003158 | Subjective response to lithium treatment | HSPD1P6-LINC02033 | 8×10⁻⁷ |
DATA SOURCES
| Database | Version/Access | URL |
|---|
| HGNC | Current | https://www.genenames.org |
| Ensembl | GRCh38 | https://www.ensembl.org |
| NCBI Gene | Current | https://www.ncbi.nlm.nih.gov/gene |
| UniProt | Current | https://www.uniprot.org |
| OMIM | Current | https://omim.org |
| RefSeq | Current | https://www.ncbi.nlm.nih.gov/refseq |
| PDB | Current | https://www.rcsb.org |
| AlphaFold | v4 | https://alphafold.ebi.ac.uk |
| InterPro | Current | https://www.ebi.ac.uk/interpro |
| ClinVar | Current | https://www.ncbi.nlm.nih.gov/clinvar |
| AlphaMissense | 2023 | https://alphamissense.hegelab.org |
| SpliceAI | Current | - |
| Reactome | Current | https://reactome.org |
| Gene Ontology | Current | http://geneontology.org |
| STRING | v12 | https://string-db.org |
| IntAct | Current | https://www.ebi.ac.uk/intact |
| Bgee | Current | https://www.bgee.org |
| PharmGKB | Current | https://www.pharmgkb.org |
| GenCC | Current | https://thegencc.org |
| Orphanet | Current | https://www.orpha.net |
| HPO | Current | https://hpo.jax.org |
| GWAS Catalog | Current | https://www.ebi.ac.uk/gwas |
| CollecTRI | Current | https://collectri.github.io |
Reference compiled via biobtree cross-database integration platform
I have compiled a comprehensive cross-database identifier and functional mapping reference for human MLH1 (mutL homolog 1).
Summary of Key Findings
| Category | Key Data |
|---|
| Gene IDs | HGNC:7127, ENSG00000076242, Entrez 4292, OMIM 120436 |
| Location | Chr3:36,993,226-37,050,896 (+ strand, 3p22.2) |
| Transcripts | 44 Ensembl, MANE Select: NM_000249 |
| Protein | P40692 (756 aa, 84.6 kDa) |
| Structures | 7 PDB entries, AlphaFold model (pLDDT 77.89) |
| Orthologs | Conserved across mouse, rat, zebrafish, fly, worm, yeast |
| ClinVar Variants | 6,260 total (500+ pathogenic) |
| AlphaMissense | 4,981 missense predictions |
| SpliceAI | 2,888 splice effect predictions |
| Pathways | 6 Reactome (mismatch repair, meiosis, TP53 regulation) |
| GO Terms | 35 (DNA repair, meiosis, ATP binding) |
| Interactions | 3,428+ STRING, 514+ IntAct |
| Expression | Ubiquitous (highest in muscle, heart, germline) |
| Primary Diseases | Lynch syndrome (definitive), Muir-Torre syndrome, CMMRD |
| Pharmacogenomics | VIP gene - affects platinum compound response |
The reference covers all 12 sections with counts, identifiers, and detailed listings suitable for research use.