Nucleati Germline Cancer Evidence Base automates the collection and curation of evidence to help variant classification and gene validity assessment.

Technological advancements have significantly reduced the cost of panel assays and sequencing. However, the classification of genetic variants remains a challenge. Published case reports of patients with germline mutation and panel/sequencing studies on a cohort of patients serve as significant evidence (ACMG Guidelines: PP4, PM6, PS2, BP5, BS4, and PP1). Individual and multiple case reports and series also strengthen or refute gene-disease associations. The enormous magnitude of published literature (with more than 30 million articles) makes finding relevant case reports a nontrivial task. Typically, expert professionals collect and curate case reports to assure accuracy. However, such efforts are expensive, and the scale is limited.

Nucleati is pioneering technology to fully automate relevant literature collection and curation processes for bio-medical needs . One of the early outcomes is the Nucleati Germline Cancer Evidence Base. The knowledge base provides access to AI-curated evidence categorized as case reports, case series, and GWAS through intuitive UI. Nucleati AI attempts to extract and normalize genes, disease, patient age, sex, ethnicity, variations, variant location, molecular consequences, zygosity, pathogenicity, interventions, and diagnostic process for each identified case report. Additionally, for case series and GWAS data Nucleati AI extracts the number of cases and controls as well as the ethnic background of the cohort. Nucleati AI curated data is freely available to the community to use.

Here we provide primary and secondary insights and statistics to justify using Nucleati Germline Evidence Base as a complementary resource to identify clinical evidence from literature and use it for patient care.

Nucleati Germline Cancer Evidence Base identifies and curates many case reports, case series, and GWAS evidence for well-established hereditary cancer genes.

Leading providers offer hereditary cancer screening panels. The panels include multiple genes based on the provider's discretion, primarily generated using existing evidence. Table 1 summarizes cancer-predisposing genes offered by nine cancer panel screening providers. While classic cancer-predisposing genes like MLH1, MSH2, TP53, POLE, etc., are present in all panels, ATR, FANCA, DDX41, etc., in only one panel. Table 1 also summarizes the number of individual case reports and case series reports available in the Nucleati Germline Cancer Evidence Base. Nucleati Germline Cancer Evidence Base has evidence for 95-97% of genes in one or more hereditary cancer panels. Columns 2 and 3 of Table 1 provide direct links to AI-curated data for case reports and case series for a given gene.

Table 1: Gene in Panel vs Case Reports

Gene Symbol Present in #number of panels Number of case reports in KB Number of case series in KB
ABRAXAS1 1 0 7
ATR 1 1 22
CBL 1 1 5
CASR 1 3 9
CDH23 1 0 4
FANCA 1 8 25
DDX41 1 1 22
HNF1B 1 5 10
MC1R 1 6 76
ENG 1 3 13
MLH3 1 6 28
HNF1A 1 3 8
LZTR1 1 9 16
MRE11A 1 0 0
PRSS1 1 6 7
PAX5 1 0 3
PRF1 1 5 15
TGFBR2 1 1 12
PTPN11 1 21 9
XRCC3 1 1 145
SBDS 1 4 4
TSHR 1 6 14
CYLD 2 30 7
FANCM 2 4 34
CTNNA1 2 4 18
EXT1 2 8 19
EXT2 2 6 23
HRAS 2 12 16
RECQL4 2 15 8
RHBDF2 2 1 0
RPS20 2 3 3
WRN 2 9 22
DIS3L2 3 2 3
FANCC 3 6 15
GPC3 3 5 0
PMS1 3 7 34
RNF43 3 2 11
XRCC2 3 3 64
SMARCE1 3 7 2
TERC 3 0 12
AIP 4 27 46
BLM 4 14 28
CDC73 4 52 34
CDKN1C 4 4 4
CEBPA 4 1 11
GALNT12 4 0 4
MRE11 4 4 44
GATA2 4 7 15
RECQL 4 3 16
PDGFRA 4 35 49
PRKAR1A 4 38 5
RUNX1 4 11 26
TERT 4 6 109
ALK 5 12 13
EGFR 5 28 76
KIT 5 100 103
NF2 5 110 53
PHOX2B 5 11 12
RB1 5 61 162
RAD50 5 5 81
SMARCB1 5 101 50
CDKN1B 6 19 42
GREM1 6 4 20
MAX 6 20 61
HOXB13 6 5 60
POT1 6 7 32
PTCH1 6 45 26
SDHA 6 19 91
SUFU 6 13 8
TMEM127 6 13 40
WT1 6 83 29
AXIN2 7 9 20
MITF 7 3 38
DICER1 7 82 69
FH 7 97 39
FLCN 7 65 27
MET 7 15 55
MSH3 7 7 40
NF1 7 212 186
NTHL1 7 6 28
SDHAF2 7 10 49
SDHB 7 111 302
SDHC 7 46 181
SDHD 7 100 278
SMARCA4 7 23 13
TSC1 7 56 30
TSC2 7 73 41
BARD1 8 12 113
BRIP1 8 7 153
MEN1 8 251 215
RET 8 367 585
NBN 8 19 167
VHL 8 222 395
APC 9 372 565
BRCA1 9 397 2002
ATM 9 39 452
BRCA2 9 352 1860
MLH1 9 198 652
BAP1 9 61 99
MSH2 9 221 650
BMPR1A 9 11 18
MSH6 9 121 439
PMS2 9 94 296
EPCAM 9 15 69
CDH1 9 83 291
CDK4 9 16 113
MUTYH 9 49 253
CDKN2A 9 64 248
CHEK2 9 55 516
TP53 9 409 1371
PTEN 9 174 306
STK11 9 107 137
PALB2 9 52 400
SMAD4 9 56 72
POLD1 9 15 66
POLE 9 37 80
RAD51C 9 12 153
RAD51D 9 7 102

Nucleati Germline Cancer Evidence Base identifies evidence for emerging cancer or related disorder-predisposing genes.

Evidence collected through Nucleati AI goes above and beyond established cancer and related disorder-predisposing genes. Table 2 summarizes genes, the number of case reports, and case series reports those are not present in selected gene panels from nine providers. Although the genes at the top (e.g., XRCC1, BRAF, KRAS, ERCC2, MDM2, CTNNB1, PI3KCA) are primarily mutated in tumors (profiled along with germline mutation), the articles also include germline mutations in these genes. A few genes with emerging cancer-associated phenotypes CDKN2B/ETV6 (Leukemia), XPC (melanoma, xeroderma, skin neoplasm), GATA3 (Breast Carcinoma) are noteworthy. Nucleati AI also identifies and curates evidence for genes associated with uncontrolled or abnormal growth symptoms like dysplasia, Meningioma, and Hemochromatosis (e.g., GNAS, TP63, ARMC5, COL2A1, RUNX2, HFE). Recently, exome sequencing has become affordable and amenable to routine clinical profiling. More and more evidence of the association of new genes with cancer will likely be published in the coming decade. Resources like Nucleati Germline Cancer Evidence base will be essential in identifying and cataloging emerging associations.

Table 2: New Genes and Count

Gene Combined articles Case report articles case series articles
XRCC1 259 0 259
BRAF 205 53 152
KRAS 192 54 138
ERCC2 167 7 160
MDM2 158 6 152
RAD51 134 13 121
MTHFR 132 0 132
PIK3CA 125 22 103
CTNNB1 119 49 70
ERBB2 99 20 79
OGG1 92 5 87
CCND1 77 5 72
GSTM1 75 0 75
NOD2 71 10 61
GSTT1 64 0 64
ERCC5 61 0 61
GSTP1 59 0 59
ESR1 56 0 56
ERCC1 55 0 55
JAK2 53 12 41
FGFR2 50 6 44
XPC 49 7 42
IDH1 48 25 23
ERCC4 47 0 47
CYP1A1 46 0 46
CDKN2B 44 5 39
CASP8 44 0 44
MGMT 43 4 39
TP63 43 25 18
MYC 41 8 33
GNAS 38 28 10
ARMC5 38 15 23
TCF7L2 36 0 36
FGFR3 36 15 21
CYP1B1 34 0 34
VDR 33 0 33
VWF 31 0 31
RNASEL 30 0 30
AKT1 29 4 25
FBXW7 29 0 29
FHIT 29 6 23
HFE 28 11 17
XRCC4 28 0 28
NQO1 28 0 28
KCNQ1 27 0 27
RAD51B 27 5 22
BCL2 27 7 20
TOX3 27 0 27
FBN1 26 15 11
AR 26 10 16
IDH2 26 10 16
DNMT3B 25 0 25
PTPN22 25 0 25
KMT2D 25 4 21
KMT2C 25 0 25
NRAS 25 8 17
EPAS1 24 7 17
COL2A1 24 17 7
CYP21A2 24 24 0
XPA 24 8 16
RUNX2 23 19 4
CFTR 23 8 15
TET2 23 0 23
RAD52 23 0 23
SMAD7 23 0 23
MAP3K1 22 0 22
TGFBR1 22 0 22
FANCD2 22 7 15
KDR 21 5 16
FLT3 21 5 16
EXO1 21 0 21
MDM4 20 0 20
BAX 20 0 20
KIF1B 20 4 16
FGFR4 20 0 20
ELAC2 20 0 20
CDKN1A 19 0 19
PRPF31 19 0 19
EZH2 19 0 19
GJB2 19 7 12
MTRR 19 0 19
CFH 19 0 19
TYMS 19 5 14
RTEL1 19 0 19
NAT2 18 0 18
HMGA2 18 11 7
GATA3 18 6 12
MSR1 18 0 18
SETD2 18 0 18
LIG4 18 7 11
PTGS2 18 0 18
RBM45 18 4 14
ABCB1 18 0 18
PKD1 17 11 6
PGLS 17 0 17
EGF 17 0 17
DCC 17 0 17
CDK6 17 0 17
APOE 17 0 17
ABCA4 17 0 17
SERPINB3 17 0 17
CYP11B1 16 16 0
ATRX 16 10 6
SLCO6A1 16 0 16
CHEK1 16 0 16
FANCI 16 0 16
PARP1 16 0 16
COMT 16 0 16
CYP19A1 16 0 16
TLR4 16 0 16
ETV6 16 5 11

Nucleati Germline Cancer Evidence Base is a complementary resource to identify evidence for gene validity and variant-pathogenicity classifications.

There is no direct way to compare AI-curated and human-curated data exhaustively. An indirect measure compares the number of articles collected by Nucleati AI and publicly accessible variant classification repositories: ClinVar and ClinGen. Chart 1 summarizes the overlapping and exclusive articles in ClinVar and Nucleati Germline Cancer Evidence Base for genes present in all nine panels used in the analysis. While there is a significant overlap between resources, there are exclusive articles in the ClinVar or Nucleati Germline Cancer Evidence Base.


Extending the analysis of evidence only present in ClinVar and Nucleati Germline Cancer Evidence Base for the BAP1 gene (56 and 95, respectively), we manually collected reported variations in the article. The manually collected variations by going through exclusive articles are summarized in Table 3. This table further strengthens the use of evidence in the form of articles identified by Nucleati AI.

Table 3: Variants collected from articles identified using ClinVar and Nucleati Germline Cancer Evidence Base

ClinVar Exclusives Nucleati Germline Cancer Evidence Base Exclusive
PubMed Id Mutation(s) PubMed Id Mutation(s)
18757409 NM_004656.4(BAP1):c.2050C>T (p.Gln684Ter) NM_004656.4(BAP1):c.2017G>T (p.Glu673Ter) NM_004656.4(BAP1):c.1986del (p.Ile662fs) 22889334 NM_004656.3(BAP1):c.1708C>G (p.Leu570Val)
23684012 NM_004656.4(BAP1):c.2050C>T (p.Gln684Ter) 23552620 NM_004656.4(BAP1):c.214del (p.Ile72LeufsTer6)
24728327 NM_004656.4(BAP1):c.1786A>G (p.Ser596Gly) NM_004656.4(BAP1):c.1735G>A (p.Gly579Arg) NM_004656.4(BAP1):c.1408G>A (p.Gly470Arg) NM_004656.4(BAP1):c.1325C>G (p.Pro442Arg) NM_004656.4(BAP1):c.905C>T (p.Pro302Leu) NM_004656.4(BAP1):c.121G>A (p.Gly41Ser) 23585512 NM_004656.4(BAP1):c.758dup (p.Thr254AspfsTer30)
26467025 NM_004656.4(BAP1):c.2057-4G>T NM_004656.4(BAP1):c.1838C>T (p.Thr613Met) NM_004656.4(BAP1):c.1786A>G (p.Ser596Gly) NM_004656.4(BAP1):c.1730-1G>A NM_004656.4(BAP1):c.1729+8T>C NM_004656.4(BAP1):c.1413T>G (p.Ala471=) NM_004656.4(BAP1):c.1002A>G (p.Leu334=) NM_004656.4(BAP1):c.783G>A (p.Gln261=) NM_004656.4(BAP1):c.294C>T (p.Ser98=) NM_004656.4(BAP1):c.121G>A (p.Gly41Ser) 25830670 NM_004656.4(BAP1):c.2054A>T (p.Glu685Val)
26689913 NM_004656.4(BAP1):c.1735G>A (p.Gly579Arg) NM_004656.4(BAP1):c.1337A>G (p.Asn446Ser) 26140217 NM_004656.4(BAP1):c.518A>G (p.Tyr173Cys)
25929848 NM_004656.4(BAP1):c.1735G>A (p.Gly579Arg) 25342144 NM_004656.4(BAP1):c.134G>A (p.Gly45Glu)
28380455 NM_004656.4(BAP1):c.1735G>A (p.Gly579Arg) 27751355 NM_004656.4(BAP1):c.329_335delinsTC (p.Pro110LeufsTer14)
26166446 NM_004656.4(BAP1):c.188C>G (p.Ser63Cys) NM_004656.4(BAP1):c.121G>A (p.Gly41Ser) 29891518 NM_004656.4(BAP1):c.371C>T (p.Pro124Leu)
22683710 NM_004656.4(BAP1):c.121G>A (p.Gly41Ser) 31706282 NM_004656.4(BAP1):c.1265del (p.Gly422GlufsTer8)
26845104 NM_004656.4(BAP1):c.1063C>T (p.Gln355Ter) 30578689 NM_004656.4(BAP1):c.2001del (p.Thr668ProfsTer24)
24166983 NM_004656.4(BAP1):c.1946G>A (p.Cys649Tyr) 29554022 BAP1:p.D567X
32068069 NM_004656.4(BAP1):c.1147C>T (p.Arg383Cys) 33093002 NM_004656.4: c.255_255+6del
29641532 NM_004656.4(BAP1):c.1147C>T (p.Arg383Cys) 33330039 NC_000003.12:g.52406903_52406924del
30480620 NM_004656.4(BAP1):c.878C>T (p.Pro293Leu) 32583627 NM_004656.4(BAP1):c.1565_1566del (p.Pro522ArgfsTer14)
28687356 NM_004656.4(BAP1):c.944A>C (p.Glu315Ala) 34504799 NM_004656.4(BAP1):c.1777C>T (p.Gln593Ter)
28170043 NM_004656.4(BAP1):c.519T>G (p.Tyr173Ter) 34725624 NM_004656.4(BAP1):c.2050C>T (p.Gln684Ter)
29684080 NM_004656.4(BAP1):c.1943C>T (p.Ala648Val) NM_004656.4(BAP1):c.1810G>A (p.Val604Met) NM_004656.4(BAP1):c.1249A>G (p.Arg417Gly) NM_004656.4(BAP1):c.1201_1212del (p.Tyr401_Asp404del) 35381901 NM_004656.4(BAP1):c.898_899del (p.Arg300GlyfsTer6)
16199547 NM_004656.4(BAP1):c.1891-1G>A NM_004656.4(BAP1):c.1730-2A>G NM_004656.4(BAP1):c.1251-2A>G NM_004656.4(BAP1):c.783+2T>C NM_004656.4(BAP1):c.783+1G>A NM_004656.4(BAP1):c.581-1G>T NM_004656.4(BAP1):c.581-2A>G NM_004656.4(BAP1):c.437+1G>T NM_004656.4(BAP1):c.376-2A>G NM_004656.4(BAP1):c.375+2T>A NM_004656.4(BAP1):c.122+1G>T NM_004656.4(BAP1):c.122+1G>A NM_004656.4(BAP1):c.67+1del NM_004656.4(BAP1):c.38-1G>A 35992853 NM_004656.4(BAP1):c.535C>T (p.Arg179Trp)
28034829 NM_004656.4(BAP1):c.1441C>A (p.His481Asn) NM_004656.4(BAP1):c.1066C>T (p.Arg356Trp) 35360426 NM_004656.4(BAP1):c.604T>C (p.Trp202Arg)
28900502 NM_004656.4(BAP1):c.1441C>A (p.His481Asn) NM_004656.4(BAP1):c.1066C>T (p.Arg356Trp) 35483881 NM_004656.4(BAP1):c.535C>T (p.Arg179Trp)
17576681 NM_004656.4(BAP1):c.2057-3C>T NM_004656.4(BAP1):c.2056+5G>C NM_004656.4(BAP1):c.2056+3G>A NM_004656.4(BAP1):c.2056+1G>C NM_004656.4(BAP1):c.1983+6T>C NM_004656.4(BAP1):c.1983+4G>A NM_004656.4(BAP1):c.1890+6C>G NM_004656.4(BAP1):c.1890+5G>A NM_004656.4(BAP1):c.1729+6C>T NM_004656.4(BAP1):c.1729+3G>A NM_004656.4(BAP1):c.1729G>A (p.Glu577Lys) NM_004656.4(BAP1):c.1250+4A>G NM_004656.4(BAP1):c.931+6G>A NM_004656.4(BAP1):c.931+6G>T NM_004656.4(BAP1):c.931+4C>T NM_004656.4(BAP1):c.931+3A>C NM_004656.4(BAP1):c.931+3A>T NM_004656.4(BAP1):c.659+6G>A NM_004656.4(BAP1):c.659+5G>A NM_004656.4(BAP1):c.437G>A (p.Arg146Lys) NM_004656.4(BAP1):c.375+5G>C NM_004656.4(BAP1):c.375+4G>A NM_004656.4(BAP1):c.256-3C>T NM_004656.4(BAP1):c.256-3C>A NM_004656.4(BAP1):c.255+3C>G NM_004656.4(BAP1):c.255G>C (p.Gln85His) NM_004656.4(BAP1):c.123-3C>T NM_004656.4(BAP1):c.122+6T>A NM_004656.4(BAP1):c.122+5G>C NM_004656.4(BAP1):c.122G>T (p.Gly41Val) NM_004656.4(BAP1):c.67+6_67+7del NM_004656.4(BAP1):c.67+5G>C NM_004656.4(BAP1):c.38-3del NM_004656.4(BAP1):c.37+5G>A NM_007294.4(BRCA1):c.134+3A>C 35814862 BAP1:c.458_459delCT
9536098 NM_004656.4(BAP1):c.2057-3C>T NM_004656.4(BAP1):c.2056+5G>C NM_004656.4(BAP1):c.2056+3G>A NM_004656.4(BAP1):c.2056+1G>C NM_004656.4(BAP1):c.1983+6T>C NM_004656.4(BAP1):c.1983+4G>A NM_004656.4(BAP1):c.1890+6C>G NM_004656.4(BAP1):c.1890+5G>A NM_004656.4(BAP1):c.1729+6C>T NM_004656.4(BAP1):c.1729+3G>A NM_004656.4(BAP1):c.1729G>A (p.Glu577Lys) NM_004656.4(BAP1):c.1250+4A>G NM_004656.4(BAP1):c.931+6G>A NM_004656.4(BAP1):c.931+6G>T NM_004656.4(BAP1):c.931+4C>T NM_004656.4(BAP1):c.931+3A>C NM_004656.4(BAP1):c.931+3A>T NM_004656.4(BAP1):c.659+6G>A NM_004656.4(BAP1):c.659+5G>A NM_004656.4(BAP1):c.437G>A (p.Arg146Lys) NM_004656.4(BAP1):c.375+5G>C NM_004656.4(BAP1):c.375+4G>A NM_004656.4(BAP1):c.256-3C>T NM_004656.4(BAP1):c.256-3C>A NM_004656.4(BAP1):c.255+3C>G NM_004656.4(BAP1):c.255G>C (p.Gln85His) NM_004656.4(BAP1):c.123-3C>T NM_004656.4(BAP1):c.122+6T>A NM_004656.4(BAP1):c.122+5G>C NM_004656.4(BAP1):c.122G>T (p.Gly41Val) NM_004656.4(BAP1):c.67+6_67+7del NM_004656.4(BAP1):c.67+5G>C NM_004656.4(BAP1):c.38-3del NM_004656.4(BAP1):c.37+5G>A NM_007294.4(BRCA1):c.134+3A>C 35114507 BAP1:c.1780_1781insT, p.(G549Vfs*49)
30258054 NM_004656.4(BAP1):c.1339G>A 19197335 NM_004656.4(BAP1):c.294C>T (p.Ser98=) NM_004656.4(BAP1):c.1002A>G (p.Leu334=) NM_004656.4(BAP1):c.1026C>T (p.Ser342=)
26554828 NM_004656.4(BAP1):c.606G>T (p.Trp202Cys) NM_004656.4(BAP1):c.604T>C (p.Trp202Arg) 21956388
29610392 NM_004656.4(BAP1):c.374A>C (p.Glu125Ala) NM_004656.4(BAP1):c.188C>G (p.Ser63Cys) 24916674 NM_004656.4(BAP1):c.605G>A (p.Trp202Ter)
21642991 NM_004656.4(BAP1):c.188C>G 25468148 NM_004656.4(BAP1):c.1026C>T (p.Ser342=)
24894717 NM_004656.4(BAP1):c.188C>G (p.Ser63Cys) 27494029 NM_004656.4(BAP1):c.1550C>T (p.Thr517Met)
26452128 NM_004656.4(BAP1):c.188C>G (p.Ser63Cys) 29298805 NM_004656.4(BAP1):c.233A>G (p.Asn78Ser) NM_004656.4(BAP1):c.1147C>T (p. Arg383Cys) NM_004656.4(BAP1):c.1748C>T (p.Ser583Leu) NM_004656.4(BAP1):c.1695dup (p.Glu566fs*1) NM_004656.4(BAP1):c.1717delC (p.Leu573fs*3) NM_004656.4(BAP1):c.1882_1885delTCAC (p.Ser628fs*8) NM_004656.4(BAP1):c.1717delC (p.Leu573fs*3) NM_004656.4(BAP1):c.1729+1G>A (p.?) NM_004656.4(BAP1):c.1891-1G>A (p.?)
30039884 NM_004656.4(BAP1):c.1550C>T (p.Thr517Met) 29504908 NM_004656.4(BAP1):c.1135G>A (p.Ala379Thr)
29478780 NM_004656.4(BAP1):c.959dup (p.Cys320fs) 29769598 NM_004656.4(BAP1):c.79del (p.Val27CysfsTer?) NM_004656.4(BAP1):c.2T>A (p.M1K) NM_004656.4(BAP1):c.505dup (p.His169ProfsTer14)
27749792 NM_004656.4(BAP1):c.122+1G>A 30376426 NM_004656.4(BAP1):c.T1938A (p.Tyr646Ter) NM_004656.4(BAP1):c.1882_1885del (p.Ser628ProfsTer8) NM_004656.4(BAP1):c.438-2A>G NM_004656.4(BAP1):c.1717del (p.Leu573TrpfsTer3) NM_004656.4(BAP1):c.659+1G>C NM_004656.4(BAP1):c.604T>C (p.Trp202Arg) NM_004656.4(BAP1):c.1153C>T (p.Arg385Ter) NM_004656.4(BAP1):c.1729G>T (p.Glu577Ter) NM_004656.4(BAP1):c.2050C>T (p.Gln684Ter) NM_004656.4(BAP1):c.122+1G>A NM_004656.4(BAP1):c.1203dup (p.Glu402Ter) NM_004656.4(BAP1):133G>A (p.Gly45Arg)
23032617 NM_004656.4(BAP1):c.79del (p.Val27fs) 32012241 NM_004656.4(BAP1):c.784-1G>A
29978187 NM_004656.4(BAP1):c.437G>A (p.Arg146Lys) 33748184 BAP1:p.C39fs
28724667 NM_004656.4(BAP1):c.132T>G (p.Tyr44Ter) NM_000059.4(BRCA2):c.3860del (p.Asn1287fs) 34767027 BAP1:p.K453Rfs* BAP1:p.L573fs*
29351919 NM_004656.4(BAP1):c.2116A>G (p.Ile706Val) 33600035 NM_004656.4(BAP1):c.783+2T>C
35920959 NM_004656.4(BAP1):c.1984-2A>C NC_000003.12:g.52407461_52407462del NM_004656.4(BAP1):c.2057-4G>T
35885614 NM_004656.4(BAP1):c.46dup (p.Thr16AsnfsTer?) NM_004656.4(BAP1)::c.1153C>T (p.Arg385Ter) NM_004656.4(BAP1)::c.605G>A (p.Trp202Ter)
35777164 NM_004656.4(BAP1):c.1337del (p.Asn446ThrfsTer?) NM_004656.4(BAP1):c.326_327dup (p.Pro110AspfsTer4) NM_004656.4(BAP1):c.605G>A (p.Trp202Ter) NM_004656.4(BAP1):c.677del (p.Ile226ThrfsTer5) NM_004656.4(BAP1):c.799_800del (p.Gln267AlafsTer16)
35032816 NM_004656.4(BAP1):c.38-1G>T

Curated and normalized evidence provide secondary insights into the characteristic of disorders.

Unlike raw text articles, Nucleati curated evidence efficiently extracts, understands, and normalizes attributes like age, sex, and ethnicity. These attributes across the several case reports provide secondary insights like age of onset or distribution as a function of sex and ethnic background. Chart 2 summarizes the age and sex distribution of case reports. The age distribution chart supports the early onset of disease with DICER1 mutation. Similarly, the sex distribution chart manifests BRCA1, CHEK2, and PALB2-driven cancers in females.


Conclusions

Nucleati Germline Cancer Evidence Base is a product of the first-of-its-kind fully automated medical-grade evidence curation pipeline. The pipeline consists of automated literature collection, curation, data normalization, and ontology mappings. Through the data presented above, we establish that the evidence present in the Nucleati Germline Cancer Evidence Base aids in the variant classification and gene-validity assessment for well-established as well as emerging cancer-predisposing genes. Additionally, the data curated using an in-house AI-driven data-curation pipeline provides secondary insights that are impossible to derive from any other comparable resources. Nucleati is developing a scaled-up product to offer evidence for any genetically predisposed disorder as curated case reports, case series, GWAS, and experimental studies.