24Сʱѯߣ400-6800558 ڿǿɿְķڿķѯ!!!

ҽѧ֪ʶͼ׹оչ

ʱ䣺2019-08-02ҽѧ1

ժ Ҫ: ֪ʶͼ׹ҽѧձЧʵ͡ƶࡢչԲ⡣ҽݿ֡רҵǿṹӵص㣬Թҽѧ֪ʶͼ׵ĹؼԵϵȫҽѧ֪ʶʾ

ժ Ҫ: ֪ʶͼ׹ҽѧձЧʵ͡ƶࡢչԲ⡣ҽݿ֡רҵǿṹӵص㣬Թҽѧ֪ʶͼ׵ĹؼԵϵȫҽѧ֪ʶʾȡںϺԼ岿; ⣬ҽѧ֪ʶͼϢ֪ʶʴϵҽƷеӦ״󣬽ϵǰҽѧ֪ʶͼ׹ٵشս͹ؼ⣬䷢չǰչ

ҽѧ֪ʶͼ׹оչ

ؼ: ֪ʶͼ; ֪ʶȡ; ֪ʶں; ֪ʶ; ȻԴ

1998 ά֮ Berners-Lee DzȵϱԿ͹⣬γһ׼Ĺ;ͬʱӿ( linked open data)ĹģɢԽԽ֪ʶԪݡ֪ʶͼ׾Ĵݱ²һ֪ʶʾ͹ķʽǿ˹ܵչ£֪ʶͼ漰֪ʶȡʾںϡʴȹؼõһ̶ȵĽͻƣ֪ʶͼ׳Ϊ֪ʶһȵ㣬ܵѧߺ͹ҵ㷺ע

֪ʶͼ׵ǰ֪ʶ֯ͱ﷽ʹ֪ʶڼ֮ͼ֮佻ͨͼӹ˵һ֪ʶͼģʽͼͼ֮Ĺϵɣģʽͼ֪ʶĸǿϵʽģʽͼнڵǸʵ壬Ǹϵ part-of;ͼǿһϵп͹ʵͼеĽڵģʽͼеĸʵַ࣬ͼеıǾʵ;ģʽͼͼ֮Ĺϵָͼʵģʽͼĸ֮ĶӦ˵ģʽͼͼģߡ֪ͨʶͼйȸ Knowledge Graph [1] ֪ ( https: / /www. sogou. com/)YAGO[2] DBpedia [3]ȣǾйģʶص㡣Ŀǰҽѧ֪ʶͼӦĴֱ֮һϺҽԺҽҩ֪ʶͼ[4]ҽ֪ʶ SNOMED-CT (http: / /www. snomed. org /)IBM Watson Health ( http: / /www- 935. ibm. com/industries/hea lthcare /index. html) ӦýҲʼߡ

֪ʶͼܴݵǰо⣬Զеļ˳ӦϢʱķչ罥ʽģʽƣõݼɣ RDFOWL ȱ׼֧֣֪ʶȡҽѧϢҽϢϵͳķչ˺ҽѧݣδЩϢԹӦãƽҽѧܻĹؼ⣬ҽѧ֪ʶٴϡҽӲܻĻ

1 ҽѧ֪ʶͼ׹

Ľҽѧ֪ʶͼ׹Ϊ岿֣ҽѧ֪ʶıʾȡںϡԼͨӴĽṹǽṹҽѧȡʵ塢ϵԵ֪ʶͼ׵ԪأѡЧķʽ֪ʶ⡣ҽѧ֪ʶں϶ҽѧ֪ʶݽӣǿ֪ʶڲ߼Ժͱͨ˹ԶķʽΪҽѧ֪ʶͼ׸¾֪ʶ򲹳֪ʶ;֪ʶƶϳȱʧʵԶɼ;DZݵҪֶΣҽѧ֪ʶͼ׵ĿŶȺ׼ȷȡ

1. 1 ҽѧ֪ʶʾ֪ʶʾ

ΪһԼ֪ʶŻʽģʽĹ[5]Ҫо洢֪ʶķʾʽӰϵͳ֪ʶȡ洢õЧʡȻҽѧ෱ӡ洢ʽһӲʽͱ׼ͬ漰ص㣬ҽѧ֪ʶʾ죬ͬʱҲҽѧ֪ʶʾս

ҽ֪ʶõ֪ʶʾν߼ʾʽʾܱʾʾȣ SNOMEDCTڵ MYCIN ϵͳ[6]󳦸˾ݿ EcoCyc [7]ȡ֪ʶͼ֪ʶϵӻЩڱʾȱԣΪҪ֪ʶʾΪҽѧ֪ʶʾĸ򲹳䡣

ʾʽʾ֪ʶ(ʵ 1ϵʵ 2)Ԫʾڵ(ʵ)֪ʶͼ֮𽥵õϿɡʾ𣬱עʵȺ߸۽룬ҲиķչDZҲֶҪ RDF RDF-SDAMLOWL ȡʹñʾҽѧǿ󡢿ɻҽϢϵͳ;ùҽݵ;ṩڲͬ׼ͳƾۺϡҽѧĹҪҽѧĽṹ͸ܽɬǿԵҽѧ֪ʶЧرĿǰҽѧ֪ʶҽѧ֪ʶ LinkBase [8] TAMBIS (TaO) [9]ȡ

֪ʶͼ׵ĽڵӰĽṹӶȼЧʺѶȡ֪ʶʾѧϰѧϰоϢʾΪܵάЧϡ⣬Ӷ֪ʶںϺ[10]άʾһֲַʽʾ (distributed representation) [11]ģʹöԪ洢ĹƣʹöάʾϢ

֪ʶʾѧϰеĴģнṹʾ( structure embeddingSE) [12]ģ ( single layer model SLM) [13] ģ ( latent factor modelLFM) [14] TransE[15]ķģ͵ȡЩģͿʵЭͬԺͼ㿪ʾʵ壬ٶԱʾʵϵӦľ任ۺʵԣΪ֪֮ʶȫṩҪοKleyko [16]֤˷ֲʽʾʾҽѧͼзܹ࣬Ѿ䷽ͬ;Henriksson [17] Աʹö֪ʶʾʾ EHR ¼:ϼ¼ҩʹü¼ƷͲ̼¼Ȼ֪ʶʾѧϰΪҽѧ֪ʶͼ׵֪ʶʾ˼·

1. 2 ҽѧ֪ʶȡ

ҽѧ֪ʶͼ׵ĹҪǴӷǽṹ˹Զȡʵ塢ϵԡ˹ȡͨרһռϢȡ֪ʶĿǰͨ˹ҽѧ֪ʶٴҽѧ֪ʶ[18]SNOMED-CTICD-10 ;Զȡûѧϰ˹ܡھϢȡԴԶȡ֪ʶͼ׵ĻԪأԶҽѧ֪ʶĵһ廯ҽѧϵͳ UMLS [19]˹ȡĴ̫֪ʶԶȡĿǰصоҲǽ֪ʶͼ׵ơҪԶԴгȡ֪ʶϢʵ塢ϵԳȡ

1. 2. 1 ʵȡ

ʶıеҽѧʵ壬ĿͨʶؼһȡϵϢʶĸԱ׼ʽʾҽѧʵȡǴҽѧԴȡض͵ʵ壬ҽѧʵijȡΪࡣ

1)ҽѧʵ估ķ

÷ͨ˹ģʽƥɴʵʹҽѧʵгȡҽѧʵ壬ǾսԵġȣĿǰûֵ͵ʵ壬Լ򵥵ıƥ㷨DzӦʵʶ;Σͬĵʻɸĵĸıָͬ (׿ʻʵҲԷ);ٴΣҩʵͬʱӵж( PTEN MMAC1 ָͬĻ)ˣҽѧʵ估ֻڱ㷺ʹáFriedman [20]ͨԶģʽ﷨ʶӲеҽѧϢWu [21]ʹ CHV[22] SNOMEDCT ҽѧʵõ˲ʵȻ÷ܴﵽܸߵ׼ȷȣ޷׽⣬ҲרұдĴʵ͹޷Ӧҽѧʻ㲻ӿֵʵ

2)ҽѧԴѧģ͵Ļѧϰ

÷ͨʹͳѧͻѧϰҽѧԴصѵģͣʵʶӢҽѧʵȡ棬ߴԵıע i2b2 2010 [23] ӢĵӲעϣ⻹ SemEval( http: / /www. senseval. org /)NTCIR( http: / /research. nii. ac. jp /ntcir) ⣬Լ NCBI [24] ϿȣṩӢҽѧʵעݡ

Ŀǰ÷ɷģ(HMM)ģ(CRF)֧ģ( SVM) ȡKazama [25] ʹ SVM ģͽҽѧʵʶ POSʻ桢޼ලѵõ HMM ״̬÷ GENIA Ͽ׼ȷʸرǷܽϸЧӦڴģϼZhou [26]ͨһϵѵ HMM ģͣʵĹ̬POS崥Ʊȣʶ ׼ ȷ 66. 5% GENIA Ͽеٻʴ 66. 6% ۺϷChen [27] MedLEE ϵͳʶҽѧıϢӦĶϵͳʹȻԼʶڿժҪдڵıͶҽѧʵʶ𳣳ʹýСı֪ʶ⡣[28]Զصǧ UMLS ϸ幦ܺϸϰԼ鶯ﱾеļٸֶ˼ٸʵʵʶ׼ȷʴ 64. 0% ٻʴ 77. 1% ȻߣΪ֮оԱṩһе˼·

ҽѧʵʶʹݬԼ˹ערҵҪߡĿǰרνͶݱעоԭҪúδעݳģܣСнѧϰ̽ѧϰ֪ʶγһѧϰ̡

3)ѧϰ

ѧϰʼ㷺ӦʵʶߴԵģ 2011 Collobert [29]һģͣЧܳ˴ͳ㷨Sahu [30] CNN RNN ķɴǶĿǰõ㷨ҲҪ̡

ҽѧWei [31] CRF ˫ RNN ʹ SVM мʵʶĿǰҽѧϢʵʶѧϰģ BiLSTM-CRF ģͣ Jagannatha [32] Ա CRFBiLSTMBiLSTM-CRF ģԼһЩǵĸĽģӢĵӲʵʶЧʵл LSTM ģͶ CRF Чã BiLSTM CRF ģܹһ 2% 5% ׼ȷʡ

1. 2. 2 ʵȡ

ĽҽѧʵϵȡΪ:ͬҽѧʵ㼶ϵȡ缲ijθ—θ׵;ͬ͹ϵȡ缲—֢״ȡ

1)ͬҽѧʵ㼶ϵȡ

ͬҽѧʵ㼶ϵԽΪһҪ is-a part-of ϵҽѧϽѧϵҵ淶ϵҽѧʵ䡢ٿơϢ׼нС

ICD-10 [33]SNOMED ҽƴʵҽݿصעҽѧרҵ޴ʻķ͸׼ȨҺǷΧ㣬϶ϣҽҵ㷺Ͽɣdzȡ㼶ʵϵѡԴԾҽƴʵ䡢֪ʶṩݸʽͿ API ӿڣͨ桢ʽD2R ӳȼгȡֲṹȡԪƥ䡢λϵ

2)ͬҽѧʵϵȡ

ͬҽѧʵϵʶ»ͬԴʵ֣һǰٿƻṹԴ MEDLINE UMLS ;ǰṹĵӲҽѧʵ (ҪǼ֢״ơҩƷ)ĿǰͨʵԤҪȡĹϵͣٽȡתΪԤʵϵĿǰδͳһı׼ȡҽѧ֪ʶͼ׹ģʽͼáʵʶԴĿļӦóȣ i2b2 2010 УӲеʵϵֳҽҽ⡢ҽơҽࡣ

Uzuner [34]ھӲȡҽʵϵʹʵ˳;롢﷨ʹʻѵ SVM ͨԱʵ飬ָʻʵϵʶеҪáڴ˻ϣ MEDLINE ժҪFrunza [35] ȡ˼Ƽֹϵ UMLS ҽʵȡ˲ʵ Abacha [36]ͬʹ˹ģ SVM Ļģͣȡ 94. 07% ƽ F ֵоָʱģƥ䷽Ҫãʱ SVM Ҫá

⣬ڹϵʶķ෽ԱоУDe Bruijn [37] i2b2 2010 жԱомලͻ selftraining İලı֣ UMLS䷨δݶԹϵʶӰ졣ԤϵȻתΪķоģƥ䡢ͳƹֵȷȡϵ MEDLINE ժҪͨͳƻĹȡϵݹ־˹ϵͼ[38] MEDLINE ժҪͨ﷨ͼģʽƥ䣬ȡϵ[39]

ҽѧ֪ʶƼĶҽƼЩڿ

ҽѧļԱоĿǽְƶҪ󷢱ĵģһҵҪΪϸΪ˺ܶҪҽƼĵߣڿѡ˽IJ࣬ڿⷽģDZȽģΪˣСƼ˼ʵĿϣҪĵṩ

Ͷ巽ʽ ·䣺 sdwh_2001@126.comͶʱʼд++ϵ绰
·绰 24Сʱ400-6800558