Facets (new session)
Description
Metadata
Settings
owl:sameAs
Inference Rule:
b3s
b3sifp
dbprdf-label
facets
http://dbpedia.org/resource/inference/rules/dbpedia#
http://dbpedia.org/resource/inference/rules/opencyc#
http://dbpedia.org/resource/inference/rules/umbel#
http://dbpedia.org/resource/inference/rules/yago#
http://dbpedia.org/schema/property_rules#
http://www.ontologyportal.org/inference/rules/SUMO#
http://www.ontologyportal.org/inference/rules/WordNet#
http://www.w3.org/2002/07/owl#
ldp
oplweb
skos-trans
virtrdf-label
None
About:
Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study
Goto
Sponge
NotDistinct
Permalink
An Entity of Type :
schema:ScholarlyArticle
, within Data Space :
wasabi.inria.fr
associated with source
document(s)
Type:
Academic Article
research paper
schema:ScholarlyArticle
New Facet based on Instances of this Class
Attributes
Values
type
Academic Article
research paper
schema:ScholarlyArticle
isDefinedBy
Covid-on-the-Web dataset
has title
Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study
Creator
Lin, Chin
Fang, Wen-Hui
Hsu, Chia-Jung
Lee, Chia-Cheng
Lou, Yu-Sheng
Tsai, Dung-Jang
Wang, Mei-Chuen
Wu, Ding-Chung
Eysenbach, Gunther
Kakarmath, Sujay
Mendes, David
Paixao, Klerisson
Wang, Yanshan
Source
PMC
abstract
BACKGROUND: Most current state-of-the-art models for searching the International Classification of Diseases, Tenth Revision Clinical Modification (ICD-10-CM) codes use word embedding technology to capture useful semantic properties. However, they are limited by the quality of initial word embeddings. Word embedding trained by electronic health records (EHRs) is considered the best, but the vocabulary diversity is limited by previous medical records. Thus, we require a word embedding model that maintains the vocabulary diversity of open internet databases and the medical terminology understanding of EHRs. Moreover, we need to consider the particularity of the disease classification, wherein discharge notes present only positive disease descriptions. OBJECTIVE: We aimed to propose a projection word2vec model and a hybrid sampling method. In addition, we aimed to conduct a series of experiments to validate the effectiveness of these methods. METHODS: We compared the projection word2vec model and traditional word2vec model using two corpora sources: English Wikipedia and PubMed journal abstracts. We used seven published datasets to measure the medical semantic understanding of the word2vec models and used these embeddings to identify the three–character-level ICD-10-CM diagnostic codes in a set of discharge notes. On the basis of embedding technology improvement, we also tried to apply the hybrid sampling method to improve accuracy. The 94,483 labeled discharge notes from the Tri-Service General Hospital of Taipei, Taiwan, from June 1, 2015, to June 30, 2017, were used. To evaluate the model performance, 24,762 discharge notes from July 1, 2017, to December 31, 2017, from the same hospital were used. Moreover, 74,324 additional discharge notes collected from seven other hospitals were tested. The F-measure, which is the major global measure of effectiveness, was adopted. RESULTS: In medical semantic understanding, the original EHR embeddings and PubMed embeddings exhibited superior performance to the original Wikipedia embeddings. After projection training technology was applied, the projection Wikipedia embeddings exhibited an obvious improvement but did not reach the level of original EHR embeddings or PubMed embeddings. In the subsequent ICD-10-CM coding experiment, the model that used both projection PubMed and Wikipedia embeddings had the highest testing mean F-measure (0.7362 and 0.6693 in Tri-Service General Hospital and the seven other hospitals, respectively). Moreover, the hybrid sampling method was found to improve the model performance (F-measure=0.7371/0.6698). CONCLUSIONS: The word embeddings trained using EHR and PubMed could understand medical semantics better, and the proposed projection word2vec model improved the ability of medical semantics extraction in Wikipedia embeddings. Although the improvement from the projection word2vec model in the real ICD-10-CM coding task was not substantial, the models could effectively handle emerging diseases. The proposed hybrid sampling method enables the model to behave like a human expert.
has issue date
2019-07-23
(
xsd:dateTime
)
bibo:doi
10.2196/14499
bibo:pmid
31339103
has license
cc-by
schema:url
https://doi.org/10.2196/14499
resource representing a document's title
Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study
has PubMed Central identifier
PMC6683650
has PubMed identifier
31339103
schema:publication
JMIR Med Inform
resource representing a document's body
covid:PMC6683650#body_text
is
schema:about
of
named entity 'ICD-10-CM'
named entity 'word2vec'
named entity 'word2vec'
named entity 'semantic'
named entity 'EHR'
named entity 'word2vec'
named entity 'PubMed'
named entity 'Wikipedia'
named entity 'PubMed'
named entity 'word2vec'
named entity 'coevolution'
named entity 'cosine similarity'
named entity 'ICD-10-CM'
named entity 'sampling'
named entity 'medical term'
named entity 'NLP'
named entity 'ICD-10-CM'
named entity 'PubMed'
named entity 'Multimedia'
named entity 'MXNet'
named entity 'database'
named entity 'word embedding'
named entity 'NLP'
named entity 'neoplasm'
named entity 'EHRs'
named entity 'EHRs'
named entity 'AlexNet'
named entity 'green point'
named entity 'data sources'
named entity 'hypertension'
named entity 'Tri-Service General Hospital'
named entity 'Wikipedia'
named entity 'medical applications'
named entity 'medical coders'
named entity 'SARS'
named entity 'learning rate'
named entity 'EHR'
named entity 'semantics'
named entity 'pneumonia'
named entity 'database'
named entity 'Mayo Clinic'
named entity 'Research ethics'
named entity 'EHRs'
named entity 'PubMed'
named entity 'linear projection'
named entity 'word embeddings'
named entity 'word2vec'
named entity 'Taichung'
named entity 'word embedding'
named entity 'pneumonia'
named entity 'similarity scores'
named entity 'discharge note'
named entity 'PubMed'
named entity 'NLP'
named entity 'medical records'
named entity 'cosine similarity'
named entity 'NLP'
named entity 'NLP'
named entity 'oversampling'
named entity 'unstructured data'
named entity 'word2vec'
named entity 'Word embedding'
named entity 'Wikipedia'
named entity 'PubMed'
named entity 'training set'
named entity 'transfer learning'
named entity 'ICD-10-CM'
named entity 'Tri-Service General Hospital'
named entity 'word2vec'
named entity 'hypertension'
◂◂ First
◂ Prev
Next ▸
Last ▸▸
Page 1 of 6
Go
Faceted Search & Find service v1.13.91 as of Mar 24 2020
Alternative Linked Data Documents:
Sponger
|
ODE
Content Formats:
RDF
ODATA
Microdata
About
OpenLink Virtuoso
version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software