0% found this document useful (0 votes)
13 views

Deep Learning For Electronic Health Record Analyti

Uploaded by

yusra faisal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Deep Learning For Electronic Health Record Analyti

Uploaded by

yusra faisal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)

Deep Learning for Electronic Health Record


Analytics
Gaspard Harerimana, Member, IEEE; Jong Wook Kim, Member IEEE, and Beakchol Jang, Member, IEEE

Corresponding author : [email protected]

heterogeneity there is a high probability of missing or erroneous


Abstract—Recent technological advancements have led to a
entries resulting into high reluctance by practitioners in using
deluge of medical data from various domains. However, the
these usually expensive technologies, mainly because they still
recorded data from divergent sources comes poorly annotated,
need to use abductive reasoning in getting clinical insights from
noisy and unstructured. Hence, the data is not fully leveraged to them to perform effective diagnosis.
establish actionable insights that can be used in clinical Though hospitals have effectively used the EHR for other
applications. These data recorded in hospital’s Electronic Health administrative and corporate tasks like patients logging, assets
Records (EHR) consist of patient information, clinical notes, management, transfers management, and mainly billing
charted events, medications, procedures, laboratory test results, operations, there is a need to find ways to effectively use the
diagnosis codes etc. Traditional machine learning, and statistical EHR for patient’s diagnosis. The only solution to this is the use
methods have failed to offer insights that can be used by of EHR analytic solutions that will support the physician’s
physicians to treat patients as they need to obtain an expert expertise. With the recent achievements of artificial intelligence,
opinion assisted features before building a predictive model. machine learning methods ranging from simple regression to
With the rise of deep learning methods, there is a need to complex Recurrent Neural Networks (RNN) can be used to
understand how deep learning can save lives. The purpose of this
bridge the inferential gap for various EHR tasks. However,
various complex challenges to integrate them coupled with the
study was to offer an intuitive explanation for possible use cases
limited availability of labeled data for training models as well
of deep learning with Electronic Health Records. We reflect on
as privacy issues associated with mistrusts between providers,
techniques that can be applied by health informatics
hinder the effective use of these learning systems to achieve
professionals by giving technical intuitions and blue prints on effective care. Though deep learning techniques are highly
how each clinical task can be approached by a deep learning regarded as crosscutting novelty, there are still tasks in the EHR
algorithm. that can be efficiently be solved by classical machine learning
techniques like regression, random forests and Bayesian
techniques. Machine learning has empowered the newest
Index Terms—Electronic health Records, Convolutional
Neural Networks, Recurrent Neural Networks, Adverse drug
methods like computational phenotyping in medical care as well
Events, EHR raw features. as integrating genomics data into clinical procedures.
A. Motivation for this study
Mining the EHR longitudinal data for clinical insights is a
I. INTRODUCTION tiresome aspect of building health analytic solutions. Hospitals
The Health Information Technology for Economic and use customized EHRs which are comprised of heterogeneous
Clinical Health (HITECH) Act of 2009 raised an increase in the mix of elements many of whom are voluminous and
adoption of Electronic Health Records (EHR) [1] by hospitals. unstructured content. The noisiness and sparsity of the EHRs
Hospitals and other points of care have diversified their efforts requires effective feature extraction and phenotyping before
in constructing robust Electronic Health Records facilities to extracting insights from the data. Though there are various
capture and leverage these data which are usually ill-understood. works done to explore methods used to mine data from EHR,
Currently there is a high ubiquity of health raw data mainly there is a need to understand the EHR data mining from an
caused by the abundance of state-of-the-art clinical testing aggregation point of view. For example, adverse event
devices and medical Internet of Things (mIoT) [2]. This prediction, a process intended to find impending risk of a
opportunity is a milestone to healthcare and there is undoubted hospitalized patient can be performed by aggregating insights
belief that precision and personalized healthcare will be boosted. from doctor’s notes (unstructured text data), MRI test (image
EHRs contains highly multidimensional, heterogeneous, data), ICD-10 nomenclature database (structured Text data) etc.
multimodal, irregular, time series data like laboratory test Hence, this process needs analytic solution to aggregate insights
results, doctor notes, medication prescriptions, demographic from these diverging data. In this paper we intend to help EHR
information, diagnoses, epidemiology, behavioral data, etc. analytic designers to use deep learning technique for effective
With these vast data the clinical tasks can range from critical analytic techniques to be included in Clinical Decision Support
care to long term planning. Data in EHR can help into a choice System (CDSS) by tipping them with techniques and
of treatment, finding patient similarity, integrating genomics mechanisms to extract, transform, load, and leverage disparate
data for personalized treatment, and predicting days that a EHR data.
patient will spend in the hospital. However, due to this high
1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)
TABLE I
STUDIES THAT COVERED DEEP LEARNING APPLICATION TO EHR

Study year approach


Deep EHR: A survey of recent 2018 This study covered the aspects of EHR data, recent deep learning in
advances in deep learning techniques healthcare projects. The paper describes some of deep learning algorithms
for electronic health record (EHR) used in health informatics, and some of the clinical tasks that leverage deep
analysis [5] learning methods. The paper does not provide a technical detailed
implementation.
Deep learning for health informatics 2017 This study covered different deep learning architectures used in health
[6] informatics. They also covered various software packages that are used to
provide deep neural networks implementations. Mainly the study covered
various clinical tasks that deep learning is used into like Medical imaging,
Pervasive Sensing etc..
Opportunities and challenges in 2018 A literature review which covered clinical analytics tasks that leverage EHR
developing deep learning models using data. It covers also various deep algorithms that are famously used in health
electronic health records data: a informatics.
systematic review [7]
Deep learning in pharmacogenomics: 2018 This study also covered opportunities and challenges of deep learning in
from gene regulation to patient healthcare. They focused mainly on issues of pharmacogenomics and drug
stratification [8] targets.
Our Study: Deep learning for 2019 We describe the challenges of deep learning to EHR data. The focus of the
Electronic Health Record analytics paper is to provide technical intuitions on the utilization of deep learning
algorithms to each task of EHR clinical knowledge discovery. We try to cover
each task that clinicians perform on daily basis. This will help EHR based
application developer to consider a task-oriented approach than a data-
oriented approach in analytics development.

techniques that can be used to mine this unexplored big data.


B. Organization of this paper
More of related researches have focused on applying data
This paper is organized as follows: In Section II, we cover the mining methods for an aspect of EHR data mining. Ching T et
related works to review approaches used by various authors in al. [3] thoroughly discussed opportunities and challenges in
coming closer to providing a concise insight of using deep using deep learning for biology and medicine, though this study
learning methods to EHR. In section III, we cover the anatomy was much exhaustive it did not elaborate more on the technical
and structure of EHR data using example from a real EHR side of the processes involved. [4] covered DeepEHR by
database. We explore the various aspect of this data unraveling surveying recent advances in deep learning techniques for EHR.
hidden patterns. In Section IV, we cover the challenges that an This study focused on identifying key works done in deep
EHR analytics designer is likely to face. In Section V, we try to learning for EHR. The description of these works done as well
tip developers by give a glance of techniques per clinical task as their approaches are detailed in Table I.
by covering a successful case study. In Section VI, we conclude
by giving future directions. III. ANATOMY OF THE EHR DATA
C. Methodology In this section we cover the structure and anatomy of EHR
We considered hospital’s workflow by covering clinical tasks data. The EHR is composed with huge data sourced from daily
that can be performed by clinicians. For each task we give recordings by practitioners and hospital instruments. Each
insights about the type of EHR data that can be used. We show patient record is saved in a table in EHR warehouse. Though
the challenges associated by each task, and we give a blue print structure of database can vary depending on specific medical
of an appropriate deep learning model by either analyzing an and computing requirements, examples and cases described in
already made model or proposing how it can be designed to this section are retrieved from the MIMICIII dataset [115]. A
produce the required insights. Our approach is to answer the more detailed description of EHR data is found in Table 2 with
question "how did they do it?" wherever there is an existing deep the following components being the big constituents.
learning solution to an EHR task, and "how can we do it?" from A. Patient information
our own perspective. Due to the high complexity of deep
Patient basic information is perhaps the simplest and the most
learning application to EHR, we try to explain the concepts in
structured data of the EHR data. It contains basic information of
simpler terms. However due to complexity of deep learning, it
a patient like his hospital ID, an identifier which will identify
is impossible to thoroughly explain every concept, hence a
the patients through his stay. It contains his gender, date of birth,
modest understanding of machine learning is required to
date of discharge (or death) and other charted patient’s data.
understand the content of this work.
More data related to admission like admission time, discharge
II. RELATED WORKS time, admission type (emergency or pre-planned) insurance
information and any other basic information of interest. Though
Using the vastly available EHR data for clinical analytics has the recording of these information looks straightforward many
recently gained a big deal of attention. However, few studies workers at hospital do not consider accurate recordings as one
would come up with a complete set of methodologies and
2

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)
TABLE II
ANATOMY OF EHR DATA (WITH INTUITIONS FROM THE MIMICIII DATA SET)

Type of data Data Purpose of data Example entries Deep learning task
source contribution
Patient Admission_type: Emergency  Adverse events Detection
Admission location: Emergency [15]
demographic Hospital The background information Room: Admit  Clinical trial recruitment
information database Including name (de-identified for privacy), Discharge location: DISCTRAN
age, allergies, and CANCER/CHLDRN H
any other demographic Diagnosis: BENZODIAZEPINE
data. OVERDOSE
Clinical Describe patients’ conditions during See Fig. 1  Chronic Disease
Notes hospitalization or during discharge.it Prediction [16] [17]
notes/manuscripts filled by contains prescriptions, reactions and  Medication information
physicians adverse conditions. extraction [18]
 disease phenotyping [19]
Lab tests and  Adverse events detection
Hospital Recording all laboratory 2160-0: Creatinine, blood [20]
mIoT Events database measurements based on 8248-7: Sperm test, urine  Adverse event prediction
their LOINC codes defined 2950-4: Sodium body [21] [22]
as defined in another table fluid, chemistry  Disease phenotyping [23]
Drugs codes Contains RXnorm codes  Medication information
Hospital for each drug in the hospital extraction
dataset 00597016203: Dulcolax [[24]
Diagnosis Online Data for ICD diagnoses 0880: Bartonellosis  Disease phenotyping
standard where each code corresponds 08881: Lyne disease [25]
codes dataset to a single diagnostic 08882: Babesiosis
concept 0914: Syphilis adenopathy
EHR events The friendly/adverse events related tables 27: Abdominal Assessment  Adverse Event detection
Hospital records all events like 41: Alarm activated [26]
data ventilator settings, MRI 81: Brachial pulse  Adverse Event prediction
records readings, laboratory values, code status, [27]
for each mental status, etc..  Patients phenotyping
event per
patient
Treatment Contains types of procedures 161: Removal of penetrating  Adverse event detection
as coded in the CPT foreign body from eye.  Adverse event prediction
procedures Standard standard. Each patient has 1522: Shortening procedure on
CPT to have specific procedures one extraocular muscle
coding performed on him in a separate
table

of their critical tasks This becomes worse to physicians who are notes can be analyzed using deep learning models to predict
mostly preoccupied by saving lives than proper recording, hence adverse events like heart attack, death, hospitalization length etc.
errors in EHR records can be produced at any point. For instance, However, these notes must be treated by a vectorization and a
statistics at the English National Health Service (NHS), showed feature representation algorithm before being fed to a deep
that about 20,000 adults were recorded in pediatric outpatient learning model.
services, similarly 17,000 men were admitted to obstetrical
C. Laboratory measurements and mIoT readings
services, and 8,000 men admitted to gynecology services [9].
EHR has a lab events table that is associated with lab
B. Clinical notes/manuscripts measurements for each patient. Each laboratory observation is
Perhaps the most rich but unstructured, vague and noisy of all linked to a lab item which is defined in another table containing
EHR data are the physicians/nurses’ clinical notes. The 2018 all the definitions for laboratory measurements. The definitions
national physician poll [10] showed that though physicians view contain the Logical Observation Identifiers Names and Codes
the EHR as necessary, they did not view it as a powerful clinical (LOINC) for lab measurements. For instance, the MIMIC III
tool but as a mere data storage tool, and surprisingly only half hospital dataset’s lab items table has 27,854,055 lab events
of them agreed that using an EHR detracts from their clinical associated to all 60,000 patients.
effectiveness. Moreover, the EHR does not provide a cognitive
D. Medication, diagnosis procedure and drugs codes
support design which is causing doctor to be reluctant in using
the EHR interfaces and the continuation of relying on their This EHR sections contain standard codes for diseases and
manuscript-based documentation to reduce clinical burnout. symptoms described by the International Classification of
Deep learning methods helps in transforming these manuscripts Diseases (ICD) [11] and diagnosis related groups codes DRG
into database readable formats. Fig. 1 shows Example of a (used for identifying billable items that the patient received)
clinical note as extracted from the MIMIC III database. Clinical [12]. Drugs are described by their RXnorm drugs classification
3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)

Fig 2. Basic Architecture of Auto Encoders

clinical notes. The more data we incorporate the more predicting


accuracy is achieved. The structure of lab tests come as text
flags with varying unit of measure hence combining them with
clinical notes which are raw texts without a standard becomes a
challenging task. Also, different EHR data do not contribute
Fig. 1. Example of a Clinical note as extracted from the MIMIC III
database. equally to illnesses that have to be predicted or detected. As an
codes [13], treatment procedures are described by their Current example, mental sickness might depend on narratives in a
Procedural Terminology (CPT) codes [14]. clinical note than on charted events as there might not be any
events associated with the patient, hence coming up with a
E. EHR events rationalized model that combines these data is a very complex
EHR also contains 21,146,926 rows of input events (ex; Heart task.
rate, Glucose levels etc...), 330,712,483 charted events,
B. Contexts of EHR data
4,349,218 Output Events, and many other events which records
whatever happens to a patient. Even with tools that help people to design customized
pipelines, challenges related to clinical data are hard to
IV. CHALLENGES FOR EHR MINING surmount. The big challenge comes from the nature of the data
and the kind of insights we want from it. Clinical experts are
EHR Feature engineering is moving away from the usual human beings who try to find solutions to intervention problems
expert-driven feature engineering to data-driven paradigms or in a causal point of view. In his study about causality and
the combination of both [28] for sophisticated clinical tasks like machine learning, Judea Pearl [31] argued that intervention
feature construction, risk factors identification and diseases questions cannot be answered from observational statistical
phenotyping. Hence analytic processes rely on the capability to information alone. He also argues that you cannot answer
find proper machine learning techniques for a distinct task. For counterfactual question using intervention information. In a
example, while Natural Language Processing (NLP) [29] will clinical example you cannot re-perform a trial on patients who
help in dissecting clinical insights hidden in a million clinical were treated with a drug to inspect how they would have
manuscripts it will be of little to no help in the understanding of behaved had they not been given the drug. Machine learning
an MRI brain scan. The following are key challenges that algorithms which are observational algorithms that use
analytical solutions must address to provide actionable clinical statistical data exhibits these fundamental impediments that
insights. make their applications to clinical questions to require
A. Complexity of EHR analytical tasks additional extra-statistical information.
Even the NLP which performs better in text-based sentiment C. Small labelled data set
analysis will hardly understand clinical narrative and terms used Perhaps, except the clinical notes and the patient’s charted
in the clinical notes, recorded by medical expert care staff. The events used to perform certain deep learning tasks, most of EHR
reason is that health care experts write these notes for individual data lacks labelled ground truth data. Even the model that is
or co-worker’s reference with no machine learning applications build gets hardly into implementation due to lack of acceptance.
to sight. Various toolkit that tailor the NLP for clinical texts The true outcome of a clinical event is a redundant operation
have been invented like CLAMP (Clinical Language that relies on abductive reasoning of a physician, hence deep
Annotation, Modeling, and Processing) [30] which is a popular learning gets stranded in this problem. As an example, you
NLP tool that helps clinical applications developers to quickly cannot find enough labelled cancer images that can be used to
build customized NLP pipelines. However, EHR tasks like train a CNN for future predictions. Perhaps the most appropriate
prediction of clinical events need amalgamation of structurally solution to the lack of labelled data seems to be the use of
diverging data like lab tests together with charted events and transfer learning. There are vastly available labelled data sets
4

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)

Fig 3. Basic Convolutional Neural Network

Fig 4. Basic RNN a 1 in the output vector represent a presence of a drug name in the input text

that have been trained for other tasks. these pre-trained models practice. These are Stochastic Gradient descent [34], RMS Prop
can be used to medical problems by only tailoring the last layer [35], Ada boost [36], Adam [37] etc. Deep learning is effective
of the neural network to the EHR problem in question. Authors than other machine learning algorithms as there is no need to
in [32] have used transfer learning on a pre-trained RNN model spend more efforts on feature engineering using a domain expert,
to establish phenotypes of various diseases. Another method is rather using raw data as the features can be learned by the
to use unsupervised CNN pre-training and perform a supervised system. However, as we will see in later sections, due to
fine-tuning. Authors in [33] have been able to use this method complexity of EHR data and special intolerance to errors,
to classify lung tissue in high resolution Computed Tomography feature representation and selection usually assisted by domain
(CT)data. expert might be a key to the success of a deep learning model.
In this section we are going to describe briefly popular deep
V. POPULAR DEEP EHR ALGORITHMS learning algorithms used with EHR. A complete reference of
Deep learning is a special branch of machine learning that these algorithms and their use with EHR can be found in Fig. 6.
utilize layered computational nodes with each node in each layer A. Sparse auto encoders
performing computation on inputs and its respective weight. A
This is an unsupervised representation learning mostly used
non-linearity function is applied to produce the node activation.
for the features engineering stage. They are used for non-linear
The overall Artificial Neural Network is built on updating the
dimensionality reduction and comes as a better alternative to
weights of each node to minimize the final cost associated with
other traditional dimensionality reduction techniques like
the deviation of output predictions from the ground truth labels.
principal component analysis (PCA) [38] and singular value
The neural network first initializes the parameters (weights and
decomposition (SVD) [39]. From Fig. 4, an auto encoder is used
bias) and use the forward propagation to calculate a cost then
to transform(encode) a much bigger vector into much smaller
the chain rule is used to perform back propagation for weights
data vector by taking the input x, encoding it to discover latent
updates. The process is referred to as gradient descent due to the
feature representation then decoding the latent feature
process of finding an optimal path to a minimum cost. Various
representation to reconstruct the input. Auto encoders are used
more advanced optimization algorithms that solves the basic
in applications that require features compression like finding
deep learning problem have been discovered and used in
5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)

Fig. 5. Basic Transfer learning. Weights learned from training a cat classifier are used to predict tumor malignancy from MRI images by only changing the last
layer and introducing weights for the last layer.

document similarity, feature reduction etc. Many variations of understand RNN in medical context let’s take a user who tweets
auto encoders have been used extensively. Convolutional auto about an Adverse Drug Event (ADE). an RNN can be used to
encoders are special types of autoencoders that do not use fully identify drug names present in the tweet in the process of
connected layers (each node in a layer connected with each node identifying the ADE from the tweets. In Fig. 4 taking the input
in next layer) rather using convolutional layers. tweets as a vector x we want to produce a vector y that contains
1 in a position that holds a drug name and 0 in a position that
B. Convolutional Neural Networks (CNN) [40]
holds any other word. as it can be seen from Fig., using an NLP
Convolutional Neural Networks are special algorithms that dictionary we can build a one hot encoding of each word present
perform extremely well in image classification problems. In the in the document and feed the resulting vectors to the RNN.
EHR context CNNs can yield good results in medical image
analysis like mammography, MRI images, CT scans etc. They D. Deep transfer learning: solving the small labelled dataset
can be used to detect and differentiate malignant cancer cells issue
with the benign cells from medical images. From Fig. 3, CNN One of the greatest impediments of machine learning to EHR
is composed of layers where each layer is composed of a data is to find enough labeled data for training. For instance, if
convolutional layer, a pooling layer and an activation to produce we are analyzing CT scans to find a malignant tumor, we may
input to the next layer. CNN are special architecture where each not find enough recorded events that can be used for training a
node from the previous layer is not connected to each node of deep learning model. Transfer learning is a deep learning
the next layer. Rathe each layer is composed of a filter(kernel) technique that takes intuition from the human learning which
or several filters that are applied to the input to produce uses knowledge gained from one problem to another problem.
intermediate values. The resulting next layer input is a sum of In a deep learning world, we can use the weights learned while
products of each input feature value with the filter. We say that modeling one problem to another problem. As an example, in
a filter is convolved with the input image. Each convolution Fig. 5, we can use a model that was trained on the cat and dog
stage defines certain attributes of the input such as lines curves dataset to MRI images that can classify if a brain tumor is
and edges. As an example, if a 256x256 image is input to a CNN malignant or benign.
the input layer will be 256x256x3 in size (with 3 representing
RGB channels). the convolutional layer will perform a dot
product between a receptive field and a kernel on all the VI. TECHNIQUES FOR EHR TASKS
dimensions of the input. To minimize the training time and A. Clinical adverse event detection
avoiding over fitting, the pooling layer reduces the One of the primary tasks of hospitals is to detect a clinical
dimensionality in the network by taking a maximum or average event in real time. All the causes of clinical events including
of a certain number of inputs cells. At each layer an output is medication, diagnosis, and adverse drug events etc. can be found
obtained by applying a non-linearity function usually Relu. A buried in longitudinal data in the EHR. Critical medical events
fully connected layer is added towards the end of the network can be conceived as negative changes in patient’s medical status.
followed by a SoftMax layer which produces the predictions. Authors in [45] have applied bi-directional Recurrent Neural
Various special types of CNNs have been produced and are Networks (RNN) on EHR to predict medical events. The
being used with EHR like Resnets [41], VGGNet16 [42], experiment used Sequence labeling techniques for extraction of
Inception [43] etc. medical events from unstructured text in EHR. The study in [46]
C. Recurrent Neural Networks (RNN) has tried to use EEC (electroencephalograms) signals from the
EHR and Deep Convolutional Neural Network (DCNN) [47] to
With some types of data in EHR like clinical notes, input data
detect Epileptic seizure. First the EEG signal features were
do not have the same length to be used with basic ANN. For
extracted using EMD algorithm [48] to decompose the EEG
instance, some medical applications can require processing vast
signals into oscillation instances with varying frequencies called
amount of text (like clinical notes, web based medical queries
the Intrinsic Mode Functions (IMFs). The next step was to feed
platforms etc.) to find keywords that are relevant to standard
the data to a Deep CNN for classifying the seizure into three
clinical entities like ICD codes and CPT codes. This application
classes of epilepsy; ictal (amid seizure), normal, and inter ictal
requires performing a Named Entity Recognition (NER) [44] as
(amongst seizures).
a primordial step to the understanding of the bulk text. To
6

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)

Fig. 6: Popular deep learning algorithms used with Electronic health records

data from the EHR. These skin lesion images were fed to an
EHR Use case: Dermatologist-level classification of skin
Inception V3 [50] Deep CNN which predicted if the subject in
cancer with deep neural networks [49]
the image is having malignant melanocytic lesion or benign
One of the big challenges for health-related detection and melanocytic lesion. this work leveraged the power of transfer
classification is the absence of enough labeled data. In Fig 7, learning by using the Inception V3 a special type of deep CNN
researchers combined data from open-access dermatology with reduced number of learned parameters. It achieves this
repositories, which were annotated by dermatologists as well as property by performing a factorization into smaller convolutions
7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)

Fig.7 Architecture of the skin cancer detection by adapting the Deep CNN Inception V3model.

Fig 8. use case scenario clinical notes are vectorized using Word2Vec skip gram model, then using labels obtained from patient history weather he died or not,
train a CNN model which can predict a near future death prediction using patient’s hospital notes.

through replacing a 5x5 filter with two 3x3 filers. This technique data types to predict an outcome. however clinical notes contain
helps in reducing the number of parameters to be learned hence rich amount of patient’s data than other sources. Though
shrinking the computational cost of the deep network. unstructured they can be a source of a big number of clinical
predictions. As deep learning models are applied only as
B. Clinical adverse event prediction
vectorized inputs, before applying learning deep learning, word
Clinical adverse Event Prediction sub task by a learning embedding algorithms like Word2Vec and Doc2Vec must
algorithm is to predict the onset of diseases a process that applied to produce word vectors that can be understood by the
predicts the probability that patients might develop certain learning algorithms.
diseases given their current clinical status. Specific objective is
to predict future events (hospitalization, suicide risk, heart Use case1: using EHR clinical notes and Convolutional
failure risk etc...) from longitudinally diverse events. For Neural networks (CNN)to predict death.
intelligent support system to provide patient centered support This sub-section serves as an intuition and use case of clinical
each aspect type of data would need a support system. Choi et notes generated at the point of care into predicting adverse
al. developed DoctorAI [51] a generic system that uses future event. An imminent patient’s death is a result of various
Recurrent Neural Networks (RNN) to predict clinical events via time series events manifested after admission into the hospital.
a system that performs multi label prediction using diagnoses, The unexplored clinical notes produced by physicians or nurses
medication categories and visit time of a patient. They were able contain a rich content in a form of text that requires critical
to use each patient visit to predict about diagnosis, medication analysis. The process of adverse event prediction is described in
order in the next visit as well as the time to next visit. Razavian Fig. 8. The task of deep learning is to aggregate many data with
et al. [52] were able to use longitudinal lab tests to perform early or without known outcome(labels), train a model which can
diagnosis of diseases for people who do not yet have the disease. predict an outcome for new scenarios. As clinical notes cannot
Miotto et al. proposed DeepPatient [53] a system that be directly analyzed by the deep learning model, they are
leverage raw patient data from EHR like medication, diagnoses, vectorized by a Word2vec or Doc2Vec word embedding models
procedures, lab tests by applying them to unsupervised deep that use skip gram to vectorize textual information. However,
feature learning algorithm to produce patient representations clinical notes contain ambiguous terms as well as important
that will be applied to perform more advanced clinical tasks like terms that are related to a certain disease phenotype (as an
personalized prescription, drug targeting, clinical trial example we expect a clinical note written for a patient suffering
recruitment, detecting patient similarity etc. Prediction of future a heart attack to contain terms like chest pain, discomfort,
clinical events can be achieved by modeling the EHR record as shortness of breath, lightheartedness etc.). Hence, before
longitudinal event matrix, with the horizontal dimension vectorization we must dissect the content of the clinical notes
corresponding to the time stamps and vertical dimension using standard ontologies for medical terminologies like the
corresponding to the event values and applying non-standard (SNOMED CT) [55] or the Unified Medical Language System
CNN [54]. Many prediction algorithms leverage various EHR (UMLS) [56] [62]. after extracting these words that are related

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)

Fig. 9. Jointly embedding ICD9 codes with clinical notes in a unified vector space to establish diseases phenotype and predicting future visits.

to patient phenotype the notes can now be fed as input features utilized extensively in phenotyping of many diseases. One
to a Word2Vec model for vectorization. These resulting vectors popular project is the cTAKES (clinical Text Analysis and
can now constitute labeled training data for Convolutional Knowledge Extraction System) developed at Mayo [61].
Neural Network which can predict the death probability (labels cTAKES is an openNLP toolkit that can be used to extract
are obtained in the end status of a patient whether he died, or he clinical meaning from many clinical notes. It produces named
was discharged). The CNN is composed of convolutional layer, entities from each word in the clinical note and check its
pooling layer, fully connected layer and output SoftMax layer. meaning from the UMLS through its concept unique identifier
(CUI).
Use Case2: using charted clinical events to predict medical
adverse future events Use Case 1: creating clinical phenotypes using multi-layer
Predicting the length of stay and readmission probability perceptron Deep Neural Network (RNN) on EHR data [63]
helps in improving quality of care as well as the potential to Arguing with the precision of the International Classification
decrease unnecessary healthcare costs. However, being able to of Diseases (ICD) codes that establish medical codes and
aggregate all the patient’s data and decide on which one that can associated phenotypes, Rashidian et al. used lab results, patients’
have more weight in an intended prediction is highly an iterative demographics, as well as medication data to establish a more
process. Various machine learning, and statistical models have trustworthy coding scheme using deep learning For ensuring the
been deployed to predict death risks for hospitalized patients. credibility of these codes, they partnered with medical experts
Medical charted events like ventilator settings, mIoT device’s who verified the trustworthiness of the model codes vis-a-vis
alarms, laboratory values, heart rate, MRI readings, code status, the accepted ICD9 codes. Their model was found to provide
mental status, and so on, can be used to predict patient’s risk of extensive and precise phenotypes than those described in the
imminent death or hospitalization period. For instance, a patient ICD9 standard.
in the MIMIC III database who was admitted with hemorrhagic
Use case 2: embedding medical concepts and words into a
CVA (Cerebrovascular accident) hospitalized for 5 days
unified vector space [64]
recorded among others a total of 9172 charted events,68
prescriptions, and 12 microbiology events. These records Most of the studies who tried to leverage EHR data for
contain a potential source of data for prediction. Because all the patient’s phenotyping used the embedding of medical codes like
outcomes are known (Death or discharge), if we consider each the ICD9 and fed the resulting vectors to a neural network to
patient and build a representative vector that accommodates all establish diseases phenotypes or to predict a clinical adverse
these events we can train a deep neural network that can predict event [65] [66] [67] [68]. Other approaches have tried to embed
the patient’s outcome. Esteban et al. used Recurrent Neural the extracted medical codes and accompanying words
Networks (RNNs) and static information like patient gender, separately. This approach can have its drawbacks as the words
blood type, etc. and dynamic information like clinical charted will lose their medical contexts. Rather than using the normal
events to predict future adverse events [57]. skip gram where the context words of the current word are
established by calculating the probability of each neighboring
C. EHR-driven phenotyping word being a context word, Bai et al. used a Joint Skip-gram
Clinical phenotyping is a process of establishing diseases approach to jointly embed the medical codes and words from
characteristics. This process is performed by expert opinions clinical notes. it is done by representing each patient visit by a
and many years of researches which have already established pair made of diagnosis codes and words from clinical notes (D,
phenotypes of each disease. However, with the diversification N) where D={C1,C2,C3...} and N={N1,N2,N3. ). With the
and polymorphism of existing diseases coupled with individual MIMIC III data set 54,965 such pairs have been obtained. The
genetic variations, there is high need to find other methods to Joint Skip gram was used to define the context of the diagnosis
establish disease phenotype as well as individual patients’ code in question with also other codes in the same visit, as well
phenotype using huge data stored in EHRs. Many studies have as all words in the clinical note. To aggregate data for the model,
used methods that include a mix of clinical expert opinions and for each patient visit, all diagnosis codes and all clinical notes
automated methods. A. Neuraz et al. [58] have developed a were extracted. As shown in Fig. 9 Stochastic gradient
method that used the frequency and TF-IDF [59] to establish the algorithm with negative sampling was used as an optimization
relationship between clinical phenotype and rare diseases. To algorithm to predict ICD-9 codes associated with future visit as
access the performance of deep learning methods to well as establishing diseases phenotypes.
phenotyping tasks, Gehrmann et al. [60] have thoroughly com-
D. Patient’s features representation
pared the results of CNNs with those obtained from concept ex-
traction-based methods using clinical narratives and those from To perform an adverse clinical event prediction or any other
n-gram based models. Concept extraction is a popular method EHR task a precise patient representation and stratification is

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)

Fig.10. Patient’s clinical features representation and selection overview.

very paramount. It is highly erroneous to feed the EHR raw data issues addressed in the study are that representations obtained
to a deep network to perform clinical tasks like predictions, from RNN are difficult to interpret and difficult to scale with
clinical trial recruitment or disease detection because of the high high dimensional EHR data. Moreover, these representations
heterogeneity and sparsity of the EHR’s raw data. Hence, before fall short of critical information that is embedded in the patient’s
performing these clinical support tasks with deep networks a demographic information. The authors adapted the usual
feature learning framework which can represent the patient’s embedding skip-gram model to medical concepts. The first step
features with less information overlap has to be constructed of the solution is to represent a patient’s visit as a unified vector
from the vastly heterogeneous EHR data. Various models have consisting of codes(diagnosis, subscriptions, etc..).using these
represented patients in a form of a 2D vector with patients on codes as inputs, ReLu activation was applied to obtain an
one dimension and amalgamation of each patient’s records intermediate vector which was then combined with patient’s
(ICD9 diagnosis, lab tests, clinical notes content...) in another demographic information to produce Vt an intermediate visit
dimension. A common approach is to have a clinical domain vector and use it to train a SoftMax classifier that is able to
expert manually annotate the patterns to look for including the predicts the medical codes of other visits within a context
clinical features and the targets of the learning scheme. However, window.
annotating features using a domain expert in an ad hoc manner
Use case 3: Deep features learning from EHR raw data
is tiresome and imprecise. Recently unsupervised deep learning
has revolutionized the process of feature learning and selection. The most effective method is adopted by authors in [75] [76].
Authors in [69] have use unsupervised learning for feature In these researches, authors argue that supervised feature
selection. First the EHR raw data was divided into continuous learning lacks an ability to fully grasp novel patterns and
features and categorical features. Continuous features were first features. They propose a data driven approach to automatically
changed into represented features using stacked auto encoders identify patterns and dependencies in the data without the need
and combined with categorical features then SVM was applied of a domain expert to annotate the features. Fig. 10 is a blueprint
for features selection. The resulting features were fed to a model of this novel approach. The first step is to extract patient data
which can predict the amount of LVMI (Left Ventricular Mass (medication, diagnoses, lab tests, clinical notes ...). The data is
Index) a common indicator of heart damage risk. pre-processed including appropriate embedding of clinical notes,
The most challenging hustle for deep leaning models is the then each patient is represented as a single vector. The next step
small size of the input data set. This creates a natural is the dimensionality reduction stage which consists of feature
incompatibility of EHR with deep learning models because representation and selection using stacked Denoising Auto-
when small data sets are directly fed to a deep network it leads encoders (DAE) which are unsupervised learning neural
to overfitting. One approach is to fuse deep features (obtained networks that can generate their own labels from the training
by using a deep network) with traditional features like texture data. The SDAEs are used to transform these patient vectors into
feature, color moment obtained by traditional methods like more representative descriptors which can be input of another
Haarlick [70] method. The study in [71] used lung tumor images deep learning prediction model. The last stage is the use of
and transfer learning techniques using 3 existing CNN models supervised learning to perform various clinical support tasks
that were pre-trained on ImageNet public data set [72] and like diagnosis proposition, adverse event prediction, clinical
combined obtained features with traditional features to predict trial recruitment etc. As an example, these features can be used
Survival among Patients with lung adenocarcinoma. Authors in together with risk factors (like death or ECG readings) to train
[73] used a CNN based Coding Network for medical image a supervised model which can predict adverse events.
classification using deep features obtained with convolutional E. Medication information extraction
neural network and some selected traditional features obtained Medication information is an important area of biomedical
with a solid background knowledge of medical images like color research as this information extraction contributes greatly to
histogram, color moment and texture features. pharmacovigilance, adverse events’ detection, bio curation
Choi et al. proposed Med2Vec [74] a patient representation assistance, integrative biology etc. Through much of the
that learns from medical codes associated with a clinical visit to information can be extracted from social fabric like social
predicts codes that are likely to characterize the next visits. The
10

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)

Fig.11. Potential medical information hidden from social acquaintances and their possible contribution to EHR

networks, EHR contains also much of the immediate medication SNS into EHR argue that these data should be supplemental not
information. However, the process of mining this information overriding other EHR data like charted events, lab events, lab
from the EHR can be a lengthy tiresome process as the data is tests, etc. Fig. 11 shows a patient message to his acquaintances
hidden deep in EHR’s clinical narratives, patient’s encounters, and possible EHR tasks that can leverage these types of
ICU discharges, and charted events. The task of a computerized messages.
Adverse drug event recognition involves 3 main tasks which are
the Named Entity Recognition (NER) a process of detecting key VII. CONCLUSION
drug mentions, identifying these named events a process of We have given insights and technical intuitions of how to
identifying the context of these mentions, and finding leverage the EHR data using deep learning approaches. We
relationship between them. The medication information unraveled the technical side of various efforts that have been
extraction system aims to establish the medications names, and invested to apply deep learning models for clinical knowledge
their signatures like dosages, duration, prescription reasons, discovery using electronic Health Records vast data sets.
complications, frequencies, route of administering, and any Despite clear success of deep learning for other hospital’s tasks
other information deemed necessary by the prescribing entity. like billing and patient management there is still much to do in
Early use cases include MedEx [77] a system that automatically the application of EHR data with deep learning methods.
extract medication names and their signatures from clinical Available successes in this domain still depend on a supervision
narratives using NLP. Authors of MedEx argue that usual text of a medical domain expert. More research needs to be done to
parsing methods like regular expression cannot apply in bring AI and deep leaning on the patient’s bedside. Unlike other
medication information extraction as they fall short of deep learning applications, the medical field is challenged by
contextual information out of clinical narratives. MedEx uses a the structure of the data itself and the acceptance of the models
semantic-based approach with a much finer granularity. by the medical community. Even if the model might be working
F. Integrating EHR, social networks and web data from a computing point of view its adoption will be hindered by
the reluctance by clinicians who still exercise their professions
It is most likely that a patient shares adverse drug event within
using abductive reasoning. Though deep learning algorithms
social acquaintances than with his physician. With the ex-
perform better even with little or no feature engineering,
plosion of social networks, there is huge, untapped medical
considering high risk factors associated with EHR tasks,
insights which can be used together with hospital’s EHR for
coupled with high longitudinality, sparsity, and noisiness of
clinical support systems. Though the medical research
EHR data there is a requirement to perform a thorough patient
community agrees that social networks should be part of the
representation that consists of appropriate patient’s feature
EHR, the modalities of how to go about it remain a highly
selection and representation before a predictive deep learning
debated subject. The concerns of this reluctance are high noise
model.
due to spelling errors, imprecise descriptions, and ambiguous or
casual use of medical terms. Some clinical tasks may even
depend on social data than more formal EHR data. For example, REFERENCES
recent researches have shown that these social network services
can hold data related to pharmacovigilance and medication [1] Hoerbst and E. Ammenwerth, “Electronic health records,” Methods of
adherence than EHR because a big number of patients might not information in medicine, vol. 49, no. 04, pp. 320–336, 2010.
return to hospital to narrate the drug reactions unless there is an [2] V. Dimitrov, “Medical internet of things and big data in healthcare,”
Healthcare informatics research, vol. 22, no. 3, pp. 156–163, 2016.
acute condition that resulted into taking the drug. Recently deep [3] T. Ching et al., “Opportunities and obstacles for deep learning in biology
learning models have been applied to SNS data to contribute to and medicine,” Journal of The Royal Society Interface, vol. 15, no. 141,
various clinical tasks [78] [79] [80] [81] [82]. Integration of p. 20170387, 2018.
social media in the clinical care pipeline helps patients to [4] R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley, “Deep learning
for healthcare: review, opportunities and challenges,” Briefings in
participate in self-care, health promotion, and disease bioinformatics, vol. 19, no. 6, pp. 1236–1246, 2017.
prevention efforts by the public. Ideas on how to integrate the [5] Shickel B, Tighe PJ, Bihorac A, Deep RP. EHR: a sur- vey of recent
11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)
advances in deep learning techniques for electronic health record (EHR) [32] P. Gupta, P. Malhotra, L. Vig, and G. Shroff, “Transfer Learning for
analysis. IEEE journal of biomedical and health Clinical Time Series Analysis using Recurrent Neural Networks,” arXiv
informatics;2018(22):1589–1604. preprint arXiv:1807.01705, 2018.
[6] Ravı D. Wong C, Deligianni F, Berthelot M, Andreu- Perez J, Lo B, Yang [33] T. Schlegl, J. Ofner, and G. Langs, “Unsupervised pre-training across
G-Z. Deep learning for health informatics. IEEE journal of biomedical and image domains improves lung tissue classification,” in International
health informatics;2017(21):4–21. MICCAI Workshop on Medical Computer Vision, 2014, pp. 82–93.
[7] Xiao, E. Choi, and J. Sun, “Opportunities and challenges in developing [34] T. Schlegl, J. Ofner, and G. Langs, “Unsupervised pre-training across
deep learning models using electronic health records data: a systematic image domains improves lung tissue classification,” in International
review,” Journal of the American Medical Informatics Association, vol. MICCAI Workshop on Medical Computer Vision, 2014, pp. 82–93.
25, no. 10, pp. 1419–1428, 2018. [35] T. Schlegl, J. Ofner, and G. Langs, “Unsupervised pre-training across
[8] Kalinin et al., “Deep learning in pharmacogenomics: from gene regulation image domains improves lung tissue classification,” in International
to patient stratification,” Pharmacogenomics, vol. 19, no. 7, pp. 629–650, MICCAI Workshop on Medical Computer Vision, 2014, pp. 82–93.
2018. [36] R. E. Schapire, “Explaining adaboost,” in Empirical inference, Springer,
[9] W. R. Hersh et al., “Caveats for the use of operational electronic health 2013, pp. 37–52.
record data in comparative effectiveness research,” Medical care, vol. 51, [37] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
no. 8 0 3, p. S30, 2013. arXiv preprint arXiv:1412.6980, 2014.
[10] 2018. Available from: https://med.stanford.edu/content/ [38] I. Jolliffe, Principal component analysis. Springer, 2011.
dam/sm/ehr/documents/EHR-Poll-Presentation.pdf. [39] G. H. Golub and C. Reinsch, “Singular value decomposition and least
[11] 2018. Available from: https://www.who.int/ squares solutions,” in Linear Algebra, Springer, 1971, pp. 134–151.
classifications/icd/factsheet/en/. [40] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
[12] J. A. Mistichelli, Diagnosis related groups (DRGs) and the prospective with deep convolutional neural networks,” in Advances in neural
payment system: forecasting social implications. Kennedy Institute of information processing systems, 2012, pp. 1097–1105.
Ethics, Georgetown Univ., 1984. [41] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
[13] S. Liu, W. Ma, R. Moore, V. Ganesan, and S. Nelson, “RxNorm: recognition,” in Proceedings of the IEEE conference on computer vision
prescription for electronic drug information exchange,” IT professional, and pattern recognition, 2016, pp. 770–778.
vol. 7, no. 5, pp. 17–23, 2005. [42] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
[14] J. A. Hirsch et al., “Current procedural terminology; a primer,” Journal large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
of neurointerventional surgery, vol. 7, no. 4, pp. 309–312, 2015. [43] C. Szegedy et al., “Going deeper with convolutions,” in Proceedings of
[15] K. E. McBride et al., “Impact of serious mental illness on surgical patient the IEEE conference on computer vision and pattern recognition, 2015,
outcomes,” ANZ journal of surgery, vol. 88, no. 7–8, pp. 673–677, 2018. pp. 1–9.
[16] Feller DJ, Zucker J, Yin MT, Gordon P. Elhadad N. Using clinical notes [44] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer,
and natural language processing for au- tomated HIV risk assessment. “Neural architectures for named entity recognition,” arXiv preprint
JAIDS Journal of Acquired Immune Deficiency Syndromes;2(160–166). arXiv:1603.01360, 2016.
[17] J. Liu, Z. Zhang, and N. Razavian, “Deep ehr: Chronic disease prediction [45] A. N. Jagannatha and H. Yu, “Bidirectional RNN for medical event
using medical notes,” arXiv preprint arXiv:1808.04928, 2018. detection in electronic health records,” in Proceedings of the conference.
[18] A. Jagannatha, F. Liu, W. Liu, and H. Yu, “Overview of the First Natural Association for Computational Linguistics. North American Chapter.
Language Processing Challenge for Extracting Medication, Indication, Meeting, 2016, vol. 2016, p. 473.
and Adverse Drug Events from Electronic Health Record Notes (MADE [46] H. G. Daoud, A. M. Abdelhameed, and M. Bayoumi, “Automatic
1.0),” Drug safety, pp. 1–13, 2018. epileptic seizure detection based on empirical mode decomposition and
[19] B. S. Glicksberg et al., “Automated disease cohort selection using word deep neural network,” in 2018 IEEE 14th International Colloquium on
embeddings from Electronic Health Records,” in Pac Symp Biocomput, Signal Processing & Its Applications (CSPA), 2018, pp. 182–186.
2018, vol. 23, pp. 145–56. [47] T. N. Sainath, A. Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep
[20] S. Liu et al., “Correlating lab test results in clinical notes with structured convolutional neural networks for LVCSR,” in 2013 IEEE international
lab data: A case study in hba1c and glucose,” AMIA Summits on conference on acoustics, speech and signal processing, 2013, pp. 8614–
Translational Science Proceedings, vol. 2017, p. 221, 2017. 8618.
[21] S. P. Mohanty, D. P. Hughes, and M. Salathé, “Using deep learning for [48] N. E. Huang et al., “The empirical mode decomposition and the Hilbert
image-based plant disease detection,” Frontiers in plant science, vol. 7, spectrum for nonlinear and non-stationary time series analysis,”
p. 1419, 2016. Proceedings of the Royal Society of London. Series A: Mathematical,
[22] A. Passantino, F. Monitillo, M. Iacoviello, and D. Scrutinio, “Predicting Physical and Engineering Sciences, vol. 454, no. 1971, pp. 903–995,
mortality in patients with acute heart failure: Role of risk scores,” World 1998.
journal of cardiology, vol. 7, no. 12, p. 902, 2015. [49] A. Esteva et al., “Dermatologist-level classification of skin cancer with
[23] T. A. Lasko, J. C. Denny, and M. A. Levy, “Computational phenotype deep neural networks,” Nature, vol. 542, no. 7639, p. 115, 2017.
discovery using unsupervised feature learning over noisy, sparse, and [50] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
irregular clinical data,” PloS one, vol. 8, no. 6, p. e66341, 2013. “Rethinking the inception architecture for computer vision,” in
[24] S. Lim, K. Lee, and J. Kang, “Drug drug interaction extraction from the Proceedings of the IEEE conference on computer vision and pattern
literature using a recursive neural network,” PloS one, vol. 13, no. 1, p. recognition, 2016, pp. 2818–2826.
e0190926, 2018. [51] E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart, and J. Sun, “Doctor
[25] J. A. Sinnott et al., “PheProb: probabilistic phenotyping using diagnosis ai: Predicting clinical events via recurrent neural networks,” arXiv
codes to improve power for genetic association studies,” Journal of the preprint arXiv:1511.05942, 2015.
American Medical Informatics Association, vol. 25, no. 10, pp. 1359– [52] N. Razavian, J. Marcus, and D. Sontag, “Multi-task prediction of disease
1365, 2018. onsets from longitudinal laboratory tests,” in Machine Learning for
[26] R. C. Zink, “Detecting safety signals among adverse events in clinical Healthcare Conference, 2016, pp. 73–100.
trials,” in Biopharmaceutical Applied Statistics Symposium, 2018, pp. [53] R. Miotto, L. Li, B. A. Kidd, and J. T. Dudley, “Deep patient: an
107–125. unsupervised representation to predict the future of patients from the
[27] J. C. Lauffenburger et al., “Predicting adherence to chronic disease electronic health records,” Scientific reports, vol. 6, p. 26094, 2016.
medications in patients with long-term initial medication fills using [54] Y. Cheng, F. Wang, P. Zhang, and J. Hu, “Risk prediction with electronic
indicators of clinical events and health behaviors,” Journal of managed health records: A deep learning approach,” in Proceedings of the 2016
care & specialty pharmacy, vol. 24, no. 5, pp. 469–477, 2018. SIAM International Conference on Data Mining, 2016, pp. 432–440.
[28] J. Sun et al., “Combining knowledge and data driven insights for [55] K. A. Spackman, K. E. Campbell, and R. A. Côté, “SNOMED RT: a
identifying risk factors using electronic health records,” in AMIA Annual reference terminology for health care.,” in Proceedings of the AMIA
Symposium Proceedings, 2012, vol. 2012, p. 901. annual fall symposium, 1997, p. 640.
[29] Chowdhury GG. Natural language processing. Annual review of [56] O. Bodenreider, “The unified medical language system (UMLS):
information science and technology. 2003;1(51–89).” integrating biomedical terminology,” Nucleic acids research, vol. 32, no.
[30] E. Soysal et al., “CLAMP–a toolkit for efficiently building customized suppl_1, pp. D267–D270, 2004.
clinical natural language processing pipelines,” Journal of the American [57] C. Esteban, O. Staeck, S. Baier, Y. Yang, and V. Tresp, “Predicting
Medical Informatics Association, vol. 25, no. 3, pp. 331–336, 2017. clinical events by combining static and dynamic information using
[31] J. Pearl, “Theoretical impediments to machine learning with seven sparks recurrent neural networks,” in 2016 IEEE International Conference on
from the causal revolution,” arXiv preprint arXiv:1801.04016, 2018. Healthcare Informatics (ICHI), 2016, pp. 93–101.

12

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)
[58] N. Garcelon et al., “Next generation phenotyping using narrative reports [81] J. Xie, D. D. Zeng, and Z. A. Marcum, Using deep learning to improve
in a rare disease clinical data warehouse,” Orphanet journal of rare medication safety: the untapped potential of social media. SAGE
diseases, vol. 13, no. 1, p. 85, 2018. Publications Sage UK: London, England, 2017.
[59] A. Rajaraman and J. D. Ullman, Mining of massive datasets. Cambridge [82] S. Chowdhury, C. Zhang, and P. S. Yu, “Multi-task pharmacovigilance
University Press, 2011. mining from social media posts,” arXiv preprint arXiv:1801.06294, 2018.
[60] S. Gehrmann et al., “Comparing deep learning and concept extraction- [83] T. Huynh, Y. He, A. Willis, and S. Rüger, “Adverse drug reaction
based methods for patient phenotyping from clinical narratives,” PloS classification with deep neural networks,” 2016.
one, vol. 13, no. 2, p. e0192360, 2018. [84] A. Akselrod-Ballin, L. Karlinsky, S. Alpert, S. Hasoul, R. Ben-Ari, and
[61] G. K. Savova et al., “Mayo clinical Text Analysis and Knowledge E. Barkan, “A region based convolutional network for tumor detection
Extraction System (cTAKES): architecture, component evaluation and and classification in breast mammography,” in Deep Learning and Data
applications,” Journal of the American Medical Informatics Association, Labeling for Medical Applications, Springer, 2016, pp. 197–205.
vol. 17, no. 5, pp. 507–513, 2010. [85] V. K. Singh et al., “Conditional Generative Adversarial and
[62] B. L. Humphreys and D. A. Lindberg, “The UMLS project: making the Convolutional Networks for X-ray Breast Mass Segmentation and Shape
conceptual connection between users and the information they need.,” Classification,” in International Conference on Medical Image
Bulletin of the Medical Library Association, vol. 81, no. 2, p. 170, 1993. Computing and Computer-Assisted Intervention, 2018, pp. 833–840.
[63] S. Rashidian et al., “Disease phenotyping using deep learning: A diabetes [86] W. Zhu, X. Xiang, T. D. Tran, G. D. Hager, and X. Xie, “Adversarial
case study,” arXiv preprint arXiv:1811.11818, 2018. deep structured nets for mass segmentation from mammograms,” in 2018
[64] T. Bai, A. K. Chanda, B. L. Egleston, and S. Vucetic, “EHR phenotyping IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018),
via jointly embedding medical concepts and words into a unified vector 2018, pp. 847–850.
space,” BMC medical informatics and decision making, vol. 18, no. 4, p. [87] B. B. Ahn, “The Compact 3D Convolutional Neural Network for
123, 2018. Medical Images,” Standford University, 2017.
[65] E. Choi, A. Schuetz, W. F. Stewart, and J. Sun, “Using recurrent neural [88] M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha, and V. K. Asari,
network models for early detection of heart failure onset,” Journal of the “Recurrent residual convolutional neural network based on u-net (r2u-
American Medical Informatics Association, vol. 24, no. 2, pp. 361–370, net) for medical image segmentation,” arXiv preprint arXiv:1802.06955,
2016. 2018.
[66] Y. Choi, C. Y.-I. Chiu, and D. Sontag, “Learning low-dimensional [89] C. Qin, J. Schlemper, J. Caballero, A. N. Price, J. V. Hajnal, and D.
representations of medical concepts,” AMIA Summits on Translational Rueckert, “Convolutional recurrent neural networks for dynamic MR
Science Proceedings, vol. 2016, p. 41, 2016. image reconstruction,” IEEE transactions on medical imaging, vol. 38,
[67] T. Pham, T. Tran, D. Phung, and S. Venkatesh, “Predicting healthcare no. 1, pp. 280–290, 2019.
trajectories from medical records: A deep learning approach,” Journal of [90] R. P. Poudel, P. Lamata, and G. Montana, “Recurrent fully convolutional
biomedical informatics, vol. 69, pp. 218–229, 2017. neural networks for multi-slice MRI cardiac segmentation,” in
[68] A. Perotte, R. Ranganath, J. S. Hirsch, D. Blei, and N. Elhadad, “Risk Reconstruction, segmentation, and analysis of medical images, Springer,
prediction for chronic kidney disease progression using heterogeneous 2016, pp. 83–94.
electronic health record data and time series analysis,” Journal of the [91] M. Turan, Y. Almalioglu, H. Araujo, E. Konukoglu, and M. Sitti, “Deep
American Medical Informatics Association, vol. 22, no. 4, pp. 872–880, endovo: A recurrent convolutional neural network (rcnn) based visual
2015. odometry approach for endoscopic capsule robots,” Neurocomputing,
[69] M. Z. Nezhad, D. Zhu, X. Li, K. Yang, and P. Levy, “Safs: A deep feature vol. 275, pp. 1861–1870, 2018.
selection approach for precision medicine,” in 2016 IEEE International [92] M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha, and V. K. Asari,
Conference on Bioinformatics and Biomedicine (BIBM), 2016, pp. 501– “Recurrent residual convolutional neural network based on u-net (r2u-
506. net) for medical image segmentation,” arXiv preprint arXiv:1802.06955,
[70] R. M. Haralick and K. Shanmugam, “Textural features for image 2018
classification,” IEEE Transactions on systems, man, and cybernetics, no. [93] E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart, and J. Sun, “Doctor
6, pp. 610–621, 1973. ai: Predicting clinical events via recurrent neural networks,” arXiv
[71] R. Paul et al., “Deep feature transfer learning in combination with preprint arXiv:1511.05942, 2015.
traditional features predicts survival among patients with lung [94] E. Choi, A. Schuetz, W. F. Stewart, and J. Sun, “Using recurrent neural
adenocarcinoma,” Tomography, vol. 2, no. 4, p. 388, 2016. network models for early detection of heart failure onset,” Journal of the
[72] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: American Medical Informatics Association, vol. 24, no. 2, pp. 361–370,
A large-scale hierarchical image database,” 2009. 2016.
[73] Z. Lai and H. Deng, “Medical Image Classification Based on Deep [95] M. Golmohammadi et al., “Gated recurrent networks for seizure
Features Extracted by Deep Model and Statistic Feature Fusion with detection,” in 2017 IEEE Signal Processing in Medicine and Biology
Multilayer Perceptron,” Computational intelligence and neuroscience, Symposium (SPMB), 2017, pp. 1–5.
vol. 2018, 2018. [96] E. Choi, A. Schuetz, W. F. Stewart, and J. Sun, “Medical concept
[74] E. Choi et al., “Multi-layer representation learning for medical concepts,” representation learning from electronic health records and its application
in Proceedings of the 22nd ACM SIGKDD International Conference on on heart failure prediction,” arXiv preprint arXiv:1602.03686, 2016.
Knowledge Discovery and Data Mining, 2016, pp. 1495–1504. [97] Z. Che, S. Purushotham, R. Khemani, and Y. Liu, “Distilling knowledge
[75] M. Z. Nezhad, D. Zhu, N. Sadati, and K. Yang, “A predictive approach from deep networks with applications to healthcare domain,” arXiv
using deep feature learning for electronic medical records: A preprint arXiv:1512.03542, 2015.
comparative study,” arXiv preprint arXiv:1801.02961, 2018. [98] M. Puri, Y. Pathak, V. K. Sutariya, S. Tipparaju, and W. Moreno,
[76] T. A. Lasko, J. C. Denny, and M. A. Levy, “Computational phenotype Artificial neural network for drug design, delivery and disposition.
discovery using unsupervised feature learning over noisy, sparse, and Academic Press, 2015.
irregular clinical data,” PloS one, vol. 8, no. 6, p. e66341, 2013. [99] T. T. Erguzel, S. Ozekes, O. Tan, and S. Gultekin, “Feature selection and
[77] H. Xu, S. P. Stenner, S. Doan, K. B. Johnson, L. R. Waitman, and J. C. classification of electroencephalographic signals: an artificial neural
Denny, “MedEx: a medication information extraction system for clinical network and genetic algorithm based approach,” Clinical EEG and
narratives,” Journal of the American Medical Informatics Association, neuroscience, vol. 46, no. 4, pp. 321–326, 2015.
vol. 17, no. 1, pp. 19–24, 2010. [100] J. Guo et al., “A Stacked Sparse Autoencoder-Based Detector for
[78] A. Cocos, A. G. Fiks, and A. J. Masino, “Deep learning for Automatic Identification of Neuromagnetic High Frequency Oscillations
pharmacovigilance: recurrent neural network architectures for labeling in Epilepsy,” IEEE transactions on medical imaging, vol. 37, no. 11, pp.
adverse drug reactions in Twitter posts,” Journal of the American 2474–2482, 2018.
Medical Informatics Association, vol. 24, no. 4, pp. 813–821, 2017. [101] P. Cerveri, A. Belfatto, G. Baroni, and A. Manzotti, “Stacked sparse
[79] A. Nikfarjam, A. Sarker, K. O’connor, R. Ginn, and G. Gonzalez, autoencoder networks and statistical shape models for automatic staging
“Pharmacovigilance from social media: mining adverse drug reaction of distal femur trochlear dysplasia,” The International Journal of
mentions using sequence labeling with word embedding cluster features,” Medical Robotics and Computer Assisted Surgery, vol. 14, no. 6, p.
Journal of the American Medical Informatics Association, vol. 22, no. 3, e1947, 2018.
pp. 671–681, 2015. [102] Y. Qiu, W. Zhou, N. Yu, and P. Du, “Denoising Sparse Autoencoder-
[80] L. Xia, G. A. Wang, and W. Fan, “A deep learning based named entity Based Ictal EEG Classification,” IEEE Transactions on Neural Systems
recognition approach for adverse drug events identification and and Rehabilitation Engineering, vol. 26, no. 9, pp. 1717–1726, 2018.
extraction in health social media,” in International Conference on Smart [103] Z. Alhassan, D. Budgen, R. Alshammari, T. Daghstani, A. S. McGough,
Health, 2017, pp. 237–248. and N. Al Moubayed, “Stacked Denoising Autoencoders for Mortality
Risk Prediction Using Imbalanced Clinical Data,” in 2018 17th IEEE

13

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2928363, IEEE
Access

>REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT)
International Conference on Machine Learning and Applications
(ICMLA), 2018, pp. 541–546.
[104] T. Katsuki et al., “Feature Extraction from Electronic Health Records of
Diabetic Nephropathy Patients with Convolutioinal Autoencoder,” in
Workshops at the Thirty-Second AAAI Conference on Artificial
Intelligence, 2018.
[105] S. Dubois, N. Romano, K. Jung, N. Shah, and D. C. Kale, “The
Effectiveness of Transfer Learning in Electronic Health Records Data,”
2017.
[106] H. Suresh, P. Szolovits, and M. Ghassemi, “The use of autoencoders for
discovering patient phenotypes,” arXiv preprint arXiv:1703.07004, 2017.
[107] T. Katsuki et al., “Feature Extraction from Electronic Health Records of
Diabetic Nephropathy Patients with Convolutioinal Autoencoder,” in
Workshops at the Thirty-Second AAAI Conference on Artificial
Intelligence, 2018.
[108] M. Khademi and N. S. Nedialkov, “Probabilistic graphical models and
deep belief networks for prognosis of breast cancer,” in 2015 IEEE 14th
International Conference on Machine Learning and Applications
(ICMLA), 2015, pp. 727–732.
[109] A. M. Abdel-Zaher and A. M. Eldeib, “Breast cancer classification using
deep belief networks,” Expert Systems with Applications, vol. 46, pp.
139–144, 2016.
[110] T. Tran, T. D. Nguyen, D. Phung, and S. Venkatesh, “Learning vector
representation of medical objects via EMR-driven nonnegative restricted
Boltzmann machines (eNRBM),” Journal of biomedical informatics, vol.
54, pp. 96–105, 2015.
[111] K. H. Hoang and T. B. Ho, “Learning Treatment Regimens from
Electronic Medical Records,” in Pacific-Asia Conference on Knowledge
Discovery and Data Mining, 2018, pp. 411–422.
[112] P. M. Shakeel, S. Baskar, V. S. Dhulipala, S. Mishra, and M. M. Jaber,
“Maintaining security and privacy in health care system using learning
based deep-Q-networks,” Journal of medical systems, vol. 42, no. 10, p.
186, 2018.
[113] Y. Liu, B. Logan, N. Liu, Z. Xu, J. Tang, and Y. Wang, “Deep
reinforcement learning for dynamic treatment regimes on medical
registry data,” in 2017 IEEE International Conference on Healthcare
Informatics (ICHI), 2017, pp. 380–385.
[114] Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzel, “Learning to diagnose
with LSTM recurrent neural networks,” arXiv preprint
arXiv:1511.03677, 2015.
[115] A. E. Johnson et al., “MIMIC-III, a freely accessible critical care
database,” Scientific data, vol. 3, p. 160035, 2016.

14

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like