0% found this document useful (0 votes)

11 views

Sat - 9.Pdf - Predicting Liver Failure Using Supervised Machine Learning Approach

The document discusses liver disease and how machine learning can be used to predict liver disease by analyzing past patient dataset to build a classification model using algorithms like logistic regression, decision tree, random forest and support vector machine which are then evaluated based on performance metrics to detect liver disease early. It also talks about collecting patient data, preprocessing it, visualizing it and then analyzing it using machine learning techniques and algorithms for building a predictive model for liver disease detection.

Uploaded by

Vj Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Sat - 9.Pdf - Predicting Liver Failure Using Supervised Machine Learning Approach

Uploaded by

Vj Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

ABSTRACT

The function of liver is to filter blood that circulates through the body, converting
nutrients and drugs absorbed from the digestive tract into ready-to-use chemicals.
The liver performs many other important functions, such as removing toxins and
other chemical waste products from the blood and readying them for excretion. Liver
failure that begins in the cells of your liver. Nowadays machine learning is applied
to healthcare system where there is a chance of predicting the disease early. The
main necessity of Artificial intelligence is data. The past dataset is collected and that
dataset is used to build a machine learning model. The necessary pre-processing
techniques are applied like univariate analysis and bivariate analysis are
implemented. The data is visualized for better understanding of the features and
based on that a classification model is built by using machine learning algorithm and
comparison of algorithms are done based on their performance metrics like
accuracy, F1 score recall etc.

Keywords— liver disease, demographic variables, prognostic/biochemical

variables, statistical learning for variable selection and classification, support vector
machine.

v
TABLE OF CONTENTS

CHAPTER TITLE NAME PAGE NO

NO
ABSTRACT V
LIST OF FIGURES VIII

1 INTRODUCTION
1.1 GENERAL 1
1.2 DOMAIN OVERVIEW 3
1.3 PROBLEM STATEMENT 9

2 LITERATURE SURVEY
2.1 LITERATURE REVIEW 11
2.2 SURVEY WALKTHROUGH 15
2.2.1 NUMPY 15
2.2.2 PANDAS 18
2.2.3 MATPLOTLIB 20
2.2.4 SKLEARN 23
2.3 PROJECT GOALS 25

3 AIM AND SCOPE OF PRESENT

INVESTIGATION
3.1 AIM 27
3.2 SCOPE 27
3.3 SYSTEM STUDY 27
3.4 EXISTING SYSTEM 29
3.5 PROPOSED SYSTEM 30
3.5.1 SYSTEM ARCHITECTUR 31

vi
3.5.2 WORKING PROCESS 33

4 EXPERIMENTAL OR MATERIALS AND

METHODS AND ALGORITHMS USED
4.1 DATA GATHERING 35
4.1.1 PREPARING DATASET 35
4.1.2 DATA PREPARATION 37
4.2 DATA PRE-PROCESSING 38
4.2.1 DATA WRANGLING 39
4.3 DATA VISUALIZATION 41
4.4 DATA ANALYSIS 45
4.4.1 DATA COLLECTION 46
4.4.2 CONSTRUCTION OF PREDICTIVE MODEL 46
4.4.3 PROJECT REQUIREMENTS 47
4.4.4 ENVIRONMENTAL REQUIREMENTS 48
4.5 WORKFLOW PROCESS 48
4.5.1 USECASE DIAGRAM 49
4.5.2 CLASS DIAGRAM 50
4.5.3 ACTIVITY DIAGRAM 51
4.5.4 SEQUENCE DIAGRAM 52
4.5.5 ER – DIAGRAM 53
4.5.6 PREDICTION RESULT BY ACCURACY 55
4.5.7 TRAINING AND TESTING MODEL 58
4.6 ALGORITHM AND TECHNIQUES 59
4.6.1 LOGISTIC REGRESSION 60
4.6.2 DECISION TREE 63
4.6.3 RANDOM FOREST 66
4.6.4 SUPPORT VECTOR MACHINE 71
4.6.5 DEPLOYMENT 75

5 RESULTS AND DISCUSSION 79

6 CONCLUSION AND FUTURE WORK 82
7 APPENDICES AND SCREENSHOTS 85

vii
LIST OF FIGURES

Figure no. Name of the Figure Page no.

1.1 Stages of Liver Damage 2

1.2 Machine Learning 8

1.3 Types of Machine Learning 9

3.1 System Overview 31

3.2 System Architecture Diagram 32

4.1 Dataset Used 37

4.2 Unbiased evaluation of data 38

4.3 Missing Data 39

4.4 Total male and female patients 42

4.5 Liver Failure by Gender 43
4.6 Visualization of Total proteins and Albumin 44
4.7 Correlation between Data 44
4.8 Correlation Heatmap 45
4.9 Process of Dataflow Diagram 47
4.10 Workflow Diagram 49
4.11 Use case Diagram 50
4.12 Class Diagram 51
4.13 Activity Diagram 52
4.14 Sequence Diagram 53
4.15 E – R Diagram 54
4.16 Confusion Matrix of LR 61
4.17 Classification Report of LR 62

viii
4.18 Confusion Matrix of DT 65
4.19 Classification Report of DT 66
4.20 Confusion Matrix of RF 69
4.21 Classification Report of RF 70
4.22 Confusion Matrix of SVM 73
4.23 Classification Report of SVM 74

ix
CHAPTER 1

INTRODUCTION

1.1 GENERAL

The liver is a large, pyramid-shaped organ that lies behind your ribs on the right side
of your body. It‟s under the right lung. It‟s divided into right and left lobes. The liver
helps break down and store nutrients. These include sugars, starch, fats, and
proteins. It also makes proteins, such as albumin. This helps the body balance
fluids. The liver makes clotting factors, which help blood thicken or clot when a
person is bleeding. Bile made in the liver is important for digesting food and for other
bodily functions.

One of the liver‟s most important jobs is to filter out and destroy toxins in the blood.
When the liver isn‟t working well, chemicals can build up inside the body and cause
damage. Liver cancer is cancer that starts in your liver. It‟s also called primary liver
cancer. Primary liver cancer is not the same as cancer that started somewhere else
in the body and then spread (metastasized) to the liver.

Cancer that starts in another organ, such as the colon, breast, or lung, and then
spreads to the liver is called secondary liver cancer. Secondary liver cancer is far
more common in the U.S. than primary liver cancer. Cancer that has spread to the
liver from somewhere else is treated like the original cancer. For instance, lung
cancer that has spread to the liver is treated like lung cancer.

Fatty liver disease (FLD) has become a rampant condition. It is associated with a
high rate of morbidity and mortality in a population. The condition is commonly
referred as FLD. Early prediction of FLD would allow patients to take necessary
preventive, diagnosis, and treatment. Chronic liver diseases and cirrhosis are the
11th leading cause of death in the world, accounting for 1.1 million deaths annually.
The global prevalence of cirrhosis has been substantially rising from 71 million in
1990 to over 122 million in 2017. Common causes of cirrhosis are chronic hepatitis
B virus (HBV) and hepatitis C virus (HCV) infections, alcohol-related liver disease
and nonalcoholic steatohepatitis (NASH). Over the past decade, there has been a

1
temporal shift in the prevalence of causes of cirrhosis, i.e., the prevalence of NASH
has been dramatically increasing, whereas the prevalence of other causes has been
slowly decreasing. The estimated worldwide prevalence of nonalcoholic fatty liver
disease (NAFLD) is 25% and is projected to be to 33.5% by 2030, emphasizing the
importance of both cirrhosis and NAFLD.

The computer-based approach is required for the non-invasive detection of chronic

liver diseases that are asymptomatic, progressive, and potentially fatal in nature.
Machine learning has made a significant impact on the biomedical field for liver
disease prediction and diagnosis. Machine learning offers a guarantee for improving
the detection and prediction of disease that has been made an interest in the
biomedical field and they also increase the objectivity of the decision – making
process. By using machine learning techniques medical problems can be easily
solved and the cost of diagnosis will be reduced.

The gold standard for the diagnosis of liver fibrosis and nonalcoholic fatty liver
disease (NAFLD) is liver biopsy. Various noninvasive modalities, e.g.,
ultrasonography, elastography and clinical predictive scores, have been used as
alternatives to liver biopsy, with limited performance. Recently, artificial intelligence
(AI) models have been developed and integrated into noninvasive diagnostic tools
to improve their performance.

FIG: 1.1 STAGES OF LIVER DAMAGE

2
1.2 Domain Overview

DATA SCIENCE:

Data science is an interdisciplinary field that uses scientific methods, processes,

algorithms and systems to extract knowledge and insights from structured and
unstructured data, and apply knowledge and actionable insights from data across a
broad range of application domains.

The term "data science" has been traced back to 1974, when Peter Naur proposed
it as an alternative name for computer science. In 1996, the International Federation
of Classification Societies became the first conference to specifically feature data
science as a topic.

However, the definition was still in flux. The term “data science” was first coined in
2008 by D.J. Patil, and Jeff Hammerbacher, the pioneer leads of data and
analytics efforts at LinkedIn and Facebook. In less than a decade, it has become
one of the hottest and most trending professions in the market. Data science is the
field of study that combines domain expertise, programming skills, and knowledge
of mathematics and statistics to extract meaningful insights from data.

Data science can be defined as a blend of mathematics, business acumen, tools,

algorithms and machine learning techniques, all of which help us in finding out the
hidden insights or patterns from raw data which can be of major use in the formation
of big business decisions.
Data Scientist: Data scientists examine which questions need answering and
where to find the related data. They have business acumen and analytical skills as
well as the ability to mine, clean, and present data. Businesses use data scientists
to source, manage, and analyze large amounts of unstructured data.

ARTIFICIAL INTELLIGENCE:

Artificial intelligence (AI) refers to the simulation of human intelligence in

machines that are programmed to think like humans and mimic their actions. The

3
term may also be applied to any machine that exhibits traits associated with a human
mind such as learning and problem-solving.
Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to
the natural intelligence displayed by humans or animals. Leading AI textbooks
define the field as the study of "intelligent agents" any system that perceives its
environment and takes actions that maximize its chance of achieving its goals.
Some popular accounts use the term "artificial intelligence" to describe machines
that mimic "cognitive" functions that humans associate with the human mind, such
as "learning" and "problem solving", however this definition is rejected by major AI
researchers.
Artificial intelligence is the simulation of human intelligence processes by machines,
especially computer systems. Specific applications of AI include expert systems,
natural language processing, speech recognition and machine vision.

AI applications include advanced web search engines, recommendation systems

(used by YouTube, Amazon and Netflix), Understanding human speech (such
as Siri or Alexa), self-driving cars (e.g., Tesla), and competing at the highest level
in strategic game systems (such as chess and go), As machines become
increasingly capable, tasks considered to require "intelligence" are often removed
from the definition of AI, a phenomenon known as the AI effect. For instance, optical
character recognition is frequently excluded from things considered to be AI, having
become a routine technology.

Artificial intelligence was founded as an academic discipline in 1956, and in the

years since has experienced several waves of optimism, followed by
disappointment and the loss of funding (known as an "AI winter"), followed by new
approaches, success and renewed funding. AI research has tried and discarded
many different approaches during its lifetime, including simulating the brain,
modeling human problem solving, formal logic, large databases of knowledge and
imitating animal behavior. In the first decades of the 21st century, highly
mathematical statistical machine learning has dominated the field, and this
technique has proved highly successful, helping to solve many challenging
problems throughout industry and academia.

The various sub-fields of AI research are centered around particular goals and the
use of particular tools. The traditional goals of AI research

4
include reasoning, knowledge representation, planning, learning, natural language
processing, perception and the ability to move and manipulate objects. General
intelligence (the ability to solve an arbitrary problem) is among the field's long-term
goals. To solve these problems, AI researchers use versions of search and
mathematical optimization, formal logic, artificial neural networks, and methods
based on statistics, probability and economics. AI also draws upon computer
science, psychology, linguistics, philosophy, and many other fields.

The field was founded on the assumption that human intelligence "can be so
precisely described that a machine can be made to simulate it". This raises
philosophical arguments about the mind and the ethics of creating artificial beings
endowed with human-like intelligence. These issues have been explored
by myth, fiction and philosophy since antiquity. Science fiction and futurology have
also suggested that, with its enormous potential and power, AI may become
an existential risk to humanity.

As the hype around AI has accelerated, vendors have been scrambling to promote
how their products and services use AI. Often what they refer to as AI is simply one
component of AI, such as machine learning. AI requires a foundation of specialized
hardware and software for writing and training machine learning algorithms. No one
programming language is synonymous with AI, but a few, including Python, R and
Java, are popular.

In general, AI systems work by ingesting large amounts of labeled training data,

analyzing the data for correlations and patterns, and using these patterns to make
predictions about future states. In this way, a chatbot that is fed examples of text
chats can learn to produce life like exchanges with people, or an image recognition
tool can learn to identify and describe objects in images by reviewing millions of
examples.

AI programming focuses on three cognitive skills: learning, reasoning and self-

correction.

Learning processes. This aspect of AI programming focuses on acquiring data and

creating rules for how to turn the data into actionable information. The rules, which

5
are called algorithms, provide computing devices with step-by-step instructions for
how to complete a specific task.

Reasoning processes. This aspect of AI programming focuses on choosing the

right algorithm to reach a desired outcome.

Self-correction processes. This aspect of AI programming is designed to

continually fine-tune algorithms and ensure they provide the most accurate results
possible.

AI is important because it can give enterprises insights into their operations that they
may not have been aware of previously and because, in some cases, AI can perform
tasks better than humans. Particularly when it comes to repetitive, detail-oriented
tasks like analyzing large numbers of legal documents to ensure relevant fields are
filled in properly, AI tools often complete jobs quickly and with relatively few errors.

Artificial neural networks and deep learning artificial intelligence technologies are
quickly evolving, primarily because AI processes large amounts of data much faster
and makes predictions more accurately than humanly possible.

Natural Language Processing (NLP):

Natural language processing (NLP) allows machines to read

and understand human language. A sufficiently powerful natural language
processing system would enable natural-language user interfaces and the
acquisition of knowledge directly from human-written sources, such as newswire
texts. Some straightforward applications of natural language processing
include information retrieval, text mining, question answering and machine
translation.

Many current approaches use word co-occurrence frequencies to construct

syntactic representations of text. "Keyword spotting" strategies for search are
popular and scalable but dumb; a search query for "dog" might only match
documents with the literal word "dog" and miss a document with the word "poodle".