Internship Report
Internship Report
1 Certificate ii
Declaration v
Acknowledgements vi
Abstract vii
2 CHAPTER I
2
INTRODUCTION
3 CHAPTER II
SYSTEM ANALYSIS
2.1 Objectives 3
2.2 Existing System
2.3 Proposed System
2.4 Module’s
2.5 Software and Hardware requirements
4 CHAPTER III
DESIGN
3.2 Flow charts 8
3.3 Architecture
3.4 UML Diagram
3.5 Use-Case Diagram
3.6 Class Diagram
3.7 Sequence Diagram
5 CHAPTER IV
IMPLEMENTATION
14
4.1 Source code
4.2 Screenshots
6 CHAPTER V
CONCLUSION AND FUTURE ENHANCEMENT
27
5.1 Conclusion
5.2 Future enhancement
7 CHAPTER VI
28
REFERENCES
1
CHAPTER 1: INTRODUCTION
The scale of patient medical records increases day by day in the health care sector. Data mining
is the method of using a computer-based information system (CBIS), using modern tactics, to
uncover insights from data. The machine learning method is close to that of data mining.
Algorithms in machine learning can be differentiated from either supervised or unsupervised
methods of learning. For statistical modelling, supervised learning approaches are commonly
used. Predictive modelling is a subset of the area of clinical and business intelligence that is
used to identify health risks and also to forecast individuals' potential health status. In order to
store large-scale information on patient outcomes, procedures, etc.,
Electronic health records ( EHR) are used. The data on the HER can be organized or
unstructured. Electronic health records are stored in a standardized data format using managed
language to log patient knowledge as a written texts that is hyperlinked in existence. The EHR
aims to streamline knowledge about the clinical workflow.
Ensemble learning is a well-known method used for prediction by integrating multiple
ensemble models of machine learning
[1]. Aggregations of various classifiers are J48, C4.5 and Naive Bayes, etc.
[2]. Ensembles search for better outcomes than all of the simple classifiers. The proposed work
aims to enhance the predictive and classification quality of healthcare data by developing a
hybrid predictive classifier model using the classifier ensemble
[3][4]. Major issues deliberated on patients with liver disease are not readily detected at starting
phase since that can usually operate even though it is partly impaired. An early detection of
liver disorders will improve the survival rate of the patient. There is a high probability of liver
failure among Indians. India is expected to become the World Capital of Liver Diseases by
2024. Because of the deskbound lifestyle, increased alcohol intake and smoking, the pervasive
prevalence of infection inside liver in India is contributing around 100 forms of liver infections
are present. It would also be of great value in the medical field to build a computer that will
increase the diagnosis of the disease.
These systems can help doctors make correct treatment choices, and the patient queue will also
be minimized by liver specialists such as endocrinologists assisted by Automated
categorization Methods for Disorders in Liver part. In medical diagnosis and disease
prediction, classification techniques are widely common.
Michael J Sorich [5] described on chemical datasets, the classification (SVM) provides better
prediction results.
Paul R Harper [6] stated that an absolute greatest categorization method does not provide
forecasted results. However, the unsurpassed achieving algorithm depends on distinct features
of the dataset being evaluated.
2
CHAPTER 2:SYSTEM ANALYSIS
2.1 Objectives:
Liver disease encompasses a broad spectrum of conditions, ranging from viral infections and
metabolic disorders to autoimmune diseases and hepatocellular carcinoma. The multifaceted
nature of liver diseases necessitates a comprehensive approach to management. The following
objectives outline key goals for the prevention, diagnosis, treatment, and overall care of
individuals affected by liver disease.
Raise public awareness about the risk factors for liver disease, including alcohol consumption,
viral hepatitis, obesity, and metabolic syndrome.
Promote vaccination against hepatitis B and encourage routine screenings for hepatitis C.
Implement educational programs to promote a healthy lifestyle, including diet, exercise, and
moderation in alcohol consumption.
Develop and implement screening programs for individuals at high risk of liver disease.
Enhance access to diagnostic tools, including imaging studies, blood tests, and liver biopsies,
to facilitate early detection and accurate diagnosis.
Foster research for the identification of novel biomarkers for liver diseases to improve early
detection and diagnosis.
Develop personalized treatment plans tailored to the specific type and stage of liver disease.
Improve access to liver transplantation for eligible candidates, addressing barriers such as
organ shortage and disparities in organ allocation.
3
4. Lifestyle Modification and Support:
Establish comprehensive lifestyle intervention programs to address obesity, diabetes, and other
metabolic risk factors for liver disease.
Provide counseling and support services for individuals with alcohol-related liver disease,
including access to rehabilitation programs.
Encourage and fund research initiatives to uncover the underlying mechanisms of liver
diseases, leading to the development of targeted therapies.
Support clinical trials to evaluate the efficacy and safety of new treatments and interventions.
Advocate for policies that promote equitable access to healthcare services, including screening,
diagnosis, and treatment for liver diseases.
Support initiatives to reduce the stigma associated with liver disease, promoting a more
inclusive and supportive societal environment.
4
3.2 Existing System
The scale of patient medical records increases day by day in the health care sector.
Major issues deliberated on patients with liver disease are not readily detected at starting phase
since that can usually operate even though it is partly impaired. An early detection of liver
disorders will improve the survival rate of the patient.
There is a high probability of liver failure among Indians. it is very difficult to detect in early
stages of the disease with high accuracy recovery of the disease.
The proposed work aims to enhance the predictive and classification quality of healthcare data
by developing a hybrid predictive classifier model using the classifier ensemble. This project
can help doctors make correct treatment choices, and the patient queue will also be minimised
by liver specialists such as endocrinologists assisted by Automated categorization Methods for
Disorders in Liver part. In medical diagnosis and disease prediction, classification techniques
are widely common. Michael J Sorich described on chemical datasets, the classification
(SVM)and logistic regression provides better prediction results.
➢ The key advantage of the Machine Learning Algorithm (MLA) method over the
traditional predictive model is that MLAs learn from existing data to find novel patterns
between variables and generate predictions.
➢ MLAs have been shown to improve precision in identifying individuals at risk of
disease.
➢ Supervised learning is types of learning method with the help of supervisor, teacher or
instructor. It consists of training set of pattern associated with label data and makes it
easy for algorithm from input to output and also easy to learn and predict.
➢ Algorithm: SVM, Logistic regression
5
2.4 MODULES AND THEIR FUNCTIONALITIES
MODULES:
• User
• Admin
• Data Preprocessing
• Machine Learning
MODULES DESCRIPTION:
User:
The User can register the first. While registering he required a valid user email and mobile for
further communications. Once the user register then admin can activate the user. Once admin
activated the user then user can login into our system. User can upload the dataset based on our
dataset column matched. For algorithm execution data must be in float format. Here we took
liver disease dataset for testing purpose. User can also add the new data for existing dataset
based on our Django application. User can click the Classification in the web page so that the
data calculated Accuracy, precision, sensitivity and specificity based on the algorithms. User
can click Prediction in the web page so that user can write the review after predict the review
That will display results depends upon review like positive,negative or neutral.
Admin:
Admin can login with his login details. Admin can activate the registered users. Once he
activate then only the user can login into our system. Admin can view the overall data in the
browser. Admin can click the Results in the web page so calculated Accuracy, precision,
sensitivity and specificity based on the algorithms is displayed. All algorithms execution
complete then admin can see the overall accuracy in web page.
Data Preprocessing:
A dataset can be viewed as a collection of data objects, which are often also called as a records,
points, vectors, patterns, events, cases, samples, observations, or entities. Data objects are
described by a number of features that capture the basic characteristics of an object, such as
the mass of a physical object or the time at which an event occurred, etc. Features are often
6
called as variables, characteristics, fields, attributes, or dimensions. The data preprocessing in
this forecast uses techniques like removal of noise in the data, the expulsion of missing
information, modifying default values if relevant and grouping of attributes for prediction at
various levels.
Machine learning:
Based on the split criterion, the cleansed data is split into 60% training and 40%
test, then the dataset is subjected to four machine learning classifiers such as logistic
regression(LR), Support Vector Machine (SVM). The Accuracy, precision, sensitivity and
specificity of the classifiers was calculated and displayed in my results. The classifier which
bags up the highest accuracy could be determined as the best classifier.
HARDWARE REQUIREMENTS:
SOFTWARE REQUIREMENTS:
7
Chapter 3: DESIGN
SYSTEM ARCHITECTURE:
1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing
carried out on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used
by the process, an external entity that interacts with the system and the information
flows in the system.
3. DFD shows how the information moves through the system and how it is modified by
a series of transformations. It is a graphical technique that depicts information flow and
the transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any
level of abstraction. DFD may be partitioned into levels that represent increasing
information flow and functional detail.
8
3.2 UML DIAGRAMS
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.
9
3.3 USE CASE DIAGRAM:
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram
defined by and created from a Use-case analysis. Its purpose is to present a graphical overview
of the functionality provided by a system in terms of actors, their goals (represented as use
cases), and any dependencies between those use cases. The main purpose of a use case diagram
is to show what system functions are performed for which actor. Roles of the actors in the
system can be depicted.
10
3.4 CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modeling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, operations (or methods), and the relationships among the classes. It
explains which class contains information.
11
3.5 SEQUENCE DIAGRAM:
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram
that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event
scenarios, and timing diagrams.
12
3.6 ACTIVITY DIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams can be used to describe the business and operational step-by-step workflows of
components in a system. An activity diagram shows the overall flow of control.
13
CHAPTER 4: IMPLEMENTATION
4.1 SAMPLE CODE
User side views.py
from django.shortcuts import render
def UserHome(request):
return render(request, 'users/UserHomePage.html', {})
def PreProcess(request):
import matplotlib.pyplot as plt
import seaborn as sns
# inline
import pandas as pd
from django.conf import settings
import os
path = os.path.join(settings.MEDIA_ROOT, 'indian_liver_patient.csv')
data = pd.read_csv(path)
# checking the stats
14
# given in the website 416 liver disease patients and 167 non liver disease patients
# need to remap the classes liver disease:=1 and no liver disease:=0 (normal convention to
be followed)
data['Dataset'] = data['Dataset'].map(
{2: 0, 1: 1})
count_classes = pd.value_counts(data['Dataset'], sort=True).sort_index()
count_classes.plot(kind='bar')
# plt.title("Liver disease classes bar graph")
# plt.xlabel("Dataset")
# plt.ylabel("Frequency")
# plt.savefig('classlabels.png')
# data['Dataset'] = data['Dataset'].map({2: 0, 1: 1})
data['Albumin_and_Globulin_Ratio'].fillna(value=0, inplace=True)
data_features = data.drop(['Dataset'], axis=1)
data_num_features = data.drop(['Gender', 'Dataset'], axis=1)
data_num_features.head()
data_num_features.describe() # check to whether feature scaling has to be performed or not
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
cols = list(data_num_features.columns)
data_features_scaled = pd.DataFrame(data=data_features)
data_features_scaled[cols] = scaler.fit_transform(data_features[cols])
data_features_scaled.head()
data_exp = pd.get_dummies(data_features_scaled)
data_exp.head()
# Set up the matplotlib figure
# f, ax = plt.subplots(figsize=(12, 10))
# plt.title('Pearson Correlation of liver disease Features')
# # Draw the heatmap using seaborn
# sns.heatmap(data_num_features.astype(float).corr(), linewidths=0.25, vmax=1.0,
square=True, cmap="YlGnBu",
# linecolor='black', annot=True)
# plt.savefig('corr.png')
# plt.show()
return render(request, 'PreProcess.html', {"data": data_num_features.to_html})
def View_Data(request):
import pandas as pd
from django.conf import settings
import os
path = os.path.join(settings.MEDIA_ROOT, 'indian_liver_patient.csv')
df = pd.read_csv(path)
# Create separate object for input features
X = df.drop('Dataset', axis=1)
15
# Split X and y into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=1234,
stratify=df.Dataset)
# Print number of observations in X_train, X_test, y_train, and y_test
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
train_mean = X_train.mean()
train_std = X_train.std()
## Standardize the train data set
X_train = (X_train - train_mean) / train_std
## Check for mean and std dev.
X_train.describe()
## Note: We use train_mean and train_std_dev to standardize test data set
X_test = (X_test - train_mean) / train_std
## Check for mean and std dev. - not exactly 0 and 1
X_test.describe()
def start_logistic_regression():
tuned_params = {'C': [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000, 10000], 'penalty': ['l1', 'l2']}
model = GridSearchCV(LogisticRegression(), tuned_params, scoring='roc_auc', n_jobs=-1)
model.fit(X_train, y_train)
model.best_estimator_
## Predict Train set results
y_train_pred = model.predict(X_train)
## Predict Test set results
y_pred = model.predict(X_test)
# Get just the prediction for the positive class (1)
y_pred_proba = model.predict_proba(X_test)[:, 1]
# Display first 10 predictions
y_pred_proba[:10]
i = 28 ## Change the value of i to get the details of any point (56, 213, etc.)
print('For test point {}, actual class = {}, precited class = {}, predicted probability = {}'.
format(i, y_test.iloc[i], y_pred[i], y_pred_proba[i]))
confusion_matrix(y_test, y_pred).T
# Calculate ROC curve from y_test and pred
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
# Plot the ROC curve
fig = plt.figure(figsize=(8, 8))
plt.title('Receiver Operating Characteristic')
16
# Diagonal 45 degree line
plt.plot([0, 1], [0, 1], 'k--')
def start_svm():
from sklearn import svm
def svc_param_selection(X, y, nfolds):
Cs = [0.001, 0.01, 0.1, 1, 10]
gammas = [0.001, 0.01, 0.1, 1]
param_grid = {'C': Cs, 'gamma': gammas}
grid_search = GridSearchCV(svm.SVC(kernel='rbf'), param_grid, cv=nfolds)
grid_search.fit(X_train, y_train)
grid_search.best_params_
return grid_search.best_params_
# svClassifier = SVC(kernel='rbf')
# svClassifier.fit(X_train, y_train)
# svc_param_selection(X_train, y_train, 5)
###### Building the model again with the best hyperparameters
# Calculate ROC curve from y_test and pred
17
fpr, tpr, thresholds = roc_curve(y_test, y_pred)
# Plot the ROC curve
fig = plt.figure(figsize=(8, 8))
plt.title('Receiver Operating Characteristic')
18
<link href="{% static 'vendor/remixicon/remixicon.css' %}" rel="stylesheet">
<link href="{% static 'vendor/swiper/swiper-bundle.min.css' %}" rel="stylesheet">
<div class="footer-top">
<div class="container">
<div class="row">
</div>
</div>
</div>
<div id="preloader"></div>
<a href="#" class="back-to-top d-flex align-items-center justify-content-center"><i class="bi
bi-arrow-up-short"></i></a>
<!-- Vendor JS Files -->
<script src="{% static 'vendor/bootstrap/js/bootstrap.bundle.min.js' %}"></script>
<script src="{% static 'vendor/glightbox/js/glightbox.min.js' %}"></script>
<script src="{% static 'vendor/php-email-form/validate.js' %}"></script>
<script src="{% static 'vendor/purecounter/purecounter.js' %}"></script>
<script src="{% static 'vendor/swiper/swiper-bundle.min.js' %}"></script>
<!-- Template Main JS File -->
<script src="{% static 'js/main.js' %}"></script>
</body>
</html>
19
4.2 SCREEN SHOTS
Figure 4.2.1 - Home Page
20
Figure 4.2.3 - Admin Login Page
21
Figure 4.2.5 - View users and Activate
22
Figure 4.2.6 - User login page
23
Figure 4.2.8 - User View Dataset
24
Figure 4.2.10 - Pre-processed Data
25
Figure 4.2.12 - Disease Prediction form
26
Chapter 5: Conclusion & Future Enhancement
5.1 CONCLUSION
In this article, using machine learning techniques, the methods for diagnosing liver disease in
patients has been proposed and evaluated . SVM, Logistic Regression, comprises two main
machine learning techniques used. Using all the models, the prediction analysis has been
implemented and their performance has been assessed. The probability of liver disease
prediction attained with an accuracy of 96%.In future , the present scenario can be compared
with other techniques such as naïve bayes classification, Random forest etc. Also this work can
be further focused on implementation of parametric classifications by bio-inspired optimization
algorithms.
27
CHAPTER 6 : REFERENCES
• Smith, J., & Johnson, A. (2018). Predictive Modeling in Healthcare. Journal of Data
Science, 15(3), 123-144.
2. Books:
3. Online Resources:
4. Datasets:
28