Logistic Regression in Julia
Last Updated :
15 Dec, 2020
Logistic Regression, as the name suggests is completely opposite in functionality. Logistic Regression is basically a predictive algorithm in Machine Learning used for binary classification. It predicts the probability of a class and then classifies it based on the predictor variables’ values.
The dependent variable(y) in logistic regression, is binary and takes 2 possible values 0 or 1. For example: If we want to check if the mail is spam or not. Then y will have values – 1 for spam and 0 for not spam.
Logistic regression is analogous to linear regression in some aspects, just that in linear regression y is a continuous variable, whereas in logistic regression y needs to lie between 0 and 1. As, y is nothing but the predicted value and predicted value is nothing but the probability of y equals to 1, P(y = 1).
We know, the equation for linear regression:

Now, we know that this equation yields a continuous value of y. So as to restrict the predicted value of y in the 0 – 1 range, we apply the sigmoid function. The sigmoid function is,

Computing the value of y from the above equation and then putting in the linear regression equation, we get:

Hence, we finally get the equation for Logistic Regression. Using this equation, we can find the value of the probability of y when the value of x is known. In the above equation, b0 is the intercept and b1 is the slope matrix for all values of x. Collectively, we can refer to all as regression coefficients.
To learn more I will take the example of the Churn Modelling Data, a well-known dataset to predict a customer has churned out or not based on a number of features. Let’s start with it in Julia.
Step 1: Import Packages
In this step, we will import all the required packages we will be using. Don’t be afraid if you miss any package, you can definitely add it simultaneously.
Julia
import Pkg;
Pkg.add( "Lathe" )
using Lathe
import Pkg;
Pkg.add( "DataFrames" )
using DataFrames
import Pkg;
Pkg.add( "Plots" )
using Plots;
import Pkg;
Pkg.add( "GLM" )
using GLM
import Pkg;
Pkg.add( "StatsBase" )
using StatsBase
import Pkg;
Pkg.add( "MLBase" )
using MLBase
import Pkg;
Pkg.add( "ROCAnalysis" )
using ROCAnalysis
import Pkg;
Pkg.add( "CSV" )
using CSV
println( "Successfully imported the packages!!" )
|
Output:
Successfully imported the packages!!
Step 2: Load the dataset
To read the data we use the CSV.file function and then convert it into data frame using DataFrame function.
Julia
df = DataFrame(CSV. File ( "churn modelling.csv" ))
first(df, 5 )
|
Output:

Step 3: Summary of the dataframe
Julia
println(size(df))
describe(df)
|
Output:
The dependent variable, y in this case is the last column (‘Exited’). The first column is just the row numbers and the variable from the second column are predictors or independent variables. Also, the size of our dataset is 10,000 rows and 14 columns. From the summary, we find that the dataset has no missing or null values. The column names don’t have any spaces or special characters in them, hence we can go further without any problem.
Julia
select!(df, Not([:RowNumber, :CustomerId, :Surname, :Geography, :Gender]))
first(df, 5 )
|
Output:

Step 4: Splitting the data into train and test set.
Julia
using Lathe.preprocess: TrainTestSplit
train, test = TrainTestSplit(df, . 75 );
|
Step 5: Building our Logistic Regression model.
We use the glm function for logistic regression.
Julia
fm = @formula(Exited ~ CreditScore + Age + Tenure +
Balance + NumOfProducts + HasCrCard +
IsActiveMember + EstimatedSalary)
logit = glm(fm, train, Binomial(), ProbitLink())
|
Output:
Once, we have built our model and trained it, we would evaluate the model on test data to see how accurately it works.
Step 6: Model prediction and evaluation
Julia
prediction = predict(logit, test)
|
Output:
The glm model gives us the probability score of class 1. After getting the probability score we need to classify it as 0 or 1 based n the threshold score. Usually, the threshold is considered 0.5.
With the help of the above code, we got the probability scores. Now, let’s convert these into classes 0 or 1. For probability > 0.5 will be given class 1, otherwise 0.
Julia
prediction_class = [ if x < 0.5 0 else 1 end for x in prediction];
prediction_df = DataFrame(y_actual = test.Exited,
y_predicted = prediction_class,
prob_predicted = prediction);
prediction_df.correctly_classified = prediction_df.y_actual . = = prediction_df.y_predicted
|
Output:
After, predicting the classes of our test data, the next step is to evaluate the model by analyzing the accuracy.
Step 7: Calculating the Accuracy of the model
By Accuracy, we mean the total number of classes our model predicted correctly.
Julia
accuracy = mean(prediction_df.correctly_classified)
|
Output:
0.7996820349761526
So, our model yields an accuracy of approximately 80% which is a good score. Now, to get better insights of accuracy we analyze the confusion matrix.
Step 8: Confusion Matrix
The confusion matrix is just another way to evaluate the performance of any classification model.
Julia
confusion_matrix = MLBase.roc(prediction_df.y_actual,
prediction_df.y_predicted)
|
Output:
ROCNums{Int64}
p = 523
n = 1993
tp = 69
tn = 1943
fp = 50
fn = 454
The above output informs us that due to class imbalance our model classifies maximum observations as class 0. This is the reason for a moderate accuracy of 80%.
Similar Reads
Polynomial Regression in Julia
In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial in x. Polynomial regression fits a nonlinear relationship between the value of x and the correspondin
3 min read
Non-Linear Regression in R
Non-Linear Regression is a statistical method that is used to model the relationship between a dependent variable and one of the independent variable(s). In non-linear regression, the relationship is modeled using a non-linear equation. This means that the model can capture more complex and non-line
6 min read
Recursion in Julia
Julia is a dynamic, high-level programming language with high performance and speed that is used to perform operations in scientific computing. It is great for computational complex problems. It is an open-source language so all source code is available easily online. It is as easy to use as Python
3 min read
Multinomial Logistic Regression in R
Multinomial logistic regression is applied when the dependent variable has more than two categories that are not ordered. This method extends binary logistic regression to deal with multiple classes by estimating the probability of each outcome category relative to a baseline. It is commonly used in
4 min read
Local Regression in R
In this article, we will discuss what local regression is and how we implement it in the R Programming Language. What is Local Regression in R?Local regression is also known as LOESS (locally estimated scatterplot smoothing) regression. It is a flexible non-parametric method for fitting regression m
4 min read
Plot Logistic Regression Line Over Heat Plot in R
Plotting a logistic regression line over a heat plot can be a powerful way to visualize the relationship between predictor variables and a binary outcome. This article will guide you through the steps to create such a visualization in R. What is a Logistic Regression Line?In logistic regression, the
4 min read
How to Plot a Logistic Regression Curve in R?
In this article, we will learn how to plot a Logistic Regression Curve in the R programming Language. Logistic regression is basically a supervised classification algorithm. That helps us in creating a differentiating curve that separates two classes of variables. To Plot the Logistic Regression cur
3 min read
How to Plot the Linear Regression in R
In this article, we are going to learn to plot linear regression in R. But, to plot Linear regression, we first need to understand what exactly is linear regression. What is Linear Regression?Linear Regression is a supervised learning model, which computes and predicts the output implemented from th
8 min read
Visualisation in Julia
Data visualization is the process of representing the available data diagrammatically. There are packages that can be installed to visualize the data in languages like Python and Julia. Some of the reasons that make the visualization of data important are listed below: Larger data can be analyzed ea
7 min read
Polynomial Regression in R Programming
Polynomial Regression is a form of linear regression in which the relationship between the independent variable x and dependent variable y is modeled as an nth degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, d
4 min read