Credit Card Predection Project (1)
Credit Card Predection Project (1)
Project Mentor
Sri. K. Siva Kumar
Assistant Professor, B.Tech., M.E., (Ph.D.)
Department of Computer science and Engineering
Department of
Submitted by:
Redg.No:
Department of
Endorsements
Faculty Guide
Principal
Certificate from Intern Organization
I owe a great many thanks to a great many people who helped and
supported and suggested us in every step.
Finally, I thank one and all who directly and indirectly helped us
to complete ourproject successfully.
Project Associate
Contents
Executive Summary 9
1.
12
2. Overview of the Organization
16
3. Introduction to Machine Learning
Project 179
9.
Conclusion 211
10.
Machine Learning
with Python
Executive Summary
Objective:
The main objective of the Machine Learning with Python internship is to
provide participants with a comprehensive understanding of machine learning
concepts and practical experience in Python programming. The program is
designed to bridge the gap between theoretical knowledge and real-world
application by immersing interns in hands-on projects and industry-relevant
scenarios.
Curriculum:
The internship program encompasses a range of topics and activities to
ensure a well-rounded learning experience. The curriculum covers the
following key areas:
Python for Machine Learning: Participants will learn the essential Python
libraries and frameworks used in machine learning, such as NumPy, Pandas,
and Scikit-learn. They will also develop proficiency in data manipulation and
preprocessing techniques.
Model Evaluation and Validation: The program will cover techniques for
evaluating and validating machine learning models, such as cross-
validation, overfitting, and underfitting. Interns will gain insights into
optimizing model performance and avoiding common pitfalls.
Real-World Projects: Interns will work on practical projects throughout
the program to apply their knowledge and gain hands-on experience. These
projects may involve tasks such as data analysis, predictive modeling, and
pattern recognition.
Benefits:
The Machine Learning with Python internship offers several benefits to
participants:
Objectives :
Suggestive contents
Joffren Omar Company Sdn. Bhd. is a Brunei company which started off
in 1982 asa humble materials supplier to the local oil and gas industry. As
the company develops,training is becoming one of our business focus areas.
Our Sungai Bera facilities include 3 multi-purpose classrooms, a lecture
theatre and modern amenities. Our welder training and certification centre
started operation in 2009.
Our capability has earned the approval of the Energy Department, Prime
Minister‟s Office and the Ministry of Education, as a Registered Training
Organization (RTO) for welder and scaffolder training and certification
(Industrial Skill Qualification JO is now venturinginto IT security training
with Condition Zebra in conducting its Professional Training and IT
compliance and information security services. No matter who the client is
to serve to the best of our ability.
The Open Web Application Security Project (OWASP) is a 501(c)(3)
worldwide not- for-profit charitable organization focused on improving the
security of software. Our mission is to make software security visible, so
that individuals and organizations worldwide can make informed decisions
about true software security risks. Everyone isfree to participate in
OWASP and all of our materials are available under a free and open
software license. You‟ll find everything about OWASP here on or linked
from our wiki and current information on our OWASP Blog. OWASP
does not endorse or recommend commercial products or services, allowing
our community to remain vendor neutral with the collective wisdom of the
best minds in software security worldwide. We ask that the community
look out for inappropriate uses of the OWASPbrand including use of our
name, logos, project names and other trademark issues.
Roles and responsibilities of the employees in which the intern is
placed:
Introduction :
Machine Learning
There are several types of machine learning algorithms, each with its
own characteristics and applications. The main types of machine
learning are:
It's important to note that within these broad categories, there are
various subcategories, variations, and hybrid approaches to machine
learning. For example, semi-supervised learning combines labeled and
unlabeled data, while transfer learning leverages knowledge learned
from one task to improve performance on another task. Additionally,
ensemble methods combine multiple models to make more accurate
predictions. Each type of machine learning has its strengths and
limitations, and the choice of approach depends on the specific
problem and available data.
7 Steps of Machine Learning
1. Gathering Data
4. Training
5. Evaluation
6. Hyperparameter Training
7. Prediction
1. Gathering Data :
3. Data Sources:
Data can be obtained from a wide range of sources, and the choice of
sources depends on the problem domain and availability of data. Some
common data sources include:
a. Public Datasets: There are numerous publicly available datasets on
platforms like Kaggle, UCI Machine Learning Repository, or government
data portals. These datasets cover various domains and can provide a
starting point for many machine learning projects.
Ensuring data quality is crucial for effective machine learning. Here are
some considerations for data quality:
2. Data Cleaning:
3. Choosing a model :
d. Data Size and Dimensionality: Take into account the size of the
dataset and the number of features (dimensionality). Some algorithms
may work well with small datasets, while others require large amounts
of data for effective training. Similarly, some algorithms handle high-
dimensional data better than others.
a. Data Preparation: The labeled dataset is divided into two subsets: the
training set and the validation set. The training set is used to teach the model,
while the validation set is used to assess the model's performance during
training.
c. Forward Propagation: The training examples are fed into the model, and
their input features are processed through the model's layers or components.
The model produces predicted outputs for each example based on its current
parameters.
c. Model Fitting: The model is applied to the unlabeled data, and it learns
patterns or structures in the data without the use of labeled examples. The model
identifies clusters, associations, or latent variables that capture the underlying
patterns.
4. Considerations in Training:
When training machine learning models, several considerations should be taken
into account:
5. Training Challenges:
Training machine learning models can be challenging due to various factors:
6. Evaluation in Practice:
In practice, evaluation is an iterative process that involves multiple
iterations of model training, evaluation, and refinement. Here is a high-
level overview of the evaluation process:
a. Split the Data: Divide the available data into training and testing
datasets using appropriate techniques like holdout, cross-validation, or
time-series splits.
b. Train the Model: Train the machine learning model on the training data
using the chosen algorithm and hyperparameters.
1.Introduction to Hyperparameters:
In machine learning, hyperparameters are settings that are not learned from the
data but are defined by the user or machine learning engineer. They define the
behavior and characteristics of the model and can significantly impact its
performance. Some common examples of hyperparameters include learning rate,
regularization strength, number of hidden layers, number of decision tree nodes,
etc.
4.Evaluation Metrics:
To perform hyperparameter tuning effectively, it is crucial to define appropriate
evaluation metrics. The choice of metric depends on the specific problem and the
objective of the machine learning task.
Common evaluation metrics include accuracy, precision, recall, F1-score, mean
squared
error, mean absolute error, or area under the ROC curve. The evaluation metric
guides the selection of hyperparameters during the tuning process.
5.Cross-Validation:
Cross-validation is a technique used to estimate the performance of a machine
learning model on unseen data. It is often employed during hyperparameter
tuning to assess the model's generalization capability. Cross-validation involves
dividing the available data into multiple subsets (folds), training the model on a
subset, and evaluating it on the remaining fold. This process is repeated for each
fold, and the average performance across all folds is used as an estimate of the
model's performance. Cross-validation helps in reducing the risk of overfitting
during hyperparameter tuning.
e. Use automated tools and libraries: Several libraries and frameworks, such
as scikit-learn, TensorFlow, or Keras, provide built-in support for
hyperparameter tuning. These tools offer convenient functions and classes to
automate the tuning process, making it easier to explore different
hyperparameter configurations efficiently.
8. Challenges and Limitations:
Hyperparameter tuning is a complex task that can be challenging and time-
consuming. Some challenges and limitations include:
d. Data Sensitivity: Hyperparameter values that work well for one dataset
may not generalize well to other datasets. The optimal set of hyperparameters
can be sensitive to the characteristics of the data, making it essential to validate
the performance on multiple datasets.
d. Model Training: The selected model is trained using the labeled training
data. This involves optimizing the model's parameters or weights to minimize
the difference between predicted and actual values, using techniques like
gradient descent, maximum likelihood estimation, or backpropagation.
3.Prediction Techniques:
Machine learning models employ various techniques to make predictions.
Some common prediction techniques include:
4.Prediction Evaluation:
Evaluating the accuracy and reliability of predictions is crucial in
machine learning. Common evaluation techniques include:
c. Data Drift and Concept Change: Prediction models may become less
accurate over time if the underlying patterns or relationships change.
Continuous monitoring and adaptation of models are necessary to handle
data drift and concept change.
e. Ethical and Legal Considerations: Predictive models may raise ethical and
legal concerns related to privacy, bias, fairness, or discrimination. Ensuring
transparency, fairness, and responsible use of predictions is essential.
Data Preparation: The labeled dataset is divided into two parts - the
input features (independent variables) and the corresponding class
labels (dependent variables). The features are extracted or selected
based on the problem domain.
Model Deployment: Once the model has been trained and evaluated, it
can be deployed to make predictions on new, unlabeled instances. The
input features of these instances are fed into the model, which assigns
them to the predicted class or category.
Logistic Regression
Naïve bayes
K-Nearest Neighbors
Decision Tree
Support Vector Machines
Random Forest
2. Regression :
Linear Regression
Decision Tree Regressor
K Nearest Neighbor Regressor
Random Forest Regressor
Neural Networks
Linear Regression in Machine Learning
y = mx + b
where:
- y is the predicted value or target variable,
- x is the input feature,
- m is the slope or coefficient, indicating the change in y for each unit change
in x,
- b is the y-intercept, representing the value of y when x is 0.
bnxn where:
- y is the predicted value or target variable,
- xi represents the i-th input feature,
- bi is the coefficient corresponding to the i-th input feature,
- b0 is the y-intercept.
Linear regression is widely used in various domains, including
economics, finance, social sciences, and engineering. It provides a
simple and interpretable approach for understanding the relationship
between variables and making predictions.
Examples:
Linear regression is a widely used technique in daily life for various purposes.
Here's an example of how linear regression can be applied in a real-life
scenario:
By applying linear regression to this data, you can build a model that estimates
the electricity consumption based on the temperature. The model will find the
best-fitting line that represents the relationship between temperature and
electricity usage.
Once the linear regression model is trained and validated, you can use it to
make predictions. For example, if you have the forecasted temperature for the
next day, you can input that temperature into the model and obtain an estimate
of the expected electricity consumption for that day. This prediction can help
you plan your energy usage, anticipate costs, or optimize energy efficiency.
Linear regression can be beneficial in various other daily life scenarios, such as:
βnxn))
where:
- p is the predicted probability of belonging to the positive class,
- β0, β1, β2, ..., βn are the coefficients or weights associated
with each input feature (x1, x2, ..., xn).
During the training process, the logistic regression model learns the
optimal values of the coefficients that maximize the likelihood of the
observed data. This optimization is typically achieved using
algorithms like gradient descent or maximum likelihood estimation.
Logistic regression has several advantages:
1. Data Preparation: We have a labeled dataset with the exam scores and
admission decisions. We split the data into two parts: the input features (Exam
1 and Exam 2) and the target variable (admission decision).
4. Model Deployment: Once the model has been trained and evaluated,
it can be used to make predictions on new, unseen instances. For example,
given the exam scores of a new student, the logistic regression model can
estimate the probability of admission.
The logistic regression model uses a logistic function (sigmoid function) to
convert the linear combination of the input features and coefficients into a
probability score between 0 and 1. This probability score represents the likelihood
of the student being admitted. By setting a threshold (e.g., 0.5), we can classify the
instances into the two classes (admitted or not admitted) based on the predicted
probabilities.
In the context of this example, logistic regression would estimate the coefficients
for Exam 1 and Exam 2, along with the intercept. The model would learn the
relationship between the exam scores and the probability of admission, allowing
us to predict whether a student will be admitted based on their exam
performance.
Decision Tree Classification Algorithm in Machine Learning
Examples :
Let's say you work for a bank, and your task is to determine whether
a loan applicant is likely to default or not based on certain attributes.
You have a dataset that includes information about previous loan
applicants, such as their age, income, credit score, and employment
status, along with the information on whether they defaulted on their
loan or not.
You can use the Decision Tree Classification algorithm to build a model
that predicts the likelihood of loan default based on these attributes.
Here's how the process would look like:
The decision tree model will create a tree-like structure where each
internal node represents a decision based on an attribute, and each leaf
node represents a prediction (loan default or not). By traversing the
tree based on the applicant's attribute values, you can determine the
final prediction.
Random Forests are widely used across various domains and applications,
including finance, healthcare, marketing, and image recognition. The
algorithm's flexibility, robustness, and interpretability make it a popular choice
for both regression and classification tasks, especially when dealing with
complex and noisy data.
Example :
You can use the Random Forest algorithm to build a predictive model
that determines the likelihood of a customer making a purchase. Here's
how the process would look like:
In the e-commerce example, the Random Forest model can help identify
potential customers who are likely to make a purchase. This information
can be used for targeted marketing campaigns, personalized
recommendations, or improving the overall customer experience.
It's important to note that Random Forests have hyperparameters that can
be tuned to optimize the model's performance, such as the number of trees
in the forest, the maximum depth of each tree, or the number of features
considered for each split. Hyperparameter tuning is crucial to avoid
overfitting or underfitting the data and to achieve the best possible
performance.
Support Vector Machine Algorithm
The Support Vector Machine (SVM) algorithm is a powerful and widely used
supervised machine learning algorithm for classification and regression tasks.
SVMs are particularly effective in scenarios where the data has clear class
separations or when dealing with high-dimensional data.
2. Model Training: Apply the SVM algorithm to the labeled training data.
The algorithm finds the optimal hyperplane by solving an optimization
problem. It aims to maximize the margin while minimizing classification
errors or regression residuals.
4. Model Deployment: Once the model has been trained and evaluated, it
can be used to make predictions on new, unseen data. The input features of
these instances are mapped into the feature space, and based on their position
relative to the hyperplane, they are classified into different classes or their
values are predicted for regression.
SVMs offer several advantages:
- Robust to outliers: SVMs are less sensitive to outliers in the data due to
the margin maximization objective.
SVMs are widely used in various domains such as image classification, text
analysis, bioinformatics, and finance. Choosing the appropriate kernel
function and tuning the hyperparameters, such as the regularization
parameter (C) and kernel-specific parameters (e.g., gamma for RBF), is
crucial for optimizing SVM performance for a given problem.
Example :
Here's an example of how the Support Vector Machine (SVM) algorithm can
be applied in a classification problem:
Let's say you are working on a project to classify emails as either spam or not
spam. You have a labeled dataset that includes various features extracted from
the emails, such as the frequency of certain words, presence of specific
patterns, or email metadata.
You can use the SVM algorithm to build a model that can classify incoming
emails as spam or not spam. Here's how the process would look like:
2. Model Training: Apply the SVM algorithm to the labeled training data.
The algorithm will find the optimal hyperplane that separates the two classes by
maximizing the margin between them. The SVM algorithm will determine the
support vectors, which are the data points that lie closest to the decision
boundary.
4. Model Deployment: Once the model has been trained and evaluated,
it can be deployed to classify new, unseen emails. The input features of
these emails are fed into the model, and it predicts whether the email is
likely to be spam or not spam based on its position relative to the decision
boundary.
The SVM algorithm works by transforming the input features into a higher-
dimensional space using a kernel function, such as the radial basis function
(RBF) kernel. This allows the algorithm to handle nonlinear decision
boundaries and capture complex relationships between the features and the
target variable.
In the case of email classification, the SVM model will learn to differentiate
spam emails from non-spam emails based on the patterns and frequencies of
words or other features in the email content. By using support vectors, the
SVM model focuses on the most informative data points to make accurate
predictions.
SVMs are known for their ability to handle high-dimensional data and their
robustness to outliers. They have been successfully applied in various domains,
including text classification, image recognition, bioinformatics, and more.
It's important to note that SVMs have hyperparameters that need to be tuned,
such as the regularization parameter (C) and the kernel parameters (e.g.,
gamma for RBF kernel), to achieve the best performance for a specific
problem. Cross-validation or grid search can be used to find the optimal
hyperparameters for the SVM model.
Neural Network Algorithms
Here are some common neural network algorithms used in machine learning:
The neural network algorithm, in this case, MLP, learns to recognize the
distinctive patterns and features of different digits through the training process.
By adjusting the weights and biases of the network, the model can make
accurate predictions on unseen digit images.
It's important to note that the performance of the neural network can be
influenced by various factors such as the architecture of the network, the
choice of activation functions, the number of hidden layers and neurons, and
the optimization algorithm used for training.
Let's say you work for a retail company, and you have a large dataset
containing customer purchase history. You want to gain insights into
customer behavior and segment your customers into distinct groups for
targeted marketing strategies.
You can use unsupervised learning techniques to achieve this. Here's how
the process might look:
• Dimension Reduction
• Cluster Analysis
1. Clustering:
Clustering algorithms analyze the input data and assign data points to
different clusters based on similarity measures. The similarity between data
points is determined by various distance metrics, such as Euclidean
distance or cosine similarity. The objective is to minimize the intra-cluster
distance (distance between data points within the same cluster) and
maximize the inter-cluster distance (distance between data points in
different clusters).
It's important to note that the choice of clustering algorithm and parameters
depends on the nature of the data and the desired clustering objectives.
Additionally, evaluating the quality of clusters can be subjective and may
require domain expertise and validation measures such as silhouette score
or cohesion and separation metrics.
Example:
It's important to note that the choice of clustering algorithm and the
number of clusters can significantly impact the results. Additionally, the
interpretation of the clusters and their characteristics requires domain
expertise and further analysis.
The most well-known algorithm for association rule learning is the Apriori
algorithm. Here's a general overview of how association rule learning
works:
Association rule learning is not limited to market basket analysis and can
be applied to other domains as well, such as web usage mining, healthcare,
and customer behavior analysis. It allows for the discovery of interesting
and actionable patterns in large datasets, providing valuable insights for
decision-making and improving business strategies.
Example:
Imagine you work for an online retailer, and you want to analyze the
purchasing patterns of your customers to identify associations or
relationships between the products they buy. This information can help
you understand customer preferences, optimize product placement, and
make personalized recommendations.
It's important to note that association rule mining may generate a large
number of rules, and some rules may be trivial or irrelevant. Careful
selection of support and confidence thresholds, as well as careful
evaluation and interpretation of the rules, is crucial for obtaining
meaningful and actionable insights.
Dimensionality Reduction
1. K-means clustering
2. KNN (K-Nearest Neighbours)
3. Hierarchal clustering
4. Anomaly clustering
5. Neural Networking
6. Principal Component Analysis
7. Independent Component Analysis
8. Apriori Algorithm
9. Singular values Decomposition
Principle Component Analysis
3. Noise Reduction: PCA can help remove noise or redundant features from
the dataset, as the lower- dimensional representation focuses on the most
informative features.
4. Reward: The feedback or evaluation signal that the agent receives from the
environment after taking an action. The reward indicates the desirability or
quality of the agent's action. The agent's objective is to maximize the
cumulative reward over time.
Reinforcement learning algorithms use a trial-and-error approach to learn the optimal
policy. The agent explores different actions in the environment, receives rewards,
and updates its policy based on the received feedback. Common algorithms in
reinforcement learning include Q-learning, Deep Q-Networks (DQN), and policy
gradient methods.
3.Write your code: In the notebook, you'll see a code cell with a prompt
(`In [ ]:`) where you can write your code. Colab supports multiple
programming languages, including Python, so you can write code in the
selected language.
4.Run code cells: To run a code cell, click on the cell and either press the
play button on the left side of the cell or use the keyboard shortcut
Shift+Enter. Colab will execute the code in the cell and display the output
below it.
5.Add more code cells: To add more code cells, click on the "+" button
on the toolbar or use the keyboard shortcut Ctrl+M B (for adding a cell
below) or Ctrl+M A (for adding a cell above). You can then write code in
the new cells and execute them as mentioned in step 4.
6.Install libraries: If your code requires additional libraries or packages that
are not already installed in the Colab environment, you can install them
using the `!pip install` command in a code cell. For example, `!pip install
pandas` will install the Pandas library.
8.GPU and TPU usage: Google Colab provides free access to GPUs and
TPUs for running code that requires more computational power. You can
enable GPU or TPU acceleration by clicking on "Runtime" in the menu,
selecting "Change runtime type," and choosing the desired accelerator
under "Hardware Accelerator."
With these steps, you can open, write code, execute, and manage
notebooks in Google Colab. It provides a convenient web-based
environment for running programs, experimenting with machine learning
models, and collaborating with others.
Basic Datatypes of Python :
Python has several built-in data types that are commonly used for
representing different kinds of data. The basic data types in Python
include:
6. Tuples: Tuples are similar to lists but are immutable, meaning they
cannot be modified after creation. They are enclosed within parentheses
( ). For example: `(1, 2, 3)`, `('red', 'green', 'blue')`.
7. Dictionaries: Dictionaries are unordered collections of key-value
pairs enclosed within curly braces { }. Each key is unique and
associated with a value. For example: `{'name': 'John', 'age': 25, 'city':
'New York'}`.
These basic data types can be combined and manipulated to represent complex
data structures and solve a wide range of programming problems in Python.
Additionally, Python also provides various built-in functions and methods
to work with these data types efficiently.
Type conversion is the process of converting a data type into another data type.
• "Assign the integer value 10 to the variable x, then convert x to a
float and assign the result to the variable y. Finally, print the value
of y, which is 10.0.“
• "Assign the floating-point value 10.2 to the variable x, then
convert x to an integer and assign the result to the variable y.
Finally, print the value of y, which is 10."
Boolean DataType
The Python Boolean type is one of Python's built-in data types. It's used
to represent the truth value of an expression.
"Assign the Boolean value True to the variable a, then print the
value of a, which is True. Finally, determine the data type of a,
which is a Boolean."
Input Function
In Python, the input() function is used to prompt the user for input
from the keyboard. It allows you to interact with the user and obtain
values that can be stored in variables or used in your program's
logic.
Types of objects in python
• Immutable:-
int, float, string, boolean, tuple
• Mutable:-
2. Ordered: The elements in a list are ordered and maintain their position.
The order in which elements are added to the list is preserved.
4. Indexing and Slicing: Elements in a list are accessed using their index,
which starts from 0 for the first element. Negative indexing is also
supported to access elements from the end of the list. Slicing allows you to
retrieve a sublist by specifying a range of indices.
7. Iterable: Lists are iterable, meaning you can loop over the elements of
a list using loops like `for` or `while`.
Lists are widely used in Python for various purposes, such as storing
collections of data, implementing stacks and queues, representing
sequences, and working with data that needs to be modified or accessed in
a specific order.
• "Create a list containing the values 1, 2.5, 'venkat', and True, and
assign it to the variable a. Then, print the contents of a, which is [1,
2.5, 'venkat', True]. Finally, determine the data type of a, which
is a list."
Set :
1. Unique Elements: Sets only contain unique elements. If you try to add a
duplicate element to a set, it will be ignored.
2. Unordered: The elements in a set are unordered, which means they are not
stored in any particular order. Therefore, you cannot access elements in a set by
their index.
3. Mutable: Sets are mutable, so you can add or remove elements from them.
5. Set Membership Test: You can quickly check whether an element is present
in a set using the `in` operator.
Sets are commonly used in scenarios where you want to store a collection of
unique elements and perform operations like finding common elements,
removing duplicates, or testing membership efficiently.
To create a set, you can use either curly braces { } or the `set()` constructor.
- Adding elements: Use the `add()` method to add a single element to a set, or
use the `update()` method to add multiple elements.
- Removing elements: Use the `remove()` or `discard()` method to remove a
specific element from a set. The `discard()` method does not raise an error if the
element is not found, while the `remove()` method does.
- Length of a set: Use the `len()` function to get the number of elements in a set.
Sets provide a powerful and efficient way to work with unique elements and
perform set operations in Python.
Dictionary :
- Keys: Keys in a dictionary are unique and immutable, meaning they cannot be
changed once created. Common key types include strings, numbers, or tuples.
- Values: Values in a dictionary can be of any data type, such as numbers,
strings, lists, or even other dictionaries.
- Creation: Dictionaries can be created using curly braces { } and key-value
pairs, or by using the built-in `dict()` constructor function.
- Accessing values: Values in a dictionary can be accessed by providing the
corresponding key in square brackets [ ].
- Modifying values: You can modify the value associated with a key by
assigning a new value to that key.
- Adding new key-value pairs: You can add new key-value pairs to a dictionary
by assigning a value to a new key.
- Removing key-value pairs: Key-value pairs can be removed from a dictionary
using the `del` statement or the `pop()` method.
- Checking for key existence: You can check if a key exists in a dictionary
using the `in` keyword.
- Length of a dictionary: The number of key-value pairs in a dictionary can be
obtained using the `len()` function.
- Iterating over a dictionary: You can iterate over the keys or values of a
dictionary using loops or the `keys()`, `values()`, or `items()` methods.
Dictionaries are useful when you need to store and retrieve data based on
meaningful keys rather than numerical indices. They provide a flexible and
efficient way to organize and manipulate data in Python programs.
Tuple :
2. Ordered: Like lists, tuples maintain the order of their elements. The position of
an element in a tuple can be determined using indexing.
Tuples are commonly used in situations where you want to ensure that the data
remains unchanged. For example, you might use a tuple to represent coordinates
(x, y) in a 2D plane, or to store the RGB values of a color (red, green, blue).
Tuples can be used in various ways, such as returning multiple values from a
function, as keys in dictionaries, or as elements in sets. They can also be
unpacked, allowing you to assign the elements of a tuple to separate variables.
While tuples are immutable, you can perform operations that don't modify the
tuple itself, such as indexing, slicing, or combining tuples using the `+` operator.
However, any operation that attempts to modify a tuple will result in a TypeError.
Overall, tuples offer a lightweight and efficient way to store and access data when
immutability is desired.
Conversion of List to Tuple :
2. Iterate over the list: Loop through each element of the list.
3. Add elements to the tuple: For each element in the list, append it to the
tuple using the `+=` operator or the `tuple()` function.
4. Complete the conversion: Once all the elements from the list have been
added to the tuple, the conversion is complete.
The resulting tuple will contain the same elements as the original list but in
an immutable format.
Note that when converting a list to a tuple, the elements themselves are not
modified or cloned. Instead, a new tuple object is created with the same
elements as the list, b ut in an immutable format.
Operators in Python
Arithmetic Operators :
These arithmetic operators can be used with numeric data types such as
integers and floats. They allow you to perform mathematical calculations and
manipulate numerical values in Python programs.
Assignment Operators :
1. = (Equal): The equal sign assigns the value on the right to the
variable onthe left. For example, `x = 5` assigns the value 5 to the
variable `x`.
2. += (Add and assign): The plus-equal operator adds the value on the
right tothe current value of the variable and assigns the result back to
the variable. For example, `x += 2` is equivalent to `x = x + 2`.
10. >>= (Right shift and assign): The right shift-equal operator performs
a bitwise right shift operation on the current value of the variable,
assigning theresult back to the variable. For example, `x >>= 2` is
equivalent to `x = x >> 2`.
11. &= (Bitwise AND and assign): The bitwise AND-equal operator
performs a bitwise AND operation between the current value of the
variable and the value on the right, assigning the result back to the
variable. For example, `x &= 3` is equivalent to `x = x & 3`.
Comparison operators in Python are used to compare the relationship between two
values or expressions. They return a boolean value (`True` or `False`) based on
the comparison result. Here are the comparison operators in Python:
2. Not equal to (`!=`): Checks if the values of two operands are not equal.
3. Greater than (`>`): Checks if the left operand is greater than the right operand.
4. Less than (`<`): Checks if the left operand is less than the right operand.
5. Greater than or equal to (`>=`): Checks if the left operand is greater than or
equal to the right operand.
6. Less than or equal to (`<=`): Checks if the left operand is less than or equal to
the right operand.
These comparison operators are commonly used in conditions and control flow
statements to make decisions based on the comparison results. For example, you
can use comparison operators to determine if a number is greater than another, if
two strings are equal, or if a condition is true or false.
It's important to note that comparison operators can be used with various data
types, including numbers (integers, floats), strings, and other objects that support
comparison operations. The result of a comparison operation is always a boolean
value, either `True` or `False`, indicating the result of the comparison.
Logical Operators :
2. `or`: The `or` operator returns True if at least one of the operands or
expressions on either side of it evaluates to True. If both operands evaluate to
False, it returns False. It performs a logical OR operation.
3. `not`: The `not` operator is a unary operator that returns the opposite boolean
value of the operand. If the operand is True, `not` returns False. If the operand
is False, `not` returns True.
In Python, identity operators are used to compare the identity or memory location
of two objects. These operators determine if two objects are the same object or if
they refer to different objects in memory. The identity operators in Python are:
1. `is` operator: The `is` operator checks if two objects refer to the same memory
location. It evaluates to `True` if the objects are the same, and `False` otherwise.
2. `is not` operator: The `is not` operator checks if two objects refer to different
memory locations. It evaluates to `True` if the objects are different, and `False` if
they refer to the same memory location.
Identity operators are useful when you want to compare whether two variables or
objects refer to the same underlying memory location, rather than comparing their
values or content. These operators are often used with mutable objects like lists or
dictionaries to determine if they have been modified or updated.
It's important to note that the `is` and `is not` operators check for identity, not
equality. Even if two objects have the same value or content, they might not refer
to the same memory location and will be considered different by the identity
operators.
Identity operators are particularly useful in cases where you want to explicitly
check if two variables or objects are the same instance, rather than relying on their
values. However, in most cases, when comparing values or content, it is more
appropriate to use equality operators (`==` and `!=`) instead of identity operators.
Membership Operators :
2. `not in` operator: The `not in` operator checks if a value does not exist in
a sequence or container and returns `True` if the value is not found, and
`False` if it is found.
Membership operators are commonly used with data structures like strings,
lists, tuples, sets, and dictionaries to check for the presence or absence of
specific elements.
To use NumPy in your Python code, you can start by installing it using a
package manager like pip:
```
pip install numpy
```
After installing NumPy, you can import it into your Python script or
interactive session using the following import statement:
```python
import numpy as np
```
3. Fixed Size: NumPy arrays have a fixed size upon creation. Once created,
the size (shape) of the array cannot be changed. However, you can create new
arrays with different shapes or reshape the existing array to change its
dimensions.
```python
import numpy as np
Once created, you can access elements of a NumPy array using indexing,
similar to regular Python lists. You can also perform various operations on the
arrays, such as element-wise arithmetic, array slicing, reshaping, and applying
mathematical functions.
NumPy arrays are widely used in scientific computing, data analysis, machine
learning, and other fields due to their efficiency and versatility in handling
large numerical datasets. They form the foundation for many other libraries
and tools in the Python scientific computing ecosystem.
Complete Pandas Tutorial for ML
4. Data Input and Output: Pandas supports reading and writing data from
various file formats, including CSV, Excel, SQL databases, JSON, and
more. It simplifies the process of importing data from external sources and
exporting data for further analysis or sharing.
5. Time Series Analysis: Pandas includes functionality for working with
time series data, which is essential in finance, economics, and other
domains. It offers tools to handle time series indexing, resampling,
frequency conversion, time shifting, and rolling window calculations.
```
pip install pandas
```
```python
import pandas as pd
```
With Pandas imported, you can create, manipulate, and analyze data using the
rich set of functions and methods provided by the library.
Matplotlib
Matplotlib is a popular Python library used for creating visualizations and plots. It
provides a wide range of functions and tools for generating high-quality graphs,
charts, and figures from data.
4. Subplots and Multiple Axes: Matplotlib enables the creation of multiple plots
within a single figure using subplots. It allows you to arrange multiple plots in a
grid or in any custom layout. Additionally, you can add multiple axes within a
single plot, enabling the visualization of different datasets or additional
information in the same figure.
6. Interactive Plots: Matplotlib can be used with interactive backends, such as the
Jupyter Notebook, IPython, or GUI toolkits like Qt or Tkinter. These backends
enable you to interact with plots, zoom in/out, pan, and dynamically update the
displayed data.
```
pip install matplotlib
```
After installing, you can import Matplotlib in your Python script or interactive
session using:
```python
import matplotlib.pyplot as plt
```
With Matplotlib imported, you can start creating plots and visualizations using the
various functions and methods provided by the library.
Importing data through Kaggle
```
pip install kaggle
```
4. Store Kaggle API Token: Save the downloaded JSON file (typically
named "kaggle.json") in a secure location on your local machine. This file
contains your Kaggle username and API key, which grants access to
Kaggle's API.
```
set KAGGLE_USERNAME=your_username
set KAGGLE_KEY=your_api_key
```
```
export KAGGLE_USERNAME=your_username
export KAGGLE_KEY=your_api_key
```
6. Download Kaggle Dataset: Once you have the Kaggle API set up, you
can download datasets directly from Kaggle using the `kaggle datasets
download` command. Specify the dataset you want to download using the
Kaggle dataset URL or the dataset slug. For example:
```
kaggle datasets download username/dataset-name
```
1. Access Dataset: Once the dataset is extracted, you can access the data
files in your code using standard file I/O operations or suitable libraries
for data manipulation, such as Pandas for structured data or NumPy for
numerical arrays.
By following these steps, you can import datasets from Kaggle directly into
your local environment and start working with the data for analysis,
machine learning, or any other tasks. Remember to comply with the terms
and conditions and any licensing requirements associated with the dataset
you're using.
```
pip install kaggle
```
3. Kaggle API Token: To authenticate and access Kaggle datasets, you'll need
an API token. Go to your Kaggle account settings and navigate to the "API"
section. Click on the "Create New API Token" button. This will download a
JSON file containing your API credentials.
4. Store Kaggle API Token: Save the downloaded JSON file (typically named
"kaggle.json") in a secure location on your local machine. This file contains
your Kaggle username and API key, which grants access to Kaggle's API.
5. Set Kaggle API Environment Variables: To securely store your Kaggle API
credentials, you can set them as environment variables on your local machine.
Open your command-line interface and set the following environment
variables:
```
set KAGGLE_USERNAME=your_username
set KAGGLE_KEY=your_api_key
```
```
export KAGGLE_USERNAME=your_username
export KAGGLE_KEY=your_api_key
```
6. Download Kaggle Dataset: Once you have the Kaggle API set up, you can
download datasets directly from Kaggle using the `kaggle datasets download`
command. Specify the dataset you want to download using the Kaggle dataset
URL or the dataset slug. For example:
```
kaggle datasets download username/dataset-name
```
This command will download the dataset in a compressed format (e.g., ZIP)
to your local machine.
8. Access Dataset: Once the dataset is extracted, you can access the data files
in your code using standard file I/O operations or suitable libraries for data
manipulation, such as Pandas for structured data or NumPy for numerical
arrays.
By following these steps, you can import datasets from Kaggle directly into
your local environment and start working with the data for analysis, machine
learning, or any other tasks. Remember to comply with the terms and
conditions and any licensing requirements associated with the dataset you're
using.
Sample Project :
Diabetes Prediction
6. Model Evaluation: Evaluate the trained model using the testing dataset to
assess its performance. Common evaluation metrics for binary
classification include accuracy, precision, recall, F1 score, and area under
the ROC curve (AUC-ROC). These metrics provide insights into the
model's ability to correctly predict diabetes cases.
7. Hyperparameter Tuning: Fine-tune the model's hyperparameters to
optimize its performance. Hyperparameters are configuration settings that
affect the learning process, such as learning rate, regularization strength, or
maximum tree depth. This step involves techniques like grid search or
random search to find the best combination of hyperparameters.
Note that the above steps provide a general framework for diabetes
prediction using machine learning. The specific details may vary depending
on the chosen algorithm, dataset characteristics, and desired performance
metrics. It's important to consider ethical and privacy considerations when
handling sensitive health-related data and ensure compliance with
regulations and guidelines.
** Project **
Credit Card Fraud
Detection
Introduction :
Credit card fraud occurs when unauthorized transactions are made using
stolen credit card information. Detecting fraudulent transactions is
challenging because fraudsters continuously adapt their techniques to avoid
detection. Machine learning offers an effective approach to tackle this
problem by analyzing historical data and identifying patterns indicative of
fraudulent behavior.
The first step in credit card fraud detection is gathering relevant data. This
typically includes credit card transaction data, including transaction
amounts, timestamps, location, merchant information, and cardholder details.
The data may also contain labels indicating whether each transaction is
normal or fraudulent.
Data preprocessing is crucial to ensure the quality and suitability of the data
for machine learning algorithms. Steps involved in data preprocessing include:
The selected model is trained using the preprocessed data. The dataset is
typically divided into training and testing sets, with a portion of the data
reserved for evaluation purposes. During training, the model learns from the
labeled data and adjusts its parameters to minimize prediction errors.
[ ]: import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
[ ]: data = pd.read_csv('/content/credit_data.csv')
[ ]: Time V1 V2 V3 V4 V5 V6 V7 \
0 0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599
1 0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803
2 1 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461
3 1 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609
4 2 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941
1
2 -0.139097 -0.055353 -0.059752 378.66 0.0
3 -0.221929 0.062723 0.061458 123.50 0.0
4 0.502292 0.219422 0.215153 69.99 0.0
[5 rows x 31 columns]
[ ]: Time V1 V2 V3 V4 V5 V6 \
3968 3617 1.134592 0.252051 0.488592 0.799826 -0.264819 -0.369918
3969 3621 -1.338671 1.080974 1.291196 0.719258 0.101320 0.053896
3970 3622 -0.339728 -2.417449 0.975517 2.537995 -1.720361 0.863005
3971 3623 -0.368639 0.947432 1.707755 0.932092 0.292956 0.189100
3972 3624 -0.663445 1.162921 1.508050 0.549405 0.231377 -0.106041
[5 rows x 31 columns]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3973 entries, 0 to 3972
Data columns (total 31 columns):
# Column Non-Null Count Dtype
2
7 V7 3973 non-null float64
8 V8 3973 non-null float64
9 V9 3973 non-null float64
10 V10 3973 non-null float64
11 V11 3973 non-null float64
12 V12 3973 non-null float64
13 V13 3973 non-null float64
14 V14 3973 non-null float64
15 V15 3973 non-null float64
16 V16 3973 non-null float64
17 V17 3973 non-null float64
18 V18 3973 non-null float64
19 V19 3973 non-null float64
20 V20 3973 non-null float64
21 V21 3973 non-null float64
22 V22 3973 non-null float64
23 V23 3972 non-null float64
24 V24 3972 non-null float64
25 V25 3972 non-null float64
26 V26 3972 non-null float64
27 V27 3972 non-null float64
28 V28 3972 non-null float64
29 Amount 3972 non-null float64
30 Class 3972 non-null float64
dtypes: float64(30), int64(1)
memory usage: 962.3 KB
[ ]: 0.0 3970
1.0 2
Name: Class, dtype: int64
[ ]: print(valid.shape)
print(invalid.shape)
(3970, 31)
(2, 31)
3
[ ]: # statistical measures of the data
valid.Amount.describe()
[ ]: count 3970.000000
mean 64.899597
std 213.612570
min 0.000000
25% 2.270000
50% 12.990000
75% 54.990000
max 7712.430000
Name: Amount, dtype: float64
[ ]: invalid.Amount.describe()
[ ]: count 2.000000
mean 264.500000
std 374.059487
min 0.000000
25% 132.250000
50% 264.500000
75% 396.750000
max 529.000000
Name: Amount, dtype: float64
[ ]: valid_data = valid.sample(n=492)
[ ]: customer_data.head()
[ ]: Time V1 V2 V3 V4 V5 V6 \
2440 2007 -1.416097 -0.439945 2.951172 -1.043307 0.318020 -0.712666
280 199 1.187706 -0.141019 0.563459 0.766383 0.151071 1.596937
1955 1506 1.238569 0.207420 -0.022440 1.052196 0.341801 0.351660
1801 1402 1.167491 -0.014538 0.791353 0.393639 -0.400197 0.265558
3299 2846 1.233084 0.373278 0.445418 0.768739 -0.517563 -1.256164
4
V24 V25 V26 V27 V28 Amount Class
2440 0.442992 0.199112 0.810611 -0.587232 -0.743571 1.90 0.0
280 -1.686137 0.559743 -0.291580 0.086083 0.008422 9.99 0.0
1955 -0.811124 0.881769 -0.289689 0.021560 -0.000476 12.99 0.0
1801 -0.286483 0.113203 0.245470 0.008220 0.009749 3.23 0.0
3299 0.675560 0.177552 0.073693 -0.021715 0.034479 1.98 0.0
[5 rows x 31 columns]
[ ]: customer_data.tail()
[ ]: Time V1 V2 V3 V4 V5 V6 \
1930 1490 -0.798976 0.761254 -0.045669 -1.782954 2.405157 3.469894
2464 2029 -12.168192 -15.732974 -0.376474 3.792613 10.658654 -7.465603
1181 919 -0.222350 1.256333 0.972661 2.312995 0.756371 -0.782664
541 406 -2.312227 1.951992 -1.609851 3.997906 -0.522188 -1.426545
623 472 -3.043541 -3.157307 1.088463 2.288644 1.359805 -1.064823
[5 rows x 31 columns]
[ ]: customer_data['Class'].value_counts()
[ ]: 0.0 492
1.0 2
Name: Class, dtype: int64
[ ]: customer_data.groupby('Class').mean()
[ ]: Time V1 V2 V3 V4 V5 \
Class
0.0 1664.871951 -0.265443 0.190514 0.861039 0.049335 -0.085729
1.0 439.000000 -2.677884 -0.602658 -0.260694 3.143275 0.418809
5
V6 V7 V8 V9 … V20 V21 \
Class …
0.0 -0.031439 0.095845 -0.021388 -0.010232 … 0.084203 -0.008129
1.0 -1.245684 -1.105907 0.661932 -1.520521 … 1.114625 0.589464
Amount
Class
0.0 68.946951
1.0 264.500000
[2 rows x 30 columns]
#* Choosing the Model Splitting the data into Features & Targets
[ ]: X = customer_data.drop(columns='Class', axis=1)
Y = customer_data['Class']
[ ]: print(X)
Time V1 V2 V3 V4 V5 V6 \
2440 2007 -1.416097 -0.439945 2.951172 -1.043307 0.318020 -0.712666
280 199 1.187706 -0.141019 0.563459 0.766383 0.151071 1.596937
1955 1506 1.238569 0.207420 -0.022440 1.052196 0.341801 0.351660
1801 1402 1.167491 -0.014538 0.791353 0.393639 -0.400197 0.265558
3299 2846 1.233084 0.373278 0.445418 0.768739 -0.517563 -1.256164
… … … … … … … …
1930 1490 -0.798976 0.761254 -0.045669 -1.782954 2.405157 3.469894
2464 2029 -12.168192 -15.732974 -0.376474 3.792613 10.658654 -7.465603
1181 919 -0.222350 1.256333 0.972661 2.312995 0.756371 -0.782664
541 406 -2.312227 1.951992 -1.609851 3.997906 -0.522188 -1.426545
623 472 -3.043541 -3.157307 1.088463 2.288644 1.359805 -1.064823
6
623 0.325574 -0.067794 -0.270953 … 2.102339 0.661696 0.435477
[ ]: print(Y)
2440 0.0
280 0.0
1955 0.0
1801 0.0
3299 0.0
…
1930 0.0
2464 0.0
1181 0.0
541 1.0
623 1.0
Name: Class, Length: 494, dtype: float64
Data Standardisation
[ ]: StandardScaler()
[ ]: Standardized_data=scaler.transform(X)
print(Standardized_data)
7
…
[-7.26812294e-01 3.83046954e-02 7.68470309e-01 … 4.34742037e-01
7.55689096e-01 -2.91134332e-01]
[-1.23005190e+00 -1.47612023e+00 1.26854285e+00 … 7.87329728e-01
-5.69838928e-01 -3.13184294e-01]
[-1.16530762e+00 -2.00606563e+00 -2.40426443e+00 … -8.32813046e-01
1.47926263e-01 2.06246334e+00]]
[ ]: X=Standardized_data
Y = customer_data['Class']
[ ]: print(X)
print(Y)
8
#* Model Training
Logistic Regression
[ ]: model_classifier = LogisticRegression()
[ ]: LogisticRegression()
#* Model Evaluation
#Accuracy Score
• Test Data
• Train Data
[ ]: input_data=(166205.0,-1.359807134, -0.072781173,2.536346738,1.
↪378155224,-0.33832077,0.462387778,0.239598554,0.098697901,0.3637869,0.
↪090794172,-0.551599533,-0.617800856,-0.991389847,-0.311169354,1.468176972,-0.
↪470400525,0.207971242,0.02579058,0.40399296,0.251412098,-0.018306778,0.
↪277837576,-0.11047391,0.066928075,0.128539358,-0.189114844,0.133558377,-0.
↪021053053,149.62)
9
print(std_data)
prediction=model_classifier.predict(std_data)
print(prediction)
if (prediction[0] == 0):
print('The user is a Valid User')
else:
print('The user is a Invalid User')
[ ]: import pickle
[ ]: filename='trained_model.sav'
pickle.dump(model_classifier, open(filename,'wb'))
[ ]: input_data=(166205.0,-1.359807134, -0.072781173,2.536346738,1.
↪378155224,-0.33832077,0.462387778,0.239598554,0.098697901,0.3637869,0.
↪090794172,-0.551599533,-0.617800856,-0.991389847,-0.311169354,1.468176972,-0.
↪470400525,0.207971242,0.02579058,0.40399296,0.251412098,-0.018306778,0.
↪277837576,-0.11047391,0.066928075,0.128539358,-0.189114844,0.133558377,-0.
↪021053053,149.62)
prediction=loaded_model.predict(input_data_reshaped)
print(prediction)
if (prediction[0] == 0):
print('The user is a Valid User')
10
else:
print('The user is a Invalid User')
[0.]
The user is a Valid User
Files
creditcard.py :
import numpy as np
import pandas as pd
import pickle
import streamlit as st
# Creating prediction
def credit_card_prediction(input_data):
# Converting input data to numeric values
input_data = [float(x) for x in input_data]
prediction = loaded_model.predict(input_data_reshaped)
if prediction[0] == 0:
return 'The user is a Valid User'
else:
return 'The user is an Invalid User'
def main():
# Giving title
st.title('Valid Credit Card User Prediction')
# Getting input data from the user
Time = st.text_input('Time')
V1 = st.text_input('V1 Value')
V2 = st.text_input('V2 Value')
V3 = st.text_input('V3 Value')
V4 = st.text_input('V4 Value')
V5 = st.text_input('V5 Value')
V6 = st.text_input('V6 Value')
V7 = st.text_input('V7 Value')
V8 = st.text_input('V8 Value')
V9 = st.text_input('V9 Value')
V10 = st.text_input('V10 Value')
V11 = st.text_input('V11 Value')
V12 = st.text_input('V12 Value')
V13 = st.text_input('V13 Value')
V14 = st.text_input('V14 Value')
V15 = st.text_input('V15 Value')
V16 = st.text_input('V16 Value')
V17 = st.text_input('V17 Value')
V18 = st.text_input('V18 Value')
V19 = st.text_input('V19 Value')
V20 = st.text_input('V20 Value')
V21 = st.text_input('V21 Value')
V22 = st.text_input('V22 Value')
V23 = st.text_input('V23 Value')
V24 = st.text_input('V24 Value')
V25 = st.text_input('V25 Value')
V26 = st.text_input('V26 Value')
V27 = st.text_input('V27 Value')
V28 = st.text_input('V28 Value')
Amount = st.text_input('Amount')
if st.button("User Result"):
input_data = [Time, V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12, V13,
V14, V15, V16, V17, V18, V19, V20, V21,
V22, V23, V24, V25, V26, V27, V28, Amount]
diagnosis = credit_card_prediction(input_data)
st.success(diagnosis)
if __name__ == '__main__':
main()
requirements .txt:
numpy==1.24.3
pickle-mixin==1.0.2
streamlit==1.23.1
streamlit-option-menu==0.3.2
scikit-learn==1.2.2
Uploading Files to the Github
To upload files to GitHub website online in Chrome and make them live,
follow these steps:
1. Create a GitHub account: If you don't already have one, go to the GitHub
website (github.com) and sign up for an account.
2. Create a new repository: Once you're logged in, click on the "+" icon in
the top-right corner of the GitHub page and select "New repository."
Give your repository a name, choose whether it should be public or
private, and click on the "Create repository" button.
4. Set up Git: If you don't have Git installed on your computer, download
and install it from the official Git website (git-scm.com). Follow the
instructions for your operating system.
6. Navigate to the desired location: Use the `cd` command to navigate to the
directory where you want to clone the repository. For example, `cd
Documents` will take you to the "Documents" directory.
7. Clone the repository: In the terminal, use the following command to
clone the repository:
```
git clone <repository_URL>
```
8. Add files: Copy the files you want to upload to the cloned repository's
directory on your computer.
1. This command stages all the files in the current directory for commit.
If you only want to stage specific files, replace `.` with the file names
or paths.
2. Commit files: Use the following command to commit the staged files:
```
git commit -m "Your commit message"
```
That's it! You have successfully uploaded files to GitHub website online using
Chrome and made them live.
Making the App live in Streamlit
2. Create a new repository: Once you're logged in, click on the "+"
button in the top-right corner and select "New repository". Give your
repository a name and choose the desired settings (e.g.,
public/private).
4. Select the branch you want to deploy (e.g., `master`) and choose
the root directory.
https://gunakar-polaki-creditcard-user-creditcard-injx6o.streamlit.app/
Conclusion
4. Model Selection and Evaluation: During the internship, I had the chance
to explore a variety of machine learning models, including linear regression,
decision trees, random forests, and neural networks. Understanding the
strengths and weaknesses of each model helped me choose the most
appropriate algorithm for different problem scenarios. I also gained insights
into evaluation metrics such as accuracy, precision, recall, and F1-score, which
assisted in assessing model performance and making informed decisions.
5. Hyperparameter Tuning: Optimizing the hyperparameters of machine
learning models is crucial to improve their performance. Through my
internship, I learned how to use techniques like grid search and randomized
search to systematically explore the hyperparameter space and identify the
optimal configuration. This skill allowed me to fine-tune models and achieve
better results.