20131a0249 Ai-Ml
20131a0249 Ai-Ml
BY
1
GAYATRI VIDYA PARISHAD COLLEGE OF ENGINEERING
(AUTONOMOUS)
(Affiliated to JNTUK, Kakinada, A.P, Accredited by NBA & NAAC)
MADHURAWADA, VISAKHAPATNAM, A. P. – 530048
DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING
CERTIFICATE
This is to certify that the mini project (summer internship) report entitled “AWS AI-
ML VIRTUAL INTERNSHIP” submitted by, KADA JASHWANTH, 20131A0249, in
partial fulfilment of the requirements for the degree of Bachelor of technology in
Electrical and Electronics Engineering, GVP COLLEGE OF
ENGINEERING(Autonomous), Affiliated to Jawaharlal Nehru Technological
University (JNTUK), Kakinada has been carried under my supervision during 2022 –
2023.
3
4
ACKNOWLEDGEMENT
We would like to express our deep sense gratitude to our esteemed institute “Gayatri Vidya
Parishad College of Engineering (Autonomous)”, which has provided us an opportunity to
fulfil our cherished desire.
We express our sincere thanks to our Principal Dr. A. B. KOTESWARA RAO for his
encouragement to us during the course of this internship.
We express our heartfelt thanks and acknowledge our in - debtness to Dr. G V E Satish
Kumar, Head of the Department, Department of Electrical and Electronics
Engineering.
We express our profound gratitude and our deep indebtedness to our guide Mr. P. Pawan
Puthra, whose valuable suggestions, guidance and comprehensive assistance helped us a
lot in realizing my internship and lead us incompleting my internship efficiently.
We would also like to thank all the members of teaching and non-teaching staff of The
Electrical and Electronics Engineering Department for all their support in completion of my
internship
8
DECLARATION
9
ABSTRACT
The AI-ML virtual internship is about learning the latest technologies involving Artificial
Intelligence along with manipulation of data with the help of Machine Learning.
There are a wide range of courses or topics included in the course of Artificial Intelligence
and machine learning which gives us a brief information about the technologies
mentioned above.
In this internship before introducing to AI-ML the basics of the cloud computing
technologies which are termed as cloud foundation were intended in the course.The cloud
foundations(course-1) gives us the brief idea of using the AWS CLOUD and its
functionalities like hosting webpages or static websites along with data storage and
manipulation.
The AI-ML (course-2) gives us the basic idea of implementation of Machine Learning as
well and using Machine Learning as an intermediate to manipulate data. Both the courses
consisted of various modules which consisted of various pre-recorded videos along with
labs for the learner to practice and perform tasks.
At the end of this internship, I understood the complete concepts of the cloud Foundation
with the basic topics of artificial intelligence which also included some topics indulged in
it such as Machine Learning, NLP, deep learning.
10
INDEX
SI. NO TOPIC NAME PAGE NUMBER
Course – 1 Cloud Foundations
Module 1 Cloud Concepts Overview 12-13
Module 2 Cloud Economics and Billing 14-15
Module 3 AWS Global Infrastructure Overview 16-17
AWS Cloud Security
Module 4 Lab 1 - Introduction to AWS IAM 18-19
Networking and Content Delivery
Module 5 Lab 2 - Build your VPC and Launch a Web Server 20-22
Compute
Module 6 Lab 3 - Introduction to Amazon EC2 23-24
Storage
Module 7 Lab 4 - Working with EBS 25-26
Databases
Module 8 Lab 5 - Build a Database Server 27-28
Module 9 Cloud Architecture 29-31
Automatic Scaling and Monitoring
Module 10 Lab 6 - Scale & Load Balance your Architecture 32-33
Course – 2 Machine Learning Foundations
Welcome to AWS Academy Machine
Module 1 34
Learning Foundations
Module 2 Introducing Machine Learning 35-37
Implementing a Machine Learning pipeline with
Amazon SageMaker
Module 3 Lab 3.1 - Amazon SageMaker - Creating and importing data 38-40
Lab 3.2 - Amazon SageMaker - Exploring Data
Lab 3.3 - Amazon SageMaker - Generating model performance
Introducing Forecasting
Module 4 Lab 4 - Creating a forecast with Amazon Forecast 41-42
Introducing Computer Vision (CV)
Module 5 Lab 5 - Guided Lab: Facial Recognition 43-44
Introducing Natural Language Processing
Module 6 Lab 6 - Amazon Lex - Create a chatbot 45
Module 7 Course Wrap-Up -
Case Study Problem and Solution 46
Conclusion 47
References 48
11
AWS Academy Cloud Foundations
Infrastructure as software
Cloud computing enables you to stop thinking of your infrastructure as hardware, and
instead think of (and use) it as software.
12
Cloud service models
● Infrastructure as a service (IaaS): Services in this category are the basic building blocks
for cloud IT and typically provide you with access to networking features, computers (virtual
or on dedicated hardware), and data storage space. IaaS provides you with the highest level of
flexibility and management control over your IT resources. It is the most similar to existing
IT resources that many IT departments and developers are familiar with today.
● Platform as a service (PaaS): Services in this category reduce the need for you to manage
the underlying infrastructure (usually hardware and operating systems) and enable you to
focus on the deployment and management of your applications.
● Software as a service (SaaS): Services in this category provide you with a completed product
that the service provider runs and manages. In most cases, software as a service refers to end-
user applications. With a SaaS offering, you do not have to think about how the service is
maintained or how the underlying infrastructure is managed. You need to think only about
how you plan to use that particular piece of software. A common example of a SaaS
application is web-based email, where you can send and receive email without
managingfeature additions to the email product or maintaining the servers and operating
systems that the email program runs on.
13
Module 2: Cloud Economics and Billing
There are three fundamental drivers of cost with AWS: compute, storage, and outbound data
transfer. These characteristics vary somewhat, depending on the AWS product and pricing
model you choose.In most cases,there is no charge for inbound data transfer or for data transfer
between other AWS services within the same AWS Region. There are some exceptions,so be sure
to verify data transfer rates before you begin to use the AWS service.Outbound data transfer is
aggregated across services and then charged at the outbound data transfer rate. This charge
appears on the monthly statement as AWS Data Transfer Out.
14
AWS Pricing Calculator
AWS offers the AWS Pricing Calculator to help you estimate a monthly AWS bill. You can use
this tool to explore AWS services and create an estimate for the cost of your use cases on AWS.
You can model your solutions before building them, explore the price points and calculations
behind your estimate, and find the available instance types and contract terms that meet your
needs. This enables you to make informed decisions about using AWS. You can plan your AWS
costs and usage or price out setting up a new set of instances and services.
Soft Benefits include Reuse of service and applications that enable you to define (and redefine
solutions) by using the same cloud service, Increased developer productivity, Improved
customer satisfaction, Agile business processes that can quickly respond to new and emerging
opportunities, Increase in global reach.
Cloud Total Cost of Ownership defines what will be spent on the technology after adoption—or
what it costs to run the solution. Typically, a TCO analysis looks at the as-is on-premises
infrastructure and compares it with the cost of the to-be infrastructure state in the cloud. While
this difference might be easy to calculate, it might only provide a narrow view of the total
financial impact of moving to the cloud.
A return on investment (ROI) analysis can be used to determine the value that is generated while
considering spending and saving. This analysis starts by identifying the hard benefits in terms of
direct and visible cost reductions and efficiency improvements.
15
Module 3: AWS Global Infrastructure Overview
AWS Regions
The AWS Cloud infrastructure is built around Regions. AWS has 22 Regions worldwide. An AWS
Region Is a physical geographical location with one or more Availability Zones. Availability
Zones in turn consist of one or more data centers. To achieve fault tolerance and stability,
Regions are isolated from one another. Resources in one Region are not automatically replicated
to other Regions. Data replication across Regions is controlled by you. Communication between
Regions uses AWS backbone network infrastructure.
Selecting a Region
There are a few factors that you should consider when you select the optimal Region or Regions
where you store data and use AWS services. They are:
● Data governance, legal requirements
● Proximity to customers(latency)
● Services available within the Region
● Costs (vary by Region)
Availability Zones
Each AWS Region has multiple, isolated locations that are known as Availability Zones. Each
Availability Zone provides the ability to operate applications and databases that are more highly
available, fault-tolerant, and scalable than would be possible with a single data center. Each
Availability Zone can include multiple data centers (typically three). They are fully isolated
partitions of the AWS Global Infrastructure. All Availability Zones are interconnected with high-
bandwidth, low-latency networking that provides high-throughput between Availability Zones.
The network accomplishes synchronous replication between Availability Zones.
16
AWS data centers
AWS data centers are designed for security. Data centers are where the data resides and data
processing occurs. Each data center has redundant power, networking, and connectivity, and is
housed in a separate facility. A data center typically has 50,000 to 80,000 physical servers. Data
centers are securely designed with several factors.
17
Module 4 - AWS Cloud Security
18
IAM: Essential components
An IAM user is a person or application that is defined in an AWS account, and that must make
API calls to AWS products. An IAM group is a collection of IAM users. An IAM policy is a
document that defines permissions to determine what users can do in the AWS account. An IAM
role is a tool for granting temporary access to specific AWS resources in an AWS account.
By default, IAM users do not have permissions to access any resources or data in an AWS
account. Instead, you must explicitly grant permissions to a user, group, or role by creating a
policy,which is a document in JavaScript Object Notation (JSON) format.A policy lists
permissions that allow or deny access to resources in the AWS account.
IAM: Authorization
You Assign permissions by creating an IAM policy. The Permissions determine which resources
and operations are allowed. All permissions are implicitly denied by default. If something is
explicitly denied, it is never allowed. The principle of least privilege is an important concept in
computer security. It promotes that you grant only the minimal user privileges needed to the
user, based on the needs of your users. The scope of the IAM service configurations is global.
IAM policies
An IAM policy is a formal statement of permissions that will be granted to an entity. Policies can
be attached to any IAM entity. Policies specify what actions are allowed, which resources to
allow the actions on, and what the effect will be when the user requests access to the resources.
There are two types of IAM policies: Identity-based policies and Resource-based policies.
19
Module 5: Networking and Content Delivery
Networks
A computer network is two or more client machines that are connected together to share
resources. A network can be logically partitioned into subnets. Networking requires a
networking device to connect all the clients together and enable communication between them.
IP addresses
Each client machine in a network has a unique Internet Protocol (IP) address that identifies it.
An IP address is a numerical label in decimal format. Machines convert that decimal number toa
binary format.
20
Amazon VPC
Amazon Virtual Private Cloud (Amazon VPC) is a service that lets you provision a logically
isolated section of the AWS Cloud where you can launch your AWS resources. Amazon VPC
givesyou control over your virtual networking resources, including the selection of your own IP
address range, the creation of subnets, and the configuration of route tables and network
gateways. You can use both IPv4 and IPv6 in your VPC for secure access to resources and
applications.You can also customize the network configuration for your VPC.
Internet gateway
An internet gateway is a scalable, redundant, and highly available VPC component that allows
communication between instances in your VPC and the internet. An internet gateway serves two
purposes: to provide a target in your VPC route tables for internet-routable traffic, and to
perform network address translation for instances that were assigned public IPv4 addresses.
21
Security groups
A security group acts as a virtual firewall for your instance, and it controls inbound and
outbound traffic. Security groups act at the instance level, not the subnet level. Security groups
have rules that control inbound and outbound instance traffic. Default security groups deny all
inbound traffic and allow all outbound traffic. Security groups are stateful.
Amazon Route 53
Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web
service. It is designed to give developers and businesses a reliable and cost-effective way to route
users to internet applications by translating names into the numeric IP addresses.
Amazon CloudFront
Amazon CloudFront is a fast CDN service that securely delivers data, videos, applications, and
application programming interfaces (APIs) to customers globally with low latency and high
transfer speeds.It also provides a developer-friendly environment. Amazon CloudFront delivers
files to users over a global network of edge locations and Regional edge caches.
22
Module 6 - Compute
AWS Compute Services
Amazon EC2
The EC2 in Amazon EC2 stands for Elastic Compute Cloud:
• Elastic refers to the fact that you can easily increase or decrease the number of servers you
run to support an application automatically.
• Compute refers to reason why most users run servers in the first place, which is to host
running applications or process data—actions that require compute resources, including
processing power (CPU) and memory (RAM).
• Cloud refers to the fact that the EC2 instances that you run are hosted in the cloud.
23
Amazon EC2 Cost Optimization
Amazon offers different pricing models to choose from when you want to run EC2 instances.
● Per second billing is only available for On-Demand Instances, Reserved Instances, and
Spot Instances that run Amazon Linux or Ubuntu.
● On-Demand Instances are eligible for the AWS Free Tier. They have the lowest upfront cost and
the most flexibility.
● Reserved Instance enable you to reserve computing capacity for 1-year or 3-year term
with lower hourly running costs.
● Spot Instances enable you to bid on unused EC2 instances, which can lower your costs.
Container Services
Containers are a method of operating system virtualization that enables you to run an application
and its dependencies in resource-isolated processes. By using containers, you can easily package an
application's code, configurations, and dependencies into easy-to-use building blocks that deliver
environmental consistency, operational efficiency, developer productivity, and version control.
Aws Lambda
AWS Lambda is an event-driven, serverless compute service.Lambda enables you to run code
without provisioning or managing servers.You create a Lambda function, which is the AWS
resource that contains the code that you upload. You then set the Lambda function to be
triggered, either on a scheduled basis or in response to an event. Your code only runs when it is
triggered.
24
Module 7 - Storage
25
Amazon EFS Features
● File storage in the AWS Cloud
● Petabyte-scale, low-latency file system
● Shared storage
● Elastic capacity
● Supports Network File System (NFS) versions 4.0 and 4.1 (NFSv4)
● Compatible with all Linux-based AMIs for Amazon EC2
Amazon S3 Glacier
When you use Amazon S3 Glacier to archive data, you can store your data at an extremely low
cost, but you cannot retrieve your data immediately when you want it. Data that is stored in
Amazon S3 Glacier can take several hours to retrieve, which is why it works well for archiving.
There are three key Amazon S3 Glacier terms you should be familiar with:
● Archive–Any object (such as a photo, video, file, or document) that you store in Amazon
S3 Glacier. It is the base unit of storage in Amazon S3 Glacier.
● Vault–A container for storing archives. When you create a vault, you specify the vault
name and the Region where you want to locate the vault.
● Vault access policy–Determine who can and cannot access the data that is stored in the
vault, and what operations users can and cannot perform. One vault access permissions
policy can be created for each vault to manage access permissions for that vault.
26
Module 8- Databases
Features
● Offers asynchronous replication
● Can be promoted to primary if needed
Amazon DynamoDB
Here is a review of the differences between these two types of databases:
● A relational database (RDB) works with structured data that is organized by tables,
records, and columns. RDBs establish a well-defined relationship between database tables.
RDBs use structured query language, which is a standard user application that provides a
programming interface for database interaction.
● A non-relational database is any database that does not follow the relational model that is
provided by traditional RDBMS. Non-Relational databases scale out horizontally, and
they can work with unstructured and semistructured data.
Amazon DynamoDB Features
● NoSQL database tables
● Virtually unlimited storage
● Items can have differing attributes
● Low-latency queries
● Scalable read/write throughput
27
Amazon Redshifi
Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost effective
to analyze all your data by using standard SQL and your existing business intelligence (BI) tools.
Amazon Redshift use cases
● Enterprise data Warehouse (EDW)
● Respond faster to business needs
● Big data
● low price point for small customers
● Managed service for ease of deployment and maintenance
● Focus more on data and Lesson database management
● Software as a service (SaaS)
Amazon Aurora
Amazon Aurora is a MySQL and PostgreSQL compatible relational database that is built for the
cloud. It combines the performance and availability of high end commercial databases with the
simplicity and cost effectiveness of open source databases.
28
Module 9- Cloud Architecture
AWS Well-Architected Framework
• A guide for designing infrastructures that are:
Secure
High-performing
Resilient
Efficient
• A consistent approach to evaluating and implementing cloud architectures
• A way to provide best practices that were developed through lessons learned by reviewing
customer architectures.
29
• Perform operations as code–Define your entire workload as code and update it with code.
• Make frequent, small, reversible changes–Design workloads to enable components to be
updated regularly. Make changes in small increments that can be reversed if they fail (without
affecting customers when possible).
• Refine operations procedures frequently–Look for opportunities to improve operations
procedures.
• Anticipate failure–Identify potential sources of failure so that they can be removed or
mitigated. Test failure scenarios and validate your understanding of their impact.
• Learn from all operational failures–Drive improvement through lessons learned from all
operational events and failures. Share what is learned across teams and through the entire
organization.
Security Pillar
The Security pillar focuses on the ability to protect information, systems, and assets while
delivering business value through risk assessments and mitigation strategies.
Security design principles
There are seven design principles that can improve security:
• Implement a strong identity foundation
• Enable traceability
• Apply security at all layers
• Automate security best practices
• Protect data in transit and at rest
• Keep people away from data
• Prepare for security events
Reliability Pillar
The Reliability pillar focuses on ensuring a workload performs its intended function correctly
and consistently when it’s expected to. Key topics include: designing distributed systems,
recovery planning, and handling change.
Reliability design principles
There are five design principles that can increase reliability:
• Automatically recover from failure–Monitor systems for key performance indicators
and configure your systems to trigger an automated recovery when a threshold is
breached.
• Test recovery procedures
• Scale horizontally to increase aggregate workload availability
• Stop guessing capacity
• Manage change in automation
30
Cost Optimization Pillar
The Cost Optimization pillar focuses on the ability to avoid unnecessary costs. Key topics
include:understanding and controlling where money is being spent, selecting the most
appropriate and right number of resource types, analyzing spend over time, and scaling to meet
business needs without overspending.
Cost optimization design principles
There are five design principles that can optimize costs:
• Implement Cloud Financial Management
• Adopt a consumption model
• Measure overall efficiency
• Stop spending money on undifferentiated heavy lifting
• Analyze and attribute expenditure
31
Module 10 - Automatic Scaling and Monitoring
Elastic Load Balancing
Elastic Load Balancing is an AWS service that distributes incoming application or network
traffic across multiple targets—such as Amazon Elastic Compute Cloud (Amazon EC2) instances,
containers, internet protocol (IP) addresses, and Lambda functions—in a single Availability Zone
or across multiple Availability Zones.
Types of load balancers
Elastic Load Balancing is available in three types:
• An Application Load Balancer operates at the application level (Open Systems
Interconnection, or OSI, model layer 7).
• A Network Load Balancer operates at the network transport level (OSI model layer 4), routing
connections to targets—EC2 instances, microservices, and containers—based on IP protocol
data.
• A Classic Load Balancer provides basic load balancing across multiple EC2 instances, and it
operates at both the application level and network transport level.
Amazon CloudWatch
Amazon CloudWatch is a monitoring and observability service that is built for DevOps
engineers, developers, site reliability engineers (SRE), and IT managers. CloudWatch monitors
your AWS resources (and the applications that you run on AWS) in real time. You can use
CloudWatch to collect and track metrics, which are variables that you can measure for your
resources and applications.
CloudWatch alarms
You can create a CloudWatch alarm that watches a single CloudWatch metric or the result of a
math expression based on CloudWatch metrics. You can create a CloudWatch alarm based on a
static threshold, anomaly detection, or a metric math expression. The alarm goes to ALARM
state when the metric breaches the threshold for a specified number of evaluation periods.
For an alarm based on a static threshold, you must specify the:
• Namespace
32
• Metric
• Statistic
• Period
• Conditions
• Additional configuration information
• Actions
33
AWS Academy Machine Learning Foundations
34
Module 2 - Introducing Machine Learning
35
Types of Machine Learning?
● Supervised Learning
Supervised learning is a popular type of ML because it’s widely applicable. It’s called
supervised learning because you need a supervisor—a teacher—who can show the right
answers to the model.
you can have different types of problems within supervised learning. These
problems can be broadly categorized into two categories: classification and regression.
Classification problems have two types. The first type is considered a binary classification
problem. Recall the previous example about identifying fraudulent transactions.
Multiclass classification problems also exist.
Regression problems also exist. In a regression problem, you are no longer mapping an
input to a defined number of categories. Instead, you are mapping inputs to a continuous
value, like an integer. One example of an ML regression problem is predicting the price of
a company’s stock.
● UnSupervised Learning
In unsupervised learning,labels are not provided (like they are with supervised learning)
because you don’t know all the variables and patterns. In these instances, the machine must
uncover and create the labels itself. These models use the data that they are presented with
to detect emerging properties of the entire dataset, and then construct patterns.
A common subcategory of unsupervised learning is called clustering. This kind of algorithm
groups data into different clusters that are based on similar features to better understand
the attributes of a specific cluster.
The advantage of unsupervised algorithms is that they enable you to see patterns in the data
that you were otherwise unaware of. An example might be the existence of two major
customer types.
● Reinforcement Learning
In reinforcement learning, an agent continuously learns, through trial and error, as it
interacts in an environment. Reinforcement learning is useful when the reward of an
intended outcome is known, but the path to achieving it is not. Discovering that path
requires much trial and error.
Consider the example of AWS DeepRacer. In the AWS DeepRacer simulator, the agent is
the virtual car, and the environment is a virtual racetrack. The actions are the throttle and
steering inputs to the car. The goal is to complete the racetrack as quickly as possible
without deviating from the track.
36
Machine Learning Process
Business Problem
Data Preparation
Iterative model training
Feature engineering
Model Training
Evaluating and tuning the
model Deployment
37
Module 3 - Implementing Machine
Learning Pipeline with Amazon
SageMaker
Formulating Machine Learning Problems
Your first step in this phase is to define the problem that you’re trying to solve and the goal
that you want to reach. Understanding the business goal is key,because you use that goal to
measure the performance of your solution. You frequently must clarify the business problem
before you can begin to target a solution. You must ask other questions so that you can
thoroughly understand the problem.
Is this problem a supervised or unsupervised machine learning problem? Do you have labeled
data to train a supervised model?Again, you have many questions to ask yourself and the
business. Ultimately, you should try to validate the use of machine learning and confirm that
you have access to the right people and data. Then, devise the simplest solution to the problem.
Collecting and Securing data
You can obtain data from several places.
• Private data is data that you (or your customers) have in various existing systems.
Everything from log files to customer invoice databases can be useful, depending on
the problem that you want to solve.
• Commercial data is data that a commercial entity collected and made available.
Companies such as Reuters, Change Healthcare, Dun & Bradstreet, and
Foursquare maintain databases that you can subscribe to.
• Open-source data comprises many different open-source datasets that range
from scientific information to movie reviews. These datasets are usually available for
use in research or for teaching purposes.
Extracting,Transforming and Loading data
Data is typically spread across many different systems and data providers. The challenge is to
bring all these data sources together into something that a machine learning model can
consume.
The steps in an extract, transform, and load (ETL) process are defined as follows.
• Extract–Pull the data from the sources to a single location.
• Transform–During extraction, the data might need to be modified, matching
records might need to be combined, or other transformations might be necessary.
• Load–Finally, the data is loaded into a repository, such as Amazon S3 or Amazon Athena.
38
A typical ETL framework has several components. From the diagram, these components
include:
• Crawler–A program that connects to a data store (source or target). It progresses
through a prioritized list of classifiers to determine the schema for your data, and
creates metadata tables in the AWS Glue Data Catalog.
• Job–The business logic that is required to perform ETL work.
• Schedule or event–A scheduling service that periodically runs the ETL process.
Feature Engineering
Feature engineering is a task which selects the columns of data that make the most impact in the
model.
39
Feature Selection And Extraction
Feature selection is about selecting the features that are most relevant and discarding the rest.
Feature selection is applied to prevent either redundancy or irrelevance in the existing features,
or to get a limited number of features to prevent overfitting.
Handling Missing Data
If you decide to drop rows with missing data, you can use built-in functions. For example, the
pandas drop function can drop all rows with missing data, or you can drop specific data values
by using a subset.
Outliers
Outliers are points in your dataset that lie at an abnormal distance from other values. They are
not always something that you want to clean up, because they can add richness to your dataset.The
outliers affect accuracy because they skew values away from the other more normal values that
are related to that feature.
Feature Selection
Filter methods use statistical methods to measure the relevance of features by their correlation
with the target variable.
Wrapper methods measure the usefulness of a subset of features by training a model on it and
measuring the success of the model.
40
Module 4- Introduction to Forecasting
Forecasting overview
Forecasting is an important area of machine learning. It is important because so many
opportunities for predicting future outcomes are based on historical data. Many of these
opportunities involve a time component. Although the time component adds more information,
it also makes time series problems more difficult to handle than other types of predictions.
In addition to these two categories, most time series datasets also follow one of the following
patterns:
• Trend–A pattern that shows the values as they increase, decrease, or stay the same over time
• Seasonal–A repeating pattern that is based on the seasons in a year
• Cyclical–Some other form of a repeating pattern
• Irregular-Changes in the data over time that appear to be random or that have no discernable
pattern
41
• Forward fill–Uses the last known value for the missing value.
• Moving average–Uses the average of the last known values to calculate the missing value.
• Backward fill–Uses the next known value after the missing value. Be aware that it is a potential
danger to use the future to calculate the past, which is bad in forecasting. This practice is known
as lookahead,and it should be avoided.
• Interpolation–Essentially uses an equation to calculate the missing
value.Downsampling
We might obtain data at different frequencies. When you have data that is at a
different frequency than other datasets, or isn’t compatible with your question, you might need
to downsample.
Downsample means moving from a more finely grained time to a less finely grained time.When
you downsample, you must decide how to combine the values. Understanding your data helps
you decide what the best course of action is.
Upsampling
The inverse of downsampling is upsampling. The problem with upsampling is that it’s difficult to
achieve in most cases.Unless you have some other data source to reference, you wouldn’t be able
to change from data. In some cases, you must use additional data or knowledge.
Smoothing Data
Smoothing your data can help you deal with outliers and other anomalies. You might consider
smoothing for the following reasons.
• Data preparation–Removing error values and outliers
• Visualization–Reducing noise in a plot
42
Module 5- Introduction to Computer Vision
Computer Vision:
Computer vision enables machines to identify people, places, and things in images with accuracy
at or above human levels, with greater speed and efficiency. Often built with deep learning
models, computer vision automates the extraction, analysis, classification, and understanding of
useful information from a single image or a sequence of images. The image data can take many
forms, such as single images, video sequences, views from multiple cameras, or three-
dimensional data.
43
• Unsafe content detection–Amazon Rekognition can detect inappropriate content in images
and in stored videos
• Text detection–Amazon Rekognition Text in Image enables you to recognize and extract text
content from images.
As with other machine learning processes, you must train Amazon Rekognition to recognize
scenes and objects that are in a domain. Thus, you need a training dataset and a test dataset that
contains labeled images. Amazon Rekognition Custom Labels can be helpful for these tasks.You
can use Amazon Rekognition Custom Labels to find objects and scenes that are unique to your
business needs.
Training a computer vision algorithm to recognize images requires a large input dataset, which
is impractical for most organizations.
You can use an existing model or a managed service like Amazon Rekognition Custom Labels to:
• Simplify data labeling–Amazon Rekognition Custom Labels provides a UI for labeling images,
including defining bounding boxes.
• Provide simplified model evaluation, inference, and feedback–You evaluate your custom
model’s performance on your test set. For every image in the test set, you can see the side-by-
side comparison of the model’s prediction versus the label that it assigned.You can also review
detailed performance metrics. You can start using your model immediately for image analysis,
oryou can iterate and retrain new versions with more images to improve performance.
44
Module 6-Introducing Natural Language Processing
Overview of NLP
NLP is a broad term for a general set of business or computational problems that you can solve
with machine learning (ML). NLP systems predate ML. Two examples are speech-to-text on your
old cell phone and screen readers. Many NLP systems now use some form of machine learning.
NLP considers the hierarchical structure of language. Words are at the lowest layer of the
hierarchy. A group of words make a phrase. The next level up consists of phrases, which make a
sentence, and ultimately, sentences convey ideas.
Challenges Of NLP
Language is not precise. Words can have different meanings, which are based on the other
words that surround them (context). Often, the same words or phrases can have multiple
meanings.
Some of the main challenges for NLP include:
• Discovering the structure of the text–One of the first tasks of any NLP application is to break
the text into meaningful units, such as words, phrases, and sentences.
• Labeling data–After the system converts the text to data, the next challenge is to apply labels
that represent the various parts of speech. Every language requires a different labeling scheme to
match the language’s grammar.
• Representing context–Because word meaning depends on context, any NLP system needs a
way to represent context. It is a big challenge because of the large number of contexts.
Converting context into a form that computers can understand is difficult.
• Applying grammar–Although grammar defines a structure for language, the application of
grammar is nearly infinite. Dealing with the variation in how humans use language is a major
challenge for NLP systems. Addressing this challenge is where machine learning can have a big
impact.
• Chatbots to mimic human speech in applications
NLP Workflow:
For NLP, collecting data consists of breaking the text into meaningful subsets and labeling the
sets. Feature engineering is a large part of NLP applications. This process gets more complicated
when you have irregular or unstructured text.
Labeling data in the NLP domain is sometimes also called tagging.In the labeling process, you
must assign individual text strings to different parts of speech.
1) Preprocessing Text
2) Feature Engineering And Creating
Tokens 3)Text Analysis
4) Derive Meaning
45
CASE STUDY
❖ Problem
Statement
Machine (computer) could only process things that it has been introduced to, and to do so,
a computer requires a pattern as its main reference and comparison resource. The same
principle applies to researchers’ attempts to make computers capable of detecting and
recognizing human faces. Computer needs a model that could represent the human face,
and that model is later trained and then tested using positive and negative values.
❖ Solution
In order to perform facial emotion recognition, we need to define algorithm(s) that process
images more advanced than just a detection to be implemented by computers using tools
provided or designed by ourselves.
Emotion Detection Using open-source code from Open.CV for facial recognition, this
program is able to scan the current image for a face and detect whether or not a given
emotion is present. As an application, this program could be used in lie detection, vehicular
safety cameras, and even as an aid for autistic children.
The design of the program begins with creating a database and filling it with images
containing happy faces. This database is then compared with another that contains images
without faces, creating a classifier. The program first accesses the webcam and searches for
a face. Once found, features are extracted from the face and compared with the classifier to
see if the emotion is presented. Results show that the program is able to accurately detect
happiness. Future work is needed to train the program to detect other emotions as well.
Currently, the program is only able to detect whether the subject is happy or not and going
forward the goal is to create a database and classifier for all emotions. Also, as it stands
right now the program simply detects the emotion asked of it (happiness).
46
CONCLUSION
In a nutshell, this internship has been an excellent and rewarding experience. I came to
know about various technologies like Artificial Intelligence ,Machine Learning and cloud
computing technologies. I had learnt how to analyze,deal with the data and perform various
activities like Data Cleaning,Data Preprocessing etc. Various topics such as AWScloud,
its security, architecture etc were learnt during the internship programme . On the other
hand we also got introduced to the topics such as Deep Learning, Computer Vision, Natural
Language Processing(NLP),Confusion Matrix etc.
Needless to say, the technical aspects of the work I’ve done are not flawless and could be
improved provided enough time.As someone with no prior experience with these topics
whatsoever I believe my time spent in research and discovering it was well worth it.The
internship was also good to find out what my strengths and weaknesses are. This helped
me to define what skills and knowledge I have to improve in the coming time. While
doing the internship we had to face many situations in which we had to make our
decisions and use our skills so that our main goal was attained.We got a chance to assess
our interests,explore our choices and gain confidence.This internship opportunity is a great
one those who are passionate enough to learn AI & ML.
At last this internship has given new insights and motivation to pursue a career.
47
REFERENCES
[1] https://www.researchgate.net/publication/350094982_Cloud_Foundations
March 2021, DOI:10.1007/978-1-4842-6921-3_1
[3] Author : Sukhpal Singh Gill, May 2018, License CC0, Project: AI for Computing
https://www.researchgate.net/publication/324982146_Sustainable_
Cloud_Computing_Foundations_and_Future_Directions
[6] Nitish Srivastava nitish, Geoffrey Hinton hinton, Alex Krizhevsky kriz, Ilya
Sutskever ilya, Ruslan Salakhutdinov, Department of Computer Science,
University of Toronto 10 King's College Road, Rm 3302 Toronto, Ontario, M5S
3G4, Canada. https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf
48