100% found this document useful (1 vote)
1K views45 pages

20131a0249 Ai-Ml

This module provides an overview of cloud concepts. It defines cloud computing as the on-demand delivery of computing resources like servers, storage, databases, networking, software, analytics and intelligence via the internet with pay-as-you-go pricing. These resources are owned and hosted by cloud service providers in large data centers around the world. The module discusses how cloud computing allows infrastructure to be thought of and used as software rather than physical hardware, providing flexibility and enabling businesses to quickly implement new solutions with low upfront costs. It also introduces the three main cloud service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).

Uploaded by

Kada Jash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views45 pages

20131a0249 Ai-Ml

This module provides an overview of cloud concepts. It defines cloud computing as the on-demand delivery of computing resources like servers, storage, databases, networking, software, analytics and intelligence via the internet with pay-as-you-go pricing. These resources are owned and hosted by cloud service providers in large data centers around the world. The module discusses how cloud computing allows infrastructure to be thought of and used as software rather than physical hardware, providing flexibility and enabling businesses to quickly implement new solutions with low upfront costs. It also introduces the three main cloud service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).

Uploaded by

Kada Jash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

AICTE AI-ML VIRTUAL INTERNSHIP

A mini project (summer internship) report submitted in partial fulfillment

of requirements for the award of degree of


BACHELOR OF TECHNOLOGY IN
ELECTRICAL AND ELECTRONICS
ENGINEERING

BY

KADA JASHWANTH (20131A0249)

Under the esteemed guidance of

Mr. P. Pawan Puthra


(Assistant Professor)

Department of Electrical and Electronics Engineering


GAYATRI VIDYA PARISHAD COLLEGE OF ENGINEERING (AUTONOMOUS)
(Affiliated to JNTU-K, Kakinada)
VISAKHAPATNAM
2022 – 2023

1
GAYATRI VIDYA PARISHAD COLLEGE OF ENGINEERING
(AUTONOMOUS)
(Affiliated to JNTUK, Kakinada, A.P, Accredited by NBA & NAAC)
MADHURAWADA, VISAKHAPATNAM, A. P. – 530048
DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING

CERTIFICATE

This is to certify that the mini project (summer internship) report entitled “AWS AI-
ML VIRTUAL INTERNSHIP” submitted by, KADA JASHWANTH, 20131A0249, in
partial fulfilment of the requirements for the degree of Bachelor of technology in
Electrical and Electronics Engineering, GVP COLLEGE OF
ENGINEERING(Autonomous), Affiliated to Jawaharlal Nehru Technological
University (JNTUK), Kakinada has been carried under my supervision during 2022 –
2023.

PROJECT SUPERVISOR / Mentor HEAD OF THE DEPARTMENT

(Mr. P. Pawan Puthra) (Dr. G V E Satish Kumar)


(Assistant Professor) Professor & Head
Dept of EEE, GVPCE(A) Dept. of EEE, GVPCE(A)

3
4
ACKNOWLEDGEMENT

We would like to express our deep sense gratitude to our esteemed institute “Gayatri Vidya
Parishad College of Engineering (Autonomous)”, which has provided us an opportunity to
fulfil our cherished desire.

We express our sincere thanks to our Principal Dr. A. B. KOTESWARA RAO for his
encouragement to us during the course of this internship.

We express our heartfelt thanks and acknowledge our in - debtness to Dr. G V E Satish
Kumar, Head of the Department, Department of Electrical and Electronics
Engineering.

We express our profound gratitude and our deep indebtedness to our guide Mr. P. Pawan
Puthra, whose valuable suggestions, guidance and comprehensive assistance helped us a
lot in realizing my internship and lead us incompleting my internship efficiently.

We would also like to thank all the members of teaching and non-teaching staff of The

Electrical and Electronics Engineering Department for all their support in completion of my

internship

KADA JASHWANTH (20131A0249)

8
DECLARATION

I hereby declare that the internship entitled “AWS AI-ML VIRUAL


INTERNSHIP” is a bonafide work done by me and submitted to the Department of Electrical and
Electronics Engineering, G.V.P College of Engineering (Autonomous) Visakhapatnam, in
partial fulfilment for the award of the degree of B. Tech. is of my own and it is not submitted to any other
university or has been published any time before.

PLACE: VISHAKAPATNAM KADA JASHWANTH


DATE: (20131A0249)

9
ABSTRACT

The AI-ML virtual internship is about learning the latest technologies involving Artificial
Intelligence along with manipulation of data with the help of Machine Learning.

There are a wide range of courses or topics included in the course of Artificial Intelligence
and machine learning which gives us a brief information about the technologies
mentioned above.

In this internship before introducing to AI-ML the basics of the cloud computing
technologies which are termed as cloud foundation were intended in the course.The cloud
foundations(course-1) gives us the brief idea of using the AWS CLOUD and its
functionalities like hosting webpages or static websites along with data storage and
manipulation.

The AI-ML (course-2) gives us the basic idea of implementation of Machine Learning as
well and using Machine Learning as an intermediate to manipulate data. Both the courses
consisted of various modules which consisted of various pre-recorded videos along with
labs for the learner to practice and perform tasks.

At the end of this internship, I understood the complete concepts of the cloud Foundation
with the basic topics of artificial intelligence which also included some topics indulged in
it such as Machine Learning, NLP, deep learning.

10
INDEX
SI. NO TOPIC NAME PAGE NUMBER
Course – 1 Cloud Foundations
Module 1 Cloud Concepts Overview 12-13
Module 2 Cloud Economics and Billing 14-15
Module 3 AWS Global Infrastructure Overview 16-17
AWS Cloud Security
Module 4 Lab 1 - Introduction to AWS IAM 18-19
Networking and Content Delivery
Module 5 Lab 2 - Build your VPC and Launch a Web Server 20-22
Compute
Module 6 Lab 3 - Introduction to Amazon EC2 23-24
Storage
Module 7 Lab 4 - Working with EBS 25-26
Databases
Module 8 Lab 5 - Build a Database Server 27-28
Module 9 Cloud Architecture 29-31
Automatic Scaling and Monitoring
Module 10 Lab 6 - Scale & Load Balance your Architecture 32-33
Course – 2 Machine Learning Foundations
Welcome to AWS Academy Machine
Module 1 34
Learning Foundations
Module 2 Introducing Machine Learning 35-37
Implementing a Machine Learning pipeline with
Amazon SageMaker
Module 3 Lab 3.1 - Amazon SageMaker - Creating and importing data 38-40
Lab 3.2 - Amazon SageMaker - Exploring Data
Lab 3.3 - Amazon SageMaker - Generating model performance
Introducing Forecasting
Module 4 Lab 4 - Creating a forecast with Amazon Forecast 41-42
Introducing Computer Vision (CV)
Module 5 Lab 5 - Guided Lab: Facial Recognition 43-44
Introducing Natural Language Processing
Module 6 Lab 6 - Amazon Lex - Create a chatbot 45
Module 7 Course Wrap-Up -
Case Study Problem and Solution 46
Conclusion 47
References 48

11
AWS Academy Cloud Foundations

Module 1: Cloud Concepts Overview

Cloud computing defined


Cloud computing is the on-demand delivery of compute power, database, storage, applications,
and other IT resources via the internet with pay-as-you-go pricing. These resources run on
server computers that are located in large data centers in different locations around the world.
When you use a cloud service provider like AWS, that service provider owns the computers that
you are using. These resources can be used together like building blocks to build solutions that
help meet business goals and satisfy technology requirements.

Infrastructure as software
Cloud computing enables you to stop thinking of your infrastructure as hardware, and
instead think of (and use) it as software.

Traditional computing model


In the traditional computing model, infrastructure is thought of as hardware. Hardware
solutions are physical, which means they require space, staff, physical security, planning, and
capital expenditure.

Cloud computing model


By contrast, cloud computing enables you to think of your infrastructure as software. Software
solutions are flexible. You can select the cloud services that best match your needs, provision
and terminate those resources on-demand, and pay for what you use. You can elastically scale
resources up and down in an automated fashion. With the cloud computing model, you can
treat resources as temporary and disposable. The flexibility that cloud computing offers enables
businesses to implement new solutions quickly and with low upfront costs.

12
Cloud service models

● Infrastructure as a service (IaaS): Services in this category are the basic building blocks
for cloud IT and typically provide you with access to networking features, computers (virtual
or on dedicated hardware), and data storage space. IaaS provides you with the highest level of
flexibility and management control over your IT resources. It is the most similar to existing
IT resources that many IT departments and developers are familiar with today.

● Platform as a service (PaaS): Services in this category reduce the need for you to manage
the underlying infrastructure (usually hardware and operating systems) and enable you to
focus on the deployment and management of your applications.

● Software as a service (SaaS): Services in this category provide you with a completed product
that the service provider runs and manages. In most cases, software as a service refers to end-
user applications. With a SaaS offering, you do not have to think about how the service is
maintained or how the underlying infrastructure is managed. You need to think only about
how you plan to use that particular piece of software. A common example of a SaaS
application is web-based email, where you can send and receive email without
managingfeature additions to the email product or maintaining the servers and operating
systems that the email program runs on.

13
Module 2: Cloud Economics and Billing

AWS pricing model

There are three fundamental drivers of cost with AWS: compute, storage, and outbound data
transfer. These characteristics vary somewhat, depending on the AWS product and pricing
model you choose.In most cases,there is no charge for inbound data transfer or for data transfer
between other AWS services within the same AWS Region. There are some exceptions,so be sure
to verify data transfer rates before you begin to use the AWS service.Outbound data transfer is
aggregated across services and then charged at the outbound data transfer rate. This charge
appears on the monthly statement as AWS Data Transfer Out.

How do you pay for AWS?


AWS offers a range of cloud computing services. For each service, you pay for exactly the
amount of resources that you actually need. This utility-style pricing model includes:
● Pay for what you use
● Pay less when you reserve
● Pay less when you use more
● Pay even less as AWS grows

14
AWS Pricing Calculator
AWS offers the AWS Pricing Calculator to help you estimate a monthly AWS bill. You can use
this tool to explore AWS services and create an estimate for the cost of your use cases on AWS.
You can model your solutions before building them, explore the price points and calculations
behind your estimate, and find the available instance types and contract terms that meet your
needs. This enables you to make informed decisions about using AWS. You can plan your AWS
costs and usage or price out setting up a new set of instances and services.

The AWS Pricing Calculator helps you:


• Estimate monthly costs of AWS services
• Identify opportunities for cost reduction
• Model your solutions before building them
• Explore price points and calculations behind your estimate

Additional benefit considerations


Hard benefits include reduced spending on compute, storage, networking, and security. They
also include reductions in hardware and software purchases; reductions in operational costs,
backup, and disaster recovery; and a reduction in operations personnel.

Soft Benefits include Reuse of service and applications that enable you to define (and redefine
solutions) by using the same cloud service, Increased developer productivity, Improved
customer satisfaction, Agile business processes that can quickly respond to new and emerging
opportunities, Increase in global reach.

Cloud Total Cost of Ownership defines what will be spent on the technology after adoption—or
what it costs to run the solution. Typically, a TCO analysis looks at the as-is on-premises
infrastructure and compares it with the cost of the to-be infrastructure state in the cloud. While
this difference might be easy to calculate, it might only provide a narrow view of the total
financial impact of moving to the cloud.

A return on investment (ROI) analysis can be used to determine the value that is generated while
considering spending and saving. This analysis starts by identifying the hard benefits in terms of
direct and visible cost reductions and efficiency improvements.

15
Module 3: AWS Global Infrastructure Overview

AWS Global Infrastructure


The AWS Global Infrastructure is designed and built to deliver a flexible, reliable, scalable, and
securecloud computing environment with high-quality global network performance. AWS
continually updates its global infrastructure footprint.

AWS Regions
The AWS Cloud infrastructure is built around Regions. AWS has 22 Regions worldwide. An AWS
Region Is a physical geographical location with one or more Availability Zones. Availability
Zones in turn consist of one or more data centers. To achieve fault tolerance and stability,
Regions are isolated from one another. Resources in one Region are not automatically replicated
to other Regions. Data replication across Regions is controlled by you. Communication between
Regions uses AWS backbone network infrastructure.

Selecting a Region
There are a few factors that you should consider when you select the optimal Region or Regions
where you store data and use AWS services. They are:
● Data governance, legal requirements
● Proximity to customers(latency)
● Services available within the Region
● Costs (vary by Region)

Availability Zones
Each AWS Region has multiple, isolated locations that are known as Availability Zones. Each
Availability Zone provides the ability to operate applications and databases that are more highly
available, fault-tolerant, and scalable than would be possible with a single data center. Each
Availability Zone can include multiple data centers (typically three). They are fully isolated
partitions of the AWS Global Infrastructure. All Availability Zones are interconnected with high-
bandwidth, low-latency networking that provides high-throughput between Availability Zones.
The network accomplishes synchronous replication between Availability Zones.

16
AWS data centers
AWS data centers are designed for security. Data centers are where the data resides and data
processing occurs. Each data center has redundant power, networking, and connectivity, and is
housed in a separate facility. A data center typically has 50,000 to 80,000 physical servers. Data
centers are securely designed with several factors.

AWS Infrastructure features


The AWS Global Infrastructure has several valuable features:
● First, it is elastic and scalable. This means resources can dynamically adjust to increases
or decreases in capacity requirements. It can also rapidly adjust to accommodate growth.
● Second, this infrastructure is fault tolerant, which means it has built-in component
redundancy which enables it to continue operations despite a failed component.
● Finally, it requires minimal to no human intervention, while providing high availability
with minimal down time.

AWS foundational services

17
Module 4 - AWS Cloud Security

AWS responsibility: Security of the cloud


AWS is responsible for security of the cloud. AWS is responsible for protecting the global
infrastructure that runs all the services that are offered in the AWS Cloud. The global
infrastructure includes AWS Regions, Availability Zones, and edge locations.

Customer responsibility: Security in the cloud


The customer is responsible for what is implemented by using AWS services and for the
applications that are connected to AWS. The security steps that you must take depend on the
services that you use and the complexity of your system.Customer responsibilities include
selecting and securing any instance operating systems, securing the applications that are launched
on AWS resources, security group configurations, firewall configurations, network
configurations, and secure account management.

Service characteristics and security responsibility


AWS services such as Amazon EC2 can be categorized as IaaS and thus require the customer to
perform all necessary security configuration and management tasks. AWS services such as AWS
Lambda and Amazon RDS can be categorized as PaaSbecause AWS operates the infrastructure
layer, the operating system, and platforms. AWS services such as AWS Trusted Advisor, AWS
Shield, and Amazon Chime could be categorized as SaaS offerings, given their characteristics.

AWS Identity and Access Management (IAM)


AWS Identity and Access Management (IAM) allows you to control access to compute, storage,
database, and application services in the AWS Cloud. IAM is a tool that centrally manages access
to launching, configuring, managing, and terminating resources in your AWS account. With
IAM, you can manage which resources can be accessed by who,and how these resources can be
accessed. IAM is a feature of your AWS account, and it is offered at no additional charge.

18
IAM: Essential components
An IAM user is a person or application that is defined in an AWS account, and that must make
API calls to AWS products. An IAM group is a collection of IAM users. An IAM policy is a
document that defines permissions to determine what users can do in the AWS account. An IAM
role is a tool for granting temporary access to specific AWS resources in an AWS account.

Authorization: What actions are permitted


Authorization Is the process of determining what permissions a user, service or application
should be granted. After a user has been authenticated, they must be authorized to access AWS
services.

By default, IAM users do not have permissions to access any resources or data in an AWS
account. Instead, you must explicitly grant permissions to a user, group, or role by creating a
policy,which is a document in JavaScript Object Notation (JSON) format.A policy lists
permissions that allow or deny access to resources in the AWS account.

IAM: Authorization
You Assign permissions by creating an IAM policy. The Permissions determine which resources
and operations are allowed. All permissions are implicitly denied by default. If something is
explicitly denied, it is never allowed. The principle of least privilege is an important concept in
computer security. It promotes that you grant only the minimal user privileges needed to the
user, based on the needs of your users. The scope of the IAM service configurations is global.

IAM policies
An IAM policy is a formal statement of permissions that will be granted to an entity. Policies can
be attached to any IAM entity. Policies specify what actions are allowed, which resources to
allow the actions on, and what the effect will be when the user requests access to the resources.
There are two types of IAM policies: Identity-based policies and Resource-based policies.

19
Module 5: Networking and Content Delivery

Networks
A computer network is two or more client machines that are connected together to share
resources. A network can be logically partitioned into subnets. Networking requires a
networking device to connect all the clients together and enable communication between them.

IP addresses
Each client machine in a network has a unique Internet Protocol (IP) address that identifies it.
An IP address is a numerical label in decimal format. Machines convert that decimal number toa
binary format.

Open Systems Interconnection (OSI) model


The Open Systems Interconnection (OSI) model is a conceptual model that is used to explain
how data travels over a network. It consists of seven layers and shows the common protocols and
addresses that are used to send data at each layer.

20
Amazon VPC
Amazon Virtual Private Cloud (Amazon VPC) is a service that lets you provision a logically
isolated section of the AWS Cloud where you can launch your AWS resources. Amazon VPC
givesyou control over your virtual networking resources, including the selection of your own IP
address range, the creation of subnets, and the configuration of route tables and network
gateways. You can use both IPv4 and IPv6 in your VPC for secure access to resources and
applications.You can also customize the network configuration for your VPC.

VPCs and subnets


A VPC is a virtual network that is logically isolated from other virtual networks in the AWS
Cloud. VPCs belong to a single AWS Region and can span multiple Availability Zones. After you
create a VPC, you can divide it into one or more subnets. A subnet is a range of IP addresses in a
VPC. Subnets belong to a single Availability Zone. You can create subnets in different
AvailabilityZones for high availability. Subnets are generally classified as public or private.

Elastic network interface


An elastic network interface is a virtual network interface that you can attach or detach from an
instance in a VPC. Its attributes follow when it is reattached to a new instance. Each instance in
your VPC has a default network interface that is assigned a private IPv4 address from the IPv4
address range of your VPC.

Internet gateway
An internet gateway is a scalable, redundant, and highly available VPC component that allows
communication between instances in your VPC and the internet. An internet gateway serves two
purposes: to provide a target in your VPC route tables for internet-routable traffic, and to
perform network address translation for instances that were assigned public IPv4 addresses.

Network address translation (NAT) gateway


A network address translation (NAT) gateway enables instances in a private subnet to connect to
the internet or other AWS services.

21
Security groups
A security group acts as a virtual firewall for your instance, and it controls inbound and
outbound traffic. Security groups act at the instance level, not the subnet level. Security groups
have rules that control inbound and outbound instance traffic. Default security groups deny all
inbound traffic and allow all outbound traffic. Security groups are stateful.

Network access control lists


A network access control list (network ACL)is an optional layer of security for your Amazon VPC.
It acts as a firewall for controlling traffic in and out of one or more subnets. Each subnet in your
VPC must be associated with a network ACL.

Amazon Route 53
Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web
service. It is designed to give developers and businesses a reliable and cost-effective way to route
users to internet applications by translating names into the numeric IP addresses.

Content delivery network (CDN)


A content delivery network (CDN) is a globally distributed system of caching servers. A CDN
caches copies of commonly requested files that are hosted on the application origin server.
CDNs also deliver dynamic content that is unique to the requester and is not cacheable.

Amazon CloudFront
Amazon CloudFront is a fast CDN service that securely delivers data, videos, applications, and
application programming interfaces (APIs) to customers globally with low latency and high
transfer speeds.It also provides a developer-friendly environment. Amazon CloudFront delivers
files to users over a global network of edge locations and Regional edge caches.

22
Module 6 - Compute
AWS Compute Services

Amazon EC2
The EC2 in Amazon EC2 stands for Elastic Compute Cloud:
• Elastic refers to the fact that you can easily increase or decrease the number of servers you
run to support an application automatically.
• Compute refers to reason why most users run servers in the first place, which is to host
running applications or process data—actions that require compute resources, including
processing power (CPU) and memory (RAM).
• Cloud refers to the fact that the EC2 instances that you run are hosted in the cloud.

23
Amazon EC2 Cost Optimization
Amazon offers different pricing models to choose from when you want to run EC2 instances.
● Per second billing is only available for On-Demand Instances, Reserved Instances, and
Spot Instances that run Amazon Linux or Ubuntu.
● On-Demand Instances are eligible for the AWS Free Tier. They have the lowest upfront cost and
the most flexibility.
● Reserved Instance enable you to reserve computing capacity for 1-year or 3-year term
with lower hourly running costs.
● Spot Instances enable you to bid on unused EC2 instances, which can lower your costs.

Container Services
Containers are a method of operating system virtualization that enables you to run an application
and its dependencies in resource-isolated processes. By using containers, you can easily package an
application's code, configurations, and dependencies into easy-to-use building blocks that deliver
environmental consistency, operational efficiency, developer productivity, and version control.

Amazon Elastic Container Services(Amazon ECS)


Amazon Elastic Container Service (Amazon ECS) is a highly scalable, high-performance
container management service that supports Docker containers.Amazon ECS enables you to
easily run applications on a managed cluster of Amazon EC2 instances.

Aws Lambda
AWS Lambda is an event-driven, serverless compute service.Lambda enables you to run code
without provisioning or managing servers.You create a Lambda function, which is the AWS
resource that contains the code that you upload. You then set the Lambda function to be
triggered, either on a scheduled basis or in response to an event. Your code only runs when it is
triggered.

Aws Elastic Beanstalk


AWS Elastic Beanstalk is another AWS compute service option. It is a platform as a service that
facilitates the quick deployment, scaling, and management of your web applications and
services. You remain in control. The entire platform is already built, and you only need to
uploadyour code. Choose your instance type, your database, set and adjust automatic scaling,
update your application, access the server log files, and enable HTTPS on the load balancer.

24
Module 7 - Storage

Amazon Elastic Block Store(Amazon EBS)


Amazon EBS provides persistent block storage volumes for use with Amazon EC2 instances.
Persistent storage is any data storage device that retains data after power to that device is shut
off. It is also sometimes called non-volatile storage. Each Amazon EBS volume is automatically
replicated within its Availability Zone to protect you from component failure. It is designed for
high availability and durability. Amazon EBS volumes provide the consistent and low-latency
performance that is needed to run your workloads.

Amazon Simple Storage Service(Amazon S3)


Amazon S3 is object-level storage, which means that if you want to change a part of a file, you
must make the change and then re-upload the entire modified file. Amazon S3 stores data as
objects within resources that are called buckets. You can store virtually as many objects as you
want in a bucket, and you can write, read, and delete objects in your bucket. Bucket names are
universal and must be unique across all existing bucket names in Amazon S3. Objects can be up
to 5 TB in size. By default, data in Amazon S3 is stored redundantly across multiple facilities and
multiple devices in each facility.

Amazon S3 Storage Classes


Amazon S3 offers a range of object-level storage classes that are designed for different use cases.
These classes include:
● Amazon S3 Standard
● Amazon S3 Intelligent-Tiering
● Amazon S3 Standard-Infrequent Access (Amazon S3 Standard-IA)
● Amazon S3 One Zone-Infrequent Access (Amazon S3 One Zone-IA)
● Amazon S3 Glacier

Amazon Elastic File System(Amazon EFS)


Amazon Elastic File System (Amazon EFS) provides simple, scalable, elastic file storage for use
with AWS services and on-premises resources. It offers a simple interface that enables you to
create and configure file systems quickly and easily.

25
Amazon EFS Features
● File storage in the AWS Cloud
● Petabyte-scale, low-latency file system
● Shared storage
● Elastic capacity
● Supports Network File System (NFS) versions 4.0 and 4.1 (NFSv4)
● Compatible with all Linux-based AMIs for Amazon EC2

Amazon S3 Glacier
When you use Amazon S3 Glacier to archive data, you can store your data at an extremely low
cost, but you cannot retrieve your data immediately when you want it. Data that is stored in
Amazon S3 Glacier can take several hours to retrieve, which is why it works well for archiving.
There are three key Amazon S3 Glacier terms you should be familiar with:
● Archive–Any object (such as a photo, video, file, or document) that you store in Amazon
S3 Glacier. It is the base unit of storage in Amazon S3 Glacier.
● Vault–A container for storing archives. When you create a vault, you specify the vault
name and the Region where you want to locate the vault.
● Vault access policy–Determine who can and cannot access the data that is stored in the
vault, and what operations users can and cannot perform. One vault access permissions
policy can be created for each vault to manage access permissions for that vault.

Amazon S3 Glacier use cases


● Media asset archiving
● Healthcare information archiving
● Regulatory and compliance archiving
● Scientific data archiving
● Digital preservation

26
Module 8- Databases

Amazon Relational Database Service (Amazon RDS)


Amazon RDS is a managed service that sets up and operates a relational database in the
cloud.Amazon RDS provides cost efficient and resizable capacity, while automating time
consuming administrative tasks. It enables you to focus on your application, so you can give
applications the performance, high availability, security, and compatibility that they need. With
Amazon RDS, your primary focus is your data and optimizing your application.

Amazon RDS in a virtual private cloud (VPC)


You can run an instance by using Amazon Virtual Private Cloud (Amazon VPC). When you use a
virtual private cloud ( VPC), you have control over your virtual networking environment. You
can select your own IP address range, create subnets, and configure routing and access control
lists (ACLs).

Features
● Offers asynchronous replication
● Can be promoted to primary if needed

Amazon DynamoDB
Here is a review of the differences between these two types of databases:
● A relational database (RDB) works with structured data that is organized by tables,
records, and columns. RDBs establish a well-defined relationship between database tables.
RDBs use structured query language, which is a standard user application that provides a
programming interface for database interaction.
● A non-relational database is any database that does not follow the relational model that is
provided by traditional RDBMS. Non-Relational databases scale out horizontally, and
they can work with unstructured and semistructured data.
Amazon DynamoDB Features
● NoSQL database tables
● Virtually unlimited storage
● Items can have differing attributes
● Low-latency queries
● Scalable read/write throughput

27
Amazon Redshifi
Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost effective
to analyze all your data by using standard SQL and your existing business intelligence (BI) tools.
Amazon Redshift use cases
● Enterprise data Warehouse (EDW)
● Respond faster to business needs
● Big data
● low price point for small customers
● Managed service for ease of deployment and maintenance
● Focus more on data and Lesson database management
● Software as a service (SaaS)

Amazon Aurora
Amazon Aurora is a MySQL and PostgreSQL compatible relational database that is built for the
cloud. It combines the performance and availability of high end commercial databases with the
simplicity and cost effectiveness of open source databases.

Amazon Aurora service benefits


● It is highly available and it offers a fast,distributed storage subsystem.
● Amazon Aurora is straightforward to set up and uses SQL queries.
● It is designed to have drop-in compatibility with MySQL and PostgreSQL database
engines Amazon Aurora is a pay-as-you-go service.
● It’s a managed service that integrates with features such as AWS Database Migration
Service (AWS DMS) and the AWS Schema Conversion Tool.

28
Module 9- Cloud Architecture
AWS Well-Architected Framework
• A guide for designing infrastructures that are:
Secure
High-performing
Resilient
Efficient
• A consistent approach to evaluating and implementing cloud architectures
• A way to provide best practices that were developed through lessons learned by reviewing
customer architectures.

Pillars of the AWS Well-Architected Framework

Operational Excellence pillar


The Operational Excellence pillar focuses on the ability to run and monitor systems to deliver
business value,and to continually improve supporting processes and procedures.
There are five design principles for operational excellence in the cloud:

29
• Perform operations as code–Define your entire workload as code and update it with code.
• Make frequent, small, reversible changes–Design workloads to enable components to be
updated regularly. Make changes in small increments that can be reversed if they fail (without
affecting customers when possible).
• Refine operations procedures frequently–Look for opportunities to improve operations
procedures.
• Anticipate failure–Identify potential sources of failure so that they can be removed or
mitigated. Test failure scenarios and validate your understanding of their impact.
• Learn from all operational failures–Drive improvement through lessons learned from all
operational events and failures. Share what is learned across teams and through the entire
organization.

Security Pillar
The Security pillar focuses on the ability to protect information, systems, and assets while
delivering business value through risk assessments and mitigation strategies.
Security design principles
There are seven design principles that can improve security:
• Implement a strong identity foundation
• Enable traceability
• Apply security at all layers
• Automate security best practices
• Protect data in transit and at rest
• Keep people away from data
• Prepare for security events

Reliability Pillar
The Reliability pillar focuses on ensuring a workload performs its intended function correctly
and consistently when it’s expected to. Key topics include: designing distributed systems,
recovery planning, and handling change.
Reliability design principles
There are five design principles that can increase reliability:
• Automatically recover from failure–Monitor systems for key performance indicators
and configure your systems to trigger an automated recovery when a threshold is
breached.
• Test recovery procedures
• Scale horizontally to increase aggregate workload availability
• Stop guessing capacity
• Manage change in automation

30
Cost Optimization Pillar
The Cost Optimization pillar focuses on the ability to avoid unnecessary costs. Key topics
include:understanding and controlling where money is being spent, selecting the most
appropriate and right number of resource types, analyzing spend over time, and scaling to meet
business needs without overspending.
Cost optimization design principles
There are five design principles that can optimize costs:
• Implement Cloud Financial Management
• Adopt a consumption model
• Measure overall efficiency
• Stop spending money on undifferentiated heavy lifting
• Analyze and attribute expenditure

The AWS Well-Architected Tool


• Helps you review the state of your workloads and compares them to the latest AWS
architectural best practices
• Gives you access to knowledge and best practices used by
AWS architects, whenever you need it
• Delivers an action plan with step-by-step guidance on how to build better workloads for the
cloud
• Provides a consistent process for you to review and measure
your cloud architectures.

Reliability and availability


One way to do that is to architect your applications and workloads to withstand failure. There
are two important factors that cloud architects consider when designing architectures to
withstand failure: reliability and availability.
Reliability
Reliability is a measure of your system’s ability to provide functionality when desired by the
user.Because "everything fails, all the time," you should think of reliability in statistical
terms.Reliability is the probability that an entire system will function as intended for a specified
period.
Availability
As you just learned, failure of system components impacts the availability of the system.
Formally,availability is the percentage of time that a system is operating normally or correctly
performing the operations expected of it (or normal operation time over total time). Availability
is reduced anytime the application isn’t operating normally, including both scheduled and
unscheduled interruptions.

31
Module 10 - Automatic Scaling and Monitoring
Elastic Load Balancing
Elastic Load Balancing is an AWS service that distributes incoming application or network
traffic across multiple targets—such as Amazon Elastic Compute Cloud (Amazon EC2) instances,
containers, internet protocol (IP) addresses, and Lambda functions—in a single Availability Zone
or across multiple Availability Zones.
Types of load balancers
Elastic Load Balancing is available in three types:
• An Application Load Balancer operates at the application level (Open Systems
Interconnection, or OSI, model layer 7).
• A Network Load Balancer operates at the network transport level (OSI model layer 4), routing
connections to targets—EC2 instances, microservices, and containers—based on IP protocol
data.
• A Classic Load Balancer provides basic load balancing across multiple EC2 instances, and it
operates at both the application level and network transport level.

Load balancer monitoring


We can use the following features to monitor your load balancers, analyze traffic patterns, and
troubleshoot issues with your load balancers and targets:
• Amazon CloudWatch metrics
• Access logs
• AWS CloudTrail logs

Amazon CloudWatch
Amazon CloudWatch is a monitoring and observability service that is built for DevOps
engineers, developers, site reliability engineers (SRE), and IT managers. CloudWatch monitors
your AWS resources (and the applications that you run on AWS) in real time. You can use
CloudWatch to collect and track metrics, which are variables that you can measure for your
resources and applications.

CloudWatch alarms
You can create a CloudWatch alarm that watches a single CloudWatch metric or the result of a
math expression based on CloudWatch metrics. You can create a CloudWatch alarm based on a
static threshold, anomaly detection, or a metric math expression. The alarm goes to ALARM
state when the metric breaches the threshold for a specified number of evaluation periods.
For an alarm based on a static threshold, you must specify the:
• Namespace

32
• Metric
• Statistic
• Period
• Conditions
• Additional configuration information
• Actions

Amazon EC2 Auto Scaling


Scaling is the ability to increase or decrease the compute capacity of your application.When you
run your applications on AWS, you want to ensure that your architecture can scale to handle
changes in demand.
Auto Scaling groups
An auto Scaling group is a collection of Amazon EC2 instances that are treated as a logical
grouping for the purposes of automatic scaling and management.The size of an Auto Scaling
group depends on the number of instances you set as the desired capacity.
How Amazon EC2 Auto Scaling works
Amazon EC2 Auto Scaling is designed to adjust the size of your group so it has the specified
number of instances. If you specify scaling policies, then Amazon EC2 Auto Scaling can launch
or terminate instances as demand on your application increases or decreases.
Auto Scaling group uses a launch configuration ,which is an instance configuration template.
You have many scaling options:
• Maintain current instance levels at all times
• Manual scaling
Scheduled scaling
–With scheduled scaling,scaling actions are performed automatically as a function of date and
time. This is useful for predictable workloads when you know exactly when to increase
ordecrease the number of instances in your group.
–A more advanced way to scale your resources enables you to define parameters that control the
scaling process. For example, you have a web application that currently runs on two instances
and you want the CPU utilization of the Auto Scaling group to stay close to 50 percent when the
load on the application changes.
Implementing dynamic scaling
One common configuration for implementing dynamic scaling is to create a CloudWatch alarm
that is based on performance information from your EC2 instances or load balancer. When a
performance threshold is breached, a CloudWatch alarm triggers an automatic scaling event that
either scales out or scales in EC2 instances in the Auto Scaling group.Amazon CloudWatch,
Amazon EC2 Auto Scaling, and Elastic Load Balancing work well individually.Together,
however, they become more powerful and increase the control and flexibility over how your
application handles customer demand.

33
AWS Academy Machine Learning Foundations

Module 1- Welcome to Machine Learning Foundations

Machine Learning Job roles


● DataScientist Role:
If you decide to work toward a data scientist role, you should focus on developing
analytical,statistical, and programming skills. As a data scientist, you use those skills to
collect, analyze,and interpret large datasets. Some universities now offer degrees in data
science, but often data scientists have degrees in statistics, math, computer science, or
economics. As a data scientist, you need technical competencies in statistics, machine
learning, programming languages, and data analytics.

● Machine Learning Engineer:


If you decide to work toward a career as a machine learning engineer, you need some
skills that are similar to a data scientist’s skills. However, your focus is more on
programming skills and software architecture. As a machine learning engineer, you can
apply those skills to design and develop machine learning systems. Machine learning
engineers often have previous experience with software development.

● Applied Science Researcher:


You might also decide to work toward a career in science where you can apply machine
learning technology. Machine learning has an impact on everything from astronomy to
zoology, so many different paths are open to you. As an applied science researcher, your
the primary focus is on the type of science that you decide to concentrate on. You need
some of the skills of a data scientist, but you must also know how to apply those skills to
your chosen domain.

● Machine Learning developer role:


Many software developers are now integrating machine learning into their applications.
If you are working toward a career as a software developer, you should include machine
learning technology in your course of study. As a machine learning developer, your
primary focus is on software development skills, but you also need some of the skills of a
data scientist. Therefore, you should include coursework in statistics and applied
mathematics.

34
Module 2 - Introducing Machine Learning

What is Machine Learning?


Machine learning is a subset of AI, which is a broad branch of computer science for
building machines that can do human tasks. Deep learning itself is a subdomain of machine
learning. AI is about building machines that can perform tasks that a human would
typically perform. In modern culture, AIs appear in movies or works of fiction. You might
recall some AIs in science fiction movies or TV shows that control the future world, or act
intelligently on their own—sometimes, with negative effects for society or the human beings
around them. These AIs started as computer agents that perceived their environments and
took actions to achieve a specific goal. However, for some of these fictional AIs, their
actions were not the outcome that their creators had originally envisioned.
Machine learning is the scientific study of algorithms and statistical models to perform a task
by using inference instead of instructions.
Suppose that you must write an application that determines whether an email message is
spam or not. Without machine learning, you write a complex series of decision statements
(think if/else statements). Perhaps you use words in the subject or body, the number of links,
and the length of the email to determine whether an email message is spam. It would be
difficult and laborious to compile such a large set of rules to cover every possibility. However,
with machine learning, you can use a list of email messages that are marked spam or not
spam to train a machine learning model.
Deep learning represents a significant leap forward in the capabilities for AI and ML. The
the theory behind deep learning was created from how the human brain works. An artificial
neural network (ANN) is inspired from the biological neurons in the brain, although the
implementation is different.Artificial neurons have one or more inputs and a single output.
These neurons fire (or activate their outputs), which are based on a transformation of the inputs.
A neural network is composed of layers of these artificial neurons, with connections between
the layers. Typically, a network has input, output, and hidden layers.
The output of a single neuron connects to the inputs of all the neurons in the next layer. The
network is then asked to solve a problem. The input layer is populated from the training
data.The neurons activate throughout the layers until an answer is presented in the output layer.
The accuracy of the output is then measured. If the output hasn’t met your threshold, the
training is repeated, but with slight changes to the weights of the connections between the
neurons. It continues to repeat this process. Each time, it strengthens the connections that
lead to success, and diminishes the connections that lead to failure.

35
Types of Machine Learning?

● Supervised Learning
Supervised learning is a popular type of ML because it’s widely applicable. It’s called
supervised learning because you need a supervisor—a teacher—who can show the right
answers to the model.
you can have different types of problems within supervised learning. These
problems can be broadly categorized into two categories: classification and regression.
Classification problems have two types. The first type is considered a binary classification
problem. Recall the previous example about identifying fraudulent transactions.
Multiclass classification problems also exist.
Regression problems also exist. In a regression problem, you are no longer mapping an
input to a defined number of categories. Instead, you are mapping inputs to a continuous
value, like an integer. One example of an ML regression problem is predicting the price of
a company’s stock.

● UnSupervised Learning
In unsupervised learning,labels are not provided (like they are with supervised learning)
because you don’t know all the variables and patterns. In these instances, the machine must
uncover and create the labels itself. These models use the data that they are presented with
to detect emerging properties of the entire dataset, and then construct patterns.
A common subcategory of unsupervised learning is called clustering. This kind of algorithm
groups data into different clusters that are based on similar features to better understand
the attributes of a specific cluster.
The advantage of unsupervised algorithms is that they enable you to see patterns in the data
that you were otherwise unaware of. An example might be the existence of two major
customer types.

● Reinforcement Learning
In reinforcement learning, an agent continuously learns, through trial and error, as it
interacts in an environment. Reinforcement learning is useful when the reward of an
intended outcome is known, but the path to achieving it is not. Discovering that path
requires much trial and error.
Consider the example of AWS DeepRacer. In the AWS DeepRacer simulator, the agent is
the virtual car, and the environment is a virtual racetrack. The actions are the throttle and
steering inputs to the car. The goal is to complete the racetrack as quickly as possible
without deviating from the track.

36
Machine Learning Process
Business Problem
Data Preparation
Iterative model training
Feature engineering
Model Training
Evaluating and tuning the
model Deployment

Machine Learning Tools Overview


Jupyter Notebook is an open-source web application that enables you to create and share
documents that contain live code, equations, visualizations, and narrative text. Uses include
data cleaning and transformation, numerical simulation, statistical modeling, data
visualization, machine learning, and much more.
JupyterLab is a web-based interactive development environment for Jupyter notebooks,
code, and data. JupyterLab is flexible. You can configure and arrange the user interface to
support a range of workflows in data science, scientific computing, and machine learning.
JupyterLab is also extensible and modular.
Pandas is an open-source Python library. It’s used for data handling and analysis. It
represents data in a table that is similar to a spreadsheet. This table is known as a
pandas DataFrame.
Matplotlib is a library for creating scientific static, animated, and interactive visualizations in
Python. You use it to generate plots of your data later in this course.
Seaborn is another data visualization library for Python. It’s built on matplotlib, and it
provides a high-level interface for drawing informative statistical graphics.
NumPy is one of the fundamental scientific computing packages in Python. It contains
functions for N-dimensional array objects and useful math functions such as linear algebra,
Fourier transform, and random number capabilities.
Scikit-learn is an open-source machine learning library that supports supervised and
unsupervised learning. It also provides various tools for model fitting, data preprocessing,
model selection and evaluation, and many other utilities. scikit-learn is built on NumPy,
SciPy, and matplotlib, and it’s a good package for exploring machine learning.

37
Module 3 - Implementing Machine
Learning Pipeline with Amazon
SageMaker
Formulating Machine Learning Problems
Your first step in this phase is to define the problem that you’re trying to solve and the goal
that you want to reach. Understanding the business goal is key,because you use that goal to
measure the performance of your solution. You frequently must clarify the business problem
before you can begin to target a solution. You must ask other questions so that you can
thoroughly understand the problem.
Is this problem a supervised or unsupervised machine learning problem? Do you have labeled
data to train a supervised model?Again, you have many questions to ask yourself and the
business. Ultimately, you should try to validate the use of machine learning and confirm that
you have access to the right people and data. Then, devise the simplest solution to the problem.
Collecting and Securing data
You can obtain data from several places.
• Private data is data that you (or your customers) have in various existing systems.
Everything from log files to customer invoice databases can be useful, depending on
the problem that you want to solve.
• Commercial data is data that a commercial entity collected and made available.
Companies such as Reuters, Change Healthcare, Dun & Bradstreet, and
Foursquare maintain databases that you can subscribe to.
• Open-source data comprises many different open-source datasets that range
from scientific information to movie reviews. These datasets are usually available for
use in research or for teaching purposes.
Extracting,Transforming and Loading data
Data is typically spread across many different systems and data providers. The challenge is to
bring all these data sources together into something that a machine learning model can
consume.
The steps in an extract, transform, and load (ETL) process are defined as follows.
• Extract–Pull the data from the sources to a single location.
• Transform–During extraction, the data might need to be modified, matching
records might need to be combined, or other transformations might be necessary.
• Load–Finally, the data is loaded into a repository, such as Amazon S3 or Amazon Athena.

38
A typical ETL framework has several components. From the diagram, these components
include:
• Crawler–A program that connects to a data store (source or target). It progresses
through a prioritized list of classifiers to determine the schema for your data, and
creates metadata tables in the AWS Glue Data Catalog.
• Job–The business logic that is required to perform ETL work.
• Schedule or event–A scheduling service that periodically runs the ETL process.

ETL with AWS Glue


AWS Glue is a fully managed ETL service. With AWS Glue, it is simple and cost-effective to
categorize your data, clean it, enrich it, and move it reliably between various data stores.
AWS Glue consists of the following components:
• A central metadata repository, which is known as the AWS Glue Data Catalog
• An ETL engine that automatically generates Python or Scala code
• A flexible scheduler that handles dependency resolution, job monitoring, and
retries AWS Glue is serverless, so you don’t have infrastructure to set up or manage.
AWS Glue is designed to work with semistructured data. It introduces a component that is
called a dynamic frame, which you can use in your ETL scripts. A dynamic frame is similar to
an Apache Spark DataFrame, except that each record is self-describing, so no schema is
required initially.
Securing the data
AWS provides encryption features for storage services, typically both at rest and in transit.
You can often meet these encryption requirements by enabling encryption on the object or
service that you need to protect. For in-transit data, you must use secure transports, such
as Secure Sockets Layer/Transport Layer Security (SSL/TLS).
Evaluating your data
Data generally must be put into numeric form so ML algorithms can use the data to make
Predictions.
Understanding The Data
Descriptive Statistics
Statistical Characteristics
Plotting Statistics
Correlation Matrix

Feature Engineering
Feature engineering is a task which selects the columns of data that make the most impact in the
model.

39
Feature Selection And Extraction
Feature selection is about selecting the features that are most relevant and discarding the rest.
Feature selection is applied to prevent either redundancy or irrelevance in the existing features,
or to get a limited number of features to prevent overfitting.
Handling Missing Data
If you decide to drop rows with missing data, you can use built-in functions. For example, the
pandas drop function can drop all rows with missing data, or you can drop specific data values
by using a subset.
Outliers
Outliers are points in your dataset that lie at an abnormal distance from other values. They are
not always something that you want to clean up, because they can add richness to your dataset.The
outliers affect accuracy because they skew values away from the other more normal values that
are related to that feature.
Feature Selection
Filter methods use statistical methods to measure the relevance of features by their correlation
with the target variable.
Wrapper methods measure the usefulness of a subset of features by training a model on it and
measuring the success of the model.

Hyperparameter and model tuning


Hyperparameters have a few different categories. The first kind is model hyperparameters,
which help define the model itself. The second kind is optimizer hyperparameters. These
hyperparameters are related to how the model learns the patterns that are based on data, and
they are used for a neural-network model.The third kind is data hyperparameters, which relate
to the attributes of the data itself.
Tuning hyperparameters
Tuning hyperparameters can be labor-intensive. Traditionally, this kind of tuning was done
manually. Someone—who had domain experience that was related to that hyperparameter and
use case—would manually select the hyperparameters, according to their intuition and
experience. Then, they would train the model and score it on the validation data. This process
would be repeated until satisfactory results were achieved.
Hyperparameter Tuning
Hyperparameter tuning might not necessarily improve your model. It’s an advanced tool for
building machine solutions. As such, it should be considered part of the process of using the
scientific method.
Amazon SageMaker enables you to perform automated hyperparameter tuning. Amazon
SageMaker automatic model tuning, also known as hyperparameter tuning, finds the best
version of a model by running many training jobs on your dataset.

40
Module 4- Introduction to Forecasting
Forecasting overview
Forecasting is an important area of machine learning. It is important because so many
opportunities for predicting future outcomes are based on historical data. Many of these
opportunities involve a time component. Although the time component adds more information,
it also makes time series problems more difficult to handle than other types of predictions.
In addition to these two categories, most time series datasets also follow one of the following
patterns:
• Trend–A pattern that shows the values as they increase, decrease, or stay the same over time
• Seasonal–A repeating pattern that is based on the seasons in a year
• Cyclical–Some other form of a repeating pattern
• Irregular-Changes in the data over time that appear to be random or that have no discernable
pattern

Processing Time Series Data


Time series data
Time series data is captured in chronological sequence over a defined period of time.
Introducing time into a machine learning model has a positive impact because the model can
derive meaning from change in the data points over time. Time series data tends to be
correlated, which means that a dependency exists between data points.
Because you have a regression problem—and because regression assumes independence of data
points—you must develop a method for handling data dependence. The purpose of this method
is to increase the validity of the predictions.
In addition to the time series data, you can add related data to augment a forecasting model.
This information is in addition to the number of units that are sold per time period.
The third type of data is metadata about the dataset.

Time Series Handling:


1)Missing Data
A common occurrence in real-world forecasting problems is missing values in the raw
data.Missing values makes it harder for a model to generate a forecast.
Missing values can be marked as missing for various reasons. Missing values can occur because
of no transaction, or possibly because of measurement errors. Maybe a service that monitored
certain data was not working correctly, or the measurement could not occur correctly.
The missing data can be calculated in several ways:

41
• Forward fill–Uses the last known value for the missing value.
• Moving average–Uses the average of the last known values to calculate the missing value.
• Backward fill–Uses the next known value after the missing value. Be aware that it is a potential
danger to use the future to calculate the past, which is bad in forecasting. This practice is known
as lookahead,and it should be avoided.
• Interpolation–Essentially uses an equation to calculate the missing
value.Downsampling
We might obtain data at different frequencies. When you have data that is at a
different frequency than other datasets, or isn’t compatible with your question, you might need
to downsample.
Downsample means moving from a more finely grained time to a less finely grained time.When
you downsample, you must decide how to combine the values. Understanding your data helps
you decide what the best course of action is.
Upsampling
The inverse of downsampling is upsampling. The problem with upsampling is that it’s difficult to
achieve in most cases.Unless you have some other data source to reference, you wouldn’t be able
to change from data. In some cases, you must use additional data or knowledge.
Smoothing Data
Smoothing your data can help you deal with outliers and other anomalies. You might consider
smoothing for the following reasons.
• Data preparation–Removing error values and outliers
• Visualization–Reducing noise in a plot

Using Amazon Forecast


When you generate forecasts, you can apply the machine learning development pipeline that you
use.
• Import your data–You must import as much data as you have—both historical data and related
data. You should do some basic evaluation and feature engineering before you use the data to
train a model.
• Train a predictor–To train a predictor, you must choose an algorithm. If you are not sure
which algorithm is best for your data, you can let Amazon Forecast choose by selecting AutoML
as your algorithm. You also must select a domain for your data, but if you’re not sure which
domain fits best, you can select a custom domain. Domains have specific types of data that they
require. For more information, see Predefined Dataset Domains and Dataset Types in the
Amazon Forecast documentation.
• Generate forecasts–As soon as you have a trained model, you can use the model to make a
forecast by using an input dataset group. After you generate a forecast, you can query the
forecast, or you can export it to an Amazon Simple Storage Service (Amazon S3) bucket. You
alsohave the option to encrypt the data in the forecast before you export it.

42
Module 5- Introduction to Computer Vision

Introduction To Computer Vision

Computer Vision:
Computer vision enables machines to identify people, places, and things in images with accuracy
at or above human levels, with greater speed and efficiency. Often built with deep learning
models, computer vision automates the extraction, analysis, classification, and understanding of
useful information from a single image or a sequence of images. The image data can take many
forms, such as single images, video sequences, views from multiple cameras, or three-
dimensional data.

Computer Vision Applications:


● Public safety and home security
● Authentication and enhanced computer-human interaction
● Content management and analysis
● Autonomous driving
● Medical imaging
● Manufacturing process control

Analyzing images and videos


Amazon Rekognition is a computer vision service based on deep learning.Amazon Rekognition
is an AWS managed service that enables you to integrate image and video analysis into your
applications. Because it’s a managed service, Amazon Rekognition hosts the machine learning
models, maintains an API, and scales out to meet demand for you.
Amazon Rekognition enables you to perform the following types of analysis:
• Searchable image and video libraries–Amazon Rekognition makes images and stored videos
searchable so that you can discover the objects and scenes that appear in them.
• Face-based user verification–Amazon Rekognition enables your applications to confirm user
identities by comparing their live image with a reference image.
• Sentiment and demographic analysis–Amazon Rekognition interprets emotional expressions,
such as happy, sad, or surprise. It can also interpret demographic information from
facialimages, such as gender.

43
• Unsafe content detection–Amazon Rekognition can detect inappropriate content in images
and in stored videos
• Text detection–Amazon Rekognition Text in Image enables you to recognize and extract text
content from images.

Preparing custom datasets for computer vision

As with other machine learning processes, you must train Amazon Rekognition to recognize
scenes and objects that are in a domain. Thus, you need a training dataset and a test dataset that
contains labeled images. Amazon Rekognition Custom Labels can be helpful for these tasks.You
can use Amazon Rekognition Custom Labels to find objects and scenes that are unique to your
business needs.

Training a computer vision algorithm to recognize images requires a large input dataset, which
is impractical for most organizations.

You can use an existing model or a managed service like Amazon Rekognition Custom Labels to:
• Simplify data labeling–Amazon Rekognition Custom Labels provides a UI for labeling images,
including defining bounding boxes.

• Provide automated machine learning–Amazon Rekognition Custom Labels includes automated


machine learning capabilities that handle the ML process for you. When you provide training
images, Amazon Rekognition Custom Labels can automatically load and inspect the data, select
the correct machine learning algorithms, train a model, and provide model performance metrics.

• Provide simplified model evaluation, inference, and feedback–You evaluate your custom
model’s performance on your test set. For every image in the test set, you can see the side-by-
side comparison of the model’s prediction versus the label that it assigned.You can also review
detailed performance metrics. You can start using your model immediately for image analysis,
oryou can iterate and retrain new versions with more images to improve performance.

44
Module 6-Introducing Natural Language Processing

Overview of NLP
NLP is a broad term for a general set of business or computational problems that you can solve
with machine learning (ML). NLP systems predate ML. Two examples are speech-to-text on your
old cell phone and screen readers. Many NLP systems now use some form of machine learning.
NLP considers the hierarchical structure of language. Words are at the lowest layer of the
hierarchy. A group of words make a phrase. The next level up consists of phrases, which make a
sentence, and ultimately, sentences convey ideas.

Challenges Of NLP
Language is not precise. Words can have different meanings, which are based on the other
words that surround them (context). Often, the same words or phrases can have multiple
meanings.
Some of the main challenges for NLP include:
• Discovering the structure of the text–One of the first tasks of any NLP application is to break
the text into meaningful units, such as words, phrases, and sentences.
• Labeling data–After the system converts the text to data, the next challenge is to apply labels
that represent the various parts of speech. Every language requires a different labeling scheme to
match the language’s grammar.
• Representing context–Because word meaning depends on context, any NLP system needs a
way to represent context. It is a big challenge because of the large number of contexts.
Converting context into a form that computers can understand is difficult.
• Applying grammar–Although grammar defines a structure for language, the application of
grammar is nearly infinite. Dealing with the variation in how humans use language is a major
challenge for NLP systems. Addressing this challenge is where machine learning can have a big
impact.
• Chatbots to mimic human speech in applications

NLP Workflow:
For NLP, collecting data consists of breaking the text into meaningful subsets and labeling the
sets. Feature engineering is a large part of NLP applications. This process gets more complicated
when you have irregular or unstructured text.

Labeling data in the NLP domain is sometimes also called tagging.In the labeling process, you
must assign individual text strings to different parts of speech.
1) Preprocessing Text
2) Feature Engineering And Creating
Tokens 3)Text Analysis
4) Derive Meaning

45
CASE STUDY
❖ Problem
Statement

Facial Emotion Recognition Using Computer Vision


Human emotion, which is often expressed by facial expression, could be recognized using
computer vision. Facial expression is often dependent on human emotion, to make
computers capable of performing facial emotion recognition, we need to understand how
faces could be detected and processed by computers. Emotion detection in particular is
a facet of facial recognition that has great potential in a wide range of fields.

Machine (computer) could only process things that it has been introduced to, and to do so,
a computer requires a pattern as its main reference and comparison resource. The same
principle applies to researchers’ attempts to make computers capable of detecting and
recognizing human faces. Computer needs a model that could represent the human face,
and that model is later trained and then tested using positive and negative values.

❖ Solution
In order to perform facial emotion recognition, we need to define algorithm(s) that process
images more advanced than just a detection to be implemented by computers using tools
provided or designed by ourselves.

Emotion Detection Using open-source code from Open.CV for facial recognition, this
program is able to scan the current image for a face and detect whether or not a given
emotion is present. As an application, this program could be used in lie detection, vehicular
safety cameras, and even as an aid for autistic children.

The design of the program begins with creating a database and filling it with images
containing happy faces. This database is then compared with another that contains images
without faces, creating a classifier. The program first accesses the webcam and searches for
a face. Once found, features are extracted from the face and compared with the classifier to
see if the emotion is presented. Results show that the program is able to accurately detect
happiness. Future work is needed to train the program to detect other emotions as well.
Currently, the program is only able to detect whether the subject is happy or not and going
forward the goal is to create a database and classifier for all emotions. Also, as it stands
right now the program simply detects the emotion asked of it (happiness).

46
CONCLUSION

In a nutshell, this internship has been an excellent and rewarding experience. I came to
know about various technologies like Artificial Intelligence ,Machine Learning and cloud
computing technologies. I had learnt how to analyze,deal with the data and perform various
activities like Data Cleaning,Data Preprocessing etc. Various topics such as AWScloud,
its security, architecture etc were learnt during the internship programme . On the other
hand we also got introduced to the topics such as Deep Learning, Computer Vision, Natural
Language Processing(NLP),Confusion Matrix etc.

Needless to say, the technical aspects of the work I’ve done are not flawless and could be
improved provided enough time.As someone with no prior experience with these topics
whatsoever I believe my time spent in research and discovering it was well worth it.The
internship was also good to find out what my strengths and weaknesses are. This helped
me to define what skills and knowledge I have to improve in the coming time. While
doing the internship we had to face many situations in which we had to make our
decisions and use our skills so that our main goal was attained.We got a chance to assess
our interests,explore our choices and gain confidence.This internship opportunity is a great
one those who are passionate enough to learn AI & ML.

At last this internship has given new insights and motivation to pursue a career.

47
REFERENCES
[1] https://www.researchgate.net/publication/350094982_Cloud_Foundations
March 2021, DOI:10.1007/978-1-4842-6921-3_1

[2] Prof. Michele De Donno October 2019,


IEEE Access 7:1-1, DOI:10.1109/ACCESS.2019.2947652, License CC BY 4.0
https://www.researchgate.net/publication/336575191_Foundations_and_Evolution_of_
Modern_Computing_Paradigms_Cloud_IoT_Edge_and_Fog

[3] Author : Sukhpal Singh Gill, May 2018, License CC0, Project: AI for Computing

https://www.researchgate.net/publication/324982146_Sustainable_
Cloud_Computing_Foundations_and_Future_Directions

[4] Authors : Saloni Kargutkar, Paper Id : IJRASET46980, Publish Date : 2022-10-04


https://www.ijraset.com/best-journal/fundamentals-and-research-issues-on-cloud-computing

[5] Marius Lindauer, Katharina Eggensperger, Matthias Feurer, André


Biedenkapp, Difan Deng, Carolin Benjamins, Tim Ruhkopf, René Sass, Frank
Hutter, 2022. https://www.jmlr.org/mloss/

[6] Nitish Srivastava nitish, Geoffrey Hinton hinton, Alex Krizhevsky kriz, Ilya
Sutskever ilya, Ruslan Salakhutdinov, Department of Computer Science,
University of Toronto 10 King's College Road, Rm 3302 Toronto, Ontario, M5S
3G4, Canada. https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

48

You might also like