0% found this document useful (0 votes)
9 views17 pages

CLOUD ia2

Google Cloud Platform (GCP) offers a comprehensive suite of cloud computing services, including IaaS, PaaS, and SaaS, leveraging Google's global infrastructure. Key service categories include Compute, Storage and Databases, Networking, Big Data, Data Transfer, Cloud AI, Identity and Security, Management Tools, Developer Tools, and IoT. Each category provides specialized tools and services designed to enhance scalability, performance, and security for various applications and workloads.

Uploaded by

saurabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views17 pages

CLOUD ia2

Google Cloud Platform (GCP) offers a comprehensive suite of cloud computing services, including IaaS, PaaS, and SaaS, leveraging Google's global infrastructure. Key service categories include Compute, Storage and Databases, Networking, Big Data, Data Transfer, Cloud AI, Identity and Security, Management Tools, Developer Tools, and IoT. Each category provides specialized tools and services designed to enhance scalability, performance, and security for various applications and workloads.

Uploaded by

saurabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

UNIT-3

1. WRITE SHORT NOTES ON GCP/ GIVE AN OVERVIEW ON GCP. DESCRIBE


THE FEATURES OF GCP.
->
Google’s suite of cloud computing services runs on the same infrastructure and network as Google.
Google’s world wide collection of data centres hosts IAAS, PAAS and SAAS use cases.
At a basic level, it hosts and manages your computer infrastructure so you don’t have to and it does at
Google-scale.

2. LIST THE VARIOUS CATEGORIES OF SERVICES OFFERED BY GCP. WRITE


BRIEF NOTE ON EACH CATEGORY.
->
- Compute:
• High-performance, scalable VMs
• Build apps, scale automatically
- Storage and databases:
• Managed databases
• Object storage
- Networking:
• Manage networking for your resources
• Content delivery network
- Big data:
• Analytics data warehouse
• Managed Hadoop and Spark
- Data transfer:
• Online transfer
• Cloud storage transfer service
- Cloud AI:
• Train custom machine learning models
• Large scale machine learning service
• Powerful video analysis, image analysis, speech recognition, text analysis
- API platforms and ecosystems:
• Provides access to numerous APIs
• Performs API analytics
• Supports API monetization
- Identity and security:
• Manages encryption keys
• User access control
• Full control over security
- Management tools:
• Integrated monitoring, logging, and diagnostics
• Real-time log management and analysis
• Detailed performance insights
• Manage your APIs
- Developer tools:
• Essential tools for cloud platform
• PowerShell on GCP
• Visual Studio as your cloud platform IDE
- Internet of Things:
• Help you connect sensors to cloud
3.WRITE BRIEF NOTES ON COMPUTE CATEGORY SERVICES.
->
• The following are the services that fall under the Compute category:
- GCP Compute engine:
- Google compute Engine is the IaaS component of the GCP that allows to create and run VM
on Google infrastructure.
- A compute engine instance can run Linux and Windows server images provided by Google
- The main purpose of Compute Engine is to provide readily available virtual machines, that
are highly scalable, high with performance.
- Using Compute Engine you can set up your own server—on Linux or Windows.
- GCP App engine:
- Whenever we want to build a web app or an Android app, we always have to think of the
frontend, APIs, and backend infrastructure.
- But now using App Engine all we have to worry about is the code we are writing for our
app.
- If your team lack in infrastructure competencies, you have requirement of load balancing,
or want enabling of auto scaling; these are a few of the scenarios when you can opt for
App Engine.
- Supports almost all popular languages
- No need to think of infrastructure
- Auto scalable
- Load Balancer enabled
- High Security

4.WRITE BRIEF NOTES ON STORAGE AND DATABASES CATEGORY


SERVICES.
->
- The services falling under the Storage and Databases category are as follows:
• Cloud Storage:
- On Cloud Storage, you can store any kind of data, such as movies, songs, text les, and live and
archival data. This data is called object.
• GCP Cloud SQL:
- Cloud SQL provides fully managed MySQL and PostgreSQL. PostgreSQL is in Beta right now.
They can scale, and are secure and highly available.
• GCP Cloud Bigtable:
- Cloud Bigtable is one of the best NoSQL available on the market.
- Google uses Cloud Bigtable for Search, Maps, and Gmail.
- The scale at which we can have a seamless interaction with this NoSQL is a mesmerizing
experience.
• GCP Cloud Spanner:
- Cloud Spanner is a database that is enterprise-grade, globally-distributed, and strongly
consistent providing the taste of structure of relational database and horizontal scale of
NoSQL.
• GCP Cloud Datastore:
- Cloud Datastore is a highly scalable NoSQL database providing features such as automatic
sharding and replication and it is highly durable and available.
- Cloud Datastore being a NoSQL provides us with ACID properties, SQL queries for simplicity, and
indexes to get faster results.
• GCP Persistent Disk:
- Persistent Disk is the component that is lying below most of the services for storage.
fi
- One major service extensively using Persistent Disk is Virtual Machines in Google Compute
Engine.
- Persistent Disk are distinguished in di erent types such as Hard Disk Drive and Solid State
Disk.
5.WRITE BRIEF NOTES ON NETWORKING CATEGORY SERVICES.
->
• Virtual Private Cloud (VPC):
- Virtual Private Cloud has a main purpose of building your own private network as it is on the
GCP—along with IP addresses, subnets, availability zone, and load balancers.
- VPN
- Security
- Easy routing
• Cloud Load Balancing:
- The Cloud Load Balancing service is used to provide auto scaling and load balancing to your
application at great performance and scale.
- Seamless Autoscaling
- HTTP(S) Load Balancing
- TCP/SSL and UDP support
• Cloud CDN:
- Cloud CDN stands for Cloud Content Delivery Network. This is used when you have a huge
global audience to serve your content.
• Cloud Interconnect
• Cloud DNS
• Network Service Tiers ALPHA

6. WRITE BRIEF NOTES ON BIG DATA CATEGORY SERVICES.


->
• BigQuery:
- BigQuery is the data warehouse on Google Cloud Platform. It is cost-e ective, fully managed,
and works at scale.
• Cloud Data ow:
- The main purpose of Cloud Data ow is to provide real-time transformation and enrich data. It
also serves the purpose of reducing complexity. This is yet another component with serverless
capabilities and is fully managed.
• Cloud Dataproc:
- Cloud Dataproc provides us with the power of Hadoop and Spark. Using Cloud Dataproc, we can
build our own Hadoop and Spark servers, with all the prominent services auto installed. Cloud
Dataproc is well integrated with other GCP components.
• Cloud Datalab:
- Cloud DataLab is a very powerful exploration tool. Using Cloud DataLab, we can explore,
analyze, transform, and visualize data. Using this data, we can build our own Machine Learning
models.
• Cloud Data prep BETA:
- Delivering real-time event driven data at any scale with reliability. Using Cloud Pub/Sub you
can do stream analytics on real-time data.
• Cloud Pub/Sub:
• Genomics
• Google Data Studio* BETA
fl
fl
ff
ff
7. WRITE BRIEF NOTES ON DATA TRANSFER CATEGORY SERVICES.
->
• Google Transfer Appliance:
- Google Transfer Appliance can be used to transfer data to Google Server o ine.
- If you are migrating your complete data center, then data can exceed the petabyte scale.
- Secure: Data is encrypted
- Uploads huge data: 100 GB to 480 TB in a single appliance
• Cloud Storage Transfer Service:
- Cloud Storage Transfer Service comes with two features – transferring data from other cloud
provider's bucket and transferring data within di erent Google Cloud Platform.
• Google BigQuery Data Transfer Service:
- Google BigQuery Data Transfer Service helps us to import all the data that Google SaaS
applications generate.
- Within a few clicks you will have the data of AdWords, DoubleClick campaign Manager, and
many other applications.

8. WRITE BRIEF NOTES ON CLOUD AI CATEGORY SERVICES.


->
• Cloud AutoML alpha:
- Cloud AutoML is in alpha stage right now.
- Cloud AutoML can be used to train high quality custom machine learning models in the simplest
form possible.
- It is the rst product in AutoML Vision; using this, you can train custom vision models of our
own.
• Cloud TPU beta:
- Cloud TPU can be called an accelerated machine learning service.
- Cloud TPU uses Tensor ow to accelerate machine learning.
• Cloud machine learning engine:
- Cloud Machine Learning Engine is the platform where you can build your own machine learning
models.
- Using Tensor ow we can create our own model to execute on any kind of data at any scale.
- It is well integrated with Google Cloud Data ow for pre-pre-processing, thus accessing Google
Cloud Storage and Google BigQuery.
• Cloud natural language:
- Cloud natural language can be used to build insights out of unstructured data such as
messages, texts, tweets, blog spots, and so on.
• Cloud Speech API, Translation API and Vision API:
- Cloud speech API converts speech to text using a powerful neural network.
- The translation API can be used to translate text from one language to another.
- Vision API is used to identify an object in an image. All these are REST-based APIs.
• Cloud Video Intelligence:
- Cloud video intelligence is used to perform analysis on videos.
fi
fl
fl
fl
ff
ffl
9. WRITE BRIEF NOTES ON MANAGEMENT TOOLS CATEGORY SERVICES.
->
• Monitoring, logging, error Reporting, Trace, Debugger:
- Monitoring, logging, error reporting, trace, and debugger are all separate services in Google
Cloud Platform
- Monitoring you are going to use to identify uptime and overall health of the application.
- Logging can be used to manage logs in real-time and further analysis.
- Error reporting helps us in understanding, categorizing, and identifying errors in our
applications.
- To nd performance bottlenecks in the application, we can use trace.
• Cloud deployment manager:
- used to create and manage templates for cloud resources with simple templates.
- We have to write that template in YAML format. Using Cloud Deployment Manager, we can
deploy infrastructure that requires repeated e orts.
• Cloud console:
- web UI that you can use to control the complete Google Cloud Platform.
- Resource management:
- SSH browser: you can use this interface to log in easily into a VM instance
- DevOps integration: using a mobile app, you can very easily perform your DevOps tasks
- Costing: Cloud console is free to use for GCP customers.
• Cloud shell:
- Cloud shell helps us in managing infrastructure using Command Prompt. You can access it using
cloud console web UI.
• Cloud console mobile app
fi
ff
10. ANALYZE THE GIVEN USE-CASE AND ANSWER THE QUESTIONS:
SMART PARKING SOLUTION BY MARK N PARK:
Abstract - In this use case, our aim is to nd real-time parking occupancy of four-wheel vehicles and
show it on mobile and web apps.
I. Introduce the possible solutions to this use case.
II. Identify the Challenges to be addressed by the analytics team?
III. Identify the services to be used on GCP to address the challenges?
IV. Give the architecture diagram for the solution to this use case.
->
• CHALLENGES AND SOLUTIONS:
- Collection of sensor data in real time :
- When we are dealing with the data that is coming from HTTP or MQTT, the best solution is
using Google Cloud IoT Core
- Updating the right dataset/database -
- Once the data is received, our next task is to update the right dataset. Google IoT Core
publishes the data to Cloud Pub/Sub
- Storing periodic data :
- Using Cloud Pub/Sub we can store the data in Cloud Data ow, Cloud BigQuery, or Cloud
Bigtable.
- Transmitting the data to the end user :
- To show data to the customers we can use the existing APIs or we can custom build our
APIs. These APIs can be then connected to Firebase as well
- Reports and dashboard output required:
- For reporting purposes, we can also use Google Data Studio.
- Scaling infrastructure -
- Most of the services that we have mentioned earlier are fully managed, so no user
involvement is required.
• SERVICES :
- Cloud IoT Core
- Cloud Pub/Sub
- Cloud Data ow
- Cloud Bigtable
- Cloud BigQuery
- Google Data Studio
- Apigee API Platform
- Cloud Endpoints
fl
fi
fl
11.ANALYZE THE GIVEN USE-CASE AND ANSWER THE QUESTIONS:
DSS for web mining recommendation using TensorFlow:
Abstract - Decision supports systems (DSSs) are computer-based information systems designed to help
managers to select one of the many alternative solutions to a problem. Web mining is the application of
data mining techniques to discover patterns from the World Wide Web. The aim of this use case is to nd
patterns to recommend to users using TensorFlow.
1. Identify the Challenges to be addressed by the analytics team?
2. Identify the services to be used on GCP to address the challenges?
3. Give the architecture diagram for the solution to this use case.
->
• CHALLENGES AND SOLUTION:
- Internet bandwidth:
- On low speed internet, more time is taken to process data on the web. Therefore, it is
suggested to have high speed internet.
- Local systems or mobile hardware con guration:
- There should be a common platform for representing of semi-structured or unstructured
data.
- Collection of data in real time:
- To transfer the data through HTTPS, collect the data across a distributed network with
various structures of networks.
- Updating the right database:
- A dataset contains structured elements, the shapes and types of the dataset take on the
same structure
- Storing periodic data:
- We will be storing data streamed from Cloud Pub/Sub service to any one of these services -
Cloud Data ow or Cloud BigQuery or Cloud Bigtable.
- Extracting the data to the end user:
- Existing APIs are used to transmit the data. The API can easily connect
- Reports generation as per requirements of the end user:
- Through Google Data Studio or Estimators, we can generate the results in the form of
automatic generated graphs.
- Scaling infrastructure:
- Most services are scalable and fully manageable, thus we don’t need to manage these
services manually.
fl
fi
fi
UNIT-4
1. WHAT IS MICROSOFT AZURE? DISCUSS THE ROLE OF AZURE FOR
ANALYTICS.
->
• Microsoft Azure is an enterprise-grade set of cloud computing services created by Microsoft using
their own managed data centers.
• Azure is the only cloud with a true end-to-end analytics solution.
• With Azure, analysts can derive insights in seconds from all enterprise data.
• Azure provides a mature and robust data ow without limitations on concurrency.
• Azure supports Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and SaaS.
• Azure provides exibility.
• Familiar Microsoft tools and infrastructures such as MySQL, Linux, PHP, Python, Java, or any other
open source technologies can all run on the Azure cloud.
• Fexibility to choose other types of databases or storage, whether through a service installed on a
Linux server or containerized solution, or a managed platform
• This is very important because, in the real world, di erent scenarios require di erent solutions,
tools, and products.
• SECURITY :
- Azure has the most advanced security and privacy features in the analytics space.
- Azure services support data protection through Virtual Networks (VNets) so that, even though
they are in the cloud, data points cannot be accessed by the public internet.
• CLOUD SCALE :
- The true power of the cloud is its elasticity.
- This allows you to not only scale resources up but also scale them down when necessary.
- In data science, this is very useful because data science entails variable workloads.
- Through services such as Azure Machine Learning allows you to scale according to demand.

2. DISCUSS THE FUNDAMENTAL CONCEPTS TO BE DEALT WITH IN TERMS


OF POWER OF DATA.
->
• The scale of data being talked about here is massive—hence, the popular term big data is used to
describe the harnessing power of this data at scale.
• There are a number of ways in which data analytics can help your business thrive.
- Big data analytics:
- The term "big data" is often used to describe massive volumes of data that traditional tools
cannot handle. It can be characterized by the ve Vs:
• Volume
• Variety
• Velocity
• Value
• Veracity
- DataOps:
- DataOps removes the co-ordination barrier between data and operations teams in order to
achieve speed and accuracy in data analytics.
- DataOps is about a culture of collaboration between di erent roles and functions.
- Data scientists have access to real-time data to explore, prepare, and serve results.
- IoT
- Machine Learning (ML)
- Arti cial Intelligence (AI)
fi
fl
fl
fi
ff
ff
ff
3. WRITE SHORT NOTES ON DATAOPS.
->
• DataOps removes the co-ordination barrier between data and operations teams in order to achieve
speed and accuracy in data analytics.
• DataOps is about a culture of collaboration between di erent roles and functions.
• Data scientists have access to real-time data to explore, prepare, and serve results.
• Automated processes and ows prove invaluable to this collaborative e ort between analysts and
developers, as they provide easy access to data through visualization tools.
• Relevant data should be served to end users via web or mobile applications; this is usually possible
with an Application Programming Interface (API).
• For CEOs, DataOps means faster decision-making, as it allows them to monitor their business at a
high level without waiting for team leaders to report.
fl
ff
ff
4. DISCUSS HOW MICROSOFT PROVIDES SECURITY WHILE UTILIZING
AZURE FOR ANALYTICS.
->
• Microsoft views security as the top priority.
• Azure has the most advanced security and privacy features in the analytics space.
• Azure services support data protection through Virtual Networks (VNets) so that, even though they
are in the cloud, data points cannot be accessed by the public internet.
• Only the users in the same VNet can communicate with each other.
• For web applications, you get a Web Application Firewall (WAF) provided by Azure Application
Gateway, which ensures that only valid requests can get into your network.
• With role-based access control (authorization), you can ensure that only those with the right roles,
such as administrators, have access to speci c components and the capabilities of di erent
resources.
• Authentication, on the other hand, ensures that if you don't have the right credentials (such as
passwords), you will not be able to access a resource.
• Authorization and authentication are built into various services and components of Microsoft
Azure with the help of Azure Active Directory.
• Azure also provides a service called Azure Key Vault.
• Key Vault allows you to safely store and manage secrets and passwords, create encryption keys,
and manage certi cates so that applications do not have direct access to private keys.
• By following this pattern with Key Vault, you do not have to hardcode your secrets and passwords
in your source code and script repository.

5. ELUCIDATE ON HOW MICROSOFT AZURE SUPPORTS SCALING FEATURE


OF CLOUD FOR ANALYTICS.
->
- With Azure, you can scale your cloud resources e ortlessly up or down, in or out, in minutes.
- The true power of the cloud is its elasticity.
- This allows you to not only scale resources up but also scale them down when necessary.
- In data science, this is very useful because data science entails variable workloads.
- When data scientists and engineers are analyzing a dataset, for instance, there is a need for
more computation.
- Through services such as Azure Machine Learning allows you to scale according to demand.
- Azure basically o ers a pay-as-you-go or pay-for-what-you-use service.
- Azure also provides a Service Level Agreement (SLA) for their services as their commitments to
ensure uptime and connectivity for their production customers.
- There are di erent scaling approaches and patterns that Microsoft Azure provides:
• Vertical scaling:
- This is when more resources are added to the same instance (server or service).
• Horizontal scaling:
- This is when you deploy your application to multiple instances.
• Geographical scaling:
- This is when you scale your applications to di erent geographical locations for two major
reasons: resilience and reduced latency.
• Sharding:
- This is one of the techniques for distributing huge volumes of related, structured data onto
multiple independent databases.
• Development, Testing, Acceptance, and Production (DTAP):
- This is the approach of having multiple instances living in di erent logical environments.
ff
ff
fi
fi
ff
ff
ff
ff
6. DEFINE BIG DATA. DISCUSS THE FIVE V S THAT CHARACTERIZE THE
BIG DATA. DISCUSS ANY ONE USE CASE OF BIG DATA.
->
• Big data analytics is the process of nding patterns, trends, and correlations in unstructured data
to derive meaningful insights that shape business decisions.
• This unstructured data is usually large in le size.
• The term "big data" is often used to describe massive volumes of data that traditional tools cannot
handle. It can be characterized by the ve Vs:
• Volume:
• This indicates the volume of data that needs to be analyzed for big data analytics. We
are now dealing with larger datasets than ever before.
• This has been made possible because of the availability of electronic products such as
mobile devices and IoT sensors that have been widely adopted all over the globe for
commercial purposes.
• Velocity:
• This refers to the rate at which data is being generated.
• Devices and platforms, such as those just mentioned, constantly produce data on a
large scale and at rapid speed.
• This makes collecting, processing, analyzing, and serving data at rapid speeds necessary.
• Variety:
• This refers to the structure of data being produced.
• Data sources are inconsistent, having a mix of structured, unstructured, and some semi-
structured data .
• Value:
• This refers to the value of the data being extracted.
• Accessible data may not always be valuable. With the right tools, you can derive value
from the data in a cost-e ective and scalable way.
• Veracity:
• This is the quality or trustworthiness of data.
• A raw dataset will usually contain a lot of noise (or data that needs cleaning) and bias
and will need cleaning.
• APPLICATIONS:
• Social media analysis:
• Fraud prevention:
• Price optimization:
ff
fi
fi
fi

7. DISCUSS
THE VARIOUS REASONS FOR ORGANIZATIONS TO ADAPT
CLOUD FOR THEIR ANALYTICS.
->
• RAPID GROWTH AND SCALE:
- With the rapid growth of mobile applications, IoT sensors, and social media data there is just so
much data to capture.
- This means enterprises and businesses need to scale their infrastructure to support these
massive demands.
- Company database sizes continuously grow from gigabytes of data to terabytes, or even
petabytes, of data.
- Scaling does not only apply to the consumers of the applications; it is also important for data
scientists, data engineers, and data analysts in order to analyze a company's data.
- Scaling an infrastructure is vital, as you cannot expect your data engineers to handle massive
chunks of data and run scripts to test your data models on a single machine.
• REDUCING COSTS:
- Due to scaling demands, enterprises and businesses need to have a mechanism to expand their
data infrastructure in a cost-e ective and nancially viable way.
- Networking and other physical infrastructure costs, such as hardware cooling and data center
real estate
- Professional services costs associated with setting up and maintaining these servers
- Licensing costs (if any)
- The productivity lost from people and teams who cannot ship their products faster
• DRIVING INNOVATION:
- Companies need to constantly innovate in the very competitive market, to protect their market
share hence, companies need to have a mechanism to explore new things based on what they
know.
- With advanced analytics, these companies can come up with decisions that are relevant today
and are not just restricted to analyzing historical data.

8. WRITE BRIEF NOTES ON DATA WAREHOUSE. WHY DO YOU NEED A


MODERN DATA WAREHOUSE?
->
• A data warehouse is a centralized repository that aggregates di erent (often disparate) data
sources.
• The main di erence between a data warehouse and a database is that data warehouses are
meant for Online Analytical Processing (OLAP) and databases, on the other hand, are intended for
Online Transaction Processing (OLTP).
• OLAP means that data warehouses are primarily used to generate analytics, business intelligence
and even machine learning models.
• OLTP means that databases are primarily used for transactions.
• A data warehouse is essential if you want to analyze your big data as it also contains historical
data.
• Here are some of the advantages of having a modern data warehouse:
- Supports any data source
- Highly scalable and available
- Provides insights from analytical dashboards in real-time
- Supports a machine learning environment
• For a data warehouse to be modern, there is a need to understand how two major concepts
compute and storage are managed in traditional data ware house -
- Compute:
ff
ff
fi
ff
- This refers to the ability to process the data and make sense out of it. It can be in the form
of a database query to make the results accessible to another interface, such as web
applications.
- Storage:
- This refers to the ability to keep data in order for it to be accessible at any time in the
future.
9. WHAT IS DATA INTEGRATION IN A DATA WAREHOUSE?
->
The data coming from these di erent sources is of di erent data types:
1.structured
2. unstructured
3. semi-structured.
(NOTE: explain all these three types.)

10. WHAT A IS MODERN DATA PIPELINE? DISCUSS VARIOUS STAGES IN


DATA PIPELINE WITH A SUITABLE DIAGRAM.
->

• Data ingestion:
• means transferring data from the source to your storage, data lake, or data warehouse.
• This would involve something such as Azure Synapse Analytics using data integration to
transfer data from various sources such as on-premises databases and SaaS products to a
data lake.
• Data storage :
• Once data has been ingested from various data sources, all the data is stored in a data lake.
• The data residing within the lake will still be in a raw format and includes both structured
and unstructured data formats.
• Data sharing :
• Azure Data Share allows you to securely manage and share your big data to other parties
and organizations.
• Data preparation:
• Once data is ingested, the next step is data preparation.
• This is a phase where the data from di erent data sources is pre-processed for data analytics
purposes.
ff
ff
ff
11. DISCUSS THE ARCHITECTURE OF AZURE SYNAPSE ANALYTICS SERVICE
WITH A SUITABLE DIAGRAM.
->
• Azure Synapse is an enterprise analytics service that accelerates time to insight across data
warehouses and big data systems.
• Azure Synapse Analytics is a fully managed, integrated data analytics service that blends data
warehousing, data integration, and big data processing with accelerating time to insight into a
single service.

• Synapse SQL is a distributed query system for T-SQL that enables data warehousing and data
virtualization scenarios and extends T-SQL to address streaming and machine learning scenarios.
• Apache Spark for Azure Synapse deeply and seamlessly integrates Apache Spark--the most
popular open source big data engine used for data preparation, data engineering, ETL, and
machine learning.
• Azure Synapse removes the traditional technology barriers between using SQL and Spark together.
You can seamlessly mix and match based on your needs and expertise.
• Azure Synapse contains the same Data Integration engine and experiences as Azure Data Factory,
allowing you to create rich at-scale ETL pipelines without leaving Azure Synapse Analytics.

12. WRITE SHORT NOTES ON: A) SYNAPSE SQL B) APACHE SPARK


->
• Synapse SQL :
- Synapse SQL is a distributed query system for T-SQL that enables data warehousing and data
virtualization scenarios and extends T-SQL to address streaming and machine learning
scenarios.
- Synapse SQL o ers both serverless and dedicated resource models.
- For predictable performance and cost, create dedicated SQL pools to reserve processing power
for data stored in SQL tables.
- For unplanned or bursty workloads, use the always-available, serverless SQL endpoint.
- ·Use built-in streaming capabilities to land data from cloud data sources into SQL tables
- ·Integrate AI with SQL by using machine learning models to score data using the T-SQL PREDICT
function
• Apache Spark:
- Apache Spark for Azure Synapse deeply and seamlessly integrates Apache Spark--the most
popular open source big data engine used for data preparation, data engineering, ETL, and
machine learning.
- ·ML models with SparkML algorithms and AzureML integration for Apache Spark 3.1 with built-in
support for Linux Foundation Delta Lake
- Simpli ed resource model that frees you from having to worry about managing clusters.
- Fast Spark start-up and aggressive autoscaling.
- Built-in support for .NET for Spark allowing you to reuse your C# expertise and existing .NET
code within a Spark application.
fi
ff
13. WRITE BRIEF NOTES ON THE FOLLOWING WITH RESPECT TO
MICROSOFT AZURE SYNAPSE ANALYTICS:
A. SYNAPSE WORKSPACE
B. LINKED SERVICES
C. SYNAPSE SQL
D. APACHE SPARK FOR SYNAPSE
E. SYNAPSEML
F. PIPELINES
G. DATA EXPLORER
->
• Synapse workspace:
- A Synapse workspace is a securable collaboration boundary for doing cloud-based enterprise
analytics in Azure.
- A workspace allows you to perform analytics with SQL and Apache spark. Resources available
for SQL and Spark analytics are organized into SQL and Spark pools.
• Linked services:
- A workspace can contain any number of Linked service, essentially connection strings that
de ne the connection information needed for the workspace to connect to external resources
• Synapse SQL:
- is the ability to do T-SQL based analytics in Synapse workspace.
- Synapse SQL has two consumption models:
- dedicated:use dedicated SQL pools
- serverless: use the serverless SQL pools.
- Every workspace has one of these pools.
• Apache Spark for Synapse:
- There are two ways within Synapse to use Spark:
- Spark Notebooks for doing Data Science and Engineering use Scala, PySpark, C#, and
SparkSQL
- Spark job de nitions for running batch Spark jobs using jar les.
• SynapseML:
- SynapseML, is an open-source library that simpli es the creation of massively scalable machine
learning (ML) pipelines.
- It is an ecosystem of tools used to expand the Apache Spark framework in several new
directions.
• Pipelines:
- provides Data Integration - allowing you to move data between services and orchestrate
activities.
- ·Pipeline are logical grouping of activities that perform a task together.
fi
fi
fi
fi
14.WHAT IS A LINKED SERVICE POWERBI ON AZURE? DISCUSS THE
POWERBI ARCHITECTURE THAT PERFORMS VARIOUS FUNCTIONS?
->
• Power BI is a suite of tools that enables users to visualize data and share insights across teams
and organizations, or embed dashboard analytics in their websites or applications.
• It supports di erent data sources and helps analysts and end users create live dashboards and
reports about business data on-demand.
• With Power BI, you can leverage the power of the cloud to harness complex data and present it
through rich graphs or charts.
• Power BI is composed of di erent components that can perform di erent functions:
- Power BI Desktop:
- This is a Windows desktop-based application that is often referred to as an authoring tool,
where you primarily design and publish reports to the service.
- Power BI Service:
- This is the managed platform to deploy your reports on the web for your organization to
access.
- Power BI Mobile Apps:
- These are native mobile applications that can access reports from a workspace that is
hosted in Power BI.
- Power BI Gateways:
- A Gateway is a mechanism to sync external data into Power BI.
- Power BI Report Server:
- Power BI Report Server allows you to host Power BI reports within your own data center.
- These reports are still shareable across di erent members, as long as they have the right
network access.
- Power BI Embedded:
- Embedded allows you to white label Power BI in your own custom applications.
ff
ff
ff
ff

You might also like