0% found this document useful (0 votes)
197 views3 pages

Preparing For The Google Cloud Professional Data Engineer Exam

1) AVRO is used for serialization and deserialization of data so it can be transmitted and stored while maintaining an object structure. 2) To optimize a Cloud Bigtable solution for performance when promoting from development to production, change the instance type to Production and set the number of nodes to at least 3, verifying storage is SSD. 3) For a Cloud SQL database serving infrequently changing lookup tables across regions, use read replicas to ensure good performance.

Uploaded by

Prashant Rohilla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
197 views3 pages

Preparing For The Google Cloud Professional Data Engineer Exam

1) AVRO is used for serialization and deserialization of data so it can be transmitted and stored while maintaining an object structure. 2) To optimize a Cloud Bigtable solution for performance when promoting from development to production, change the instance type to Production and set the number of nodes to at least 3, verifying storage is SSD. 3) For a Cloud SQL database serving infrequently changing lookup tables across regions, use read replicas to ensure good performance.

Uploaded by

Prashant Rohilla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

----------------------QUIZ 4 and 5

What is AVRO used for?


- Serialization and de-serialization of data so that it can be transmitted and
stored while maintaining an object structure.

Promote a Cloud Bigtable solution with a lot of data from development to production
and optimize for performance.
- Change your Cloud Bigtable instance type from Development to Production, and set
the number of nodes to at least 3. Verify that the storage type is SSD.

An application has the following data requirements.


- Cloud SQL

A client is using Cloud SQL database to serve infrequently changing lookup tables
that host data used by applications. The applications will not modify the tables.
As they expand into other geographic regions they want to ensure good performance.
What do you recommend?
- Read replicas

Customer wants to maintain investment in an existing Apache Spark code data


pipeline.
- Dataproc

BigQuery data is stored in external CSV files in Cloud Storage; as the data has
increased, the query performance has dropped.
- Import the data into BigQuery for better performance.

Host a deep neural network machine learning model on Google Cloud. Run and monitor
jobs that could occasionally fail.
- Use Vertex AI to host your model. Monitor the status of the Jobs object for
'failed' job states.

A client wants to store files from one location and retrieve them from another
location. Security requirements are that no one should be able to access the
contents of the file while it is hosted in the cloud. What is the best option?
- Client-side encryption

Three Google Cloud services commonly used together in data engineering solutions.
(Described in this course).
- Dataproc, Cloud SQL, BigQuery

A company has a new IoT pipeline. Which services will make this design work?
Select the services that should be used to replace the icons with the number "1"
and number "2" in the diagram.
- IoT Core, Pub/Sub

Source data is streamed in bursts and must be transformed before use.


- Use Pub/Sub to buffer the data, and then use Dataflow for ETL.

Calculate a running average on streaming data that can arrive late and out of
order.
- Use Pub/Sub and Dataflow with Sliding Time Windows.

A company has migrated their Hadoop cluster to the cloud and is now using Dataproc
with the same settings and methods as in the data center. What would you advise
them to do to make better use of the cloud environment?
- Store persistent data off-cluster. Start a cluster for one kind of work then shut
it down when it is not processing data.

Storage of JSON files with occasionally changing schema, for ANSI SQL queries.
- Store in BigQuery. Select "Automatically detect" in the Schema section.

Cost-effective backup to Google Cloud of multi-TB databases from another cloud


including monthly DR drills.
- Use Storage Transfer Service. Transfer to Cloud Storage Nearline bucket.

Low-cost one-way one-time migration of two 100-fTB file servers to Google Cloud;
data will be frequently accessed and only from Germany.
- Use Transfer Appliance. Transfer to a Cloud Storage Standard bucket.

Cost-effective way to run non-critical Apache Spark jobs on Dataproc?


- Set up a cluster in standard mode with high-memory machine types. Add 10
additional preemptible worker nodes.

250,000 devices produce a JSON device status every 10 seconds. How do you capture
event data for outlier time series analysis?
- Capture data in Cloud Bigtable. Use the Cloud Bigtable cbt tool to display device
outlier data.

A client has been developing a pipeline based on PCollections using local


programming techniques and is ready to scale up to production. What should they do?
- They should use the Dataflow Cloud Runner.

A company wants to connect cloud applications to an Oracle database in its data


center. Requirements are a maximum of 9 Gbps of data and a Service Level Agreement
(SLA) of 99%.
- Partner Interconnect

A Data Analyst is concerned that a BigQuery query could be too expensive.


- Use the SELECT clause to limit the amount of data in the query. Partition data by
date so the query can be more focused.

Event data in CSV format to be queried for individual values over time windows.
Which storage and schema to minimize query costs?
- Use Cloud Bigtable. Design tall and narrow tables, and use a new row for each
single event version.

You want to minimize costs to run Google Data Studio reports on BigQuery queries by
using prefetch caching.
- Set up the report to use the Owner's credentials to access the underlying data in
BigQuery, and verify that the 'Enable cache' checkbox is selected for the report.

------------------------------------LABS
LAB 1 QUERY 1

SELECT airline,AVG(minutes) as avg


FROM `<project>.JasmineJasper.triplog`
WHere origin ="FRA" and Destination="KUL"
GROUP BY airline

LAB 1 QUERY 2

SELECT airline,AVG(minutes) as avg


FROM `<project>.JasmineJasper.triplog`
WHere origin ="LHR" and Destination="KUL"
GROUP BY airline
ORDER BY avg ASC

LAB 2

gcloud storage buckets create gs://qwiklabs-gcp-02-03c51757f7ba

gsutil cp -r gs://cloud-training/preppde/benchmark.py gs://qwiklabs-gcp-02-


03c51757f7ba

gs://qwiklabs-gcp-02-03c51757f7ba/benchmark.py

You might also like