Preparing For The Google Cloud Professional Data Engineer Exam
Preparing For The Google Cloud Professional Data Engineer Exam
Promote a Cloud Bigtable solution with a lot of data from development to production
and optimize for performance.
- Change your Cloud Bigtable instance type from Development to Production, and set
the number of nodes to at least 3. Verify that the storage type is SSD.
A client is using Cloud SQL database to serve infrequently changing lookup tables
that host data used by applications. The applications will not modify the tables.
As they expand into other geographic regions they want to ensure good performance.
What do you recommend?
- Read replicas
BigQuery data is stored in external CSV files in Cloud Storage; as the data has
increased, the query performance has dropped.
- Import the data into BigQuery for better performance.
Host a deep neural network machine learning model on Google Cloud. Run and monitor
jobs that could occasionally fail.
- Use Vertex AI to host your model. Monitor the status of the Jobs object for
'failed' job states.
A client wants to store files from one location and retrieve them from another
location. Security requirements are that no one should be able to access the
contents of the file while it is hosted in the cloud. What is the best option?
- Client-side encryption
Three Google Cloud services commonly used together in data engineering solutions.
(Described in this course).
- Dataproc, Cloud SQL, BigQuery
A company has a new IoT pipeline. Which services will make this design work?
Select the services that should be used to replace the icons with the number "1"
and number "2" in the diagram.
- IoT Core, Pub/Sub
Calculate a running average on streaming data that can arrive late and out of
order.
- Use Pub/Sub and Dataflow with Sliding Time Windows.
A company has migrated their Hadoop cluster to the cloud and is now using Dataproc
with the same settings and methods as in the data center. What would you advise
them to do to make better use of the cloud environment?
- Store persistent data off-cluster. Start a cluster for one kind of work then shut
it down when it is not processing data.
Storage of JSON files with occasionally changing schema, for ANSI SQL queries.
- Store in BigQuery. Select "Automatically detect" in the Schema section.
Low-cost one-way one-time migration of two 100-fTB file servers to Google Cloud;
data will be frequently accessed and only from Germany.
- Use Transfer Appliance. Transfer to a Cloud Storage Standard bucket.
250,000 devices produce a JSON device status every 10 seconds. How do you capture
event data for outlier time series analysis?
- Capture data in Cloud Bigtable. Use the Cloud Bigtable cbt tool to display device
outlier data.
Event data in CSV format to be queried for individual values over time windows.
Which storage and schema to minimize query costs?
- Use Cloud Bigtable. Design tall and narrow tables, and use a new row for each
single event version.
You want to minimize costs to run Google Data Studio reports on BigQuery queries by
using prefetch caching.
- Set up the report to use the Owner's credentials to access the underlying data in
BigQuery, and verify that the 'Enable cache' checkbox is selected for the report.
------------------------------------LABS
LAB 1 QUERY 1
LAB 1 QUERY 2
LAB 2
gs://qwiklabs-gcp-02-03c51757f7ba/benchmark.py