GCP Notes For Certification
GCP Notes For Certification
Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and
batch (historical) modes with equal reliability and expressiveness -- no more complex workarounds or
compromises needed. And with its serverless approach to resource provisioning and management, you
have access to virtually limitless capacity to solve your biggest data processing challenges, while paying
only for what you use.
Google Compute Engine delivers virtual machines running in Google's innovative data
centers and worldwide fiber network. Compute Engine's tooling and workflow
support enable scaling from single instances to global, load-balanced cloud computing.
Compute Engine's VMs boot quickly, come with persistent disk storage, and deliver
consistent performance. Our virtual servers are available in many configurations
including predefined sizes or the option to create Custom Machine Types optimized for
your specific needs. Flexible pricing and automatic sustained use discounts make
Compute Engine the leader in price/performance.
Preemtible VM;s
App Engine (GAE)
App Engine Standard Environment:
Compare need of Standard Environment and Flexible environment of App Engine instances
Cloud Pub/Sub:
Cloud Bigtable is well suited for a variety of large-scale, high-throughput workloads such
as advertising technology or IoT data infrastructure.
• Real-time app data: Cloud Bigtable can be accessed from apps running in App Engine
flexible environment, GKE, and Compute Engine for real-time live-serving workloads.
• Stream processing: As data is ingested by Cloud Pub/Sub, Cloud Dataflow can be
used to transform and load the data into Cloud Bigtable.
• IoT time series data: Data captured by sensors and streamed into GCP can be stored
using time-series schemas in Cloud Bigtable.
• Adtech workloads: Cloud Bigtable can be used to store and track ad impressions, as
well as a source for follow-on processing and analysis using Cloud Dataproc and Cloud
Dataflow.
• Data ingestion: Cloud Dataflow or Cloud Dataproc can be used to transform and load
data from Cloud Storage into Cloud Bigtable.
• Analytical workloads: Cloud Dataflow can be used to perform complex aggregations
directly from data stored in Cloud Bigtable, and Cloud Dataproc can be used to execute
Hadoop or Spark processing and machine-learning tasks.
• Apache HBase replacement: Cloud Bigtable can also be used as a drop-in
replacement for systems built usingApache HBase, an open source database based on
the original Cloud Bigtable paper authored by Google. Cloud Bigtable is compliant with
the HBase 1.x APIs so it can be integrated into many existing big-data systems. Apache
Cassandra uses a data model based on the one found in the Cloud Bigtable paper,
meaning Cloud Bigtable can also support several workloads that leverage a wide-
column-oriented schema and structure.
Cloud Spanner is a fully managed relational database service for mission-critical OLTP
apps. Cloud Spanner is horizontally scalable, and built for strong consistency, high
availability, and global scale. This combination of qualities makes it unique as a service.
Because Cloud Spanner is a fully managed service, you can focus on designing your
app and not your infrastructure.
Cloud Spanner is a good fit if you who want the ease of use and familiarity of a
relational database along with the scalability typically associated with a NoSQL
database. Like relational databases, Cloud Spanner supports schemas, ACID
transactions, and SQL queries ANSI 2011). Like many NoSQL databases, Cloud
Spanner scales horizontally in regions, but it can also scale across regions for
workloads that have more stringent availability requirements. Cloud Spanner also
performs automatic sharding while serving data with single-digit millisecond latencies.
Security features in Cloud Spanner include data-layer encryption, audit logging, and
Cloud IAM integration.
To get started with Cloud Spanner, refer to the Cloud Spanner Documentation.
Cloud Firestore is a database that stores JSON data. JSON data can be synchronized
in real time to connected clients across different platforms, including iOS, Android,
JavaScript, IoT devices, and desktop apps. If a client does not have network
connectivity, the Cloud Firestore API lets your app persist data to a local disk. After
connectivity is reestablished, the client device synchronizes itself with the current server
state.
Cloud Firestore is a NoSQL database with an API that you can use to build a real-time
experience serving millions of users without compromising responsiveness. To facilitate
this level of scale and responsiveness, it's important tostructure your data appropriately.
To get started with Cloud Firestore, refer to the documentation. Cloud Firestore has
SDKs for iOS, Android, web, C++, and Unity clients.
• Chat and social: Store and retrieve images, audio, video, and other user-generated
content.
• Mobile games:Keep track of game progress and statistics across devices and device
platforms.
Ecosystem databases
In addition to the database services provided by GCP, you can deploy your own
database software on high-performance Compute Engine virtual machines with highly
scalable persistent storage. Traditional RDBMS such as EnterpriseDB andMicrosoft
SQL Server are supported on GCP. NoSQL database systems such
as MongoDB and Cassandra are also supported in high-performance configurations.
Using GCP Marketplace you can deploy many types of databases onto GCP using pre-
built images, storage, and network settings. Deployment resources, such as Compute
Engine instances, persistent disks, network configurations, can be managed directly
and easily customized for different workloads or use cases.
For ingested data that will be ultimately analyzed in BigQuery, you can store data
directly in BigQuery, bypassing other storage mediums. BigQuery supports loading data
through the web interface, command line tools, and REST API calls.
When loading data in bulk, the data should be in the form of CSV, JSON, or Avro files.
You can then use the BigQuery web interface, command line tools, or REST API calls to
load data from these file formats into BigQuery tables.
For streaming data, you can use Cloud Pub/Sub and Cloud Dataflow in combination to
process incoming streams and store the resulting data in BigQuery. In some workloads,
however, it might be appropriate to stream data directly into BigQuery without additional
processing. You can also build custom apps, running on GCP or on-premises
infrastructure, that read from data sources with defined schemas and rows. The custom
app can then stream that data into BigQuery tables using the GCP SDKs or direct
REST API calls.