Dataproc Documentation - Google Cloud
Dataproc Documentation - Google Cloud
Dataproc pricing
Dataproc pricing is based on the size of Dataproc clusters and the duration of time that they
run. The size of a cluster is based on the aggregate number of virtual CPUs (vCPUs)
(https://cloud.google.com/compute/docs/machine-types?hl=da) across the entire cluster, including
the master and worker nodes. The duration of a cluster is the length of time between cluster
creation and cluster deletion.
Although the pricing formula is expressed as an hourly rate, Dataproc is billed by the second,
and all Dataproc clusters are billed in one-second clock-time increments, subject to a 1-minute
minimum billing. Usage is stated in fractional hours (for example, 30 minutes is expressed as
0.5 hours) in order to apply hourly pricing to second-by-second use.
Scaling (https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/scaling-clusters?hl=da) and
autoscaling (https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/autoscaling?hl=da) clu
When VMs are added to the cluster, those machines are charged for the period of time that they are active. W
machines are deleted, they are no longer billed.
Pricing example
https://cloud.google.com/dataproc/pricing?hl=da 1/3
10/14/2020 Dataproc Documentation | Google Cloud
As an example, consider a cluster (with master and worker nodes) that has the following
configuration:
Item Machine Type Virtual CPUs Attached persistent disk Number in cluster
This Dataproc cluster has 24 virtual CPUs, 4 for the master and 20 spread across the workers.
For Dataproc billing purposes, the pricing for this cluster would be based on those 24 virtual
CPUs and the length of time the cluster ran (assuming no nodes are scaled down or
preempted). If the cluster runs for 2 hours, the Dataproc pricing would use the following
formula:
In this example, the cluster would also incur charges for Compute Engine and Standard
Persistent Disk Provisioned Space in addition to the Dataproc charge (see Use of other Google
Cloud resources (https://cloud.google.com/dataproc/use_of_other_google_cloud_resources?hl=da)). The
billing calculator (https://cloud.google.com/products/calculator?hl=da) can be used to determine
separate Google Cloud resource costs.
As a managed and integrated solution, Dataproc is built on top of other Google Cloud
technologies. Dataproc clusters consume the following resources, each billed at its own
pricing:
https://cloud.google.com/dataproc/pricing?hl=da 2/3
10/14/2020 Dataproc Documentation | Google Cloud
Dataproc clusters can optionally utilize the following resources, each billed at its own pricing,
including but not limited to:
Cloud Storage (https://cloud.google.com/storage?hl=da)
Global Networking (https://cloud.google.com/products/networking?hl=da)
BigQuery (https://cloud.google.com/bigquery?hl=da)
Cloud Bigtable (https://cloud.google.com/bigtable?hl=da)
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
(https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License
(https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies
(https://developers.google.com/site-policies?hl=da). Java is a registered trademark of Oracle and/or its
affiliates.
https://cloud.google.com/dataproc/pricing?hl=da 3/3