gcloud dataproc clusters create <CLUSTER>

Create a cluster

Arguments

NameDescription
CLUSTERID of the cluster or fully qualified identifier for the cluster

Options

NameDescription
--account <ACCOUNT>Google Cloud Platform user account to use for invocation. Overrides the default *core/account* property value for this command invocation
--asyncReturn immediately, without waiting for the operation in progress to complete
--autoscaling-policy <AUTOSCALING_POLICY>ID of the autoscaling policy or fully qualified identifier for the autoscaling policy
--billing-project <BILLING_PROJECT>The Google Cloud Platform project that will be charged quota for operations performed in gcloud. If you need to operate on one project, but need quota against a different project, you can use this flag to specify the billing project. If both `billing/quota_project` and `--billing-project` are specified, `--billing-project` takes precedence. Run `$ gcloud config set --help` to see more information about `billing/quota_project`
--bucket <BUCKET>The Google Cloud Storage bucket to use by default to stage job dependencies, miscellaneous config files, and job driver console output when using this cluster
--configuration <CONFIGURATION>The configuration to use for this command invocation. For more information on how to use configurations, run: `gcloud topic configurations`. You can also use the CLOUDSDK_ACTIVE_CONFIG_NAME environment variable to set the equivalent of this flag for a terminal session
--enable-component-gatewayEnable access to the web UIs of selected components on the cluster through the component gateway
--enable-kerberosEnable Kerberos on the cluster
--expiration-time <EXPIRATION_TIME>The time when cluster will be auto-deleted, such as "2017-08-29T18:52:51.142Z." See $ gcloud topic datetimes for information on time formats
--flags-file <YAML_FILE>A YAML or JSON file that specifies a *--flag*:*value* dictionary. Useful for specifying complex flag values with special characters that work with any command interpreter. Additionally, each *--flags-file* arg is replaced by its constituent flags. See $ gcloud topic flags-file for more information
--flatten <KEY>Flatten _name_[] output resource slices in _KEY_ into separate records for each item in each slice. Multiple keys and slices may be specified. This also flattens keys for *--format* and *--filter*. For example, *--flatten=abc.def* flattens *abc.def[].ghi* references to *abc.def.ghi*. A resource record containing *abc.def[]* with N elements will expand to N records in the flattened output. This flag interacts with other flags that are applied in this order: *--flatten*, *--sort-by*, *--filter*, *--limit*
--format <FORMAT>Set the format for printing command output resources. The default is a command-specific human-friendly output format. The supported formats are: `config`, `csv`, `default`, `diff`, `disable`, `flattened`, `get`, `json`, `list`, `multi`, `none`, `object`, `table`, `text`, `value`, `yaml`. For more details run $ gcloud topic formats
--gce-pd-kms-key <GCE_PD_KMS_KEY>ID of the key or fully qualified identifier for the key
--gce-pd-kms-key-keyring <GCE_PD_KMS_KEY_KEYRING>The KMS keyring of the key
--gce-pd-kms-key-location <GCE_PD_KMS_KEY_LOCATION>The Cloud location for the key
--gce-pd-kms-key-project <GCE_PD_KMS_KEY_PROJECT>The Cloud project for the key
--helpDisplay detailed help
--image <IMAGE>The custom image used to create the cluster. It can be the image name, the image URI, or the image family URI, which selects the latest image from the family
--image-version <VERSION>The image version to use for the cluster. Defaults to the latest version
--impersonate-service-account <SERVICE_ACCOUNT_EMAIL>For this gcloud invocation, all API requests will be made as the given service account instead of the currently selected account. This is done without needing to create, download, and activate a key for the account. In order to perform operations as the service account, your currently selected account must have an IAM role that includes the iam.serviceAccounts.getAccessToken permission for the service account. The roles/iam.serviceAccountTokenCreator role has this permission or you may create a custom role. Overrides the default *auth/impersonate_service_account* property value for this command invocation
--initialization-action-timeout <TIMEOUT>The maximum duration of each initialization action. See $ gcloud topic datetimes for information on duration formats
--initialization-actions <CLOUD_STORAGE_URI>A list of Google Cloud Storage URIs of executables to run on each node in the cluster
--kerberos-config-file <KERBEROS_CONFIG_FILE>Path to a YAML (or JSON) file containing the configuration for Kerberos on the cluster. If you pass `-` as the value of the flag the file content will be read from stdin. + The YAML file is formatted as follows: + ``` # Optional. Flag to indicate whether to Kerberize the cluster. # The default value is true. enable_kerberos: true + # Required. The Google Cloud Storage URI of a KMS encrypted file # containing the root principal password. root_principal_password_uri: gs://bucket/password.encrypted + # Required. The URI of the KMS key used to encrypt various # sensitive files. kms_key_uri: projects/myproject/locations/global/keyRings/mykeyring/cryptoKeys/my-key + # Configuration of SSL encryption. If specified, all sub-fields # are required. Otherwise, Dataproc will provide a self-signed # certificate and generate the passwords. ssl: # Optional. The Google Cloud Storage URI of the keystore file. keystore_uri: gs://bucket/keystore.jks + # Optional. The Google Cloud Storage URI of a KMS encrypted # file containing the password to the keystore. keystore_password_uri: gs://bucket/keystore_password.encrypted + # Optional. The Google Cloud Storage URI of a KMS encrypted # file containing the password to the user provided key. key_password_uri: gs://bucket/key_password.encrypted + # Optional. The Google Cloud Storage URI of the truststore # file. truststore_uri: gs://bucket/truststore.jks + # Optional. The Google Cloud Storage URI of a KMS encrypted # file containing the password to the user provided # truststore. truststore_password_uri: gs://bucket/truststore_password.encrypted + # Configuration of cross realm trust. cross_realm_trust: # Optional. The remote realm the Dataproc on-cluster KDC will # trust, should the user enable cross realm trust. realm: REMOTE.REALM + # Optional. The KDC (IP or hostname) for the remote trusted # realm in a cross realm trust relationship. kdc: kdc.remote.realm + # Optional. The admin server (IP or hostname) for the remote # trusted realm in a cross realm trust relationship. admin_server: admin-server.remote.realm + # Optional. The Google Cloud Storage URI of a KMS encrypted # file containing the shared password between the on-cluster # Kerberos realm and the remote trusted realm, in a cross # realm trust relationship. shared_password_uri: gs://bucket/cross-realm.password.encrypted + # Optional. The Google Cloud Storage URI of a KMS encrypted file # containing the master key of the KDC database. kdc_db_key_uri: gs://bucket/kdc_db_key.encrypted + # Optional. The lifetime of the ticket granting ticket, in # hours. If not specified, or user specifies 0, then default # value 10 will be used. tgt_lifetime_hours: 1 + # Optional. The name of the Kerberos realm. If not specified, # the uppercased domain name of the cluster will be used. realm: REALM.NAME ```
--kerberos-kms-key <KERBEROS_KMS_KEY>ID of the key or fully qualified identifier for the key
--kerberos-kms-key-keyring <KERBEROS_KMS_KEY_KEYRING>The KMS keyring of the key
--kerberos-kms-key-location <KERBEROS_KMS_KEY_LOCATION>The Cloud location for the key
--kerberos-kms-key-project <KERBEROS_KMS_KEY_PROJECT>The Cloud project for the key
--kerberos-root-principal-password-uri <KERBEROS_ROOT_PRINCIPAL_PASSWORD_URI>Google Cloud Storage URI of a KMS encrypted file containing the root principal password. Must be a Cloud Storage URL beginning with 'gs://'
--labels <KEY=VALUE>List of label KEY=VALUE pairs to add. + Keys must start with a lowercase character and contain only hyphens (`-`), underscores (```_```), lowercase characters, and numbers. Values must contain only hyphens (`-`), underscores (```_```), lowercase characters, and numbers
--log-httpLog all HTTP server requests and responses to stderr. Overrides the default *core/log_http* property value for this command invocation
--master-accelerator <type=TYPE,[count=COUNT]>Attaches accelerators (e.g. GPUs) to the master instance(s). + *type*::: The specific type (e.g. nvidia-tesla-k80 for nVidia Tesla K80) of accelerator to attach to the instances. Use 'gcloud compute accelerator-types list' to learn about all available accelerator types. + *count*::: The number of pieces of the accelerator to attach to each of the instances. The default value is 1
--master-boot-disk-size <MASTER_BOOT_DISK_SIZE>The size of the boot disk. The value must be a whole number followed by a size unit of ``KB'' for kilobyte, ``MB'' for megabyte, ``GB'' for gigabyte, or ``TB'' for terabyte. For example, ``10GB'' will produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1 GB
--master-boot-disk-type <MASTER_BOOT_DISK_TYPE>The type of the boot disk. The value must be ``pd-standard'' or ``pd-ssd''
--master-machine-type <MASTER_MACHINE_TYPE>The type of machine to use for the master. Defaults to server-specified
--master-min-cpu-platform <PLATFORM>When specified, the VM will be scheduled on host with specified CPU architecture or a newer one. To list available CPU platforms in given zone, run: + $ gcloud compute zones describe ZONE + CPU platform selection is available only in selected zones; zones that allow CPU platform selection will have an `availableCpuPlatforms` field that contains the list of available CPU platforms for that zone. + You can find more information online: https://cloud.google.com/compute/docs/instances/specify-min-cpu-platform
--max-age <MAX_AGE>The lifespan of the cluster before it is auto-deleted, such as "2h" or "1d". See $ gcloud topic datetimes for information on duration formats
--max-idle <MAX_IDLE>The duration before cluster is auto-deleted after last job completes, such as "2h" or "1d". See $ gcloud topic datetimes for information on duration formats
--metadata <KEY=VALUE>Metadata to be made available to the guest operating system running on the instances
--network <NETWORK>The Compute Engine network that the VM instances of the cluster will be part of. This is mutually exclusive with --subnet. If neither is specified, this defaults to the "default" network
--no-addressIf provided, the instances in the cluster will not be assigned external IP addresses. + If omitted the instances in the cluster will each be assigned an ephemeral external IP address. + Note: Dataproc VMs need access to the Dataproc API. This can be achieved without external IP addresses using Private Google Access (https://cloud.google.com/compute/docs/private-google-access)
--node-group <NODE_GROUP>The name of the sole-tenant node group to create the cluster on. Can be a short name ("node-group-name") or in the format "projects/{project-id}/zones/{zone}/nodeGroups/{node-group-name}"
--num-master-local-ssds <NUM_MASTER_LOCAL_SSDS>The number of local SSDs to attach to the master in a cluster
--num-masters <NUM_MASTERS>The number of master nodes in the cluster. + Number of Masters | Cluster Mode --- | --- 1 | Standard 3 | High Availability
--num-secondary-worker-local-ssds <NUM_SECONDARY_WORKER_LOCAL_SSDS>The number of local SSDs to attach to each preemptible worker in a cluster
--num-secondary-workers <NUM_SECONDARY_WORKERS>The number of secondary worker nodes in the cluster
--num-worker-local-ssds <NUM_WORKER_LOCAL_SSDS>The number of local SSDs to attach to each worker in a cluster
--num-workers <NUM_WORKERS>The number of worker nodes in the cluster. Defaults to server-specified
--optional-components <COMPONENT>List of optional components to be installed on cluster machines. + The following page documents the optional components that can be installed: https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/optional-components
--private-ipv6-google-access-type <PRIVATE_IPV6_GOOGLE_ACCESS_TYPE>The private IPv6 Google access type for the cluster. _PRIVATE_IPV6_GOOGLE_ACCESS_TYPE_ must be one of: *inherit-subnetwork*, *outbound*, *bidirectional*
--project <PROJECT_ID>The Google Cloud Platform project ID to use for this invocation. If omitted, then the current project is assumed; the current project can be listed using `gcloud config list --format='text(core.project)'` and can be set using `gcloud config set project PROJECTID`. + `--project` and its fallback `core/project` property play two roles in the invocation. It specifies the project of the resource to operate on. It also specifies the project for API enablement check, quota, and billing. To specify a different project for quota and billing, use `--billing-project` or `billing/quota_project` property
--properties <PREFIX:PROPERTY=VALUE>Specifies configuration properties for installed packages, such as Hadoop and Spark. + Properties are mapped to configuration files by specifying a prefix, such as "core:io.serializations". The following are supported prefixes and their mappings: + Prefix | File | Purpose of file --- | --- | --- capacity-scheduler | capacity-scheduler.xml | Hadoop YARN Capacity Scheduler configuration core | core-site.xml | Hadoop general configuration distcp | distcp-default.xml | Hadoop Distributed Copy configuration hadoop-env | hadoop-env.sh | Hadoop specific environment variables hdfs | hdfs-site.xml | Hadoop HDFS configuration hive | hive-site.xml | Hive configuration mapred | mapred-site.xml | Hadoop MapReduce configuration mapred-env | mapred-env.sh | Hadoop MapReduce specific environment variables pig | pig.properties | Pig configuration spark | spark-defaults.conf | Spark configuration spark-env | spark-env.sh | Spark specific environment variables yarn | yarn-site.xml | Hadoop YARN configuration yarn-env | yarn-env.sh | Hadoop YARN specific environment variables + See https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties for more information. +
--quietDisable all interactive prompts when running gcloud commands. If input is required, defaults will be used, or an error will be raised. Overrides the default core/disable_prompts property value for this command invocation. This is equivalent to setting the environment variable `CLOUDSDK_CORE_DISABLE_PROMPTS` to 1
--region <REGION>Dataproc region for the cluster. Each Dataproc region constitutes an independent resource namespace constrained to deploying instances into Compute Engine zones inside the region. Overrides the default `dataproc/region` property value for this command invocation
--reservation <RESERVATION>The name of the reservation, required when `--reservation-affinity=specific`
--reservation-affinity <RESERVATION_AFFINITY>The type of reservation for the instance. _RESERVATION_AFFINITY_ must be one of: *any*, *none*, *specific*
--scopes <SCOPE>Specifies scopes for the node instances. Multiple SCOPEs can be specified, separated by commas. Examples: + $ {command} example-cluster --scopes https://www.googleapis.com/auth/bigtable.admin + $ {command} example-cluster --scopes sqlservice,bigquery + The following *minimum scopes* are necessary for the cluster to function properly and are always added, even if not explicitly specified: + https://www.googleapis.com/auth/devstorage.read_write https://www.googleapis.com/auth/logging.write + If the `--scopes` flag is not specified, the following *default scopes* are also included: + https://www.googleapis.com/auth/bigquery https://www.googleapis.com/auth/bigtable.admin.table https://www.googleapis.com/auth/bigtable.data https://www.googleapis.com/auth/devstorage.full_control + If you want to enable all scopes use the 'cloud-platform' scope. + SCOPE can be either the full URI of the scope or an alias. *default* scopes are assigned to all instances. Available aliases are: + Alias | URI --- | --- bigquery | https://www.googleapis.com/auth/bigquery cloud-platform | https://www.googleapis.com/auth/cloud-platform cloud-source-repos | https://www.googleapis.com/auth/source.full_control cloud-source-repos-ro | https://www.googleapis.com/auth/source.read_only compute-ro | https://www.googleapis.com/auth/compute.readonly compute-rw | https://www.googleapis.com/auth/compute datastore | https://www.googleapis.com/auth/datastore default | https://www.googleapis.com/auth/devstorage.read_only | https://www.googleapis.com/auth/logging.write | https://www.googleapis.com/auth/monitoring.write | https://www.googleapis.com/auth/pubsub | https://www.googleapis.com/auth/service.management.readonly | https://www.googleapis.com/auth/servicecontrol | https://www.googleapis.com/auth/trace.append gke-default | https://www.googleapis.com/auth/devstorage.read_only | https://www.googleapis.com/auth/logging.write | https://www.googleapis.com/auth/monitoring | https://www.googleapis.com/auth/service.management.readonly | https://www.googleapis.com/auth/servicecontrol | https://www.googleapis.com/auth/trace.append logging-write | https://www.googleapis.com/auth/logging.write monitoring | https://www.googleapis.com/auth/monitoring monitoring-read | https://www.googleapis.com/auth/monitoring.read monitoring-write | https://www.googleapis.com/auth/monitoring.write pubsub | https://www.googleapis.com/auth/pubsub service-control | https://www.googleapis.com/auth/servicecontrol service-management | https://www.googleapis.com/auth/service.management.readonly sql (deprecated) | https://www.googleapis.com/auth/sqlservice sql-admin | https://www.googleapis.com/auth/sqlservice.admin storage-full | https://www.googleapis.com/auth/devstorage.full_control storage-ro | https://www.googleapis.com/auth/devstorage.read_only storage-rw | https://www.googleapis.com/auth/devstorage.read_write taskqueue | https://www.googleapis.com/auth/taskqueue trace | https://www.googleapis.com/auth/trace.append userinfo-email | https://www.googleapis.com/auth/userinfo.email + DEPRECATION WARNING: https://www.googleapis.com/auth/sqlservice account scope and `sql` alias do not provide SQL instance management capabilities and have been deprecated. Please, use https://www.googleapis.com/auth/sqlservice.admin or `sql-admin` to manage your Google SQL Service instances. +
--secondary-worker-accelerator <type=TYPE,[count=COUNT]>Attaches accelerators (e.g. GPUs) to the secondary-worker instance(s). + *type*::: The specific type (e.g. nvidia-tesla-k80 for nVidia Tesla K80) of accelerator to attach to the instances. Use 'gcloud compute accelerator-types list' to learn about all available accelerator types. + *count*::: The number of pieces of the accelerator to attach to each of the instances. The default value is 1
--secondary-worker-boot-disk-size <SECONDARY_WORKER_BOOT_DISK_SIZE>The size of the boot disk. The value must be a whole number followed by a size unit of ``KB'' for kilobyte, ``MB'' for megabyte, ``GB'' for gigabyte, or ``TB'' for terabyte. For example, ``10GB'' will produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1 GB
--secondary-worker-boot-disk-type <SECONDARY_WORKER_BOOT_DISK_TYPE>The type of the boot disk. The value must be ``pd-standard'' or ``pd-ssd''
--secondary-worker-type <TYPE>The type of the secondary worker group. _TYPE_ must be one of: *preemptible*, *non-preemptible*
--service-account <SERVICE_ACCOUNT>The Google Cloud IAM service account to be authenticated as
--single-nodeCreate a single node cluster. + A single node cluster has all master and worker components. It cannot have any separate worker nodes. If this flag is not specified, a cluster with separate workers is created
--subnet <SUBNET>Specifies the subnet that the cluster will be part of. This is mutally exclusive with --network
--tags <TAG>Specifies a list of tags to apply to the instance. These tags allow network firewall rules and routes to be applied to specified VM instances. See gcloud_compute_firewall-rules_create(1) for more details. + To read more about configuring network tags, read this guide: https://cloud.google.com/vpc/docs/add-remove-network-tags + To list instances with their respective status and tags, run: + $ gcloud compute instances list --format='table(name,status,tags.list())' + To list instances tagged with a specific tag, `tag1`, run: + $ gcloud compute instances list --filter='tags:tag1'
--temp-bucket <TEMP_BUCKET>The Google Cloud Storage bucket to use by default to to store ephemeral cluster and jobs data, such as Spark and MapReduce history files
--trace-token <TRACE_TOKEN>Token used to route traces of service requests for investigation of issues. Overrides the default *core/trace_token* property value for this command invocation
--user-output-enabledPrint user intended output to the console. Overrides the default *core/user_output_enabled* property value for this command invocation. Use *--no-user-output-enabled* to disable
--verbosity <VERBOSITY>Override the default verbosity for this command. Overrides the default *core/verbosity* property value for this command invocation. _VERBOSITY_ must be one of: *debug*, *info*, *warning*, *error*, *critical*, *none*
--worker-accelerator <type=TYPE,[count=COUNT]>Attaches accelerators (e.g. GPUs) to the worker instance(s). + *type*::: The specific type (e.g. nvidia-tesla-k80 for nVidia Tesla K80) of accelerator to attach to the instances. Use 'gcloud compute accelerator-types list' to learn about all available accelerator types. + *count*::: The number of pieces of the accelerator to attach to each of the instances. The default value is 1
--worker-boot-disk-size <WORKER_BOOT_DISK_SIZE>The size of the boot disk. The value must be a whole number followed by a size unit of ``KB'' for kilobyte, ``MB'' for megabyte, ``GB'' for gigabyte, or ``TB'' for terabyte. For example, ``10GB'' will produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1 GB
--worker-boot-disk-type <WORKER_BOOT_DISK_TYPE>The type of the boot disk. The value must be ``pd-standard'' or ``pd-ssd''
--worker-machine-type <WORKER_MACHINE_TYPE>The type of machine to use for workers. Defaults to server-specified
--worker-min-cpu-platform <PLATFORM>When specified, the VM will be scheduled on host with specified CPU architecture or a newer one. To list available CPU platforms in given zone, run: + $ gcloud compute zones describe ZONE + CPU platform selection is available only in selected zones; zones that allow CPU platform selection will have an `availableCpuPlatforms` field that contains the list of available CPU platforms for that zone. + You can find more information online: https://cloud.google.com/compute/docs/instances/specify-min-cpu-platform
--zone <ZONE>The compute zone (e.g. us-central1-a) for the cluster. If empty and --region is set to a value other than `global`, the server will pick a zone in the region. Overrides the default *compute/zone* property value for this command invocation