gcloud dataproc jobs submit pyspark <PY_FILE> <JOB_ARGS>

Submit a PySpark job to a cluster

Arguments

NameDescription
PY_FILEMain .py file to run as the driver
JOB_ARGSArguments to pass to the driver. + The '--' argument must be specified between gcloud specific args on the left and JOB_ARGS on the right

Options

NameDescription
--account <ACCOUNT>Google Cloud Platform user account to use for invocation. Overrides the default *core/account* property value for this command invocation
--archives <ARCHIVE>Comma separated list of archives to be extracted into the working directory of each executor. Must be one of the following file formats: .zip, .tar, .tar.gz, or .tgz
--asyncReturn immediately, without waiting for the operation in progress to complete
--billing-project <BILLING_PROJECT>The Google Cloud Platform project that will be charged quota for operations performed in gcloud. If you need to operate on one project, but need quota against a different project, you can use this flag to specify the billing project. If both `billing/quota_project` and `--billing-project` are specified, `--billing-project` takes precedence. Run `$ gcloud config set --help` to see more information about `billing/quota_project`
--bucket <BUCKET>The Cloud Storage bucket to stage files in. Defaults to the cluster's configured bucket
--cluster <CLUSTER>The Dataproc cluster to submit the job to
--configuration <CONFIGURATION>The configuration to use for this command invocation. For more information on how to use configurations, run: `gcloud topic configurations`. You can also use the CLOUDSDK_ACTIVE_CONFIG_NAME environment variable to set the equivalent of this flag for a terminal session
--driver-log-levels <PACKAGE=LEVEL>List of key value pairs to configure driver logging, where key is a package and value is the log4j log level. For example: root=FATAL,com.example=INFO
--files <FILE>Comma separated list of files to be placed in the working directory of both the app master and executors
--flags-file <YAML_FILE>A YAML or JSON file that specifies a *--flag*:*value* dictionary. Useful for specifying complex flag values with special characters that work with any command interpreter. Additionally, each *--flags-file* arg is replaced by its constituent flags. See $ gcloud topic flags-file for more information
--flatten <KEY>Flatten _name_[] output resource slices in _KEY_ into separate records for each item in each slice. Multiple keys and slices may be specified. This also flattens keys for *--format* and *--filter*. For example, *--flatten=abc.def* flattens *abc.def[].ghi* references to *abc.def.ghi*. A resource record containing *abc.def[]* with N elements will expand to N records in the flattened output. This flag interacts with other flags that are applied in this order: *--flatten*, *--sort-by*, *--filter*, *--limit*
--format <FORMAT>Set the format for printing command output resources. The default is a command-specific human-friendly output format. The supported formats are: `config`, `csv`, `default`, `diff`, `disable`, `flattened`, `get`, `json`, `list`, `multi`, `none`, `object`, `table`, `text`, `value`, `yaml`. For more details run $ gcloud topic formats
--helpDisplay detailed help
--impersonate-service-account <SERVICE_ACCOUNT_EMAIL>For this gcloud invocation, all API requests will be made as the given service account instead of the currently selected account. This is done without needing to create, download, and activate a key for the account. In order to perform operations as the service account, your currently selected account must have an IAM role that includes the iam.serviceAccounts.getAccessToken permission for the service account. The roles/iam.serviceAccountTokenCreator role has this permission or you may create a custom role. Overrides the default *auth/impersonate_service_account* property value for this command invocation
--jars <JAR>Comma separated list of jar files to be provided to the executor and driver classpaths
--labels <KEY=VALUE>List of label KEY=VALUE pairs to add. + Keys must start with a lowercase character and contain only hyphens (`-`), underscores (```_```), lowercase characters, and numbers. Values must contain only hyphens (`-`), underscores (```_```), lowercase characters, and numbers
--log-httpLog all HTTP server requests and responses to stderr. Overrides the default *core/log_http* property value for this command invocation
--max-failures-per-hour <MAX_FAILURES_PER_HOUR>Specifies the maximum number of times a job can be restarted per hour in event of failure. Default is 0 (no retries after job failure)
--project <PROJECT_ID>The Google Cloud Platform project ID to use for this invocation. If omitted, then the current project is assumed; the current project can be listed using `gcloud config list --format='text(core.project)'` and can be set using `gcloud config set project PROJECTID`. + `--project` and its fallback `core/project` property play two roles in the invocation. It specifies the project of the resource to operate on. It also specifies the project for API enablement check, quota, and billing. To specify a different project for quota and billing, use `--billing-project` or `billing/quota_project` property
--properties <PROPERTY=VALUE>List of key value pairs to configure PySpark. For a list of available properties, see: https://spark.apache.org/docs/latest/configuration.html#available-properties
--py-files <PY_FILE>Comma separated list of Python files to be provided to the job. Must be one of the following file formats ".py, .zip, or .egg"
--quietDisable all interactive prompts when running gcloud commands. If input is required, defaults will be used, or an error will be raised. Overrides the default core/disable_prompts property value for this command invocation. This is equivalent to setting the environment variable `CLOUDSDK_CORE_DISABLE_PROMPTS` to 1
--region <REGION>Cloud Dataproc region to use. Each Cloud Dataproc region constitutes an independent resource namespace constrained to deploying instances into Compute Engine zones inside the region. Overrides the default *dataproc/region* property value for this command invocation
--trace-token <TRACE_TOKEN>Token used to route traces of service requests for investigation of issues. Overrides the default *core/trace_token* property value for this command invocation
--user-output-enabledPrint user intended output to the console. Overrides the default *core/user_output_enabled* property value for this command invocation. Use *--no-user-output-enabled* to disable
--verbosity <VERBOSITY>Override the default verbosity for this command. Overrides the default *core/verbosity* property value for this command invocation. _VERBOSITY_ must be one of: *debug*, *info*, *warning*, *error*, *critical*, *none*