aws emr

Amazon EMR is a web service that makes it easier to process large amounts of data efficiently. Amazon EMR uses Hadoop processing combined with several AWS services to do tasks such as web indexing, data mining, log file analysis, machine learning, scientific simulation, and data warehouse management

Subcommands

NameDescription
add-instance-fleetAdds an instance fleet to a running cluster. The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x
add-instance-groupsAdds an instance group to a running cluster
add-tagsAdds tags to an Amazon EMR resource. Tags make it easier to associate clusters in various ways, such as grouping clusters to track your Amazon EMR resource allocation costs. For more information, see Tag Clusters
cancel-stepsCancels a pending step or steps in a running cluster. Available only in Amazon EMR versions 4.8.0 and later, excluding version 5.0.0. A maximum of 256 steps are allowed in each CancelSteps request. CancelSteps is idempotent but asynchronous; it does not guarantee that a step will be canceled, even if the request is successfully submitted. You can only cancel steps that are in a PENDING state
create-security-configurationCreates a security configuration, which is stored in the service and can be specified when a cluster is created
create-studioCreates a new Amazon EMR Studio
create-studio-session-mappingMaps a user or group to the Amazon EMR Studio specified by StudioId, and applies a session policy to refine Studio permissions for that user or group
delete-security-configurationDeletes a security configuration
delete-studioRemoves an Amazon EMR Studio from the Studio metadata store
delete-studio-session-mappingRemoves a user or group from an Amazon EMR Studio
describe-clusterProvides cluster-level details including status, hardware and software configuration, VPC settings, bootstrap actions, instance groups and so on. For information about the cluster steps, see <code>list-steps</code>
describe-notebook-executionProvides details of a notebook execution
describe-security-configurationProvides the details of a security configuration by returning the configuration JSON
describe-stepProvides more detail about the cluster step
describe-studioReturns details for the specified Amazon EMR Studio including ID, Name, VPC, Studio access URL, and so on
get-block-public-access-configurationReturns the Amazon EMR block public access configuration for your AWS account in the current Region. For more information see Configure Block Public Access for Amazon EMR in the Amazon EMR Management Guide
get-managed-scaling-policyFetches the attached managed scaling policy for an Amazon EMR cluster
get-studio-session-mappingFetches mapping details for the specified Amazon EMR Studio and identity (user or group)
list-clustersProvides the status of all clusters visible to this AWS account. Allows you to filter the list of clusters based on certain criteria; for example, filtering by cluster creation date and time or by status. This call returns a maximum of 50 clusters per call, but returns a marker to track the paging of the cluster list across multiple ListClusters calls
list-instance-fleetsLists all available details about the instance fleets in a cluster. The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions
list-instancesProvides information for all active EC2 instances and EC2 instances terminated in the last 30 days, up to a maximum of 2,000. EC2 instances in any of the following states are considered active: AWAITING_FULFILLMENT, PROVISIONING, BOOTSTRAPPING, RUNNING
list-notebook-executionsProvides summaries of all notebook executions. You can filter the list based on multiple criteria such as status, time range, and editor id. Returns a maximum of 50 notebook executions and a marker to track the paging of a longer notebook execution list across multiple ListNotebookExecution calls
list-security-configurationsLists all the security configurations visible to this account, providing their creation dates and times, and their names. This call returns a maximum of 50 clusters per call, but returns a marker to track the paging of the cluster list across multiple ListSecurityConfigurations calls
list-stepsProvides a list of steps for the cluster in reverse order unless you specify stepIds with the request of filter by StepStates. You can specify a maximum of 10 stepIDs
list-studio-session-mappingsReturns a list of all user or group session mappings for the Amazon EMR Studio specified by StudioId
list-studiosReturns a list of all Amazon EMR Studios associated with the AWS account. The list includes details such as ID, Studio Access URL, and creation time for each Studio
modify-clusterModifies the number of steps that can be executed concurrently for the cluster specified using ClusterID
modify-instance-fleetModifies the target On-Demand and target Spot capacities for the instance fleet with the specified InstanceFleetID within the cluster specified using ClusterID. The call either succeeds or fails atomically. The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions
modify-instance-groupsModifyInstanceGroups modifies the number of nodes and configuration settings of an instance group. The input parameters include the new target instance count for the group and the instance group ID. The call will either succeed or fail atomically
put-auto-scaling-policyCreates or updates an automatic scaling policy for a core instance group or task instance group in an Amazon EMR cluster. The automatic scaling policy defines how an instance group dynamically adds and terminates EC2 instances in response to the value of a CloudWatch metric
put-block-public-access-configurationCreates or updates an Amazon EMR block public access configuration for your AWS account in the current Region. For more information see Configure Block Public Access for Amazon EMR in the Amazon EMR Management Guide
put-managed-scaling-policyCreates or updates a managed scaling policy for an Amazon EMR cluster. The managed scaling policy defines the limits for resources, such as EC2 instances that can be added or terminated from a cluster. The policy only applies to the core and task nodes. The master node cannot be scaled after initial configuration
remove-auto-scaling-policyRemoves an automatic scaling policy from a specified instance group within an EMR cluster
remove-managed-scaling-policyRemoves a managed scaling policy from a specified EMR cluster
remove-tagsRemoves tags from an Amazon EMR resource. Tags make it easier to associate clusters in various ways, such as grouping clusters to track your Amazon EMR resource allocation costs. For more information, see Tag Clusters. The following example removes the stack tag with value Prod from a cluster:
start-notebook-executionStarts a notebook execution
stop-notebook-executionStops a notebook execution
update-studioUpdates an Amazon EMR Studio configuration, including attributes such as name, description, and subnets
update-studio-session-mappingUpdates the session policy attached to the user or group for the specified Amazon EMR Studio
terminate-clustersShuts down one or more clusters, each specified by cluster ID. Use this command only on clusters that do not have termination protection enabled. Clusters with termination protection enabled are not terminated. When a cluster is shut down, any step not yet completed is canceled and the Amazon EC2 instances in the cluster are terminated. Any log files not already saved are uploaded to Amazon S3 if a --log-uri was specified when the cluster was created. The maximum number of clusters allowed in the list is 10. The command is asynchronous. Depending on the configuration of the cluster, it may take from 5 to 20 minutes for the cluster to terminate completely and release allocated resources such as Amazon EC2 instances
modify-cluster-attributesModifies the cluster attributes 'visible-to-all-users' and 'termination-protected'
install-applicationsInstalls applications on a running cluster. Currently only Hive and Pig can be installed using this command, and this command is only supported by AMI versions (3.x and 2.x)
create-clusterCreates an Amazon EMR cluster with the specified configurations. Quick start: aws emr create-cluster --release-label <release-label> --instance-type <instance-type> --instance-count <instance-count> Values for the following can be set in the AWS CLI config file using the "aws configure set" command: --service-role, --log-uri, and InstanceProfile and KeyName arguments under --ec2-attributes
add-stepsAdd a list of steps to a cluster
restore-from-hbase-backupRestores HBase from S3. This command is only available when using Amazon EMR versionsearlier than 4.0
create-hbase-backupCreates a HBase backup in S3. This command is only available when using Amazon EMR versionsearlier than 4.0
schedule-hbase-backupAdds a step to schedule automated HBase backup. This command is only available when using Amazon EMR versionsearlier than 4.0
disable-hbase-backupsAdd a step to disable automated HBase backups. This command is only available when using Amazon EMR versionsearlier than 4.0
create-default-rolesCreates the default IAM role EMR_EC2_DefaultRole and EMR_DefaultRole which can be used when creating the cluster using the create-cluster command. The default roles for EMR use managed policies, which are updated automatically to support future EMR functionality. If you do not have a Service Role and Instance Profile variable set for your create-cluster command in the AWS CLI config file, create-default-roles will automatically set the values for these variables with these default roles. If you have already set a value for Service Role or Instance Profile, create-default-roles will not automatically set the defaults for these variables in the AWS CLI config file. You can view settings for variables in the config file using the "aws configure get" command.
sshSSH into master node of the cluster. A value for the variable Key Pair File can be set in the AWS CLI config file using the "aws configure set emr.key_pair_file <value>" command.
socksCreate a socks tunnel on port 8157 from your machine to the master. A value for the variable Key Pair File can be set in the AWS CLI config file using the "aws configure set emr.key_pair_file <value>" command.
getGet file from master node. A value for the variable Key Pair File can be set in the AWS CLI config file using the "aws configure set emr.key_pair_file <value>" command.
putPut file onto the master node. A value for the variable Key Pair File can be set in the AWS CLI config file using the "aws configure set emr.key_pair_file <value>" command.
waitWait until a particular condition is satisfied. Each subcommand polls an API until the listed requirement is met