aws emr
Amazon EMR is a web service that makes it easier to process large amounts of data efficiently. Amazon EMR uses Hadoop processing combined with several AWS services to do tasks such as web indexing, data mining, log file analysis, machine learning, scientific simulation, and data warehouse management
Subcommands
Name | Description |
---|---|
add-instance-fleet | Adds an instance fleet to a running cluster. The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x |
add-instance-groups | Adds an instance group to a running cluster |
add-tags | Adds tags to an Amazon EMR resource. Tags make it easier to associate clusters in various ways, such as grouping clusters to track your Amazon EMR resource allocation costs. For more information, see Tag Clusters |
cancel-steps | Cancels a pending step or steps in a running cluster. Available only in Amazon EMR versions 4.8.0 and later, excluding version 5.0.0. A maximum of 256 steps are allowed in each CancelSteps request. CancelSteps is idempotent but asynchronous; it does not guarantee that a step will be canceled, even if the request is successfully submitted. You can only cancel steps that are in a PENDING state |
create-security-configuration | Creates a security configuration, which is stored in the service and can be specified when a cluster is created |
create-studio | Creates a new Amazon EMR Studio |
create-studio-session-mapping | Maps a user or group to the Amazon EMR Studio specified by StudioId, and applies a session policy to refine Studio permissions for that user or group |
delete-security-configuration | Deletes a security configuration |
delete-studio | Removes an Amazon EMR Studio from the Studio metadata store |
delete-studio-session-mapping | Removes a user or group from an Amazon EMR Studio |
describe-cluster | Provides cluster-level details including status, hardware and software configuration, VPC settings, bootstrap actions, instance groups and so on. For information about the cluster steps, see <code>list-steps</code> |
describe-notebook-execution | Provides details of a notebook execution |
describe-security-configuration | Provides the details of a security configuration by returning the configuration JSON |
describe-step | Provides more detail about the cluster step |
describe-studio | Returns details for the specified Amazon EMR Studio including ID, Name, VPC, Studio access URL, and so on |
get-block-public-access-configuration | Returns the Amazon EMR block public access configuration for your AWS account in the current Region. For more information see Configure Block Public Access for Amazon EMR in the Amazon EMR Management Guide |
get-managed-scaling-policy | Fetches the attached managed scaling policy for an Amazon EMR cluster |
get-studio-session-mapping | Fetches mapping details for the specified Amazon EMR Studio and identity (user or group) |
list-clusters | Provides the status of all clusters visible to this AWS account. Allows you to filter the list of clusters based on certain criteria; for example, filtering by cluster creation date and time or by status. This call returns a maximum of 50 clusters per call, but returns a marker to track the paging of the cluster list across multiple ListClusters calls |
list-instance-fleets | Lists all available details about the instance fleets in a cluster. The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions |
list-instances | Provides information for all active EC2 instances and EC2 instances terminated in the last 30 days, up to a maximum of 2,000. EC2 instances in any of the following states are considered active: AWAITING_FULFILLMENT, PROVISIONING, BOOTSTRAPPING, RUNNING |
list-notebook-executions | Provides summaries of all notebook executions. You can filter the list based on multiple criteria such as status, time range, and editor id. Returns a maximum of 50 notebook executions and a marker to track the paging of a longer notebook execution list across multiple ListNotebookExecution calls |
list-security-configurations | Lists all the security configurations visible to this account, providing their creation dates and times, and their names. This call returns a maximum of 50 clusters per call, but returns a marker to track the paging of the cluster list across multiple ListSecurityConfigurations calls |
list-steps | Provides a list of steps for the cluster in reverse order unless you specify stepIds with the request of filter by StepStates. You can specify a maximum of 10 stepIDs |
list-studio-session-mappings | Returns a list of all user or group session mappings for the Amazon EMR Studio specified by StudioId |
list-studios | Returns a list of all Amazon EMR Studios associated with the AWS account. The list includes details such as ID, Studio Access URL, and creation time for each Studio |
modify-cluster | Modifies the number of steps that can be executed concurrently for the cluster specified using ClusterID |
modify-instance-fleet | Modifies the target On-Demand and target Spot capacities for the instance fleet with the specified InstanceFleetID within the cluster specified using ClusterID. The call either succeeds or fails atomically. The instance fleet configuration is available only in Amazon EMR versions 4.8.0 and later, excluding 5.0.x versions |
modify-instance-groups | ModifyInstanceGroups modifies the number of nodes and configuration settings of an instance group. The input parameters include the new target instance count for the group and the instance group ID. The call will either succeed or fail atomically |
put-auto-scaling-policy | Creates or updates an automatic scaling policy for a core instance group or task instance group in an Amazon EMR cluster. The automatic scaling policy defines how an instance group dynamically adds and terminates EC2 instances in response to the value of a CloudWatch metric |
put-block-public-access-configuration | Creates or updates an Amazon EMR block public access configuration for your AWS account in the current Region. For more information see Configure Block Public Access for Amazon EMR in the Amazon EMR Management Guide |
put-managed-scaling-policy | Creates or updates a managed scaling policy for an Amazon EMR cluster. The managed scaling policy defines the limits for resources, such as EC2 instances that can be added or terminated from a cluster. The policy only applies to the core and task nodes. The master node cannot be scaled after initial configuration |
remove-auto-scaling-policy | Removes an automatic scaling policy from a specified instance group within an EMR cluster |
remove-managed-scaling-policy | Removes a managed scaling policy from a specified EMR cluster |
remove-tags | Removes tags from an Amazon EMR resource. Tags make it easier to associate clusters in various ways, such as grouping clusters to track your Amazon EMR resource allocation costs. For more information, see Tag Clusters. The following example removes the stack tag with value Prod from a cluster: |
start-notebook-execution | Starts a notebook execution |
stop-notebook-execution | Stops a notebook execution |
update-studio | Updates an Amazon EMR Studio configuration, including attributes such as name, description, and subnets |
update-studio-session-mapping | Updates the session policy attached to the user or group for the specified Amazon EMR Studio |
terminate-clusters | Shuts down one or more clusters, each specified by cluster ID. Use this command only on clusters that do not have termination protection enabled. Clusters with termination protection enabled are not terminated. When a cluster is shut down, any step not yet completed is canceled and the Amazon EC2 instances in the cluster are terminated. Any log files not already saved are uploaded to Amazon S3 if a --log-uri was specified when the cluster was created. The maximum number of clusters allowed in the list is 10. The command is asynchronous. Depending on the configuration of the cluster, it may take from 5 to 20 minutes for the cluster to terminate completely and release allocated resources such as Amazon EC2 instances |
modify-cluster-attributes | Modifies the cluster attributes 'visible-to-all-users' and 'termination-protected' |
install-applications | Installs applications on a running cluster. Currently only Hive and Pig can be installed using this command, and this command is only supported by AMI versions (3.x and 2.x) |
create-cluster | Creates an Amazon EMR cluster with the specified configurations. Quick start: aws emr create-cluster --release-label <release-label> --instance-type <instance-type> --instance-count <instance-count> Values for the following can be set in the AWS CLI config file using the "aws configure set" command: --service-role, --log-uri, and InstanceProfile and KeyName arguments under --ec2-attributes |
add-steps | Add a list of steps to a cluster |
restore-from-hbase-backup | Restores HBase from S3. This command is only available when using Amazon EMR versionsearlier than 4.0 |
create-hbase-backup | Creates a HBase backup in S3. This command is only available when using Amazon EMR versionsearlier than 4.0 |
schedule-hbase-backup | Adds a step to schedule automated HBase backup. This command is only available when using Amazon EMR versionsearlier than 4.0 |
disable-hbase-backups | Add a step to disable automated HBase backups. This command is only available when using Amazon EMR versionsearlier than 4.0 |
create-default-roles | Creates the default IAM role EMR_EC2_DefaultRole and EMR_DefaultRole which can be used when creating the cluster using the create-cluster command. The default roles for EMR use managed policies, which are updated automatically to support future EMR functionality. If you do not have a Service Role and Instance Profile variable set for your create-cluster command in the AWS CLI config file, create-default-roles will automatically set the values for these variables with these default roles. If you have already set a value for Service Role or Instance Profile, create-default-roles will not automatically set the defaults for these variables in the AWS CLI config file. You can view settings for variables in the config file using the "aws configure get" command. |
ssh | SSH into master node of the cluster. A value for the variable Key Pair File can be set in the AWS CLI config file using the "aws configure set emr.key_pair_file <value>" command. |
socks | Create a socks tunnel on port 8157 from your machine to the master. A value for the variable Key Pair File can be set in the AWS CLI config file using the "aws configure set emr.key_pair_file <value>" command. |
get | Get file from master node. A value for the variable Key Pair File can be set in the AWS CLI config file using the "aws configure set emr.key_pair_file <value>" command. |
put | Put file onto the master node. A value for the variable Key Pair File can be set in the AWS CLI config file using the "aws configure set emr.key_pair_file <value>" command. |
wait | Wait until a particular condition is satisfied. Each subcommand polls an API until the listed requirement is met |