Install MLRun on AWS#

For AWS users, the easiest way to install MLRun is to use a native AWS deployment. This option deploys MLRun on an AWS EKS service using a CloudFormation stack.

Prerequisites#

  1. An AWS account with permissions that include the ability to:

    • Run a CloudFormation stack

    • Create an EKS cluster

    • Create EC2 instances

    • Create VPC

    • Create S3 buckets

    • Deploy and pull images from ECR

    For the full set of required permissions, download the IAM policy or expand & copy the IAM policy below:

    show the IAM policy
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "BasicServices",
                "Effect": "Allow",
                "Action": [
                    "autoscaling:*",
                    "cloudwatch:*",
                    "elasticloadbalancing:*",
                    "sns:*",
                    "ec2:*",
                    "s3:*",
                    "s3-object-lambda:*",
                    "eks:*",
                    "elasticfilesystem:*",
                    "cloudformation:*",
                    "acm:*",
                    "route53:*"
                ],
                "Resource": "*"
            },
            {
                "Sid": "ServiceLinkedRoles",
                "Effect": "Allow",
                "Action": "iam:CreateServiceLinkedRole",
                "Resource": "*",
                "Condition": {
                    "StringEquals": {
                        "iam:AWSServiceName": [
                            "autoscaling.amazonaws.com",
                            "ec2scheduled.amazonaws.com",
                            "elasticloadbalancing.amazonaws.com",
                            "spot.amazonaws.com",
                            "spotfleet.amazonaws.com",
                            "transitgateway.amazonaws.com"
                        ]
                    }
                }
            },
            {
                "Sid": "IAMPermissions",
                "Effect": "Allow",
                "Action": [
                    "iam:AddRoleToInstanceProfile",
                    "iam:AttachRolePolicy",
                    "iam:TagOpenIDConnectProvider",
                    "iam:CreateInstanceProfile",
                    "iam:CreateOpenIDConnectProvider",
                    "iam:CreateRole",
                    "iam:CreateServiceLinkedRole",
                    "iam:DeleteInstanceProfile",
                    "iam:DeleteOpenIDConnectProvider",
                    "iam:DeleteRole",
                    "iam:DeleteRolePolicy",
                    "iam:DetachRolePolicy",
                    "iam:GenerateServiceLastAccessedDetails",
                    "iam:GetAccessKeyLastUsed",
                    "iam:GetAccountPasswordPolicy",
                    "iam:GetAccountSummary",
                    "iam:GetGroup",
                    "iam:GetInstanceProfile",
                    "iam:GetLoginProfile",
                    "iam:GetOpenIDConnectProvider",
                    "iam:GetPolicy",
                    "iam:GetPolicyVersion",
                    "iam:GetRole",
                    "iam:GetRolePolicy",
                    "iam:GetServiceLastAccessedDetails",
                    "iam:GetUser",
                    "iam:ListAccessKeys",
                    "iam:ListAccountAliases",
                    "iam:ListAttachedGroupPolicies",
                    "iam:ListAttachedRolePolicies",
                    "iam:ListAttachedUserPolicies",
                    "iam:ListGroupPolicies",
                    "iam:ListGroups",
                    "iam:ListGroupsForUser",
                    "iam:ListInstanceProfilesForRole",
                    "iam:ListMFADevices",
                    "iam:ListOpenIDConnectProviders",
                    "iam:ListPolicies",
                    "iam:ListPoliciesGrantingServiceAccess",
                    "iam:ListRolePolicies",
                    "iam:ListRoles",
                    "iam:ListRoleTags",
                    "iam:ListSAMLProviders",
                    "iam:ListSigningCertificates",
                    "iam:ListUserPolicies",
                    "iam:ListUsers",
                    "iam:ListUserTags",
                    "iam:PassRole",
                    "iam:PutRolePolicy",
                    "iam:RemoveRoleFromInstanceProfile",
                    "kms:CreateGrant",
                    "kms:CreateKey",
                    "kms:Decrypt",
                    "kms:DescribeKey",
                    "kms:Encrypt",
                    "kms:GenerateDataKeyWithoutPlaintext",
                    "kms:GetKeyPolicy",
                    "kms:GetKeyRotationStatus",
                    "kms:ListResourceTags",
                    "kms:PutKeyPolicy",
                    "kms:ScheduleKeyDeletion",
                    "kms:TagResource"
                ],
                "Resource": "*"
            },
            {
                "Sid": "AllowLanbda",
                "Effect": "Allow",
                "Action": [
                    "lambda:CreateAlias",
                    "lambda:CreateCodeSigningConfig",
                    "lambda:CreateEventSourceMapping",
                    "lambda:CreateFunction",
                    "lambda:CreateFunctionUrlConfig",
                    "lambda:Delete*",
                    "lambda:Get*",
                    "lambda:InvokeAsync",
                    "lambda:InvokeFunction",
                    "lambda:InvokeFunctionUrl",
                    "lambda:List*",
                    "lambda:PublishLayerVersion",
                    "lambda:PublishVersion",
                    "lambda:PutFunctionCodeSigningConfig",
                    "lambda:PutFunctionConcurrency",
                    "lambda:PutFunctionEventInvokeConfig",
                    "lambda:PutProvisionedConcurrencyConfig",
                    "lambda:TagResource",
                    "lambda:UntagResource",
                    "lambda:UpdateAlias",
                    "lambda:UpdateCodeSigningConfig",
                    "lambda:UpdateEventSourceMapping",
                    "lambda:UpdateFunctionCode",
                    "lambda:UpdateFunctionCodeSigningConfig",
                    "lambda:UpdateFunctionConfiguration",
                    "lambda:UpdateFunctionEventInvokeConfig",
                    "lambda:UpdateFunctionUrlConfig"
                ],
                "Resource": "*"
            },
            {
                "Sid": "CertificateService",
                "Effect": "Allow",
                "Action": "iam:CreateServiceLinkedRole",
                "Resource": "arn:aws:iam::*:role/aws-service-role/acm.amazonaws.com/AWSServiceRoleForCertificateManager*",
                "Condition": {
                    "StringEquals": {
                        "iam:AWSServiceName": "acm.amazonaws.com"
                    }
                }
            },
            {
                "Sid": "DeleteRole",
                "Effect": "Allow",
                "Action": [
                    "iam:DeleteServiceLinkedRole",
                    "iam:GetServiceLinkedRoleDeletionStatus",
                    "iam:GetRole"
                ],
                "Resource": "arn:aws:iam::*:role/aws-service-role/acm.amazonaws.com/AWSServiceRoleForCertificateManager*"
            },
            {
                "Sid": "SSM",
                "Effect": "Allow",
                "Action": [
                    "logs:*",
                    "ssm:AddTagsToResource",
                    "ssm:GetParameter",
                    "ssm:DeleteParameter",
                    "ssm:PutParameter",
                    "cloudtrail:GetTrail",
                    "cloudtrail:ListTrails"
                ],
                "Resource": "*"
            }
        ]
    }
    

    For more information, see how to create a new AWS account and policies and permissions in IAM.

  2. You need to have a Route53 domain configured in the same AWS account and specify the full domain name in Route 53 hosted DNS domain configuration (See Step 11 below). External domain registration is currently not supported. For more information see What is Amazon Route 53?.

Notes

The MLRun software is free of charge, however, there is a cost for the AWS infrastructure services such as EKS, EC2, S3 and ECR. The actual pricing depends on a large set of factors including, for example, the region, the number of EC2 instances, the amount of storage consumed, and the data transfer costs. Other factors include, for example, reserved instance configuration, saving plan, and AWS credits you have associated with your account. It is recommended to use the AWS pricing calculator to calculate the expected cost, as well as the AWS Cost Explorer to manage the cost, monitor and set-up alerts.

Post deployment expectations#

The key components deployed on your EKS cluster are:

  • MLRun server (including the feature store and the MLRun graph)

  • MLRun UI

  • Kubeflow pipeline

  • Real time serverless framework (Nuclio)

  • Spark operator

  • Jupyter lab

  • Grafana

Configuration settings#

Make sure you are logged in to the correct AWS account.

Click the button below to deploy MLRun.

../_images/aws_launch_stack.png

After clicking the icon, the browser directs you to the CloudFormation stack page in your AWS account, or redirects you to the AWS login page if you are not currently logged in.

Note

You must fill in fields marked as mandatory (m) for the configuration to complete. Fields marked as optional (o) can be left blank.

  1. Stack name (m) — the name of the stack. You cannot continue if left blank. This field becomes the logical id of the stack. Stack name can include letters (A-Z and a-z), numbers (0-9), and dashes (-). For example: “John-1”.

Parameters

  1. EKS cluster name (m) — the name of EKS cluster created. The EKS cluster is used to run the MLRun services. For example: “John-1”.

VPC network Configuration

  1. Number of Availability Zones (m) — number of availability zones. The default is set to 3. Choose from the dropdown to change the number. The minimum is 2.

  2. Availability zones (m) — select a zone from the dropdown. The list is based on the region of the instance. The number of zones must match the number of zones Number of Availability Zones.

  3. Allowed external access CIDR (m) — range of IP address allowed to access the cluster. Addresses that are not in this range are not able to access the cluster. Contact your IT manager/network administrator if you are not sure what to fill here.

Amazon EKS configuration

  1. Additional EKS admin ARN (IAM user) (o) — add an additional admin user to the instance. Users can be added after the stack has been created. For more information see Create a kubeconfig for Amazon EKS.

  2. Instance type (m) — select from the dropdown list. The default is m5.4xlarge. For size considerations see Amazon EC2 Instance Types.

  3. Maximum Number of Nodes (m) — maximum number of nodes in the cluster. The number of nodes combined with the Instance type determines the AWS infrastructure cost.

Amazon EC2 configuration

  1. SSH key name (o) — Users who wish to access the EC2 instance via SSH can enter an existing key. If left empty, it is possible to access the EC2 instance using the AWS Systems Manager Session Manager. For more information about SSH Keys see Amazon EC2 key pairs and Linux instances.

  2. Provision bastion host (m) — create a bastion host for SSH access to the Kubernetes nodes. The default is enabled. This allows ssh access to your EKS EC2 instances through a public IP.

Iguazio MLRun configuration

  1. Route 53 hosted DNS domain (m) — Enter the name of your registered Route53 domain. Only route53 domains are acceptable.

  2. The URL of your REDIS database (o) — This is only required if you’re using Redis with the online feature store. See how to configure the online feature store for more details.

Other parameters

  1. MLRun CE Helm Chart version (m) — the MLRun Community Edition version to install. Leave the default value for the latest CE release.

Capabilities

  1. Check all the capabilities boxes (m).

Press Create Stack to continue the deployment. The stack creates a VPC with an EKS cluster and deploys all the services on top of it.

Note

It could take up to 2 hours for your stack to be created.

Getting started#

When the stack is complete, go to the output tab for the stack you created. There are links for the MLRun UI, Jupyter, and the Kubeconfig command.

It’s recommended to go through the quick-start and the other tutorials in the documentation. These tutorials and demos come built-in with Jupyter under the root folder of Jupyter.

Storage resources#

When installing the MLRun Community Edition via Cloud Formation, several storage resources are created:

  • PVs via AWS storage provider: Used to hold the file system of the stacks pods, including the MySQL database of MLRun. These are deleted when the stack is uninstalled.

  • S3 Bucket: A bucket named <EKS cluster name>-<Random string> is created in the AWS account that installs the stack (where <EKS cluster name> is the name of the EKS cluster you chose and <Random string> is part of the CloudFormation stack ID). You can see the bucket name in the output tab of the stack. The bucket is used for MLRun’s artifact storage, and is not deleted when uninstalling the stack. The user must empty the bucket and delete it.

  • Container Images in ECR: When building and deploying MLRun and Nuclio functions via the MLRun Community Edition, the function images are stored in an ECR belonging to the AWS account that installs the stack. These images persist in the account’s ECR and are not deleted either.

How to configure the online feature store#

The feature store can store data on a fast key-value database table for quick serving. This online feature store capability requires an external key-value database.

Currently the MLRun feature store supports the following options:

  • Redis

  • Iguazio key-value database

To use Redis, you must install Redis separately and provide the Redis URL when configuring the AWS CloudFormation stack. Refer to the Redis getting-started page. for information about Redis installation.

Streaming support#

For online serving, it is often convenient to use MLRun graph with a streaming engine. This allows managing queues between steps and functions. MLRun supports Kafka streams as well as Iguazio V3IO streams. See the examples on how to configure the MLRun serving graph with kafka and V3IO.

Cleanup#

To free up the resources used by MLRun:

You may also need to check any external storage that you used.