- Published on
MLOps Basics [Week 7]: Container Registry - AWS ECR
- Authors
- Name
- Raviraja Ganta
- @raviraja_ganta
What is Container Registry ?
A container registry is a place to store container images. A container image is a file comprised of multiple layers which can execute applications in a single instance. Hosting all the images in one stored location allows users to commit, identify and pull images when needed.
There are many tools with which we can store the container images. The prominent ones are:
and many more...
I will be using AWS ECR
.
In this post, I will be going through the following topics:
Basics of S3
Programmatic access to S3
Configuring AWS S3 as remote storage in DVC
Basics of ECR
Configuring GitHub Actions to use S3, ECR
Basics of S3
What is S3?
Amazon Simple Storage Service (S3) is a storage for the internet. It is designed for large-capacity, low-cost storage provision across multiple geographical regions.
Amazon S3 provides developers and IT teams with Secure, Durable and Highly Scalable object storage.
How is data organized in S3?
Data in S3 is organized in the form of buckets.
A Bucket is a logical unit of storage in S3.
A Bucket contains objects which contain the data and metadata.
Before adding any data in S3 the user has to create a bucket which will be used to store objects.
Creating bucket
Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/
Click on
Create Bucket
- Bucket name details
- Created bucket
- Uploading a sample file
- Select any sample and upload it. After uploading, the bucket looks like
Now that we have seen how to create a bucket and upload files, let's see how to access s3 programmatically.
Programmatic access to s3
We can access s3 either via cli or any programming language. Let's see both ways.
Credentials are required to access any aws service. There are different ways of configuring credentials. Let's look at a simple way.
- Go to
My Security Credentials
- Navigate to
Access Keys
section and click onCreate New Access Key
button.
This will download a csv file containing the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
Do not share the secrets with others
Set the ACCESS key and id values in environment variables.
export AWS_ACCESS_KEY_ID=<ACCESS KEY ID>
export AWS_SECRET_ACCESS_KEY=<ACCESS SECRET>
Accessing s3 using CLI
Download the AWS CLI package and install it from here
aws cli
comes with a lot of commands. Check the documentation here
Let's see what all present in s3 bucket using cli.
aws s3 ls s3://models-dvc/
Output looks like
(base) ravirajas-MacBook-Pro » aws s3 ls s3://models-dvc/
2021-07-24 12:39:21 22 sample.txt
For all the available list of commands, refer to the documentation here
Accessing s3 using Python
Install the boto3
library which is AWS SDK for Python
pip install boto3
The following code prints the contents of s3 bucket.
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('models-dvc')
for obj in bucket.objects.all():
print(obj.key)
Output looks like:
sample.txt
For all the available list of commands, refer to the documentation here
Configuring AWS S3 as remote storage in DVC
Let's see how to configure s3 as the remote storage in DVC where trained models can be pushed.
Let's create a folder called trained_models
in s3 which will be used for storing the trained models.
In order to use dvc for s3 make sure you install dvc with s3 support.
pip install "dvc[s3]"
Initialise the dvc (if not initialised) using the following command:
dvc init
Configure the remote storage to s3 location.
Get the s3 url as mentioned below
Add the s3 folder as remote storage for models in dvc.
dvc remote add -d model-store s3://models-dvc/trained_models/
Make sure the AWS credentials are set in ENV.
Now let's add the trained model to dvc using the following command:
cd dvcfiles
dvc add ../models/model.onnx --file trained_model.dvc
Push the model to remote storage
dvc push trained_model.dvc
Once the model is pushed to dvc, refresh the s3.
Basics of ECR
In the previous week, we have build container using CICD, but the image is not persisted anywhere for further usage. This is where Container Registry
comes into the picture.
Search for ECR and click on Get Started
Create a repository when prompted with name mlops-basics
Let's build the docker image and push it to ECR.
Before building the docker image need to modify the Dockfile
. Till now I am using Google Drive as the remote storage. That needs to be changed to S3.
The dockerfile looks like:
FROM huggingface/transformers-pytorch-cpu:latest
COPY ./ /app
WORKDIR /app
ARG AWS_ACCESS_KEY_ID
ARG AWS_SECRET_ACCESS_KEY
# aws credentials configuration
ENV AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
# install requirements
RUN pip install "dvc[s3]" # since s3 is the remote storage
RUN pip install -r requirements_inference.txt
# initialise dvc
RUN dvc init --no-scm
# configuring remote server in dvc
RUN dvc remote add -d model-store s3://models-dvc/trained_models/
RUN cat .dvc/config
# pulling the trained model
RUN dvc pull dvcfiles/trained_model.dvc
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
# running the application
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
Build the docker image using the command:
docker build --build-arg AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID --build-arg AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -t inference:test .
Now let's push the image to ECR.
Commands required to push the image to ECR can be found in the ECR itself
Following the commands there:
- Authenticating docker client to ECR
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 246113150184.dkr.ecr.us-west-2.amazonaws.com
- Tagging the image
docker tag inference:test 246113150184.dkr.ecr.us-west-2.amazonaws.com/mlops-basics:latest
- Pushing the image
docker push 246113150184.dkr.ecr.us-west-2.amazonaws.com/mlops-basics:latest
Configuring GitHub Actions to use S3, ECR
Now let's see how to configure S3, ECR in Github Actions
We need AWS credentials for fetching the model from S3, Pushing the image to ECR. We can't share this information publicly. Fortunately GitHub Actions has a way to store these information securely.
It's called Secrets
.
- Go to the
settings
tab of the repository
- Go to
Secrets
section and click onNew repository secret
Save the following secrets:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_ACCOUNT_ID
(this is the account id of profile)
These values can be used in GitHub Actions in the following manner:
- AWS_ACCESS_KEY_ID:
{{secrets.AWS_ACCESS_KEY_ID}}
- AWS_SECRET_ACCESS_KEY:
{{secrets.AWS_SECRET_ACCESS_KEY}}
- AWS_ACCOUNT_ID:
{{secrets.AWS_ACCOUNT_ID}}
Let's modify the workflow file.
GitHub Actions Marketplace comes with lot of predefined actions which are useful for us.
aws-actions/configure-aws-credentials@v1
will be useful to configure AWS credential environment variables for use in other GitHub Actions.jwalton/gh-ecr-push@v1
will be useful to push/pull the image to ECR.
name: Create Docker Container
on: [push]
jobs:
mlops-container:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ./week_7_ecr
steps:
- name: Checkout
uses: actions/checkout@v2
with:
ref: ${{ github.ref }}
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-west-2
- name: Build container
run: |
docker build --build-arg AWS_ACCOUNT_ID=${{ secrets.AWS_ACCOUNT_ID }} \
--build-arg AWS_ACCESS_KEY_ID=${{ secrets.AWS_ACCESS_KEY_ID }} \
--build-arg AWS_SECRET_ACCESS_KEY=${{ secrets.AWS_SECRET_ACCESS_KEY }} \
--tag mlops-basics .
- name: Push2ECR
id: ecr
uses: jwalton/gh-ecr-push@v1
with:
access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
region: us-west-2
image: mlops-basics:latest
Let's understand what happening here:
Jobs will run on
ubuntu-latest
runnerClones the code and navigates to
week_7_ecr
directorySets the AWS environment variables using
aws-actions/configure-aws-credentials@v1
actionBuilds the image and tag it with
mlops-basics
tagPush the image to ECR using
jwalton/gh-ecr-push@v1
action.
Output will look like:
In actions tab Github:
In the ECR:
🔚
This concludes the post. We have seen how to automatically create a docker image using GitHub Actions and save it to ECR. How to use S3 as the remote storage.
Complete code for this post can also be found here: Github