This tutorial will guide you through the process of deploying the OpenAI's Contrastive Language–Image Pretraining (CLIP) model for inference using Amazon SageMaker. The primary goal is to help you understand how to create an endpoint for real-time inference, and use SageMaker's Batch Transform feature for offline inference.
For consistency and to make things more interesting, we'll use a theme of identifying and classifying images of different types of animals throughout this tutorial.
đź’ˇ mixpeek simplifies all this with a package that sets up ML model deployment, hosting, versioning, tuning, and inference at scale. All without your data leaving your AWS account.
Prerequisites
- An AWS account
- Familiarity with Python, AWS, and machine learning concepts
- A copy of the CLIP model, accessible in S3
Step 1: Setting Up Your Environment
First, log in to your AWS account and go to the SageMaker console. In your desired region, create a new SageMaker notebook instance (e.g., 'clip-notebook'). Once the instance is ready, open Jupyter and create a new Python 3 notebook.
In this notebook, let's start by importing the necessary libraries:
import sagemaker
from sagemaker import get_execution_role
Step 2: Define S3 Bucket and Roles
Next, we need to define our S3 bucket and the IAM role:
sagemaker_session = sagemaker.Session()
# Get a SageMaker-compatible role used by this Notebook Instance.
role = get_execution_role()
bucket = sagemaker_session.default_bucket()
prefix = 'clip-model'
Step 3: Upload the Model to S3
You'll need to upload your trained CLIP model to an S3 bucket. Here's how:
model_location = sagemaker_session.upload_data(
'model_path',
bucket=bucket,
key_prefix=prefix
)
Remember to replace 'model_path'
with the path to your model file.
Step 3.5: Verify the Model in S3 with boto3
In this step, we'll use boto3 to check if our model was successfully uploaded to our S3 bucket. First, let's import the library and initialize our S3 client:
import boto3
s3 = boto3.client('s3')
Next, let's list all objects in our S3 bucket:
response = s3.list_objects(Bucket=bucket)
for content in response['Contents']:
print(content['Key'])
You should see the path to your uploaded model in the printed output.
Step 4: Create a Model
Once the model is uploaded to S3, you can create a SageMaker model. To do this, you need a Docker container that contains the necessary libraries and dependencies to run CLIP. If you don't have this Docker image yet, you would need to create one. For the purpose of this tutorial, let's assume you have a Docker image named 'clip-docker-image' in your Elastic Container Registry (ECR).
from sagemaker.model import Model
clip_model = Model(
model_data=model_location,
imagker-image',
role=role
)
Step 5: Deploy the Model for Real-Time Inference
With the model in place, you can now deploy it to a SageMaker endpoint. Let's create an endpoint configuration named 'clip-endpoint-config':
clip_predictor = clip_model.deploy(
initial_instance_count=1,
instance_type='ml.m5.large',
endpoint_name='clip-endpoint'
)
Step 6: Create a Predictor from an Existing Deployment
Create a predictor from an existing deployment:
from sagemaker.predictor import Predictor
endpoint_name = "huggingface-pytorch-inference-2023-03-18-13-33-18-657"
# Existing endpoint
clip_predictor = Predictor(endpoint_name)
You can now use this clip_predictor
for inference, similar to the previous Step 6.
Making Inferences
Now you can use the predictor to make real-time inferences:
import requests
from PIL import Image
import numpy as np
import json
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
image_array = np.array(image)
data = {
"inputs": "the mesmerizing performances of the leads keep the film grounded and keep the audience riveted.",
"pixel_values": image_array.tolist()
}
response = clip_predictor.predict(json.dumps(data))
print(response)
Step 7: Offline Inference with Batch Transform
For offline inference, you can use SageMaker's Batch Transform feature. First, let's define a transformer:
clip_transformer = clip_model.transformer(
instance_count=1,
instance_type='ml.m5.large',
strategy='SingleRecord',
assemble_with='Line',
output_path='s3://{}/{}/output'.format(bucket, prefix)
)
Then, start a transform job:
clip_transformer.transform(
data='s3://{}/{}/input'.format(bucket, prefix),
content_type='application/x-image',
split_type='None'
)
clip_transformer.wait()
In this case, the input data is a collection of animal images stored in an S3 bucket.
After the transform job is completed, the predictions are stored in the S3 bucket specified in the output_path
.
Sample inference response:
{
"predictions": [
{
"label": "cat",
"probability": 0.002
},
{
"label": "dog",
"probability": 0.98
},
{
"label": "horse",
"probability": 0.001
},
{
"label": "rabbit",
"probability": 0.017
}
]
}
Step Forever: Overwhelmed?
If not, think about versioning, maintenance, improving and doing it all at scale. We at Mixpeek are focused on abstracting all of this into one-click model hosting that never leaves your AWS account.