Set Up and Run OpenAI's CLIP on SageMaker for Inference

How to deploy and run OpenAI's CLIP model on Amazon SageMaker for efficient real-time and offline inference.
Set Up and Run OpenAI's CLIP on SageMaker for Inference

This tutorial will guide you through the process of deploying the OpenAI's Contrastive Language–Image Pretraining (CLIP) model for inference using Amazon SageMaker. The primary goal is to help you understand how to create an endpoint for real-time inference, and use SageMaker's Batch Transform feature for offline inference.

For consistency and to make things more interesting, we'll use a theme of identifying and classifying images of different types of animals throughout this tutorial.

đź’ˇ mixpeek simplifies all this with a package that sets up ML model deployment, hosting, versioning, tuning, and inference at scale. All without your data leaving your AWS account.


  • An AWS account
  • Familiarity with Python, AWS, and machine learning concepts
  • A copy of the CLIP model, accessible in S3

Step 1: Setting Up Your Environment

First, log in to your AWS account and go to the SageMaker console. In your desired region, create a new SageMaker notebook instance (e.g., 'clip-notebook'). Once the instance is ready, open Jupyter and create a new Python 3 notebook.

In this notebook, let's start by importing the necessary libraries:

import sagemaker
from sagemaker import get_execution_role

Step 2: Define S3 Bucket and Roles

Next, we need to define our S3 bucket and the IAM role:

sagemaker_session = sagemaker.Session()

# Get a SageMaker-compatible role used by this Notebook Instance.
role = get_execution_role()

bucket = sagemaker_session.default_bucket()
prefix = 'clip-model'

Step 3: Upload the Model to S3

You'll need to upload your trained CLIP model to an S3 bucket. Here's how:

model_location = sagemaker_session.upload_data(

Remember to replace 'model_path' with the path to your model file.

Step 3.5: Verify the Model in S3 with boto3

In this step, we'll use boto3 to check if our model was successfully uploaded to our S3 bucket. First, let's import the library and initialize our S3 client:

import boto3

s3 = boto3.client('s3')

Next, let's list all objects in our S3 bucket:

response = s3.list_objects(Bucket=bucket)

for content in response['Contents']:

You should see the path to your uploaded model in the printed output.

Step 4: Create a Model

Once the model is uploaded to S3, you can create a SageMaker model. To do this, you need a Docker container that contains the necessary libraries and dependencies to run CLIP. If you don't have this Docker image yet, you would need to create one. For the purpose of this tutorial, let's assume you have a Docker image named 'clip-docker-image' in your Elastic Container Registry (ECR).

from sagemaker.model import Model

clip_model = Model(

Step 5: Deploy the Model for Real-Time Inference

With the model in place, you can now deploy it to a SageMaker endpoint. Let's create an endpoint configuration named 'clip-endpoint-config':

clip_predictor = clip_model.deploy(

Step 6: Create a Predictor from an Existing Deployment

Create a predictor from an existing deployment:

from sagemaker.predictor import Predictor

endpoint_name = "huggingface-pytorch-inference-2023-03-18-13-33-18-657"  
# Existing endpoint
clip_predictor = Predictor(endpoint_name)

You can now use this clip_predictor for inference, similar to the previous Step 6.

Making Inferences

Now you can use the predictor to make real-time inferences:

import requests
from PIL import Image
import numpy as np
import json

url = ""

image =, stream=True).raw)
image_array = np.array(image)

data = {
  "inputs": "the mesmerizing performances of the leads keep the film grounded and keep the audience riveted.",
  "pixel_values": image_array.tolist()

response = clip_predictor.predict(json.dumps(data))

Step 7: Offline Inference with Batch Transform

For offline inference, you can use SageMaker's Batch Transform feature. First, let's define a transformer:

clip_transformer = clip_model.transformer(
    output_path='s3://{}/{}/output'.format(bucket, prefix)

Then, start a transform job:

    data='s3://{}/{}/input'.format(bucket, prefix),

In this case, the input data is a collection of animal images stored in an S3 bucket.

After the transform job is completed, the predictions are stored in the S3 bucket specified in the output_path.

Sample inference response:

  "predictions": [
      "label": "cat",
      "probability": 0.002
      "label": "dog",
      "probability": 0.98
      "label": "horse",
      "probability": 0.001
      "label": "rabbit",
      "probability": 0.017

Step Forever: Overwhelmed?

If not, think about versioning, maintenance, improving and doing it all at scale. We at Mixpeek are focused on abstracting all of this into one-click model hosting that never leaves your AWS account.

About the author
Ethan Steininger

Ethan Steininger

Probably outside.

Multimodal Makers | Mixpeek

Multimodal Pipelines for AI

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Multimodal Makers | Mixpeek.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.