Reverse Video Search

You may have used some kind of reverse image search before. Put simply, instead of searching using text: australian shepherds running, you can use an image: australian_shepherd_running.png. The search engine will then find all similar images based on that input.

But have you used reverse video search? The approach is the same: use your video as a query to find other videos.

💻 Code Snippets: https://github.com/mixpeek/use-cases/blob/master/reverse-video-search/reverse_video.py

📓 Runnable Jupyter Notebook: https://github.com/mixpeek/use-cases/blob/master/reverse-video-search/reverse_video.ipynb

🚀 Live Demo: https://mixpeek.com/video

Lets first explore reverse image search

Try it on Google Images: https://images.google.com/

In the example below, I'll upload a picture of an Australian Shepherd dog, and Google's reverse image search will find all similar pictures of Australian Shepherds.

Use cases for reverse image search

There's tons of awesome use cases for reverse image like:

E-commerce: Helps customers find products by uploading images, increasing sales by simplifying the shopping experience.
Intellectual Property: Identifies unauthorized use of images, aiding in copyright enforcement and protecting creators' rights.
Content Verification: Verifies the authenticity of images in news and social media, combating misinformation.
Real Estate: Allows users to find properties by uploading photos, enhancing user experience and engagement.

from mixpeek import Mixpeek

mixpeek = Mixpeek("API_KEY")

embedding = mixpeek.embed.image(
  model_id="openai/clip-vit-large-patch14",
  input="s3://dog.png",
  input_type="url"
)

But what about video?

Reverse video search works the same way. We first embed a couple videos, then provide a sample video as a search.

For our index, we'll use a movie trailer from the 1940s classic, The Third Man:

1. Prepare the video(s)

Since we have an upper bound on the size of inputs to our mixpeek.embed.video service, we need to preprocess it. This also helps to ensure we get proper granularity, it's a similar technique to tokenization of a corpus.

First let's cut up the video using mixpeek's tools service:

from mixpeek import Mixpeek

mixpeek = Mixpeek("YOUR_API_KEY")

response = mixpeek.tools.process(
    modality="video",
    url="https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/The+Third+Man++Official+Trailer.mp4",
    frame_interval=25,
    resolution=[720, 1280],
    return_base64=True
)

We're telling the video processor tool to cut up the video into 5 frame intervals, resize the resolution as 720x1280 and return each snippet as a base64_string

The response will be something like this:

[
    {
        "base64_string": "...",
        "start_time": 0.0,
        "end_time": 1.0,
        "fps": 5.0,
        "duration": 41.8,
        "resolution": [
            768,
            432
        ],
        "size_kb": 69.93
    },

Video pre-processing docs

2. Embed the videos

Next we'll take the video segments from the video processor endpoint, and send each base64_string to the embed endpoint:

from mixpeek import Mixpeek

embedding = mixpeek.embed(
  modality="video",
  model="mixpeek/vuse-generic-v1",
  input=response["base64_string"],
  input_type="base64"
)

This will return an embedding of the video, so we just iterate through each base64_string response from the tools processor and embed them. We want to maintain the time_start and time_end keys as well.

3. Embed the video to search

Now we have a grainy video clip from some CCTV that we'll use for our reverse video search:

We'll do the same thing, only difference is we'll want the embedding from the video we want to search across the already indexed and embedded videos:

from mixpeek import Mixpeek

embedding = mixpeek.embed(
  modality="video",
  model="mixpeek/vuse-generic-v1",
  input="https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/video_queries/exiting_sewer.mp4",
  input_type="url"
)

This will return an object that contains the key: embedding

4. Compare results

Now that we have our embeddings we can run a KNN search:

[
    {
        "$vectorSearch": {
            "index": "vector_index",
            "path": "embedding",
            "queryVector": response['embedding'],
            "numCandidates": 10,
            "limit": 3
        }
    },
    {
        "$project": {
            "embedding": 0
        }
    }
]

This will return an array of objects that we can use to render in our application indicating what the most similar video timestamps are based on the video embedding as a query

[
    {
        "start_time": 25.83,
        "end_time": 26.67,
        "file_url": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/The+Third+Man++Official+Trailer.mp4",
    },
    {
        "start_time": 25.83,
        "end_time": 26.67,
        "file_url": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/The+Third+Man++Official+Trailer.mp4",
    },
    {
        "start_time": 24.17,
        "end_time": 25.0,
        "file_url": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/Thne+Third+Man++Official+Trailer.mp4",
    },
]

Now if we look at the original video @ 25.83 seconds in:

Amazing, we found a challenging scene to describe using a video query as an input. Now imagine doing that across billions of videos 🤯

Now productionize it

In the interest of not having to think about processing the data and keeping it in sync with our database, we can create a pipeline that connects S3 with our databases:

                                                            
from mixpeek import Mixpeek, SourceS3

def handler(event, context):
    mixpeek = Mixpeek("API_KEY")
    # create presigned S3 url
    file_url = SourceS3.file_url(event['bucket'], event['key'])

    video_chunks = mixpeek.video.process(
        url=file_url,
        frame_interval=5,
        resolution=[720, 1280],
        return_base64=True
    )
    
    full_video = []
    
    for chunk in video_chunks:
        obj = {
            "embedding": mixpeek.embed.video(chunk, "mixpeek/vuse-generic-v1"),
            "file_url": file_url,
            "metadata": {
                "time_start": chunk.start_time,
                "time_end": chunk.end_time,
            }
        }
        full_video.append(obj)
    
    return full_video

Semantic Video Understanding | Mixpeek

Mixpeek provides intelligent one-click deployable pipeline templates for processing S3 objects and integrating with your existing database in real-time.

MixpeekMixpeek

Using this template, we set it so that whenever a new object is added to our S3 bucket it's automatically processed and inserted into our database (connection established prior). Additionally, if a video is ever deleted from our S3 bucket its' embeddings are deleted from our database as well.

Use cases for video search

Content Creation: Enables creators to find specific video clips quickly, streamlining the editing process.
Media Monitoring: Identifies reused video content across platforms, aiding in tracking content spread and copyright enforcement.
E-commerce: Helps customers find products by uploading video snippets, enhancing the shopping experience.
Security and Surveillance: Analyzes footage to detect specific events or objects, improving security measures.

Want a reverse video proof of concept?

Reverse Video Search

Lets first explore reverse image search

Use cases for reverse image search

But what about video?

1. Prepare the video(s)

2. Embed the videos

3. Embed the video to search

4. Compare results

Now productionize it

Use cases for video search

Ethan Steininger

Multimodal Makers | Mixpeek

Reverse Video Search

Lets first explore reverse image search

Use cases for reverse image search

But what about video?

1. Prepare the video(s)

2. Embed the videos

3. Embed the video to search

4. Compare results

Now productionize it

Use cases for video search

Ethan Steininger

How We Indexed the 1000 Top Movie Trailers for AI Apps

Risk Assessment with Video Understanding

Video Scene Detection Embedding Models

Introducing VUSE: Video Understanding and Semantic Embedding

Semantic Video Understanding

Multimodal Makers | Mixpeek