You may have used some kind of reverse image search before. Put simply, instead of searching using text: australian shepherds running
, you can use an image: australian_shepherd_running.png
. The search engine will then find all similar images based on that input.
But have you used reverse video search? The approach is the same: use your video as a query to find other videos.
💻 Code Snippets: https://github.com/mixpeek/use-cases/blob/master/reverse-video-search/reverse_video.py
📓 Runnable Jupyter Notebook: https://github.com/mixpeek/use-cases/blob/master/reverse-video-search/reverse_video.ipynb
🚀 Live Demo: https://mixpeek.com/video
Lets first explore reverse image search
Try it on Google Images: https://images.google.com/
In the example below, I'll upload a picture of an Australian Shepherd dog, and Google's reverse image search will find all similar pictures of Australian Shepherds.
Use cases for reverse image search
There's tons of awesome use cases for reverse image like:
- E-commerce: Helps customers find products by uploading images, increasing sales by simplifying the shopping experience.
- Intellectual Property: Identifies unauthorized use of images, aiding in copyright enforcement and protecting creators' rights.
- Content Verification: Verifies the authenticity of images in news and social media, combating misinformation.
- Real Estate: Allows users to find properties by uploading photos, enhancing user experience and engagement.
from mixpeek import Mixpeek
mixpeek = Mixpeek("API_KEY")
embedding = mixpeek.embed.image(
model_id="openai/clip-vit-large-patch14",
input="s3://dog.png",
input_type="url"
)
But what about video?
Reverse video search works the same way. We first embed a couple videos, then provide a sample video as a search.
For our index, we'll use a movie trailer from the 1940s classic, The Third Man:
1. Prepare the video(s)
Since we have an upper bound on the size of inputs to our mixpeek.embed.video
service, we need to preprocess it. This also helps to ensure we get proper granularity, it's a similar technique to tokenization of a corpus.
First let's cut up the video using mixpeek's tools
service:
from mixpeek import Mixpeek
mixpeek = Mixpeek("YOUR_API_KEY")
response = mixpeek.tools.process(
modality="video",
url="https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/The+Third+Man++Official+Trailer.mp4",
frame_interval=25,
resolution=[720, 1280],
return_base64=True
)
We're telling the video processor tool to cut up the video into 5 frame intervals, resize the resolution as 720x1280
and return each snippet as a base64_string
The response will be something like this:
[
{
"base64_string": "...",
"start_time": 0.0,
"end_time": 1.0,
"fps": 5.0,
"duration": 41.8,
"resolution": [
768,
432
],
"size_kb": 69.93
},
2. Embed the videos
Next we'll take the video segments from the video processor endpoint, and send each base64_string
to the embed endpoint:
from mixpeek import Mixpeek
embedding = mixpeek.embed(
modality="video",
model="mixpeek/vuse-generic-v1",
input=response["base64_string"],
input_type="base64"
)
This will return an embedding of the video, so we just iterate through each base64_string
response from the tools processor and embed them. We want to maintain the time_start
and time_end
keys as well.
3. Embed the video to search
Now we have a grainy video clip from some CCTV that we'll use for our reverse video search:
We'll do the same thing, only difference is we'll want the embedding from the video we want to search across the already indexed and embedded videos:
from mixpeek import Mixpeek
embedding = mixpeek.embed(
modality="video",
model="mixpeek/vuse-generic-v1",
input="https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/video_queries/exiting_sewer.mp4",
input_type="url"
)
This will return an object that contains the key: embedding
4. Compare results
Now that we have our embeddings we can run a KNN search:
[
{
"$vectorSearch": {
"index": "vector_index",
"path": "embedding",
"queryVector": response['embedding'],
"numCandidates": 10,
"limit": 3
}
},
{
"$project": {
"embedding": 0
}
}
]
This will return an array of objects that we can use to render in our application indicating what the most similar video timestamps are based on the video embedding as a query
[
{
"start_time": 25.83,
"end_time": 26.67,
"file_url": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/The+Third+Man++Official+Trailer.mp4",
},
{
"start_time": 25.83,
"end_time": 26.67,
"file_url": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/The+Third+Man++Official+Trailer.mp4",
},
{
"start_time": 24.17,
"end_time": 25.0,
"file_url": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/Thne+Third+Man++Official+Trailer.mp4",
},
]
Now if we look at the original video @ 25.83 seconds in:
Amazing, we found a challenging scene to describe using a video query as an input. Now imagine doing that across billions of videos 🤯
Now productionize it
In the interest of not having to think about processing the data and keeping it in sync with our database, we can create a pipeline that connects S3 with our databases:
from mixpeek import Mixpeek, SourceS3
def handler(event, context):
mixpeek = Mixpeek("API_KEY")
# create presigned S3 url
file_url = SourceS3.file_url(event['bucket'], event['key'])
video_chunks = mixpeek.video.process(
url=file_url,
frame_interval=5,
resolution=[720, 1280],
return_base64=True
)
full_video = []
for chunk in video_chunks:
obj = {
"embedding": mixpeek.embed.video(chunk, "mixpeek/vuse-generic-v1"),
"file_url": file_url,
"metadata": {
"time_start": chunk.start_time,
"time_end": chunk.end_time,
}
}
full_video.append(obj)
return full_video
Using this template, we set it so that whenever a new object is added to our S3 bucket it's automatically processed and inserted into our database (connection established prior). Additionally, if a video is ever deleted from our S3 bucket its' embeddings are deleted from our database as well.
Use cases for video search
- Content Creation: Enables creators to find specific video clips quickly, streamlining the editing process.
- Media Monitoring: Identifies reused video content across platforms, aiding in tracking content spread and copyright enforcement.
- E-commerce: Helps customers find products by uploading video snippets, enhancing the shopping experience.
- Security and Surveillance: Analyzes footage to detect specific events or objects, improving security measures.