Release Notes

API Documentation: https://docs.mixpeek.com
Runnable Notebook with your API key: https://dash.mixpeek.com/notebooks

0.9.0

We're thrilled to announce our biggest release yet, introducing new capabilities that make multimodal understanding more accessible and powerful than ever. Below we've documented an example and link of how you'd use each.

💡

More comprehensive walkthrough: https://learn.mixpeek.com/transforming-multimodal-search-with-mixpeek-0-9-0

Namespaces

Isolate and organize your data with dedicated search environments.

# Create a namespace
namespace = {
    "namespace_id": "netflix_prod",
    "vector_indexes": ["image_vector", "text_vector"],
    "payload_indexes": [
        {
            "field_name": "metadata.title",
            "type": "text",
            "field_schema": {
                "tokenizer": "word",
                "lowercase": true
            }
        }
    ]
}
requests.post("/namespaces", json=namespace)

# Use namespace in requests
headers = {"X-Namespace": "netflix_prod"}
requests.post("/index/videos/url", headers=headers, json=payload)

Hybrid Search

Combine vector and keyword search capabilities for precise results.

search_request = {
    "queries": [
        {
            "vector_index": "text_vector",
            "value": "action scenes with explosions",
            "type": "text"
        },
        {
            "vector_index": "image_vector",
            "value": "http://cat.png",
            "type": "url"
        }
    ],
    "filters": {
        "AND": [
            {
                "key": "metadata.genre",
                "value": "action",
                "operator": "eq"
            }
        ]
    },
    "collection_ids": ["movies"]
}
requests.post("/features/search", json=search_request)

Feature Extractors

Configure multiple analysis pipelines for your media:

Video Processing

video_settings = {
    "interval_sec": 10,
    "describe": {
        "enabled": true,
        "vector_name": "text_vector"
    },
    "detect": {
        "faces": {
            "enabled": true,
            "confidence_threshold": 0.8
        }
    },
    "transcribe": {
        "enabled": true,
        "vector_name": "text_vector"
    }
}

Image Processing

image_settings = {
    "describe": {
        "enabled": true,
        "vector_name": "text_vector"
    },
    "read": {
        "enabled": true
    },
    "detect": {
        "faces": {
            "enabled": true
        }
    }
}

Interactions

Track and analyze user engagement with search results:

interaction = {
    "feature_id": "vid_123",
    "interaction_type": "click",
    "search_request": original_search_request,
    "position": 2,
    "metadata": {
        "device": "mobile",
        "duration_ms": 5000
    },
    "session_id": "sess_abc123"
}
requests.post("/features/search/interactions", json=interaction)

Pre-signed URLs

Secure access to your media assets:

# Assets are automatically returned with pre-signed URLs
response = requests.get(f"/assets/{asset_id}")
asset_url = response.json()["url"]
preview_url = response.json()["metadata"]["preview_url"]

# Configure URL generation in search requests
search_request = {
    "queries": [...],
    "collection_ids": ["videos"],
    "return_url": true  # Include pre-signed URLs in results
}

Security Notes

URLs are time-limited and encrypted
Each URL is unique to the requesting user
Access permissions are enforced through your authentication token

0.8.3

File_Update Changes

Now when you supply the following JSON in your index payload, it will behave accordingly

"file_update": {
    "file_id": "9f7793cd-397e-4da0-8f0f-336f4220e622",
    "mode": "replace" or "append"
}

Get Full file Changes

now when you do a GET /files/?full=true it returns each modality as an array (since you can have multiple text/image/video segments

{
    "file": {
        "file_id": "9f7793cd-397e-4da0-8f0f-336f4220e622",
        ...
    },
    "video": [
        {
            "modality": "video",
            "feature_id": "15d0d9b2-e486-44ae-9fad-d63b8ec2f85e",
            "metadata": {},
            "start_time": 10.0,
            "end_time": 20.0,
            "interval_sec": 10
        },
        ...
    ],
    "image": [],
    "text": [
        {
            "modality": "text",
            "feature_id": "d3f91ac1-e302-4f5f-917d-28a4fb2459df",
            "metadata": {},
            "text": "boy running outside",
            "json_output": {
                "noun": "boy"
            }
        },
        ...
    ],
    "pagination": {
        "total": 5,
        "page": 1,
        "page_size": 10,
        "total_pages": 1,
        "next_page": null,
        "previous_page": null
    }
}

Notice how each feature has a feature_id which brings us to the next resource release:

Features Resources

0.8.2

Index Text

We're excited to announce the addition of text indexing capabilities to our platform. This new feature allows users to directly index text content, expanding our support beyond URL and file uploads.

To index text, make a POST request to the /text endpoint with the following data:

text: The content to be indexed
collection_id: The collection to associate the text with
metadata (optional): Any additional metadata for the text
text_settings (optional): Specific settings for text processing

import requests
import json

payload = json.dumps({
  "text": "a boy running outside",
  "collection_id": "this-test"
})
headers = {
  'Authorization': 'Bearer APIKEY',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

File ID Override

Files are rarely static, they evolve over time. Initially you may have some data about a file, and then that data changes. Suppose you start with data in the form of text, then that text is replaced by a video. Using the file_id parameter in the /index resource, you can handle this elegantly.

For example:

First index your text, which will return a task_id and file_id

import requests
import json

url = "https://api.mixpeek.com/index/text"

payload = json.dumps({
  "text": "lorem ipsum",
  "collection_id": "test"
})
headers = {
  'Authorization': 'Bearer APIKEY',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

{
    "message": "Text processing queued",
    "task_id": "8d880026-9528-41f0-a015-80f9397a922e"
}

First we get the status, once it's done the file_id should be ready

{
    "file_id": "4bb047cd-511a-44d0-8649-9e931f542315",
    "task_id": "25fdd1f8-8fe1-46e3-a359-1dbc1e059069",
    "status": "DONE"
}

Then, index a video by its' url using that file_id

import requests
import json

payload = json.dumps({
  "url": "video.mp4",
  "collection_id": "hi",
  "should_save": False,
  "file_id": "4bb047cd-511a-44d0-8649-9e931f542315",
  "video_settings": [
    {
      "interval_sec": 10,
      "json_output": {
        "response_shape": {
          "shapes": [
            "str"
          ]
        },
        "prompt": "Give me all the shapes from the scene"
      }
    }
  ]
})
headers = {
  'Authorization': 'Bearer APIKEY',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

Now when we again get the full file, it has both the old and new values.

{
    "file": {
        "file_id": "4bb047cd-511a-44d0-8649-9e931f542315",
        "collection_id": "hi",
        "status": "DONE",
        "url": "video.mp4",
        "updated_at": "2024-10-15T14:04:25.444000"
    },
    "video_segments": [
        {
            "end_time": 20.0,
            "start_time": 10.0,
            "interval_sec": 10,
            "json_output": {
                "shapes": []
            },
            "metadata": {},
            "modality": "video",
            "updated_at": "2024-10-15T14:04:19.861000"
        },
        {
            "end_time": 10.0,
            "start_time": 0.0,
            "interval_sec": 10,
            "json_output": {
                "shapes": [
                    "rectangle on the wall",
                    "circle on the wall"
                ]
            },
            "metadata": {},
            "modality": "video",
            "updated_at": "2024-10-15T14:04:22.637000"
        },
        {
            "end_time": 29.0,
            "start_time": 20.0,
            "interval_sec": 10,
            "json_output": {
                "shapes": [
                    "Rectangle",
                    "Circle"
                ]
            },
            "metadata": {},
            "modality": "video",
            "updated_at": "2024-10-15T14:04:25.379000"
        }
    ],
    "image_details": null,
    "text_details": {
        "metadata": {},
        "modality": "text",
        "text": "lorem ipsum"
    },
    "pagination": {
        "total": 4,
        "page": 1,
        "page_size": 10,
        "total_pages": 1,
        "next_page": null,
        "previous_page": null
    }
}

Advanced Search Capabilities

group_by

Users can now group search results by specific fields.
Introduced the GroupByOptions class with:
- field: Specify the field to group by (default: "file_id")
- max_features: Optionally limit the number of features in each group

{
    "group_by": {
        "field": "file_id"
    }
}

sort

Added support for custom sorting of search results.
Users can provide a list of fields to sort by, including the sort direction (ascending or descending).
Implemented via the sort parameter in SearchRequestBase and SearchInput classes.

{
    "sort": [
        {
            "direction": "desc",
            "field": "relevance"
        },
        {
            "direction": "asc",
            "field": "created_at"
        }
    ]
}

filter.case_sensitive

Enhanced the LogicalOperator class to support case-sensitive matching in filters.
Added a case_sensitive boolean option to control string comparison behavior in complex nested query filters.

{
    "filters": {
        "case_sensitive": false,
        "AND": [
            {
                "OR": [
                    {
                        "key": "metadata.foo",
                        "operator": "eq",
                        "value": "bar"
                    },
                    {
                        "key": "metadata.foo2",
                        "operator": "eq",
                        "value": "bar2"
                    }
                ]
            },
            {
                "key": "end_time",
                "operator": "eq",
                "value": 20.0
            }
        ]
    }
}

0.8.1

Deduplication

Sometimes the client may not be aware of duplicate files, and for these scenarios where you'd prefer not to have duplicate files indexed you can use the prevent_duplicate: true parameter on index:

{
    "url": "814723487.mp4",
    "collection_id": "testing-123",
    "prevent_duplicate": true,
    "metadata": {
        "foo": "bar"
    },
    "video_settings": [
        {
            "interval_sec": 10,
            "read": {
                "model_id": "video-descriptor-v1"
            },
            "embed": {
                "model_id": "multimodal-v1"
            },
            "transcribe": {
                "model_id": "polyglot-v1"
            },
            "describe": {
                "model_id": "video-descriptor-v1",
                "prompt": "Create a holistic description of the video, include sounds and screenplay"
            }
        }
    ]
}

This creates a hash of the file + settings so that whenever that file / setting combo is presented again it won't index and simply return the original file object.

Metadata filtering

Complex Boolean Operators: Utilize AND, OR, and NOR operators to create sophisticated search queries.
Nested Conditions: Combine multiple conditions with different logical operators for granular control.
Wide Range of Comparison Operators: Including eq, ne, gt, gte, lt, lte, in, nin, regex, and exists.
Dot Notation for Nested Fields: Access nested metadata fields easily (e.g., "personal.age").

Example Usage:

{
  "query": "dog",
  "collection_ids": ["tube-test"],
  "model_id": "multimodal-v1",
  "filters": {
    "AND": [
      {
        "OR": [
          { "key": "metadata.age", "value": 3, "operator": "gte" },
          { "key": "metadata.breed", "value": "labrador", "operator": "eq" }
        ]
      },
      { "key": "metadata.vaccinated", "value": true, "operator": "eq" },
      {
        "NOR": [
          { "key": "metadata.status", "value": "adopted" },
          { "key": "metadata.location", "value": "shelter" }
        ]
      }
    ]
  }
}

0.8.0

Async indexing

Longer videos may take some time to fully process, with the task_status API you can poll for the status and when it's complete, receive the full video features (outlined in the next bullet)

def on_task_update(status):
    print(f"Current task status: {status}")

status = task.wait_for_done(
    sleep_interval=2,
    callback=on_task_update
)
file_id = task.file_id

New visual feature extraction models

(applicable with any image or video interval_sec) for the /index pipeline. All are accessible standalone with /understand/<method>:
- describe: Provide a description of the video segment
- read: Grab the text from the screen
- embed: Create a multimodal embedding of the interval_sec (if video) or image
- detect.faces: Saves to keys:
  - detected_face_ids: returns face_ids for each face that has been registered via /index/face
  - face_details: returns various characteristics about faces that were found in the visual asset
- json_output: More of a catch-all for structured data generation, supports all standard json types

{
    "url": "video.mp4",
    "collection_id": "name",
    "should_save": false,
    "video_settings": [
        {
            "interval_sec": 1,
            "read": {
                "model_id": "video-descriptor-v1"
            },
            "embed": {
                "model_id": "multimodal-v1"
            },
            "detect": {
                "faces": {
                    "model_id": "face-detector-v1"
                }
            },
            "json_output": {
                "response_shape": {
                    "emotions": ["label"]
                },
                "prompt": "This is a list of emotion labels, each one should be a string representing the scene."
            }
        },
        {
            "interval_sec": 30,
            "transcribe": {
                "model_id": "polyglot-v1"
            }
        },
        {
            "interval_sec": 120,
            "describe": {
                "model_id": "video-descriptor-v1",
                "prompt": "Create a holistic description of the video, include sounds and screenplay"
            }
        }
    ]
}

Shared embedding space

Embed image and video now share the same embedding space (1408 dimensions), making multimodal search more accurate and faster. Uses multimodal-v1 model_id

Full file payloads

/collections/file/<file_id>/full now returns the full features pulled out from the indexing pipeline. For embeddings returned, we support a separate /understand/embed endpoint this allows you to combine full text and other data types returned to build more custom analytics pipelines.

{
    "index_id": "ix-123",
    "file_id": "123",
    "collection_id": "testing-123",
    "status": "DONE",
    "metadata": {
        "file": {
            "source": "url",
            "file_name": "123.mp4",
            "file_extension": ".mp4",
            "mime_type": "video/mp4",
            "url": "video.mp4",
            "file_size_bytes": 1900065,
            "modality": "video"
        },
        "preview_url": "thumbnail.jpg"
    },
    "created_at": "2024-10-04T23:27:32.305000",
    "video_segments": [
        {
            "start_time": 0.0,
            "end_time": 10.0,
            "text": "GoPure\nLift & Tighten\nNeck Cream\n#gopure \n",
            "transcription": "Hi I'm back As most of you know I've incorporated Go Pure's lift and tighten neck cream a few weeks ago as part of my routine and I love it I see the difference already in my neck lines in my turkey neck and in the elasticity alone ",
            "description": "The video starts with a woman...",
            "embedding": [123],
            "emotions": [
                "happy"
            ]
        },
        {
            "start_time": 10.0,
            "end_time": 12.0,
            "text": "GoPure \n#gopure \nP \n",
            "transcription": "Go Pure ",
            "emotions": [
                "happy"
            ],
            "embedding": [123],
            "description": "The woman in the video smiles, her eyes light up, and she says, \"Go Pure!\" in a cheerful tone. \n        _______________________________________________________ \n"
        }
    ]
}

Video demo: https://www.youtube.com/watch?v=jkIXzfKBvM0&ab_channel=Mixpeek

Detect and index faces

Many of the videos and images we want to perform analysis on contain faces, and since these are unique to each use case there's a dedicated /register/faces endpoint

# Path to the local image file containing faces
face_image_path = "/path/to/your/face_image.jpg"

# Start the face registration task
task = mixpeek.register.faces(
    file_path=face_image_path,
    collection_id="social-media",
    metadata={"name": "John Doe", "age": 30},
    settings={"detection_threshold": 0.8}
)

# Define a callback function (optional)
def on_task_update(status):
    print(f"Current task status: {status}")

# Wait for the task to complete
status = task.wait_for_done(
    sleep_interval=1,
    callback=on_task_update
)