Mixpeek & FLUX for Multimodal RAG

FLUX is taking the world by storm as the SOTA image generation model. I've seen some phenomenal examples of images generated using FLUX, but none that are dynamically generated using existing images. Therein lies the opportunity for multimodal RAG.

Let's build a pipeline that combines indexed images with prompts to generate relevant images using AI.

Overview

The pipeline will consist of the following steps:

Image Indexing: Index images by their URL
Image Retrieval: Retrieve images using text-based queries or other images
Image Generation: Generate new images based on text prompts.
Integrated Workflow: Combine all steps into a unified system that can dynamically generate, index, and search for images.

Here’s how these components work together:

graph TD; A[Image URL or Generated Image] --> B[Mixpeek Indexing]; B --> C[Text/Image-based Search]; D[Text Prompt] --> E[FLUX Image Generation]; C --> F[Results Displayed to User]; E --> B; E --> F;

Step 1: Image Indexing with Mixpeek

The first step involves indexing images using Mixpeek’s API. This allows us to create a searchable database of images that can later be queried by text descriptions or similar images.

Code Example: Indexing an Image

import requests

mixpeek_api_key = "your_mixpeek_api_key"
collection_id = "shimmer"

def index_image(url, collection_id):
    headers = {
        'Authorization': f'Bearer {mixpeek_api_key}',
        'Content-Type': 'application/json'
    }
    data = {
        "url": url,
        "collection_id": collection_id
    }
    response = requests.post('https://api.mixpeek.com/index/url', headers=headers, json=data)
    return response.json()

image_url = "https://replicate.delivery/yhqm/Od36elqD9uX3byUfJHfAoi4nYaSv77HfG4Rih8LZjbzDXP5MB/out-0.webp"
index_response = index_image(image_url, collection_id)
print("Indexing Response:", index_response)

A caption will be generated as well:

Step 2: Image Retrieval Using Mixpeek

Once the images are indexed, you can retrieve them either through text-based queries or by using another image as a search query.

Text-Based Search Example

def search_images_by_text(query, collection_id):
    headers = {
        'Authorization': f'Bearer {mixpeek_api_key}',
        'Content-Type': 'application/json'
    }
    data = {
        "modality": "image",
        "input": query,
        "filters": {
            "$or": [{"collection_id": collection_id}]
        }
    }
    response = requests.post('https://api.mixpeek.com/search/text', headers=headers, json=data)
    return response.json()

text_query = "woman skateboarding on the street"
search_results = search_images_by_text(text_query, collection_id)
print("Search Results:", search_results)

Image-Based Search Example

def search_images_by_image(query_url, collection_id):
    headers = {
        'Authorization': f'Bearer {mixpeek_api_key}',
        'Content-Type': 'application/json'
    }
    data = {
        "url": query_url,
        "filters": {
            "$or": [{"collection_id": collection_id}]
        }
    }
    response = requests.post('https://api.mixpeek.com/search/url', headers=headers, json=data)
    return response.json()

query_image_url = "https://replicate.delivery/yhqm/Od36elqD9uX3byUfJHfAoi4nYaSv77HfG4Rih8LZjbzDXP5MB/out-0.webp"
image_search_results = search_images_by_image(query_image_url, collection_id)
print("Image Search Results:", image_search_results)

Step 3: Image Generation with Replicate's FLUX

Next, we use Replicate’s FLUX model to generate new images from text prompts. These images can then be indexed in Mixpeek or used directly.

Code Example: Generating an Image

import replicate

def generate_image(prompt):
    output = replicate.run(
        "black-forest-labs/flux-dev",
        input={
            "prompt": prompt,
            "guidance": 3.5,
            "aspect_ratio": "1:1",
            "output_format": "webp",
            "output_quality": 80
        }
    )
    return output

image_prompt = "womens street skateboarding final in Paris Olympics 2024"
generated_image_url = generate_image(image_prompt)
print("Generated Image URL:", generated_image_url)

Step 4: Integrating the Pipeline

The final step integrates the entire process. We first generate a new image, index it using Mixpeek, and then use that image to search for similar images in our indexed collection.

Code Example: Integrated Pipeline

# Step 1: Generate a new image
generated_image_url = generate_image("womens street skateboarding final in Paris Olympics 2024")

# Step 2: Index the generated image
index_response = index_image(generated_image_url, collection_id)
print("Generated and Indexed Image:", index_response)

# Step 3: Search for similar images using the generated image
similar_images = search_images_by_image(generated_image_url, collection_id)
print("Similar Images Found:", similar_images)

Full code: https://github.com/mixpeek/use-cases/blob/master/multimodal-rag/flux-replicate.py

Conclusion

This pipeline combines the best of image indexing, retrieval, and generation technologies. By leveraging Mixpeek’s multimodal search and Replicate’s state-of-the-art image generation model, developers can create powerful, automated systems for managing and creating visual content.

Mixpeek & FLUX for Multimodal RAG

Overview

Step 1: Image Indexing with Mixpeek

Step 2: Image Retrieval Using Mixpeek

Step 3: Image Generation with Replicate's FLUX

Step 4: Integrating the Pipeline

Conclusion

Ethan Steininger

Multimodal Makers | Mixpeek

Mixpeek & FLUX for Multimodal RAG

Overview

Step 1: Image Indexing with Mixpeek

Step 2: Image Retrieval Using Mixpeek

Step 3: Image Generation with Replicate's FLUX

Step 4: Integrating the Pipeline

Conclusion

Ethan Steininger

How We Indexed the 1000 Top Movie Trailers for AI Apps

Advanced Video Understanding: Mixpeek Embed and Weaviate KNN for Multimodal AI

Scaling Video Processing with Celery and Render

Multimodal Makers | Mixpeek