FLUX is taking the world by storm as the SOTA image generation model. I've seen some phenomenal examples of images generated using FLUX, but none that are dynamically generated using existing images. Therein lies the opportunity for multimodal RAG.
Let's build a pipeline that combines indexed images with prompts to generate relevant images using AI.
Overview
The pipeline will consist of the following steps:
- Image Indexing: Index images by their URL
- Image Retrieval: Retrieve images using text-based queries or other images
- Image Generation: Generate new images based on text prompts.
- Integrated Workflow: Combine all steps into a unified system that can dynamically generate, index, and search for images.
Here’s how these components work together:
Step 1: Image Indexing with Mixpeek
The first step involves indexing images using Mixpeek’s API. This allows us to create a searchable database of images that can later be queried by text descriptions or similar images.
Code Example: Indexing an Image
import requests
mixpeek_api_key = "your_mixpeek_api_key"
collection_id = "shimmer"
def index_image(url, collection_id):
headers = {
'Authorization': f'Bearer {mixpeek_api_key}',
'Content-Type': 'application/json'
}
data = {
"url": url,
"collection_id": collection_id
}
response = requests.post('https://api.mixpeek.com/index/url', headers=headers, json=data)
return response.json()
image_url = "https://replicate.delivery/yhqm/Od36elqD9uX3byUfJHfAoi4nYaSv77HfG4Rih8LZjbzDXP5MB/out-0.webp"
index_response = index_image(image_url, collection_id)
print("Indexing Response:", index_response)
A caption will be generated as well:
Step 2: Image Retrieval Using Mixpeek
Once the images are indexed, you can retrieve them either through text-based queries or by using another image as a search query.
Text-Based Search Example
def search_images_by_text(query, collection_id):
headers = {
'Authorization': f'Bearer {mixpeek_api_key}',
'Content-Type': 'application/json'
}
data = {
"modality": "image",
"input": query,
"filters": {
"$or": [{"collection_id": collection_id}]
}
}
response = requests.post('https://api.mixpeek.com/search/text', headers=headers, json=data)
return response.json()
text_query = "woman skateboarding on the street"
search_results = search_images_by_text(text_query, collection_id)
print("Search Results:", search_results)
Image-Based Search Example
def search_images_by_image(query_url, collection_id):
headers = {
'Authorization': f'Bearer {mixpeek_api_key}',
'Content-Type': 'application/json'
}
data = {
"url": query_url,
"filters": {
"$or": [{"collection_id": collection_id}]
}
}
response = requests.post('https://api.mixpeek.com/search/url', headers=headers, json=data)
return response.json()
query_image_url = "https://replicate.delivery/yhqm/Od36elqD9uX3byUfJHfAoi4nYaSv77HfG4Rih8LZjbzDXP5MB/out-0.webp"
image_search_results = search_images_by_image(query_image_url, collection_id)
print("Image Search Results:", image_search_results)
Step 3: Image Generation with Replicate's FLUX
Next, we use Replicate’s FLUX model to generate new images from text prompts. These images can then be indexed in Mixpeek or used directly.
Code Example: Generating an Image
import replicate
def generate_image(prompt):
output = replicate.run(
"black-forest-labs/flux-dev",
input={
"prompt": prompt,
"guidance": 3.5,
"aspect_ratio": "1:1",
"output_format": "webp",
"output_quality": 80
}
)
return output
image_prompt = "womens street skateboarding final in Paris Olympics 2024"
generated_image_url = generate_image(image_prompt)
print("Generated Image URL:", generated_image_url)
Step 4: Integrating the Pipeline
The final step integrates the entire process. We first generate a new image, index it using Mixpeek, and then use that image to search for similar images in our indexed collection.
Code Example: Integrated Pipeline
# Step 1: Generate a new image
generated_image_url = generate_image("womens street skateboarding final in Paris Olympics 2024")
# Step 2: Index the generated image
index_response = index_image(generated_image_url, collection_id)
print("Generated and Indexed Image:", index_response)
# Step 3: Search for similar images using the generated image
similar_images = search_images_by_image(generated_image_url, collection_id)
print("Similar Images Found:", similar_images)
Full code: https://github.com/mixpeek/use-cases/blob/master/multimodal-rag/flux-replicate.py
Conclusion
This pipeline combines the best of image indexing, retrieval, and generation technologies. By leveraging Mixpeek’s multimodal search and Replicate’s state-of-the-art image generation model, developers can create powerful, automated systems for managing and creating visual content.