Every year each of the world’s largest banks publishes their 2023 Market Outlooks in PDF form.

This makes it challenging to quickly grok through to uncover trends.

To solve this, we’ll be using Mixpeek’s indexer which extracts and stores the contents of each PDF in order for us to do some quick investor due diligence.

Here is a GitHub repo that contains each of the PDFs in raw form, in addition to the code used in this walkthrough:

https://github.com/mixpeek/use-cases/tree/master/2023-market-outlooks

Extract the Content and Index

from mixpeek import Mixpeek

# index entire S3 bucket, which contains each PDF
mix = Mixpeek(
    api_key="mixpeek_api_key",
    access_key="aws_access_key",
    secret_key="aws_secret_key",
    region="us-east-2"
)
mix.index_bucket("2023-market-outlooks")
{
    "file_ids": [
        "63b5c2f274017e585607f1d7",
        "63b5c2f474017e585607f1d8",
        "63b5c2f674017e585607f1d9",
        "63b5c2f774017e585607f1da",
        "63b5c2f874017e585607f1db",
        "63b5c2f974017e585607f1dc",
        "63b5c2fa74017e585607f1dd",
        "63b5c2fb74017e585607f1de",
        "63b5c2fc74017e585607f1df",
        "63b5c2fd74017e585607f1e0",
        "63b5c2fe74017e585607f1e1",
        "63b5c2fe74017e585607f1e2",
        "63b5c2ff74017e585607f1e3",
        "63b5c30074017e585607f1e4",
        "63b5c30274017e585607f1e5",
        "63b5c30374017e585607f1e6"
    ]
}

Search Across Every File

# i want to see what their stance on nuclear energy is
n = mix.search("nuclear")
print(n)

# or maybe, how they perceive the semiconductor market
s = mix.search("semiconductor")
print(s)

Demo of it in action: https://demo.mixpeek.com/files?defaultSearch=nuclear

There are plenty of other finance-related use cases.

We also wrote a tutorial on searching across Amazon Inc’s various Investor Relations assets such as their 10K (pdf) Webcast (audio) and 5K (excel): https://learn.mixpeek.com/automated-stock-analysis/

Build Your Own Search Application

  1. Register an API key for free
  2. Read the Docs
  3. Build, build, build and contact us for help
About the author
Ethan Steininger

Ethan Steininger

Former GTM Lead of MongoDB's NLP platform, Atlas Search. Occasionally off the grid in his self-converted camper van.

Multimodal Makers | Mixpeek

Multimodal Pipelines for AI

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Multimodal Makers | Mixpeek.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.