Getting Started
Wagtail Vector Index combines integrations with AI 'embedding' APIs and vector databases to provide tools to perform advanced AI-powered querying across content.
To do this:
- You set up models/pages to be searchable.
- Wagtail Vector Index splits the content of those pages into chunks and fetches embeddings from the configured AI backend.
- It then stores all those embeddings in the configured vector database.
- When querying, the query is converted to an embedding and, using the vector database, is compared to the embeddings for all your existing content.
What's an Embedding?
An embedding is a big list (vector) of floating point numbers that represent your content in some way. Models like OpenAI's ada-002 can take content and turn it in to a list of numbers such that content that is similar will have a similar list of numbers.
This way, when you provide a query, we can use the same model to get an embedding of that query and do some maths (cosine similarity) to see what content in your vector database is similar to your query.
Indexing Your Models/Pages
To index your models:
- Add Wagtail Vector Index's
VectorIndexedMixinmixin to your model - Set
embedding_fieldsto a list ofEmbeddingFields representing the fields you want to be included in the embeddings
from django.db import models
from wagtail.models import Page
from wagtail_vector_index.storage.django import VectorIndexedMixin, EmbeddingField
class MyPage(VectorIndexedMixin, Page):
body = models.TextField()
embedding_fields = [EmbeddingField("title"), EmbeddingField("body")]
A VectorIndex will be generated for your model which can be accessed using the vector_index class property, e.g.:
If you want more control over how content is indexed, you can instead create your own indexes. See Vector Indexes for more details.
Now you can index your content, see Using Indexes for how to make these indexes work for you.
Updating indexes
To update all indexes, run the update_vector_indexes management command:
To skip the prompt, use the --noinput flag.