Cloudflare Vectorize Setup

This guide will walk you through configuring Cloudflare Vectorize as a vector database destination for Sync or Swim. Vectorize enables semantic search capabilities by storing vector embeddings of your synchronized data.

Overview

Cloudflare Vectorize is a vector database that works alongside Cloudflare Workers AI for embedding generation. With Sync or Swim, you can:

Sync data from a source (PostgreSQL, MySQL, Salesforce, etc.)
Automatically generate embeddings using Cloudflare Workers AI
Store vectors with metadata in Vectorize for semantic search

Prerequisites

Cloudflare Account with Workers paid plan ($5/month minimum)
Existing data source configured in Sync or Swim (PostgreSQL, MySQL, Salesforce, etc.)
Wrangler CLI installed (for creating the Vectorize index)

Setup Steps

Install Wrangler CLI

Wrangler is Cloudflare’s CLI tool for managing Workers and Vectorize.

# Using npm
npm install -g wrangler

# Using yarn
yarn global add wrangler

# Verify installation
wrangler --version

Authenticate with Cloudflare

# Login to Cloudflare
wrangler login

# This will open a browser window for authentication
# After authorizing, you'll see "Successfully logged in"

Find Your Account ID

You’ll need your Cloudflare Account ID for configuration.

Via Wrangler:
Terminal window
```
wrangler whoami
```
Via Dashboard:
1. Log in to the Cloudflare Dashboard
2. Select any domain or go to Workers & Pages
3. Your Account ID is displayed in the right sidebar
Save this value - you’ll need it for Sync or Swim configuration.

Create a Vectorize Index

A Vectorize index stores your vectors. The index dimensions must match your embedding model.

Model	Dimensions	Use Case
`@cf/baai/bge-small-en-v1.5`	384	Faster, lower resource usage
`@cf/baai/bge-base-en-v1.5`	768	Recommended - Good balance
`@cf/baai/bge-large-en-v1.5`	1024	Highest quality, more resources

# For bge-base (768 dimensions) - Recommended
wrangler vectorize create my-sync-index --dimensions=768 --metric=cosine

# For bge-small (384 dimensions)
wrangler vectorize create my-sync-index --dimensions=384 --metric=cosine

# For bge-large (1024 dimensions)
wrangler vectorize create my-sync-index --dimensions=1024 --metric=cosine

Verify index creation:

wrangler vectorize list

Create an API Token
1. Go to Cloudflare API Tokens
2. Click Create Token > Create Custom Token
3. Configure the token:
  - Token name: Sync or Swim Vectorize
  - Permissions: Account - Workers AI - Edit, Account - Vectorize - Edit
  - Account Resources: Include - Your Account
4. Click Continue to summary > Create Token
5. Copy and save the token immediately - it won’t be shown again
Configure Sync or Swim
1. Navigate to /settings in the Sync or Swim web interface
2. Click “Add Service”
3. Select “Cloudflare Vectorize” as the adapter type
4. Enter your connection details:
  - Account ID: Your Cloudflare account ID
  - API Token: The API token you created
  - Index Name: Name of your Vectorize index
  - Embedding Model: Select the model that matches your index dimensions
5. Click “Test Connection” to verify
6. Click “Create Service” to save

Create Object Mapping

Create an object mapping that syncs data from your source to Vectorize.

Navigate to the Mapping Editor
Click Create New Mapping
Select your source service (e.g., PostgreSQL)
Select Vectorize as the destination service
Choose the source object to sync

Configure Embeddings

When Vectorize is the destination, an Embeddings tab appears in the mapping editor:

Select Embedding Fields: Choose which text fields to combine for the embedding
- Only text/string fields are available
- Order matters - fields are combined in the order shown
Embedding Template (Optional): Customize how fields are combined
```
Title: {{title}}

Description: {{description}}

Content: {{body}}
```
Leave empty to join fields with newlines.
Select Metadata Fields: Choose fields to store alongside the vector
- Metadata enables filtering in semantic search
- Common choices: ID fields, timestamps, categories, titles
Vector ID Template: How to generate the vector ID
- Default: {{external_id}}
- Available variables: {{external_id}}, {{source_object}}
Click Save Embedding Config

Querying Vectorize

After syncing, query your vectors using the Vectorize API or a Cloudflare Worker.

export default {
  async fetch(request, env) {
    const { query } = await request.json();

    // Generate embedding for the query
    const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
      text: [query]
    });

    // Search Vectorize
    const results = await env.VECTORIZE_INDEX.query(embedding.data[0], {
      topK: 10,
      returnMetadata: true
    });

    return Response.json(results);
  }
};

Filtering with Metadata

Vectorize supports filtering by metadata:

const results = await env.VECTORIZE_INDEX.query(embedding.data[0], {
  topK: 10,
  returnMetadata: true,
  filter: {
    author_id: 123
  }
});

Embedding Models Reference

BGE Model Comparison

Model	Dimensions	Speed	Quality	Best For
`bge-small`	384	Fastest	Good	High-volume, cost-sensitive
`bge-base`	768	Balanced	Better	General use (recommended)
`bge-large`	1024	Slower	Best	Quality-critical applications

Changing Models

Warning: Changing the embedding model requires recreating the index, as dimensions must match.

Create a new index with the correct dimensions
Update the Sync or Swim configuration
Re-sync all data to generate new embeddings
Delete the old index

Rate Limits and Quotas

Workers AI (Embeddings)

Plan	Requests/Day	Notes
Free	10,000	Limited for testing
Workers Paid	100,000+	Scales with usage

Vectorize

Plan	Vectors	Queries/Month
Free	5,000	30,000
Workers Paid	5M+	Scales with usage

Sync or Swim implements automatic rate limit handling with exponential backoff.

Troubleshooting

”Invalid API Token” Error

Symptoms: Connection validation fails with authentication error.

Solutions:

Verify the API token was copied correctly (no extra spaces)
Check the token has both Workers AI and Vectorize permissions
Regenerate the token if necessary

”Index Not Found” Error

Symptoms: Sync fails with index not found.

Solutions:

Verify the index_name matches exactly (case-sensitive)
Run wrangler vectorize list to confirm the index exists
Check you’re using the correct account ID

”Dimension Mismatch” Error

Symptoms: Vectors fail to insert with dimension error.

Solutions:

Verify your index dimensions match your embedding model:
- bge-small: 384 dimensions
- bge-base: 768 dimensions
- bge-large: 1024 dimensions
If mismatched, create a new index with correct dimensions

Rate Limit Errors

Symptoms: Sync slows down or fails with 429 errors.

Solutions:

Sync or Swim automatically retries with backoff
For high-volume syncs, consider upgrading your Cloudflare plan
Reduce the number of objects being synced simultaneously

Empty Embeddings

Symptoms: Vectors are created but with zero or incorrect embeddings.

Solutions:

Verify embedding fields are configured in the Embeddings tab
Check that selected fields contain text content
Review the embedding preview in the UI

Security Best Practices

Use dedicated API tokens: Create tokens specifically for Sync or Swim
Limit token permissions: Only grant necessary permissions
Rotate tokens regularly: Update tokens periodically
Monitor usage: Review Cloudflare analytics for unusual activity
Secure credentials: Never commit tokens to version control

Cost Considerations

Workers AI Pricing

Embedding generation is billed per request:

First 10,000 requests/day: Free
Additional requests: Based on model and usage

Vectorize Pricing

Resource	Free Tier	Paid
Stored vectors	5,000	$0.05/1M vectors
Queries	30,000/month	$0.01/1,000 queries
Dimensions	Any	Any

Estimating Costs

For 100,000 records synced monthly:

Embedding generation: ~$0-5 (depending on plan)
Vector storage: ~$5/month
Queries: Usage-dependent

Additional Resources

Support

If you encounter issues not covered in this guide, please contact support with:

Cloudflare account type (Free/Paid)
Vectorize index configuration (dimensions, metric)
Embedding model being used
Error messages from Sync or Swim logs
Approximate data volume being synced