Skip to content

Cloudflare Vectorize Setup

This guide will walk you through configuring Cloudflare Vectorize as a vector database destination for Sync or Swim. Vectorize enables semantic search capabilities by storing vector embeddings of your synchronized data.

Cloudflare Vectorize is a vector database that works alongside Cloudflare Workers AI for embedding generation. With Sync or Swim, you can:

  1. Sync data from a source (PostgreSQL, MySQL, Salesforce, etc.)
  2. Automatically generate embeddings using Cloudflare Workers AI
  3. Store vectors with metadata in Vectorize for semantic search
  • Cloudflare Account with Workers paid plan ($5/month minimum)
  • Existing data source configured in Sync or Swim (PostgreSQL, MySQL, Salesforce, etc.)
  • Wrangler CLI installed (for creating the Vectorize index)
  1. Install Wrangler CLI

    Wrangler is Cloudflare’s CLI tool for managing Workers and Vectorize.

    Terminal window
    # Using npm
    npm install -g wrangler
    # Using yarn
    yarn global add wrangler
    # Verify installation
    wrangler --version
  2. Authenticate with Cloudflare

    Terminal window
    # Login to Cloudflare
    wrangler login
    # This will open a browser window for authentication
    # After authorizing, you'll see "Successfully logged in"
  3. Find Your Account ID

    You’ll need your Cloudflare Account ID for configuration.

    Via Wrangler:

    Terminal window
    wrangler whoami

    Via Dashboard:

    1. Log in to the Cloudflare Dashboard
    2. Select any domain or go to Workers & Pages
    3. Your Account ID is displayed in the right sidebar

    Save this value - you’ll need it for Sync or Swim configuration.

  4. Create a Vectorize Index

    A Vectorize index stores your vectors. The index dimensions must match your embedding model.

    ModelDimensionsUse Case
    @cf/baai/bge-small-en-v1.5384Faster, lower resource usage
    @cf/baai/bge-base-en-v1.5768Recommended - Good balance
    @cf/baai/bge-large-en-v1.51024Highest quality, more resources
    Terminal window
    # For bge-base (768 dimensions) - Recommended
    wrangler vectorize create my-sync-index --dimensions=768 --metric=cosine
    # For bge-small (384 dimensions)
    wrangler vectorize create my-sync-index --dimensions=384 --metric=cosine
    # For bge-large (1024 dimensions)
    wrangler vectorize create my-sync-index --dimensions=1024 --metric=cosine

    Verify index creation:

    Terminal window
    wrangler vectorize list
  5. Create an API Token

    1. Go to Cloudflare API Tokens
    2. Click Create Token > Create Custom Token
    3. Configure the token:
      • Token name: Sync or Swim Vectorize
      • Permissions: Account - Workers AI - Edit, Account - Vectorize - Edit
      • Account Resources: Include - Your Account
    4. Click Continue to summary > Create Token
    5. Copy and save the token immediately - it won’t be shown again
  6. Configure Sync or Swim

    1. Navigate to /settings in the Sync or Swim web interface
    2. Click “Add Service”
    3. Select “Cloudflare Vectorize” as the adapter type
    4. Enter your connection details:
      • Account ID: Your Cloudflare account ID
      • API Token: The API token you created
      • Index Name: Name of your Vectorize index
      • Embedding Model: Select the model that matches your index dimensions
    5. Click “Test Connection” to verify
    6. Click “Create Service” to save

Create an object mapping that syncs data from your source to Vectorize.

  1. Navigate to the Mapping Editor
  2. Click Create New Mapping
  3. Select your source service (e.g., PostgreSQL)
  4. Select Vectorize as the destination service
  5. Choose the source object to sync

When Vectorize is the destination, an Embeddings tab appears in the mapping editor:

  1. Select Embedding Fields: Choose which text fields to combine for the embedding

    • Only text/string fields are available
    • Order matters - fields are combined in the order shown
  2. Embedding Template (Optional): Customize how fields are combined

    Title: {{title}}
    Description: {{description}}
    Content: {{body}}

    Leave empty to join fields with newlines.

  3. Select Metadata Fields: Choose fields to store alongside the vector

    • Metadata enables filtering in semantic search
    • Common choices: ID fields, timestamps, categories, titles
  4. Vector ID Template: How to generate the vector ID

    • Default: {{external_id}}
    • Available variables: {{external_id}}, {{source_object}}
  5. Click Save Embedding Config

After syncing, query your vectors using the Vectorize API or a Cloudflare Worker.

export default {
async fetch(request, env) {
const { query } = await request.json();
// Generate embedding for the query
const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: [query]
});
// Search Vectorize
const results = await env.VECTORIZE_INDEX.query(embedding.data[0], {
topK: 10,
returnMetadata: true
});
return Response.json(results);
}
};
Filtering with Metadata

Vectorize supports filtering by metadata:

const results = await env.VECTORIZE_INDEX.query(embedding.data[0], {
topK: 10,
returnMetadata: true,
filter: {
author_id: 123
}
});
Embedding Models Reference
ModelDimensionsSpeedQualityBest For
bge-small384FastestGoodHigh-volume, cost-sensitive
bge-base768BalancedBetterGeneral use (recommended)
bge-large1024SlowerBestQuality-critical applications

Warning: Changing the embedding model requires recreating the index, as dimensions must match.

  1. Create a new index with the correct dimensions
  2. Update the Sync or Swim configuration
  3. Re-sync all data to generate new embeddings
  4. Delete the old index
Rate Limits and Quotas
PlanRequests/DayNotes
Free10,000Limited for testing
Workers Paid100,000+Scales with usage
PlanVectorsQueries/Month
Free5,00030,000
Workers Paid5M+Scales with usage

Sync or Swim implements automatic rate limit handling with exponential backoff.

Troubleshooting

Symptoms: Connection validation fails with authentication error.

Solutions:

  • Verify the API token was copied correctly (no extra spaces)
  • Check the token has both Workers AI and Vectorize permissions
  • Regenerate the token if necessary

Symptoms: Sync fails with index not found.

Solutions:

  • Verify the index_name matches exactly (case-sensitive)
  • Run wrangler vectorize list to confirm the index exists
  • Check you’re using the correct account ID

Symptoms: Vectors fail to insert with dimension error.

Solutions:

  • Verify your index dimensions match your embedding model:
    • bge-small: 384 dimensions
    • bge-base: 768 dimensions
    • bge-large: 1024 dimensions
  • If mismatched, create a new index with correct dimensions

Symptoms: Sync slows down or fails with 429 errors.

Solutions:

  • Sync or Swim automatically retries with backoff
  • For high-volume syncs, consider upgrading your Cloudflare plan
  • Reduce the number of objects being synced simultaneously

Symptoms: Vectors are created but with zero or incorrect embeddings.

Solutions:

  • Verify embedding fields are configured in the Embeddings tab
  • Check that selected fields contain text content
  • Review the embedding preview in the UI
Security Best Practices
  1. Use dedicated API tokens: Create tokens specifically for Sync or Swim
  2. Limit token permissions: Only grant necessary permissions
  3. Rotate tokens regularly: Update tokens periodically
  4. Monitor usage: Review Cloudflare analytics for unusual activity
  5. Secure credentials: Never commit tokens to version control
Cost Considerations

Embedding generation is billed per request:

  • First 10,000 requests/day: Free
  • Additional requests: Based on model and usage
ResourceFree TierPaid
Stored vectors5,000$0.05/1M vectors
Queries30,000/month$0.01/1,000 queries
DimensionsAnyAny

For 100,000 records synced monthly:

  • Embedding generation: ~$0-5 (depending on plan)
  • Vector storage: ~$5/month
  • Queries: Usage-dependent

If you encounter issues not covered in this guide, please contact support with:

  • Cloudflare account type (Free/Paid)
  • Vectorize index configuration (dimensions, metric)
  • Embedding model being used
  • Error messages from Sync or Swim logs
  • Approximate data volume being synced