Cloudflare Vectorize Setup
This guide will walk you through configuring Cloudflare Vectorize as a vector database destination for Sync or Swim. Vectorize enables semantic search capabilities by storing vector embeddings of your synchronized data.
Overview
Section titled “Overview”Cloudflare Vectorize is a vector database that works alongside Cloudflare Workers AI for embedding generation. With Sync or Swim, you can:
- Sync data from a source (PostgreSQL, MySQL, Salesforce, etc.)
- Automatically generate embeddings using Cloudflare Workers AI
- Store vectors with metadata in Vectorize for semantic search
Prerequisites
Section titled “Prerequisites”- Cloudflare Account with Workers paid plan ($5/month minimum)
- Existing data source configured in Sync or Swim (PostgreSQL, MySQL, Salesforce, etc.)
- Wrangler CLI installed (for creating the Vectorize index)
Setup Steps
Section titled “Setup Steps”-
Install Wrangler CLI
Wrangler is Cloudflare’s CLI tool for managing Workers and Vectorize.
Terminal window # Using npmnpm install -g wrangler# Using yarnyarn global add wrangler# Verify installationwrangler --version -
Authenticate with Cloudflare
Terminal window # Login to Cloudflarewrangler login# This will open a browser window for authentication# After authorizing, you'll see "Successfully logged in" -
Find Your Account ID
You’ll need your Cloudflare Account ID for configuration.
Via Wrangler:
Terminal window wrangler whoamiVia Dashboard:
- Log in to the Cloudflare Dashboard
- Select any domain or go to Workers & Pages
- Your Account ID is displayed in the right sidebar
Save this value - you’ll need it for Sync or Swim configuration.
-
Create a Vectorize Index
A Vectorize index stores your vectors. The index dimensions must match your embedding model.
Model Dimensions Use Case @cf/baai/bge-small-en-v1.5384 Faster, lower resource usage @cf/baai/bge-base-en-v1.5768 Recommended - Good balance @cf/baai/bge-large-en-v1.51024 Highest quality, more resources Terminal window # For bge-base (768 dimensions) - Recommendedwrangler vectorize create my-sync-index --dimensions=768 --metric=cosine# For bge-small (384 dimensions)wrangler vectorize create my-sync-index --dimensions=384 --metric=cosine# For bge-large (1024 dimensions)wrangler vectorize create my-sync-index --dimensions=1024 --metric=cosineVerify index creation:
Terminal window wrangler vectorize list -
Create an API Token
- Go to Cloudflare API Tokens
- Click Create Token > Create Custom Token
- Configure the token:
- Token name:
Sync or Swim Vectorize - Permissions: Account - Workers AI - Edit, Account - Vectorize - Edit
- Account Resources: Include - Your Account
- Token name:
- Click Continue to summary > Create Token
- Copy and save the token immediately - it won’t be shown again
-
Configure Sync or Swim
- Navigate to
/settingsin the Sync or Swim web interface - Click “Add Service”
- Select “Cloudflare Vectorize” as the adapter type
- Enter your connection details:
- Account ID: Your Cloudflare account ID
- API Token: The API token you created
- Index Name: Name of your Vectorize index
- Embedding Model: Select the model that matches your index dimensions
- Click “Test Connection” to verify
- Click “Create Service” to save
- Navigate to
Create Object Mapping
Section titled “Create Object Mapping”Create an object mapping that syncs data from your source to Vectorize.
- Navigate to the Mapping Editor
- Click Create New Mapping
- Select your source service (e.g., PostgreSQL)
- Select Vectorize as the destination service
- Choose the source object to sync
Configure Embeddings
Section titled “Configure Embeddings”When Vectorize is the destination, an Embeddings tab appears in the mapping editor:
-
Select Embedding Fields: Choose which text fields to combine for the embedding
- Only text/string fields are available
- Order matters - fields are combined in the order shown
-
Embedding Template (Optional): Customize how fields are combined
Title: {{title}}Description: {{description}}Content: {{body}}Leave empty to join fields with newlines.
-
Select Metadata Fields: Choose fields to store alongside the vector
- Metadata enables filtering in semantic search
- Common choices: ID fields, timestamps, categories, titles
-
Vector ID Template: How to generate the vector ID
- Default:
{{external_id}} - Available variables:
{{external_id}},{{source_object}}
- Default:
-
Click Save Embedding Config
Querying Vectorize
Section titled “Querying Vectorize”After syncing, query your vectors using the Vectorize API or a Cloudflare Worker.
export default { async fetch(request, env) { const { query } = await request.json();
// Generate embedding for the query const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: [query] });
// Search Vectorize const results = await env.VECTORIZE_INDEX.query(embedding.data[0], { topK: 10, returnMetadata: true });
return Response.json(results); }};Filtering with Metadata
Vectorize supports filtering by metadata:
const results = await env.VECTORIZE_INDEX.query(embedding.data[0], { topK: 10, returnMetadata: true, filter: { author_id: 123 }});Embedding Models Reference
BGE Model Comparison
Section titled “BGE Model Comparison”| Model | Dimensions | Speed | Quality | Best For |
|---|---|---|---|---|
bge-small | 384 | Fastest | Good | High-volume, cost-sensitive |
bge-base | 768 | Balanced | Better | General use (recommended) |
bge-large | 1024 | Slower | Best | Quality-critical applications |
Changing Models
Section titled “Changing Models”Warning: Changing the embedding model requires recreating the index, as dimensions must match.
- Create a new index with the correct dimensions
- Update the Sync or Swim configuration
- Re-sync all data to generate new embeddings
- Delete the old index
Rate Limits and Quotas
Workers AI (Embeddings)
Section titled “Workers AI (Embeddings)”| Plan | Requests/Day | Notes |
|---|---|---|
| Free | 10,000 | Limited for testing |
| Workers Paid | 100,000+ | Scales with usage |
Vectorize
Section titled “Vectorize”| Plan | Vectors | Queries/Month |
|---|---|---|
| Free | 5,000 | 30,000 |
| Workers Paid | 5M+ | Scales with usage |
Sync or Swim implements automatic rate limit handling with exponential backoff.
Troubleshooting
”Invalid API Token” Error
Section titled “”Invalid API Token” Error”Symptoms: Connection validation fails with authentication error.
Solutions:
- Verify the API token was copied correctly (no extra spaces)
- Check the token has both
Workers AIandVectorizepermissions - Regenerate the token if necessary
”Index Not Found” Error
Section titled “”Index Not Found” Error”Symptoms: Sync fails with index not found.
Solutions:
- Verify the
index_namematches exactly (case-sensitive) - Run
wrangler vectorize listto confirm the index exists - Check you’re using the correct account ID
”Dimension Mismatch” Error
Section titled “”Dimension Mismatch” Error”Symptoms: Vectors fail to insert with dimension error.
Solutions:
- Verify your index dimensions match your embedding model:
bge-small: 384 dimensionsbge-base: 768 dimensionsbge-large: 1024 dimensions
- If mismatched, create a new index with correct dimensions
Rate Limit Errors
Section titled “Rate Limit Errors”Symptoms: Sync slows down or fails with 429 errors.
Solutions:
- Sync or Swim automatically retries with backoff
- For high-volume syncs, consider upgrading your Cloudflare plan
- Reduce the number of objects being synced simultaneously
Empty Embeddings
Section titled “Empty Embeddings”Symptoms: Vectors are created but with zero or incorrect embeddings.
Solutions:
- Verify embedding fields are configured in the Embeddings tab
- Check that selected fields contain text content
- Review the embedding preview in the UI
Security Best Practices
- Use dedicated API tokens: Create tokens specifically for Sync or Swim
- Limit token permissions: Only grant necessary permissions
- Rotate tokens regularly: Update tokens periodically
- Monitor usage: Review Cloudflare analytics for unusual activity
- Secure credentials: Never commit tokens to version control
Cost Considerations
Workers AI Pricing
Section titled “Workers AI Pricing”Embedding generation is billed per request:
- First 10,000 requests/day: Free
- Additional requests: Based on model and usage
Vectorize Pricing
Section titled “Vectorize Pricing”| Resource | Free Tier | Paid |
|---|---|---|
| Stored vectors | 5,000 | $0.05/1M vectors |
| Queries | 30,000/month | $0.01/1,000 queries |
| Dimensions | Any | Any |
Estimating Costs
Section titled “Estimating Costs”For 100,000 records synced monthly:
- Embedding generation: ~$0-5 (depending on plan)
- Vector storage: ~$5/month
- Queries: Usage-dependent
Additional Resources
Section titled “Additional Resources”- Cloudflare Vectorize Documentation
- Cloudflare Workers AI Documentation
- Wrangler CLI Documentation
- BGE Embedding Models
Support
Section titled “Support”If you encounter issues not covered in this guide, please contact support with:
- Cloudflare account type (Free/Paid)
- Vectorize index configuration (dimensions, metric)
- Embedding model being used
- Error messages from Sync or Swim logs
- Approximate data volume being synced