Serverless scheduled data sync with incremental cursors
Automate incremental data sync between external APIs and internal databases using serverless cron pipelines: watermark-based cursor tracking, idempotent upserts, per-run execution history, and retries on failure—all without a VPS crontab or always-on worker process.
Last updated: 2026-04-20
Answer first
Direct answer
Serverless scheduled data sync with incremental cursors. Inquir scheduled pipelines fire as serverless functions with execution history. The sync handler reads the last successful watermark (stored in a database or environment variable), fetches only records updated since that cursor, upserts idempotently, and stores the new watermark.
When it fits
- CRM, ERP, or third-party API data that needs periodic refresh in your database
- Hourly or daily delta sync where full re-fetch is too slow or quota-expensive
Tradeoffs
- Syncing all records on every run works for small datasets. At 100k+ records, a full sync takes too long and consumes too much API quota. External APIs rate-limit you; your sync starts failing intermittently.
- Without a watermark cursor, every failed sync must restart from the beginning. A sync that fails at record 95k restarts at record 0—wasteful, slow, and likely to hit the same failure again.
Workload and what breaks
Why scheduled data sync fails silently
- VPS crontab: failed syncs go to /dev/null or root mail nobody reads
- Full re-sync every run: re-fetching all records wastes API quota and increases sync time quadratically
- Missing idempotency: network failure mid-sync creates partial state and duplicate records on retry
- No run history: "did the sync run last night?" requires SSH and log search
Data sync jobs are the most common source of silent production failures. The job runs, something fails partway through, no alert fires, and the data drift compounds silently until a user reports stale records days later.
Where shortcuts fail
Why stateless cron sync without a cursor fails at scale
Syncing all records on every run works for small datasets. At 100k+ records, a full sync takes too long and consumes too much API quota. External APIs rate-limit you; your sync starts failing intermittently.
Without a watermark cursor, every failed sync must restart from the beginning. A sync that fails at record 95k restarts at record 0—wasteful, slow, and likely to hit the same failure again.
How Inquir helps
Incremental sync pipelines with cursor state
Inquir scheduled pipelines fire as serverless functions with execution history. The sync handler reads the last successful watermark (stored in a database or environment variable), fetches only records updated since that cursor, upserts idempotently, and stores the new watermark.
If a run fails, it retries from the same cursor position—not from the beginning. Execution history shows every run, its watermark range, records synced, and any failures.
What you get
Scheduled data sync features
Incremental cursor tracking
Store updatedAt or sequence cursor per sync job. Fetch only delta since last success—efficient API quota usage and fast sync time.
Idempotent upsert
Upsert by stable external ID. Re-running the same cursor range produces the same database state—safe to retry on failure.
Per-run execution history
Every scheduled run creates an execution record: cursor range, records synced, duration, success/failure. No SSH needed to answer "did it run?"
Configurable cron expressions
Every 15 minutes, hourly, daily, or custom—validated at save time. Run history tracks all runs across expression changes.
What to do next
Incremental sync pipeline pattern
Read cursor from state store
Retrieve last-successful watermark from database or environment. Default to a reasonable lookback window for first run.
Fetch delta and upsert
Query external API for records updated since cursor. Upsert by external ID in batches. Handle pagination inside the step.
Persist new cursor
On success, store the new cursor (max updatedAt from this batch). Return stats for execution history.
Code example
Incremental sync with watermark cursor
Reads cursor from environment/DB, fetches delta, upserts idempotently, returns new cursor. Cron trigger fires on schedule; retries on failure.
export async function handler(event) { // Read watermark — default to 24h ago on first run const cursor = await db.syncCursors.get('crm-contacts') ?? new Date(Date.now() - 86_400_000).toISOString(); let page = 0, synced = 0, newCursor = cursor; do { const { contacts, nextPage } = await crm.fetchContacts({ updatedAfter: cursor, page }); if (contacts.length === 0) break; await db.contacts.upsertBatch(contacts); // upsert by contacts[].externalId synced += contacts.length; newCursor = contacts.at(-1)?.updatedAt ?? newCursor; page = nextPage; } while (page); await db.syncCursors.set('crm-contacts', newCursor); return { synced, cursor, newCursor }; }
When it fits
Use scheduled data sync for
When this works
- CRM, ERP, or third-party API data that needs periodic refresh in your database
- Hourly or daily delta sync where full re-fetch is too slow or quota-expensive
When to skip it
- Real-time sync that needs sub-minute latency—use webhooks instead of cron for event-driven updates
FAQ
FAQ
How do I handle external API rate limits in sync jobs?
Add a per-page delay in the sync loop; catch 429 responses and sleep before retrying the page. Configure pipeline step retry with backoff for transient rate limit failures.
What if the external API does not support incremental queries?
Fetch all records and diff against your database using hash or updatedAt comparison. For large datasets, use a staging table approach: full import to staging, diff against production, apply delta.