Use case · data processing

Serverless CSV processing: parse, transform, and load large files

Accept a CSV upload via HTTP, return 202 immediately, and process the file in a background pipeline. Parse millions of rows, validate and transform data, upsert in batches with idempotency keys, and notify users when the import completes—all outside the HTTP timeout window.

Last updated: 2026-04-20

Deploy a CSV processor →Background jobs use case

Answer first

Direct answer

Serverless CSV processing: parse, transform, and load large files. The HTTP handler validates the file reference (URL or storage key), stores job metadata, and returns 202 with a job ID. The pipeline step reads the CSV in chunks, upserts idempotently by row ID, and records progress for resumability.

When it fits

Customer data imports, product catalog uploads, bulk user migrations
Regular CSV exports from external systems that need nightly or triggered processing

Tradeoffs

Even with streaming parsers, the HTTP handler must stay open while rows are being upserted. A database slow-down mid-import causes a timeout at the gateway, the client retries, and you get duplicate rows unless you built idempotency from the start.
Memory management is harder than it looks: a "streaming" CSV parse that collects validated rows in memory for a bulk insert still loads the entire file into RAM before writing.

Workload and what breaks

Why CSV processing breaks synchronous HTTP handlers

Files with 100k+ rows take 30–300 seconds to parse and insert—well past gateway timeouts
Inline processing holds the HTTP connection open—clients time out or retry, causing double imports
Memory pressure: loading a 50MB CSV into a serverless function in a single pass causes OOM on small runtimes

CSV import is one of the most common patterns that does not fit synchronous HTTP handlers. The file size is unpredictable, the parse and validation time grows linearly with rows, and any failure mid-import without idempotency creates partial data that is hard to recover from.

Where shortcuts fail

Why streaming CSV in the HTTP handler is fragile

Even with streaming parsers, the HTTP handler must stay open while rows are being upserted. A database slow-down mid-import causes a timeout at the gateway, the client retries, and you get duplicate rows unless you built idempotency from the start.

Memory management is harder than it looks: a "streaming" CSV parse that collects validated rows in memory for a bulk insert still loads the entire file into RAM before writing.

How Inquir helps

HTTP accepts, pipeline processes in chunks

The HTTP handler validates the file reference (URL or storage key), stores job metadata, and returns 202 with a job ID. The pipeline step reads the CSV in chunks, upserts idempotently by row ID, and records progress for resumability.

Long CSVs can be processed in multiple pipeline steps—split by row range, fan out in parallel, fan in to a summary step. Each step has its own timeout budget; failure in one step retries that step without restarting from row 1.

What you get

Serverless CSV processing patterns

Chunked processing with resumability

Split large CSVs into row-range pipeline steps. A failure at row 80k resumes from that checkpoint, not from row 1.

Parallel batch upsert

Fan out multiple pipeline steps to process row ranges in parallel—reduce total import time for large files.

Idempotent row upsert

Use a stable row identifier (external ID or row hash) as the upsert key. Re-running the import on the same file produces the same database state.

Progress tracking and notification

HTTP handler returns a job ID. Client polls a status endpoint; final pipeline step notifies via webhook or email when complete.

What to do next

Serverless CSV import flow

HTTP handler accepts file reference, returns 202

Validate the file URL or storage key. Store job record with status=pending. Trigger pipeline with file reference and job ID.

Pipeline step reads and processes CSV

Download and parse the CSV in the pipeline step. Upsert rows in batches of 500–1000 with idempotency keys. Update progress on job record.

Final step notifies

After last batch, mark job complete and notify the user via email, webhook, or status update.

Code example

Chunked CSV import pipeline

HTTP handler returns 202; pipeline step processes the file in batches. Idempotency key prevents duplicate rows on retry.

api/import-csv.mjs (HTTP handler)

export async function handler(event) {
  const { fileUrl, importId } = JSON.parse(event.body || '{}');
  if (!fileUrl || !importId) return { statusCode: 400, body: JSON.stringify({ error: 'fileUrl and importId required' }) };
  await db.imports.create({ id: importId, status: 'pending', fileUrl });
  await global.durable.startNew('process-csv', undefined, { fileUrl, importId });
  return { statusCode: 202, body: JSON.stringify({ importId, status: 'pending' }) };
}

jobs/process-csv.mjs (pipeline step)

import { parse } from 'csv-parse/sync';

export async function handler(event) {
  const { fileUrl, importId } = event.payload ?? {};
  const csvText = await fetch(fileUrl).then((r) => r.text());
  const rows = parse(csvText, { columns: true, skip_empty_lines: true });
  let inserted = 0;
  for (const batch of chunk(rows, 500)) {
    // Upsert by external_id — idempotent on retry
    await db.records.upsertBatch(batch.map((r) => ({ ...r, importId })));
    inserted += batch.length;
  }
  await db.imports.update(importId, { status: 'done', rowCount: rows.length });
  return { importId, rows: rows.length, inserted };
}

When it fits

Use serverless CSV processing for

When this works

Customer data imports, product catalog uploads, bulk user migrations
Regular CSV exports from external systems that need nightly or triggered processing

When to skip it

Tiny CSV files under 1000 rows that process in under 5 seconds—keep those synchronous for simpler debugging

FAQ

How do I handle CSV validation errors?

Collect validation errors per row and store them on the import job record. Return a summary (valid rows, invalid rows, error list) when the job completes. Let the client decide whether to proceed with partial import or fix errors first.

How do I resume a failed import?

Track the last successfully processed row offset on the job record. Re-trigger the pipeline with offset; the step skips already-processed rows. Use upsert-on-external-id for safety.

Direct answer

When it fits

Tradeoffs

Why CSV processing breaks synchronous HTTP handlers

Why streaming CSV in the HTTP handler is fragile

HTTP accepts, pipeline processes in chunks

Serverless CSV processing patterns

Chunked processing with resumability

Parallel batch upsert

Idempotent row upsert

Progress tracking and notification

Serverless CSV import flow

HTTP handler accepts file reference, returns 202

Pipeline step reads and processes CSV

Final step notifies

Chunked CSV import pipeline

Use serverless CSV processing for

✓When this works

×When to skip it

FAQ

Related guides

When this works

When to skip it