Use case · ETL

Serverless nightly ETL without Airflow or Kubernetes

Run nightly ETL pipelines—extract from APIs or databases, transform with Node.js or Python, load to your data warehouse or store—as serverless pipeline steps with cron triggers, step-level retries, and execution history. No Airflow DAG to manage, no Kubernetes cluster for workers.

Last updated: 2026-04-20

Schedule a nightly ETL job →Cron jobs use case

Answer first

Direct answer

Serverless nightly ETL without Airflow or Kubernetes. Inquir pipelines chain serverless function steps with dependsOn relationships. Extract step fetches data from source; transform step applies business logic; load step writes to destination. Each step has its own retry budget; a failing load step retries without re-running the extract.

When it fits

You have 1–10 nightly data sync jobs that do not justify an Airflow cluster
ETL steps take 5–60 minutes and need step-level retries without restarting from scratch

Tradeoffs

Airflow requires a PostgreSQL metadata database, a scheduler process, workers, and optionally Celery or Kubernetes for task execution. For a 3-step nightly ETL that runs in 20 minutes, the infrastructure cost exceeds the business value for most early teams.
Even managed Airflow (MWAA, Astronomer, Cloud Composer) carries meaningful fixed cost and a DAG authoring model that is different from the rest of your serverless backend.

Workload and what breaks

ETL orchestration options and their trade-offs

Airflow: powerful DAG orchestration, but a Kubernetes cluster or managed cloud service to maintain
Prefect / Dagster: better developer experience than Airflow, but still requires a separate deployment
crontab: simple but no run history, no retries, no dependency tracking between steps
Lambda + EventBridge: possible, but step chaining and observability require significant wiring

Most nightly ETL jobs at early-stage companies are not complex enough to justify Airflow. They need: reliable scheduling, step-level retries, visible run history, and shared secrets with the rest of the backend. Inquir pipelines cover this without a separate orchestration service.

Where shortcuts fail

Why Airflow is overkill for small ETL pipelines

Airflow requires a PostgreSQL metadata database, a scheduler process, workers, and optionally Celery or Kubernetes for task execution. For a 3-step nightly ETL that runs in 20 minutes, the infrastructure cost exceeds the business value for most early teams.

Even managed Airflow (MWAA, Astronomer, Cloud Composer) carries meaningful fixed cost and a DAG authoring model that is different from the rest of your serverless backend.

How Inquir helps

Pipeline steps as a lightweight ETL orchestrator

Inquir pipelines chain serverless function steps with dependsOn relationships. Extract step fetches data from source; transform step applies business logic; load step writes to destination. Each step has its own retry budget; a failing load step retries without re-running the extract.

Nightly ETL pipelines share workspace secrets (DB URLs, API keys) and observability with HTTP routes and background jobs. One platform for all backend workloads—no separate Airflow cluster.

What you get

Nightly ETL pipeline features

Multi-step DAG with dependsOn

Extract → Transform → Load as sequential pipeline steps. Failed load retries independently without restarting extract.

Fan-out parallel extraction

Trigger N parallel extract steps (one per source), fan in to a single transform step. Reduce total ETL run time.

Python for transformations

Python 3.12 with pandas, numpy, and sqlalchemy for data transformation steps. Node.js for API extraction steps. Both in the same pipeline.

Alerting on duration SLO

Set an alert when the nightly ETL takes longer than expected—before stale data reaches dashboards and reports.

What to do next

Nightly ETL pipeline structure

Extract step: fetch from sources

Call external APIs or query source database. Store raw data to intermediate storage or pass as pipeline payload.

Transform step: apply business logic

Clean, normalize, enrich, and aggregate. Python with pandas; Node.js for JSON transforms. Output structured data.

Load step: write to destination

Upsert to data warehouse, update analytics tables, generate aggregates. Idempotent by batch ID or date range.

Code example

Nightly ETL: API → transform → Postgres

Cron triggers nightly at 02:00 UTC. Extract step fetches, transform step normalizes, load step upserts. Each step retries independently.

jobs/etl-extract.mjs (step 1)

export async function handler(event) {
  const date = new Date().toISOString().slice(0, 10); // YYYY-MM-DD
  const records = await externalApi.fetchDailyReport(date);
  // Store to object storage — pipeline passes storage key to next step
  const key = `etl/raw/${date}.json`;
  await storage.putJson(key, records);
  return { key, count: records.length, date };
}

jobs/etl-transform.mjs (step 2 — receives step 1 output)

export async function handler(event) {
  const { key, date } = event.previousOutput ?? {};
  const raw = await storage.getJson(key);
  const transformed = raw.map((r) => ({
    id: r.external_id,
    date,
    revenue: parseFloat(r.revenue_usd),
    region: r.region?.toLowerCase(),
    updatedAt: new Date().toISOString(),
  }));
  const outKey = `etl/transformed/${date}.json`;
  await storage.putJson(outKey, transformed);
  return { outKey, count: transformed.length, date };
}

When it fits

Use serverless nightly ETL when

When this works

You have 1–10 nightly data sync jobs that do not justify an Airflow cluster
ETL steps take 5–60 minutes and need step-level retries without restarting from scratch

When to skip it

Complex DAGs with hundreds of tasks, branching conditions, and cross-DAG dependencies—Airflow or Dagster are better orchestrators at that scale

FAQ

Can I pass data between pipeline steps?

Yes—pipeline steps return structured output. The next step receives it as event.previousOutput. For large payloads (CSV, JSON files), store to object storage and pass the storage key between steps.

How do I handle a failed ETL load step?

Configure retry count and delay on the load step. Ensure the load step is idempotent (upsert by stable ID or date range). If retries exhaust, execution history records the failure and alerts fire.

Direct answer

When it fits

Tradeoffs

ETL orchestration options and their trade-offs

Why Airflow is overkill for small ETL pipelines

Pipeline steps as a lightweight ETL orchestrator

Nightly ETL pipeline features

Multi-step DAG with dependsOn

Fan-out parallel extraction

Python for transformations

Alerting on duration SLO

Nightly ETL pipeline structure

Extract step: fetch from sources

Transform step: apply business logic

Load step: write to destination

Nightly ETL: API → transform → Postgres

Use serverless nightly ETL when

✓When this works

×When to skip it

FAQ

Related guides

When this works

When to skip it