Inquir Compute logoInquir Compute
Use case · ETL

Serverless nightly ETL without Airflow or Kubernetes

Run nightly ETL pipelines—extract from APIs or databases, transform with Node.js or Python, load to your data warehouse or store—as serverless pipeline steps with cron triggers, step-level retries, and execution history. No Airflow DAG to manage, no Kubernetes cluster for workers.

Last updated: 2026-04-20

Direct answer

Serverless nightly ETL without Airflow or Kubernetes. Inquir pipelines chain serverless function steps with dependsOn relationships. Extract step fetches data from source; transform step applies business logic; load step writes to destination. Each step has its own retry budget; a failing load step retries without re-running the extract.

When it fits

  • You have 1–10 nightly data sync jobs that do not justify an Airflow cluster
  • ETL steps take 5–60 minutes and need step-level retries without restarting from scratch

Tradeoffs

  • Airflow requires a PostgreSQL metadata database, a scheduler process, workers, and optionally Celery or Kubernetes for task execution. For a 3-step nightly ETL that runs in 20 minutes, the infrastructure cost exceeds the business value for most early teams.
  • Even managed Airflow (MWAA, Astronomer, Cloud Composer) carries meaningful fixed cost and a DAG authoring model that is different from the rest of your serverless backend.

ETL orchestration options and their trade-offs

  • Airflow: powerful DAG orchestration, but a Kubernetes cluster or managed cloud service to maintain
  • Prefect / Dagster: better developer experience than Airflow, but still requires a separate deployment
  • crontab: simple but no run history, no retries, no dependency tracking between steps
  • Lambda + EventBridge: possible, but step chaining and observability require significant wiring

Most nightly ETL jobs at early-stage companies are not complex enough to justify Airflow. They need: reliable scheduling, step-level retries, visible run history, and shared secrets with the rest of the backend. Inquir pipelines cover this without a separate orchestration service.

Why Airflow is overkill for small ETL pipelines

Airflow requires a PostgreSQL metadata database, a scheduler process, workers, and optionally Celery or Kubernetes for task execution. For a 3-step nightly ETL that runs in 20 minutes, the infrastructure cost exceeds the business value for most early teams.

Even managed Airflow (MWAA, Astronomer, Cloud Composer) carries meaningful fixed cost and a DAG authoring model that is different from the rest of your serverless backend.

Pipeline steps as a lightweight ETL orchestrator

Inquir pipelines chain serverless function steps with dependsOn relationships. Extract step fetches data from source; transform step applies business logic; load step writes to destination. Each step has its own retry budget; a failing load step retries without re-running the extract.

Nightly ETL pipelines share workspace secrets (DB URLs, API keys) and observability with HTTP routes and background jobs. One platform for all backend workloads—no separate Airflow cluster.

Nightly ETL pipeline features

Multi-step DAG with dependsOn

Extract → Transform → Load as sequential pipeline steps. Failed load retries independently without restarting extract.

Fan-out parallel extraction

Trigger N parallel extract steps (one per source), fan in to a single transform step. Reduce total ETL run time.

Python for transformations

Python 3.12 with pandas, numpy, and sqlalchemy for data transformation steps. Node.js for API extraction steps. Both in the same pipeline.

Alerting on duration SLO

Set an alert when the nightly ETL takes longer than expected—before stale data reaches dashboards and reports.

Nightly ETL pipeline structure

1

Extract step: fetch from sources

Call external APIs or query source database. Store raw data to intermediate storage or pass as pipeline payload.

2

Transform step: apply business logic

Clean, normalize, enrich, and aggregate. Python with pandas; Node.js for JSON transforms. Output structured data.

3

Load step: write to destination

Upsert to data warehouse, update analytics tables, generate aggregates. Idempotent by batch ID or date range.

Nightly ETL: API → transform → Postgres

Cron triggers nightly at 02:00 UTC. Extract step fetches, transform step normalizes, load step upserts. Each step retries independently.

jobs/etl-extract.mjs (step 1)
export async function handler(event) {
  const date = new Date().toISOString().slice(0, 10); // YYYY-MM-DD
  const records = await externalApi.fetchDailyReport(date);
  // Store to object storage — pipeline passes storage key to next step
  const key = `etl/raw/${date}.json`;
  await storage.putJson(key, records);
  return { key, count: records.length, date };
}
jobs/etl-transform.mjs (step 2 — receives step 1 output)
export async function handler(event) {
  const { key, date } = event.previousOutput ?? {};
  const raw = await storage.getJson(key);
  const transformed = raw.map((r) => ({
    id: r.external_id,
    date,
    revenue: parseFloat(r.revenue_usd),
    region: r.region?.toLowerCase(),
    updatedAt: new Date().toISOString(),
  }));
  const outKey = `etl/transformed/${date}.json`;
  await storage.putJson(outKey, transformed);
  return { outKey, count: transformed.length, date };
}

Use serverless nightly ETL when

When this works

  • You have 1–10 nightly data sync jobs that do not justify an Airflow cluster
  • ETL steps take 5–60 minutes and need step-level retries without restarting from scratch

When to skip it

  • Complex DAGs with hundreds of tasks, branching conditions, and cross-DAG dependencies—Airflow or Dagster are better orchestrators at that scale

FAQ

Can I pass data between pipeline steps?

Yes—pipeline steps return structured output. The next step receives it as event.previousOutput. For large payloads (CSV, JSON files), store to object storage and pass the storage key between steps.

How do I handle a failed ETL load step?

Configure retry count and delay on the load step. Ensure the load step is idempotent (upsert by stable ID or date range). If retries exhaust, execution history records the failure and alerts fire.

Inquir Compute logoInquir Compute

The simplest way to run AI agents and backend jobs without infrastructure.

Contact info@inquir.org

© 2025 Inquir Compute. All rights reserved.