Serverless nightly ETL without Airflow or Kubernetes
Run nightly ETL pipelines—extract from APIs or databases, transform with Node.js or Python, load to your data warehouse or store—as serverless pipeline steps with cron triggers, step-level retries, and execution history. No Airflow DAG to manage, no Kubernetes cluster for workers.
Last updated: 2026-04-20
Answer first
Direct answer
Serverless nightly ETL without Airflow or Kubernetes. Inquir pipelines chain serverless function steps with dependsOn relationships. Extract step fetches data from source; transform step applies business logic; load step writes to destination. Each step has its own retry budget; a failing load step retries without re-running the extract.
When it fits
- You have 1–10 nightly data sync jobs that do not justify an Airflow cluster
- ETL steps take 5–60 minutes and need step-level retries without restarting from scratch
Tradeoffs
- Airflow requires a PostgreSQL metadata database, a scheduler process, workers, and optionally Celery or Kubernetes for task execution. For a 3-step nightly ETL that runs in 20 minutes, the infrastructure cost exceeds the business value for most early teams.
- Even managed Airflow (MWAA, Astronomer, Cloud Composer) carries meaningful fixed cost and a DAG authoring model that is different from the rest of your serverless backend.
Workload and what breaks
ETL orchestration options and their trade-offs
- Airflow: powerful DAG orchestration, but a Kubernetes cluster or managed cloud service to maintain
- Prefect / Dagster: better developer experience than Airflow, but still requires a separate deployment
- crontab: simple but no run history, no retries, no dependency tracking between steps
- Lambda + EventBridge: possible, but step chaining and observability require significant wiring
Most nightly ETL jobs at early-stage companies are not complex enough to justify Airflow. They need: reliable scheduling, step-level retries, visible run history, and shared secrets with the rest of the backend. Inquir pipelines cover this without a separate orchestration service.
Where shortcuts fail
Why Airflow is overkill for small ETL pipelines
Airflow requires a PostgreSQL metadata database, a scheduler process, workers, and optionally Celery or Kubernetes for task execution. For a 3-step nightly ETL that runs in 20 minutes, the infrastructure cost exceeds the business value for most early teams.
Even managed Airflow (MWAA, Astronomer, Cloud Composer) carries meaningful fixed cost and a DAG authoring model that is different from the rest of your serverless backend.
How Inquir helps
Pipeline steps as a lightweight ETL orchestrator
Inquir pipelines chain serverless function steps with dependsOn relationships. Extract step fetches data from source; transform step applies business logic; load step writes to destination. Each step has its own retry budget; a failing load step retries without re-running the extract.
Nightly ETL pipelines share workspace secrets (DB URLs, API keys) and observability with HTTP routes and background jobs. One platform for all backend workloads—no separate Airflow cluster.
What you get
Nightly ETL pipeline features
Multi-step DAG with dependsOn
Extract → Transform → Load as sequential pipeline steps. Failed load retries independently without restarting extract.
Fan-out parallel extraction
Trigger N parallel extract steps (one per source), fan in to a single transform step. Reduce total ETL run time.
Python for transformations
Python 3.12 with pandas, numpy, and sqlalchemy for data transformation steps. Node.js for API extraction steps. Both in the same pipeline.
Alerting on duration SLO
Set an alert when the nightly ETL takes longer than expected—before stale data reaches dashboards and reports.
What to do next
Nightly ETL pipeline structure
Extract step: fetch from sources
Call external APIs or query source database. Store raw data to intermediate storage or pass as pipeline payload.
Transform step: apply business logic
Clean, normalize, enrich, and aggregate. Python with pandas; Node.js for JSON transforms. Output structured data.
Load step: write to destination
Upsert to data warehouse, update analytics tables, generate aggregates. Idempotent by batch ID or date range.
Code example
Nightly ETL: API → transform → Postgres
Cron triggers nightly at 02:00 UTC. Extract step fetches, transform step normalizes, load step upserts. Each step retries independently.
export async function handler(event) { const date = new Date().toISOString().slice(0, 10); // YYYY-MM-DD const records = await externalApi.fetchDailyReport(date); // Store to object storage — pipeline passes storage key to next step const key = `etl/raw/${date}.json`; await storage.putJson(key, records); return { key, count: records.length, date }; }
export async function handler(event) { const { key, date } = event.previousOutput ?? {}; const raw = await storage.getJson(key); const transformed = raw.map((r) => ({ id: r.external_id, date, revenue: parseFloat(r.revenue_usd), region: r.region?.toLowerCase(), updatedAt: new Date().toISOString(), })); const outKey = `etl/transformed/${date}.json`; await storage.putJson(outKey, transformed); return { outKey, count: transformed.length, date }; }
When it fits
Use serverless nightly ETL when
When this works
- You have 1–10 nightly data sync jobs that do not justify an Airflow cluster
- ETL steps take 5–60 minutes and need step-level retries without restarting from scratch
When to skip it
- Complex DAGs with hundreds of tasks, branching conditions, and cross-DAG dependencies—Airflow or Dagster are better orchestrators at that scale
FAQ
FAQ
Can I pass data between pipeline steps?
Yes—pipeline steps return structured output. The next step receives it as event.previousOutput. For large payloads (CSV, JSON files), store to object storage and pass the storage key between steps.
How do I handle a failed ETL load step?
Configure retry count and delay on the load step. Ensure the load step is idempotent (upsert by stable ID or date range). If retries exhaust, execution history records the failure and alerts fire.