Browse sections

Run ML models (ONNX)

Run model inference inside your functions: load a model once on cold start and serve predictions on every request. The Python 3.12 runtime (glibc) ships onnxruntime, onnx, and scikit-learn as attachable layers, so manylinux wheels work out of the box. Inference is CPU-only; no GPU is available.

Ship the model with your function

Put the model file next to your handler: drop it into the editor's file tree (binary assets are stored byte-exact), or include it in the folder you inquir deploy. At runtime the file lives in /var/task alongside your code. Attach the py-onnxruntime layer in the Layers panel.

handler.py
import os, json
import numpy as np
import onnxruntime as ort

# Created once per cold start, reused by warm containers — not per request.
_MODEL = os.path.join(os.path.dirname(__file__), "model.onnx")
_session = ort.InferenceSession(_MODEL, providers=["CPUExecutionProvider"])
_input = _session.get_inputs()[0].name

def handler(event, context):
    raw = event.get("body")
    payload = json.loads(raw) if isinstance(raw, str) else (event or {})
    features = np.array(payload["features"], dtype=np.float32).reshape(1, -1)
    outputs = _session.run(None, {_input: features})
    return {
        "statusCode": 200,
        "body": json.dumps({"prediction": outputs[0].tolist()}),
    }

Create the inference session at module scope so that it loads once per cold start and is reused by warm containers (~5ms warm invokes) instead of loading on every request.

Memory, size & cold starts

  • The model must fit in the function's memory limit—256MB by default, configurable up to 2GB in the function settings.
  • The session loads on cold start; warm containers keep it in memory, so subsequent invokes skip the load.
  • Large models lengthen cold starts. Keep models small, or raise the memory limit and the timeout (5s by default, up to 15 minutes).

Other ML libraries

Beyond ONNX Runtime, attach py-scikit-learn for classic machine learning (load a fitted estimator and call predict) or py-onnx to build and inspect .onnx graphs. Any pure-Python or manylinux-wheel package can be added as a layer.