Run model inference inside your functions: load a model once on cold start and serve predictions on every request. The Python 3.12 runtime (glibc) ships onnxruntime, onnx, and scikit-learn as attachable layers, so manylinux wheels work out of the box. Inference is CPU-only; no GPU is available.
Ship the model with your function
Put the model file next to your handler: drop it into the editor's file tree (binary assets are stored byte-exact), or include it in the folder you inquir deploy. At runtime the file lives in /var/task alongside your code. Attach the py-onnxruntime layer in the Layers panel.
import os, json import numpy as np import onnxruntime as ort # Created once per cold start, reused by warm containers — not per request. _MODEL = os.path.join(os.path.dirname(__file__), "model.onnx") _session = ort.InferenceSession(_MODEL, providers=["CPUExecutionProvider"]) _input = _session.get_inputs()[0].name def handler(event, context): raw = event.get("body") payload = json.loads(raw) if isinstance(raw, str) else (event or {}) features = np.array(payload["features"], dtype=np.float32).reshape(1, -1) outputs = _session.run(None, {_input: features}) return { "statusCode": 200, "body": json.dumps({"prediction": outputs[0].tolist()}), }
Create the inference session at module scope so that it loads once per cold start and is reused by warm containers (~5ms warm invokes) instead of loading on every request.
Memory, size & cold starts
- The model must fit in the function's memory limit—256MB by default, configurable up to 2GB in the function settings.
- The session loads on cold start; warm containers keep it in memory, so subsequent invokes skip the load.
- Large models lengthen cold starts. Keep models small, or raise the memory limit and the timeout (5s by default, up to 15 minutes).
Other ML libraries
Beyond ONNX Runtime, attach py-scikit-learn for classic machine learning (load a fitted estimator and call predict) or py-onnx to build and inspect .onnx graphs. Any pure-Python or manylinux-wheel package can be added as a layer.