DEV Community

Cover image for Summoning Your ML Model from the Cloud: FastAPI + AWS Lambda Speedrun
Kaemon Lovendahl
Kaemon Lovendahl

Posted on • Originally published at glitchedgoblet.blog

Summoning Your ML Model from the Cloud: FastAPI + AWS Lambda Speedrun

Intro

Hello, fellow wizards! If you've been following my posts you'll know that I've been learning about Machine Learning and MLOps. Currently, I've been working on deploying a new Logistic Regression that can make requests to via API calls. I've gone through the process of training and fine-tuning a new model. So in this article we'll go through the next steps of deployment and making a request.

TL;DR In under 15 minutes we'll train a ~45 KB Logistic Regression that predicts whether a Magic: The Gathering card will become an EDH staple (EDH Rec rank ≤ 5 000) using only four features. mana value, card type, color identity, and rarity. Then we'll sling the model into AWS Lambda via a containerised FastAPI so you can make predictions on demand.

Why Even Go Serverless

  • Cold starts are small these days - Lambda container images give you up to 10 GB, which is plenty for most models.
  • Scale to zero - pay only when your model actually gets summoned.
  • Ops? Lol - no fleet of EC2 instances to babysit.

If your model needs sub-100 ms latency 24/7, maybe spin up a GPU box instead. For everything else, Lambda is your bff.

Prerequisites

Tool Tested Version Purpose
Python 3.11 Training + inference script
FastAPI 0.110 Lightweight API
Docker CLI 25+ Build container image
AWS CLI 2.15 Push + deploy
AWS Account Obviously
MTGJSON Card data source

Why these four features?

  • Mana value - Great way to weed out the likelihood of a card being usable. Lower mana value is more playable than a high mana value card.
  • Card type (creature / non-creature) - Creatures are the most popular cards to play.
  • Color count - More colors means more specific deck pairings whereas less colors can go into more decks.
  • Rarity tier - Mythics and rares tend to be pushed (though not always!). Encoding it gives the model extra signal for power level.

All four are text-or-numeric columns already present in Scryfall/MTG JSON dumps. Which means no embeddings or heavy tokenisation needed. Perfect for a Lambda free-tier budget.

1 · Train a snack-sized model

# train.py
import joblib, pandas as pd
from sklearn.linear_model import LogisticRegression
from pathlib import Path

cards = pd.read_csv("cards.csv") # download from https://mtgjson.com

# --- feature engineering --------------------------------------------
cards["numColors"] = cards["colorIdentity"].apply(len)
cards["isCreature"] = cards["type"].str.contains("Creature").astype(int)
rarity_map = {"common": 0, "uncommon": 1, "rare": 2, "mythic": 3}
cards["rarityScore"] = cards["rarity"].str.lower().map(rarity_map).fillna(0)

X = cards[["manaValue", "numColors", "isCreature", "rarityScore"]]
y = (cards["edhrecRank"] <= 5000).astype(int)

model = LogisticRegression(max_iter=1000).fit(X, y)
joblib.dump(model, "edh_staple_model.joblib")
print("Saved model — size:", round(Path('edh_staple_model.joblib').stat().st_size / 1024, 1), "KB")
Enter fullscreen mode Exit fullscreen mode

Result? ≈45 KB. Tiny enough that cold starts won't feel like summoning Eldrazi.

2 · Wrap the model in FastAPI (app.py)

This will create a simple API that accepts card features and returns the probability of being an EDH staple.

from fastapi import FastAPI
from pydantic import BaseModel
import joblib, numpy as np

model = joblib.load("edh_staple_model.joblib")
app = FastAPI(title="EDH-Staple-Predictor")

class CardFeatures(BaseModel):
    manaValue: float
    numColors: int # 0-5
    isCreature: int # 1 = Creature, 0 = not
    rarityScore: int # 0-common … 3-mythic

@app.post("/predict")
def predict(card: CardFeatures):
    feats = [[card.manaValue, card.numColors, card.isCreature, card.rarityScore]]
    prob = float(model.predict_proba(feats)[0, 1])
    return {"stapleProbability": round(prob, 3)}

# for local dev fun\if __name__ == "__main__":
    import uvicorn; uvicorn.run(app, host="0.0.0.0", port=8000)
Enter fullscreen mode Exit fullscreen mode

3 · Dockerise → ECR → Lambda

Next, we'll need to create a Dockerfile to package our FastAPI app and model into a container image that AWS Lambda can run.

# Base image: AWS Lambda Python 3.11 runtime
FROM public.ecr.aws/lambda/python:3.11

COPY app.py edh_staple_model.joblib ./
RUN pip install --no-cache-dir fastapi uvicorn gunicorn joblib scikit-learn pydantic mangum

# ASGI-to-Lambda shim
CMD [ "app.handler" ]
Enter fullscreen mode Exit fullscreen mode

Heads-up: mangum provides the handler ASGI adapter. Add this to the bottom of app.py:

from mangum import Mangum
handler = Mangum(app)

Then, from the repo root:

# build & push (replace with your IDs)
docker build -t gg-staple-api .
docker tag gg-staple-api:latest <aws-id>.dkr.ecr.<region>.amazonaws.com/gg-staple-api:latest
docker push <aws-id>.dkr.ecr.<region>.amazonaws.com/gg-staple-api:latest

# one-liner Lambda creation
aws lambda create-function \
  --function-name gg-staple-api \
  --package-type Image \
  --code ImageUri=<aws-id>.dkr.ecr.<region>.amazonaws.com/gg-staple-api:latest \
  --memory-size 256 \
  --timeout 10
Enter fullscreen mode Exit fullscreen mode

4 · Test the spell (curl)

Make sure your Lambda function is deployed and ready. You can test it using curl or any HTTP client of your choice.

curl -X POST https://<api-gw>.execute-api.<region>.amazonaws.com/predict \
  -H "Content-Type: application/json" \
  -d '{"manaValue":2,"numColors":1,"isCreature":1,"rarityScore":2}'
# → {"stapleProbability":0.73}
Enter fullscreen mode Exit fullscreen mode

5 · Side quests & power-ups

Pain point Quick fix
Cold starts ≥ 500 ms Use ARM/Graviton base image or SnapStart + Provisioned Concurrency
Accuracy meh? Try LightGBM (~200 KB) or XGBoost with tree_method=hist
Model getting chonky Strip scikit-learn from runtime; ship pure-NumPy weights

Obviously this model had to be small for Lambda to be viable. There is a lot more optimisation you can do to improve accuracy, such as:

  • Feature engineering: Add more features like keywords, card text, or even historical price trends.
  • Model selection: Try more complex models like LightGBM or XGBoost, which can handle categorical features better and often yield higher accuracy.
  • Hyperparameter tuning: Use tools like Optuna or Hyperopt to find the best hyperparameters for your model.

Final Thoughts

With a pinch of feature engineering and
docker build && docker push, you've moved an ML model from dev box to globally-scalable Lambda endpoint. No K8s, no autoscaling groups, just a pay-per-invocation goblet of goodness.

Feel free to fork the repo, toss in extra features (e.g., keywords in rules text), or swap in a boosted tree. All the plumbing stays the same.

Drink deeply, code boldly, and may your deployments stay glitch-free. brb, brewing more coffee.

Top comments (0)