// A Field Guide

Python Typing
for ML Engineers

From exploratory scripts to production-grade pipelines. The complete mental model, not just the syntax.

AudienceML / MLOps Engineers
Python3.10+
ApproachIncremental · Opinionated
Chapters3 + Reference
Chapter 01

Why Types? The Real Argument.

Not "good practice." Not "professional code." The actual reason types pay dividends specifically when you've crossed from data science into ML engineering.

// The Core Insight

Data science code has one reader: you, this weekend.
Production ML code has one reader: the developer six months from now who has no context, is debugging a pipeline failure, at 11pm, before a model rollout.

Types are documentation that cannot lie.

The ML Engineer's Specific Pain Points

Let's be precise. Here's where untyped code specifically hurts in ML engineering:

Pipeline contracts break silently. You have a preprocess() function. Someone changes it to return a pd.DataFrame instead of np.ndarray. The function downstream still accepts it — until a specific code path in production hits the wrong branch. Types would have caught this at commit time, not in the post-deployment incident.

Config objects are the worst offenders. Untyped config dicts travel across your entire codebase — training scripts, evaluation harnesses, model registries. When a key is renamed, nothing tells you. With typed configs, a key rename is a compiler error across your entire codebase, instantly.

Model interfaces are implicit contracts. If you're building a model registry, a serving layer, or an A/B testing harness, you're relying on every model implementing some interface — predict(), transform(), etc. Types let you declare that interface and have it verified, rather than hoping every model implementation "just follows convention."

📌 Note — Runtime vs Static

Python types are not enforced at runtime by default. They are annotations — hints for static analysis tools (mypy, pyright) and IDEs. Think of them less as constraints and more as a formal specification language you layer on top of Python. You get the benefits through tooling, not through the interpreter.

What You Actually Get

Refactor confidence. The single highest-value return from typing. When you need to change a feature engineering function's return shape, or rename a config field, types let you make the change and then run mypy to get a complete list of every callsite that now needs updating. This is transformative in codebases with 20+ modules.

IDE superpowers. Autocomplete on return types. Inline documentation from function signatures. "Find all usages" that actually works. In a large ML repo, you'll spend significantly less time reading source to understand what a function does when the signature tells you.

CI as a safety net. Once mypy is in your CI pipeline, entire categories of bugs simply cannot reach production. This is the MLOps win: type safety as a gate in your deployment pipeline.

✦ Strategy — Gradual Adoption

You do not type everything at once. The correct strategy is: type the interfaces first. Public function signatures, config classes, model contracts. The internals can come later. 30% of typing effort buys you 80% of the safety benefit.

§
Chapter 02

The Core 20% That Does 80% of the Work

Master these seven constructs and you can type virtually any ML codebase. Everything else is either shorthand or edge cases.

2.1 — Basic Annotations Essential

The syntax is minimal. Parameters get : Type, return values get -> Type. That's the entire grammar.

model_utils.py
# Before: caller has to read the body to know what to pass
def train(df, target, n_estimators, max_depth):
    ...

# After: the signature is the documentation
def train(
    df: pd.DataFrame,
    target: str,
    n_estimators: int = 100,
    max_depth: int | None = None,
) -> RandomForestClassifier:
    ...

Use the built-in types directly. On Python 3.9+, lowercase generics work without imports:

types_cheatsheet.py
# ✓ Modern (Python 3.9+) — prefer these
list[int]
dict[str, float]
set[str]
tuple[int, str, float]   # fixed-length, typed positions
tuple[float, ...]          # variable-length, all floats

# ✗ Old style (pre-3.9) — only if you're stuck on older Python
from typing import List, Dict, Set, Tuple

2.2 — Optional and Union Essential

These appear constantly in ML config and pipeline code. Learn the modern union syntax — it's cleaner and more readable.

✗ Old Syntax
from typing import Optional, Union

def load(
  path: Optional[str] = None
) -> Optional[Model]:
    ...

def parse(
  v: Union[int, float]
) -> float:
    ...
✓ Modern Syntax (3.10+)
# No imports needed

def load(
  path: str | None = None
) -> Model | None:
    ...

def parse(
  v: int | float
) -> float:
    ...
📌 Reading Optional Correctly

str | None doesn't mean "optional parameter." It means the value can be either a string or None. You still need = None to make the parameter itself optional. These are two orthogonal concepts: the type of the value vs. whether the parameter has a default.

2.3 — TypedDict: Typed Config Objects Essential

If there's one thing ML engineers should adopt immediately, it's replacing raw dict configs with TypedDict. Config dicts are the most common source of runtime errors in ML pipelines, and they're entirely preventable.

config.py
from typing import TypedDict

# ✗ Untyped: keys can be anything, values can be anything
config: dict[str, Any] = {"batch_size": 32, "lr": 0.01}

# ✓ TypedDict: keys are fixed, types are declared
class TrainingConfig(TypedDict):
    batch_size: int
    learning_rate: float
    model_name: str
    early_stopping: bool
    checkpoint_dir: str | None

# Still a dict at runtime — no overhead, no changes to call sites
config: TrainingConfig = {
    "batch_size": 32,
    "learning_rate": 0.001,
    "model_name": "xgboost-v2",
    "early_stopping": True,
    "checkpoint_dir": None,
}

Key insight: TypedDict creates zero runtime overhead. It's still a plain dict. The structure only exists in the type system — but that's exactly where you want it for static analysis.

2.4 — Callable: Typing Functions as Arguments Essential

ML pipelines pass transforms, preprocessors, and callbacks as arguments constantly. Callable types these with precision.

pipeline.py
from typing import Callable

# Callable[[arg_types...], return_type]

def build_pipeline(
    data: list[float],
    transforms: list[Callable[[float], float]],  # list of float→float fns
    on_complete: Callable[[str], None],            # callback: str → nothing
) -> list[float]:
    result = data
    for fn in transforms:
        result = [fn(x) for x in result]
    on_complete("done")
    return result

# When signature doesn't matter — use Callable[..., ReturnType]
loader: Callable[..., pd.DataFrame]

2.5 — Literal: Constraining String Parameters Essential

One of the most immediately useful tools. Replace "magic strings" in your pipeline stages, modes, and environment flags with a type that prevents typos at static-check time.

runner.py
from typing import Literal

type RunMode = Literal["train", "eval", "predict"]
type Env = Literal["dev", "staging", "prod"]

def run_experiment(
    mode: RunMode,
    env: Env = "dev",
) -> None:
    ...

run_experiment("train", "prod")   # ✓
run_experiment("trn", "prod")    # ✗ mypy: "trn" not in Literal["train","eval","predict"]

2.6 — Type Aliases: Naming Complex Types Essential

When your types get complex, don't embed them inline. Name them. This is where types start reading like domain language.

types.py
# Python 3.12+ — explicit type alias syntax
type Features = dict[str, float]
type Batch = list[Features]
type Predictions = list[float]
type Transform = Callable[[Features], Features]

# Now your signatures read like domain language:
def run_batch(
    batch: Batch,
    transforms: list[Transform],
) -> Predictions:
    ...

# Python 3.10/3.11 — use TypeAlias
from typing import TypeAlias
Features: TypeAlias = dict[str, float]

2.7 — Dataclasses: Upgrade from TypedDict Essential

TypedDict is great for data-as-dicts (e.g., loaded from JSON, passed to APIs). When you want objects with behavior, or defaults, or validation — reach for @dataclass.

config.py
from dataclasses import dataclass, field

@dataclass
class TrainingConfig:
    model_name: str
    batch_size: int = 32
    learning_rate: float = 1e-3
    tags: list[str] = field(default_factory=list)  # ← correct mutable default
    checkpoint_dir: str | None = None

    def experiment_id(self) -> str:
        return f"{self.model_name}-bs{self.batch_size}"

# Constructed like a class — no dict boilerplate
cfg = TrainingConfig(model_name="xgboost", batch_size=64)
print(cfg.experiment_id())  # "xgboost-bs64"
✦ When to use TypedDict vs dataclass

Use TypedDict when you need to interop with JSON/dicts — e.g., API responses, loaded config files. Use dataclass when you want a proper object with methods, defaults, and you control construction. For heavy validation, consider Pydantic (which extends this pattern with runtime type-checking).

§
Chapter 03

Going Deeper Power Constructs

These are the tools that make typing feel like a first-class design language, not just annotation syntax. You won't need all of these immediately, but you will reach for them.

3.1 — Protocol: Structural Typing for ML Interfaces Power Tool

This is arguably the most important advanced construct for ML engineering. Protocol lets you define an interface that any class can satisfy without inheriting from it. This is structural typing, or "typed duck typing."

The problem it solves: you want to write a model registry that works with sklearn models, PyTorch modules, custom wrappers — without requiring them all to inherit from some base class you control.

interfaces.py
from typing import Protocol
import numpy as np

# Define what "being a model" means — no inheritance required
class Predictor(Protocol):
    def predict(self, X: np.ndarray) -> np.ndarray: ...
    def predict_proba(self, X: np.ndarray) -> np.ndarray: ...

# Your registry works with ANYTHING that satisfies Predictor
class ModelRegistry:
    def __init__(self) -> None:
        self._models: dict[str, Predictor] = {}

    def register(self, name: str, model: Predictor) -> None:
        self._models[name] = model

# sklearn RandomForest "satisfies" Predictor automatically
# Your custom PyTorch wrapper "satisfies" it too
# No inheritance. No coupling. Just structural compatibility.
✦ Protocol vs ABC

Abstract Base Classes (ABC) require explicit inheritanceclass MyModel(BaseModel). Protocol requires nothing. If a class has the right methods, it satisfies the protocol. This is far better for ML code where you integrate third-party models you don't own.

3.2 — Generics: Reusable Typed Utilities Power Tool

Generics let you write functions and classes where the type is a parameter. The canonical example: "this function returns the same type it receives."

dataset.py
from typing import TypeVar, Generic

T = TypeVar("T")

# Function: return type follows input type
def first(items: list[T]) -> T:
    return items[0]

first([1, 2, 3])           # inferred return: int
first(["a", "b"])          # inferred return: str

# Generic class: a typed data store abstraction
class DataStore(Generic[T]):
    def __init__(self) -> None:
        self._data: list[T] = []

    def add(self, item: T) -> None:
        self._data.append(item)

    def get_all(self) -> list[T]:
        return self._data

store: DataStore[TrainingConfig] = DataStore()
store.add(TrainingConfig(model_name="rf"))
configs = store.get_all()  # inferred: list[TrainingConfig]

3.3 — NewType: Semantic Type Safety Power Tool

When two things are both int but should never be confused. This is type safety at the domain semantics level.

ids.py
from typing import NewType

ExperimentId = NewType("ExperimentId", int)
DatasetId    = NewType("DatasetId",    int)
RunId        = NewType("RunId",        int)

def fetch_run(run_id: RunId) -> dict: ...
def fetch_experiment(exp_id: ExperimentId) -> dict: ...

run_id = RunId(42)
exp_id = ExperimentId(7)

fetch_run(run_id)          # ✓
fetch_run(exp_id)          # ✗ mypy: ExperimentId is not RunId
fetch_experiment(run_id)   # ✗ caught — mixing IDs is now a type error

3.4 — Annotated: Types + Metadata Power Tool

You'll encounter this in FastAPI and Pydantic. Annotated lets you attach arbitrary metadata to a type — documentation, validators, units, constraints — without changing the type itself.

serving.py
from typing import Annotated
from fastapi import Query

# The first arg is the actual type. Everything after is metadata.
type Probability = Annotated[float, "must be in [0, 1]"]
type PositiveInt = Annotated[int, "must be > 0"]

# FastAPI uses Annotated to embed validation metadata
from fastapi import FastAPI
app = FastAPI()

@app.get("/predict")
async def predict(
    threshold: Annotated[float, Query(ge=0.0, le=1.0)] = 0.5
) -> dict:
    ...  # FastAPI validates threshold is in [0, 1] from the metadata

3.5 — @overload: Multiple Signatures Power Tool

When one function has different return types depending on the arguments. Common in utility functions that handle multiple input formats.

loader.py
from typing import overload

@overload
def load_data(path: str) -> pd.DataFrame: ...
@overload
def load_data(path: list[str]) -> list[pd.DataFrame]: ...

def load_data(path):  # actual implementation (untyped)
    if isinstance(path, str):
        return pd.read_csv(path)
    return [pd.read_csv(p) for p in path]

# Callers get the right return type inferred:
df   = load_data("data.csv")        # → pd.DataFrame
dfs  = load_data(["a.csv", "b.csv"]) # → list[pd.DataFrame]
§
Chapter 04

Gotchas & Hard-Won Lessons

The things that will bite you if nobody warns you. Read these before you start typing your first module.

⚠ Gotcha #1 — Types Don't Enforce at Runtime

The single most important thing to understand: Python's type annotations are not enforced by the interpreter. This will run without error:

example.py
def square(x: int) -> int:
    return x * x

square("oh no")  # ← Python runs this fine. "oh noh no" is the result.
# mypy catches it. The runtime doesn't.

Types are a static analysis tool. Their value comes from running mypy / pyright in your CI pipeline, not from Python itself. If you want runtime enforcement, use Pydantic.

⚠ Gotcha #2 — Mutable Default Arguments Are Still Broken

Types don't fix one of Python's oldest footguns: mutable default arguments are shared across all calls.

example.py
# ✗ Classic bug — one list shared across ALL calls
def add_feature(name: str, features: list[str] = []) -> list[str]:
    features.append(name)
    return features

# ✓ Correct pattern — use None sentinel
def add_feature(name: str, features: list[str] | None = None) -> list[str]:
    features = features if features is not None else []
    features.append(name)
    return features

# With dataclasses — use field(default_factory=list)
from dataclasses import dataclass, field

@dataclass
class Config:
    tags: list[str] = field(default_factory=list)  # ← correct
⚠ Gotcha #3 — Covariance and Lists

A list[Dog] is NOT a list[Animal], even if Dog is a subclass of Animal. This surprises most people. (It's correct behavior — adding a Cat to a list[Dog] via an Animal reference would be a bug.)

example.py
def process_animals(animals: list[Animal]) -> None: ...

dogs: list[Dog] = [Dog(), Dog()]
process_animals(dogs)  # ✗ mypy error!

# Solution: use Sequence[Animal] (read-only) for covariant use
from collections.abc import Sequence

def process_animals(animals: Sequence[Animal]) -> None: ...
# ✓ Sequence is covariant — list[Dog] accepted
⚠ Gotcha #4 — Any Is a Supertype AND Subtype

Any turns off type checking in both directions. It's assignable to everything and everything is assignable to it. This means one careless Any can silently propagate through your entire call graph — it "infects" types.

example.py
def parse_config(raw: Any) -> Any:     # ← Any escapes here
    return raw["settings"]

config = parse_config(raw_json)       # config: Any (type info is gone)
lr = config["learning_rate"]           # lr: Any (still lost)
model = train(lr=lr)                   # no type errors, but no safety either

# Better — isolate Any at the boundary, then narrow immediately
import json

def parse_config(raw: str) -> TrainingConfig:
    data: Any = json.loads(raw)          # Any stays local
    return TrainingConfig(               # ← re-typed immediately
        batch_size=data["batch_size"],
        learning_rate=data["learning_rate"],
        model_name=data["model_name"],
    )

Rule of thumb: Any should exist at the boundary (I/O, external libraries) and be immediately narrowed into typed structures. Never let it flow inward.

⚠ Gotcha #5 — NumPy and Pandas Typing Is Imperfect

The elephant in the room for ML engineers. np.ndarray typing doesn't encode shape or dtype. pd.DataFrame typing doesn't encode column names or types. This is a known limitation of the ecosystem.

example.py
# This is all mypy knows:
def encode_features(X: np.ndarray) -> np.ndarray:
    ...  # shape and dtype: not encoded in the type

# For richer numpy typing — use numpy.typing
import numpy as np
from numpy.typing import NDArray

def encode_features(X: NDArray[np.float64]) -> NDArray[np.float64]:
    ...  # at least dtype is now encoded

# For shape-safe numpy — consider beartype or jaxtyping in serious codebases

Be pragmatic. Don't let the imperfection of array typing discourage you from typing everything else. The value comes from typing your pipeline interfaces, not individual array shapes.

⚠ Gotcha #6 — Forward References Need Quotes (or __future__)

If a type is referenced before it's defined in the file (common in mutually recursive types or self-referential classes), Python will raise a NameError at import time — unless you quote it or use from __future__ import annotations.

example.py
# Option 1: quote the forward reference
class Node:
    def __init__(self, children: list["Node"]) -> None:
        ...

# Option 2: PEP 563 — all annotations become lazy strings
from __future__ import annotations  # put at top of file

class Node:
    def __init__(self, children: list[Node]) -> None:  # no quotes needed
        ...
§
Reference

Quick Reference & Learning Path

The Type You Should Reach For First

Situation Reach For Notes
Function params / return values int, str, list[X], dict[K,V] Always start here
Value may be absent X | None Replaces Optional[X]
Value can be one of several types X | Y | Z Replaces Union[X, Y, Z]
Config / structured dicts TypedDict or @dataclass TypedDict for JSON-origin, dataclass for constructed objects
Function as argument Callable[[ArgTypes], ReturnType] Transforms, callbacks, pipelines
Fixed string choices / modes Literal["a", "b", "c"] Pipeline stages, environments, modes
Model interface (no inheritance) Protocol Best for third-party model integration
Reusable typed utility TypeVar + Generic Data stores, loaders, registries
Semantically distinct int/str IDs NewType Prevents mixing experiment/run/dataset IDs
NumPy arrays NDArray[np.float64] from numpy.typing — dtype encoded, shape not
FastAPI / Pydantic integration Annotated[Type, metadata] Types + validation constraints

A Realistic 3-Week Adoption Plan

  1. Week 1 — Type the interfaces

    Add type hints to all public function signatures in one service or module. Don't touch the internals yet. Replace any raw config dicts with TypedDict or @dataclass. Run mypy locally and fix the warnings it surfaces.

  2. Week 2 — Type the contracts

    Introduce Protocol wherever you have implicit model interfaces. Replace Any usages with proper types where possible. Add Literal for string-enum parameters. Add type aliases for complex types to improve readability.

  3. Week 3 — Enforce in CI

    Add mypy --strict to your CI pipeline on core modules. Enable it progressively: strict on interfaces first, lenient on internals. This is the point where types start providing compounding returns — every future commit gets checked automatically.

Tooling Stack

mypy pyright ruff beartype pydantic numpy.typing pandas-stubs

mypy is the standard, battle-tested static checker. pyright (from Microsoft) is faster and often stricter — used by Pylance in VSCode. Run at least one in CI. beartype provides runtime type enforcement when you genuinely need it. pandas-stubs installs community type stubs for Pandas to improve mypy accuracy.

// The Final Mental Model Shift

Untyped code describes what runs.

Typed code describes what is true.

The gap between those two things is where production bugs live.

— Types as formal specification, not as constraint