From exploratory scripts to production-grade pipelines. The complete mental model, not just the syntax.
Not "good practice." Not "professional code." The actual reason types pay dividends specifically when you've crossed from data science into ML engineering.
Data science code has one reader: you, this weekend.
Production ML code has one reader: the developer six months from now who has no context, is debugging a pipeline failure, at 11pm, before a model rollout.
Types are documentation that cannot lie.
Let's be precise. Here's where untyped code specifically hurts in ML engineering:
Pipeline contracts break silently. You have a preprocess() function. Someone changes it to return a pd.DataFrame instead of np.ndarray. The function downstream still accepts it — until a specific code path in production hits the wrong branch. Types would have caught this at commit time, not in the post-deployment incident.
Config objects are the worst offenders. Untyped config dicts travel across your entire codebase — training scripts, evaluation harnesses, model registries. When a key is renamed, nothing tells you. With typed configs, a key rename is a compiler error across your entire codebase, instantly.
Model interfaces are implicit contracts. If you're building a model registry, a serving layer, or an A/B testing harness, you're relying on every model implementing some interface — predict(), transform(), etc. Types let you declare that interface and have it verified, rather than hoping every model implementation "just follows convention."
Python types are not enforced at runtime by default. They are annotations — hints for static analysis tools (mypy, pyright) and IDEs. Think of them less as constraints and more as a formal specification language you layer on top of Python. You get the benefits through tooling, not through the interpreter.
Refactor confidence. The single highest-value return from typing. When you need to change a feature engineering function's return shape, or rename a config field, types let you make the change and then run mypy to get a complete list of every callsite that now needs updating. This is transformative in codebases with 20+ modules.
IDE superpowers. Autocomplete on return types. Inline documentation from function signatures. "Find all usages" that actually works. In a large ML repo, you'll spend significantly less time reading source to understand what a function does when the signature tells you.
CI as a safety net. Once mypy is in your CI pipeline, entire categories of bugs simply cannot reach production. This is the MLOps win: type safety as a gate in your deployment pipeline.
You do not type everything at once. The correct strategy is: type the interfaces first. Public function signatures, config classes, model contracts. The internals can come later. 30% of typing effort buys you 80% of the safety benefit.
Master these seven constructs and you can type virtually any ML codebase. Everything else is either shorthand or edge cases.
The syntax is minimal. Parameters get : Type, return values get -> Type. That's the entire grammar.
# Before: caller has to read the body to know what to pass def train(df, target, n_estimators, max_depth): ... # After: the signature is the documentation def train( df: pd.DataFrame, target: str, n_estimators: int = 100, max_depth: int | None = None, ) -> RandomForestClassifier: ...
Use the built-in types directly. On Python 3.9+, lowercase generics work without imports:
# ✓ Modern (Python 3.9+) — prefer these list[int] dict[str, float] set[str] tuple[int, str, float] # fixed-length, typed positions tuple[float, ...] # variable-length, all floats # ✗ Old style (pre-3.9) — only if you're stuck on older Python from typing import List, Dict, Set, Tuple
These appear constantly in ML config and pipeline code. Learn the modern union syntax — it's cleaner and more readable.
from typing import Optional, Union def load( path: Optional[str] = None ) -> Optional[Model]: ... def parse( v: Union[int, float] ) -> float: ...
# No imports needed def load( path: str | None = None ) -> Model | None: ... def parse( v: int | float ) -> float: ...
str | None doesn't mean "optional parameter." It means the value can be either a string or None. You still need = None to make the parameter itself optional. These are two orthogonal concepts: the type of the value vs. whether the parameter has a default.
If there's one thing ML engineers should adopt immediately, it's replacing raw dict configs with TypedDict. Config dicts are the most common source of runtime errors in ML pipelines, and they're entirely preventable.
from typing import TypedDict # ✗ Untyped: keys can be anything, values can be anything config: dict[str, Any] = {"batch_size": 32, "lr": 0.01} # ✓ TypedDict: keys are fixed, types are declared class TrainingConfig(TypedDict): batch_size: int learning_rate: float model_name: str early_stopping: bool checkpoint_dir: str | None # Still a dict at runtime — no overhead, no changes to call sites config: TrainingConfig = { "batch_size": 32, "learning_rate": 0.001, "model_name": "xgboost-v2", "early_stopping": True, "checkpoint_dir": None, }
Key insight: TypedDict creates zero runtime overhead. It's still a plain dict. The structure only exists in the type system — but that's exactly where you want it for static analysis.
ML pipelines pass transforms, preprocessors, and callbacks as arguments constantly. Callable types these with precision.
from typing import Callable # Callable[[arg_types...], return_type] def build_pipeline( data: list[float], transforms: list[Callable[[float], float]], # list of float→float fns on_complete: Callable[[str], None], # callback: str → nothing ) -> list[float]: result = data for fn in transforms: result = [fn(x) for x in result] on_complete("done") return result # When signature doesn't matter — use Callable[..., ReturnType] loader: Callable[..., pd.DataFrame]
One of the most immediately useful tools. Replace "magic strings" in your pipeline stages, modes, and environment flags with a type that prevents typos at static-check time.
from typing import Literal type RunMode = Literal["train", "eval", "predict"] type Env = Literal["dev", "staging", "prod"] def run_experiment( mode: RunMode, env: Env = "dev", ) -> None: ... run_experiment("train", "prod") # ✓ run_experiment("trn", "prod") # ✗ mypy: "trn" not in Literal["train","eval","predict"]
When your types get complex, don't embed them inline. Name them. This is where types start reading like domain language.
# Python 3.12+ — explicit type alias syntax type Features = dict[str, float] type Batch = list[Features] type Predictions = list[float] type Transform = Callable[[Features], Features] # Now your signatures read like domain language: def run_batch( batch: Batch, transforms: list[Transform], ) -> Predictions: ... # Python 3.10/3.11 — use TypeAlias from typing import TypeAlias Features: TypeAlias = dict[str, float]
TypedDict is great for data-as-dicts (e.g., loaded from JSON, passed to APIs). When you want objects with behavior, or defaults, or validation — reach for @dataclass.
from dataclasses import dataclass, field @dataclass class TrainingConfig: model_name: str batch_size: int = 32 learning_rate: float = 1e-3 tags: list[str] = field(default_factory=list) # ← correct mutable default checkpoint_dir: str | None = None def experiment_id(self) -> str: return f"{self.model_name}-bs{self.batch_size}" # Constructed like a class — no dict boilerplate cfg = TrainingConfig(model_name="xgboost", batch_size=64) print(cfg.experiment_id()) # "xgboost-bs64"
Use TypedDict when you need to interop with JSON/dicts — e.g., API responses, loaded config files. Use dataclass when you want a proper object with methods, defaults, and you control construction. For heavy validation, consider Pydantic (which extends this pattern with runtime type-checking).
These are the tools that make typing feel like a first-class design language, not just annotation syntax. You won't need all of these immediately, but you will reach for them.
This is arguably the most important advanced construct for ML engineering. Protocol lets you define an interface that any class can satisfy without inheriting from it. This is structural typing, or "typed duck typing."
The problem it solves: you want to write a model registry that works with sklearn models, PyTorch modules, custom wrappers — without requiring them all to inherit from some base class you control.
from typing import Protocol import numpy as np # Define what "being a model" means — no inheritance required class Predictor(Protocol): def predict(self, X: np.ndarray) -> np.ndarray: ... def predict_proba(self, X: np.ndarray) -> np.ndarray: ... # Your registry works with ANYTHING that satisfies Predictor class ModelRegistry: def __init__(self) -> None: self._models: dict[str, Predictor] = {} def register(self, name: str, model: Predictor) -> None: self._models[name] = model # sklearn RandomForest "satisfies" Predictor automatically # Your custom PyTorch wrapper "satisfies" it too # No inheritance. No coupling. Just structural compatibility.
Abstract Base Classes (ABC) require explicit inheritance — class MyModel(BaseModel). Protocol requires nothing. If a class has the right methods, it satisfies the protocol. This is far better for ML code where you integrate third-party models you don't own.
Generics let you write functions and classes where the type is a parameter. The canonical example: "this function returns the same type it receives."
from typing import TypeVar, Generic T = TypeVar("T") # Function: return type follows input type def first(items: list[T]) -> T: return items[0] first([1, 2, 3]) # inferred return: int first(["a", "b"]) # inferred return: str # Generic class: a typed data store abstraction class DataStore(Generic[T]): def __init__(self) -> None: self._data: list[T] = [] def add(self, item: T) -> None: self._data.append(item) def get_all(self) -> list[T]: return self._data store: DataStore[TrainingConfig] = DataStore() store.add(TrainingConfig(model_name="rf")) configs = store.get_all() # inferred: list[TrainingConfig]
When two things are both int but should never be confused. This is type safety at the domain semantics level.
from typing import NewType ExperimentId = NewType("ExperimentId", int) DatasetId = NewType("DatasetId", int) RunId = NewType("RunId", int) def fetch_run(run_id: RunId) -> dict: ... def fetch_experiment(exp_id: ExperimentId) -> dict: ... run_id = RunId(42) exp_id = ExperimentId(7) fetch_run(run_id) # ✓ fetch_run(exp_id) # ✗ mypy: ExperimentId is not RunId fetch_experiment(run_id) # ✗ caught — mixing IDs is now a type error
You'll encounter this in FastAPI and Pydantic. Annotated lets you attach arbitrary metadata to a type — documentation, validators, units, constraints — without changing the type itself.
from typing import Annotated from fastapi import Query # The first arg is the actual type. Everything after is metadata. type Probability = Annotated[float, "must be in [0, 1]"] type PositiveInt = Annotated[int, "must be > 0"] # FastAPI uses Annotated to embed validation metadata from fastapi import FastAPI app = FastAPI() @app.get("/predict") async def predict( threshold: Annotated[float, Query(ge=0.0, le=1.0)] = 0.5 ) -> dict: ... # FastAPI validates threshold is in [0, 1] from the metadata
When one function has different return types depending on the arguments. Common in utility functions that handle multiple input formats.
from typing import overload @overload def load_data(path: str) -> pd.DataFrame: ... @overload def load_data(path: list[str]) -> list[pd.DataFrame]: ... def load_data(path): # actual implementation (untyped) if isinstance(path, str): return pd.read_csv(path) return [pd.read_csv(p) for p in path] # Callers get the right return type inferred: df = load_data("data.csv") # → pd.DataFrame dfs = load_data(["a.csv", "b.csv"]) # → list[pd.DataFrame]
The things that will bite you if nobody warns you. Read these before you start typing your first module.
The single most important thing to understand: Python's type annotations are not enforced by the interpreter. This will run without error:
def square(x: int) -> int: return x * x square("oh no") # ← Python runs this fine. "oh noh no" is the result. # mypy catches it. The runtime doesn't.
Types are a static analysis tool. Their value comes from running mypy / pyright in your CI pipeline, not from Python itself. If you want runtime enforcement, use Pydantic.
Types don't fix one of Python's oldest footguns: mutable default arguments are shared across all calls.
# ✗ Classic bug — one list shared across ALL calls def add_feature(name: str, features: list[str] = []) -> list[str]: features.append(name) return features # ✓ Correct pattern — use None sentinel def add_feature(name: str, features: list[str] | None = None) -> list[str]: features = features if features is not None else [] features.append(name) return features # With dataclasses — use field(default_factory=list) from dataclasses import dataclass, field @dataclass class Config: tags: list[str] = field(default_factory=list) # ← correct
A list[Dog] is NOT a list[Animal], even if Dog is a subclass of Animal. This surprises most people. (It's correct behavior — adding a Cat to a list[Dog] via an Animal reference would be a bug.)
def process_animals(animals: list[Animal]) -> None: ... dogs: list[Dog] = [Dog(), Dog()] process_animals(dogs) # ✗ mypy error! # Solution: use Sequence[Animal] (read-only) for covariant use from collections.abc import Sequence def process_animals(animals: Sequence[Animal]) -> None: ... # ✓ Sequence is covariant — list[Dog] accepted
Any turns off type checking in both directions. It's assignable to everything and everything is assignable to it. This means one careless Any can silently propagate through your entire call graph — it "infects" types.
def parse_config(raw: Any) -> Any: # ← Any escapes here return raw["settings"] config = parse_config(raw_json) # config: Any (type info is gone) lr = config["learning_rate"] # lr: Any (still lost) model = train(lr=lr) # no type errors, but no safety either # Better — isolate Any at the boundary, then narrow immediately import json def parse_config(raw: str) -> TrainingConfig: data: Any = json.loads(raw) # Any stays local return TrainingConfig( # ← re-typed immediately batch_size=data["batch_size"], learning_rate=data["learning_rate"], model_name=data["model_name"], )
Rule of thumb: Any should exist at the boundary (I/O, external libraries) and be immediately narrowed into typed structures. Never let it flow inward.
The elephant in the room for ML engineers. np.ndarray typing doesn't encode shape or dtype. pd.DataFrame typing doesn't encode column names or types. This is a known limitation of the ecosystem.
# This is all mypy knows: def encode_features(X: np.ndarray) -> np.ndarray: ... # shape and dtype: not encoded in the type # For richer numpy typing — use numpy.typing import numpy as np from numpy.typing import NDArray def encode_features(X: NDArray[np.float64]) -> NDArray[np.float64]: ... # at least dtype is now encoded # For shape-safe numpy — consider beartype or jaxtyping in serious codebases
Be pragmatic. Don't let the imperfection of array typing discourage you from typing everything else. The value comes from typing your pipeline interfaces, not individual array shapes.
If a type is referenced before it's defined in the file (common in mutually recursive types or self-referential classes), Python will raise a NameError at import time — unless you quote it or use from __future__ import annotations.
# Option 1: quote the forward reference class Node: def __init__(self, children: list["Node"]) -> None: ... # Option 2: PEP 563 — all annotations become lazy strings from __future__ import annotations # put at top of file class Node: def __init__(self, children: list[Node]) -> None: # no quotes needed ...
| Situation | Reach For | Notes |
|---|---|---|
| Function params / return values | int, str, list[X], dict[K,V] |
Always start here |
| Value may be absent | X | None |
Replaces Optional[X] |
| Value can be one of several types | X | Y | Z |
Replaces Union[X, Y, Z] |
| Config / structured dicts | TypedDict or @dataclass |
TypedDict for JSON-origin, dataclass for constructed objects |
| Function as argument | Callable[[ArgTypes], ReturnType] |
Transforms, callbacks, pipelines |
| Fixed string choices / modes | Literal["a", "b", "c"] |
Pipeline stages, environments, modes |
| Model interface (no inheritance) | Protocol |
Best for third-party model integration |
| Reusable typed utility | TypeVar + Generic |
Data stores, loaders, registries |
| Semantically distinct int/str IDs | NewType |
Prevents mixing experiment/run/dataset IDs |
| NumPy arrays | NDArray[np.float64] |
from numpy.typing — dtype encoded, shape not |
| FastAPI / Pydantic integration | Annotated[Type, metadata] |
Types + validation constraints |
Add type hints to all public function signatures in one service or module. Don't touch the internals yet. Replace any raw config dicts with TypedDict or @dataclass. Run mypy locally and fix the warnings it surfaces.
Introduce Protocol wherever you have implicit model interfaces. Replace Any usages with proper types where possible. Add Literal for string-enum parameters. Add type aliases for complex types to improve readability.
Add mypy --strict to your CI pipeline on core modules. Enable it progressively: strict on interfaces first, lenient on internals. This is the point where types start providing compounding returns — every future commit gets checked automatically.
mypy is the standard, battle-tested static checker. pyright (from Microsoft) is faster and often stricter — used by Pylance in VSCode. Run at least one in CI. beartype provides runtime type enforcement when you genuinely need it. pandas-stubs installs community type stubs for Pandas to improve mypy accuracy.
Untyped code describes what runs.
Typed code describes what is true.
The gap between those two things is where production bugs live.