Consider Python’s NewType Instead of an Alias

2020 December 19 • 8 min read

Updated on November 2, 2022

TL;DR: If you’re trying to document or encode information, you probably want a NewType. If you’re trying to save keypresses, a type alias is fine.

I've written this post from the perspective of Python. Still, the information should be applicable in any language with the NewType construct. Examples include Rust, Haskell, and Go.

Python typing background

Python’s typing module, introduced in version 3.5, allows developers to specify type hints for their methods and variables. Programs like mypy can then statically check these type hints for correctness. When type hints are used throughout your program and checked with mypy, they allow you to catch whole classes of bugs without running your program. They also serve as verified documentation for your methods (though they are not a complete replacement for full documentation).

# Example of function without type hints
def copy_and_scale(features, x):
    return {name: value * x for name, value in features.items()}

# Example of function with type hints
from typing import Dict


def copy_and_scale(
    features: Dict[str, int], x: int
) -> Dict[str, int]:
    return {name: value * x for name, value in features.items()}

One aspect of typing that doesn’t get enough use in my experience is the NewType helper. NewTypes are often lumped together with type aliases, with the latter used in cases where the former is more appropriate.

NewType overview

The typing docs provide an excellent summary of when you would want to use a NewType vs. a type alias:

Recall that the use of a type alias declares two types to be equivalent to one another. Doing Alias = Original will make the static type checker treat Alias as being exactly equivalent to Original in all cases. This is useful when you want to simplify complex type signatures.
In contrast, NewType declares one type to be a subtype of another. Doing Derived = NewType('Derived', Original) will make the static type checker treat Derived as a subclass of Original, which means a value of type Original cannot be used in places where a value of type Derived is expected. This is useful when you want to prevent logic errors with minimal runtime cost.

Let’s look at a simple pair of concrete examples to see the difference:

# Type alias example with mypy output below
Uid = str


def process_uid(uid: Uid) -> None:
    ...


process_uid("foo")

# mypy output:
# Success: no issues found in 1 source file

# NewType example with mypy output below
Uid = NewType("Uid", str)


def process_uid(uid: Uid) -> None:
    ...


process_uid("foo")

# mypy output:
# error: Argument 1 to "process_uid" has incompatible type "str"; expected "Uid"
# Found 1 error in 1 file (checked 1 source file)

To resolve the error, you need something like the following:

Uid = NewType("Uid", str)


def process_uid(uid: Uid) -> None:
    ...


process_uid(Uid("foo"))

So, what’s the point?

From these examples, we can hopefully see why mypy succeeds or fails, but it probably looks like NewType adds boilerplate without much benefit. The point of NewType is to help prevent logic bugs, so how does casting our string via Uid("foo") help achieve this?

For that, let’s look at another example:

Age = NewType("Age", int)


def query_for_options_with_ages(
    url: str,
) -> Dict[str, List[Age]]:
    ...


def calculate_cost(options: int, age: Age) -> int:
    ...


options_age_info = query_for_options_with_ages(
    "https://example.com/"
)

for options, ages in options_age_info.items():
    age_option_len_pairs = [(age, len(options)) for age in ages]
    costs = [
        calculate_cost(option_len, age)
        for option_len, age in age_option_len_pairs
    ]

    logger.log(
        "options: %s | estimated total cost: %d",
        options,
        sum(costs),
    )

Do you see the bug in our code? Luckily, mypy does:

error: Argument 2 to "calculate_cost" has incompatible type "int"; expected "Age"

Why is our age variable of type int instead of Age? Because we swapped our tuple unpacking!

Instead of for option_len, age, it should be for age, option_len.

So yes, the above-contrived example shows a case where mypy catches a bug that would not have been caught if we used a Age = int type alias. Of course, if we had been more careful while writing the code, we would have spotted this obvious blunder and wouldn’t need to rely on mypy (or the NewType boilerplate). We even named our variable age_option_len_pairs in the correct order!

While this is true, the issue’s crux is that humans make mistakes. Even the best programmers make silly mistakes.

To borrow an excellent phrase from a fantastic blog post:

Engineering is not about “not doing mistakes”. Engineering is about designing systems that ensure fewer mistakes occur.

Luckily, we have tools that we can leverage to catch some of these mistakes before they deploy in the wild. Type systems are one of our better safeguards for this, and the typing module with mypy is the closest thing we have to this in Python right now.

Alas, type systems are not a silver bullet, and it’s essential to understand their limitations. But these limitations shouldn’t stop us from leveraging them whenever we can. We can’t let perfection be the enemy of good.

How to know when to use a NewType?

Next time you code Python and write out a type alias, ask yourself if the name you are aliasing is the same thing as the type you are aliasing it to. Or is it a subtype and, therefore, would be better as a NewType? In our first examples, all Uid instances are str, but not all str instances are the same as Uid. Likewise, all Ages are ints, but not all ints are Ages.

Unfortunately, there’s no magic formula for when something should be a NewType vs. a simple type alias. Let’s look at a simple example where a type alias probably makes more sense:

def get_features() -> Dict[str, Union[int, float, str]]:
    ...


def copy_and_scale(
    features: Dict[str, Union[int, float, str]], x: int
) -> Dict[str, Union[int, float, str]]:
    ...


def log_features(
    features: Dict[str, Union[int, float, str]]
) -> None:
    ...

The Dict[str, Union[int, float, str]] type appears in multiple places in this code. Introducing a type alias removes this duplication from our code and arguably makes it more readable:

Features = Dict[str, Union[int, float, str]]


def get_features() -> Features:
    ...


def copy_and_scale(features: Features, x: int) -> Features:
    ...


def log_features(features: Features) -> None:
    ...

What about a NewType? Wouldn’t that help mypy catch bugs when we input a dictionary of something other than features instead of an actual Features dictionary?

We could make a new type, but then we would need to support all the operations we want to do. For example, we want to be able to add new features to our dictionary, and we may also want to update or delete existing features. We might want to combine two dictionaries of features into one easily.

A heuristic I use for determining if something should be a NewType vs. a type alias is how simple the actual type is. By actual type, I mean the type on the right-hand side of the assignment. So, for example, if we have Features = Dict[str, Union[int, float, str, bool]], then the actual type is relatively complex. If we have Uid = str, then the actual type is simple.

If the actual type is simple, then you probably want a NewType. Why? A type alias for a simple type probably isn’t to reduce redundancy. It almost certainly isn’t to save keystrokes.

It’s usually to signify that we have a particular case of the actual type, or that we want to better “document” our data. Suppose we care enough to alias our float, str, bool, etc. to a different name. In that case, our different name is probably a different thing than the simple type.

Using NewTypes is not guaranteed to prevent logic bugs in your code. They only represent a small but important piece of the tools available to help catch bugs during static analysis instead of at runtime. Judicious use of these tools makes our code more robust and easier to maintain. In addition, they help create a system where it’s harder to make a mistake.

So, next time you create a type alias, ask yourself if a NewType is more appropriate.