Consider Python’s NewType Instead of an Alias

Note: While the examples in this post are in Python, the information should be relevant to any languages with the NewType construct.

Python’s typing module, introduced in version 3.5, allows developers to specify type hints for their methods and variables. These type hints can then be statically checked for correctness via a program like mypy. When used throughout your program and checked with mypy, they allow you to catch whole classes of bugs without ever running your program. They also serve as verified documentation for your methods (though they are not a complete replacement for full documentation).

Example of function without type hints:

def copy_and_scale(features, x):
    return {name: value * x for name, value in features.items()}

Example of function with type hints:

from typing import Dict


def copy_and_scale(
    features: Dict[str, int], x: int
) -> Dict[str, int]:
    return {name: value * x for name, value in features.items()}

One aspect of typing that doesn’t get enough use in my experience is the NewType helper. NewTypes are often lumped together with type aliases, with the latter being used in cases where the former is more appropriate.

The typing docs provide a nice summary of when you would want to use a NewType vs a type alias:

Recall that the use of a type alias declares two types to be equivalent to one another. Doing Alias = Original will make the static type checker treat Alias as being exactly equivalent to Original in all cases. This is useful when you want to simplify complex type signatures.

In contrast, NewType declares one type to be a subtype of another. Doing Derived = NewType('Derived', Original) will make the static type checker treat Derived as a subclass of Original, which means a value of type Original cannot be used in places where a value of type Derived is expected. This is useful when you want to prevent logic errors with minimal runtime cost.

Let’s look at a simple pair of concrete examples to see the difference:

Type alias:

Uid = str


def process_uid(uid: Uid) -> None:
    ...


process_uid("foo")

mypy output:

Success: no issues found in 1 source file

NewType:

Uid = NewType("Uid", str)


def process_uid(uid: Uid) -> None:
    ...


process_uid("foo")

mypy output:

error: Argument 1 to "process_uid" has incompatible type "str"; expected "Uid"
Found 1 error in 1 file (checked 1 source file)

In order to resolve the error, you need the below cast:

Uid = NewType("Uid", str)


def process_uid(uid: Uid) -> None:
    ...


process_uid(Uid("foo"))

So, what’s the point?

From these examples, we can hopefully see why mypy succeeds/fails, but it probably looks like NewTypes add boilerplate without much benefit. The point of NewTypes is to help prevent logic bugs, so how does casting our string via Uid("foo") help achieve this?

For that, let’s look at another example:

Age = NewType("Age", int)


def query_for_options_with_ages(
    url: str,
) -> Dict[str, List[Age]]:
    ...


def calculate_cost(options: int, age: Age) -> int:
    ...


options_age_info = query_for_options_with_ages(
    "https://example.com/"
)

for options, ages in options_age_info.items():
    age_option_len_pairs = [(age, len(options)) for age in ages]
    costs = [
        calculate_cost(option_len, age)
        for option_len, age in age_option_len_pairs
    ]

    logger.log(
        "options: %s | estimated total cost: %d",
        options,
        sum(costs),
    )

Do you see the bug in our code? Luckily, mypy does:

error: Argument 2 to "calculate_cost" has incompatible type "int"; expected "Age"

Why is our age variable of type int instead of Age? Because we swapped our tuple unpacking! Instead of for option_len, age it should be for age, option_len.

So yes, the above-contrived example shows a case where mypy catches a bug that would not have been caught if we used a Age = int type alias. Of course, if we were more careful while writing the code, we would have spotted this obvious blunder and wouldn’t need to rely on mypy (or the NewType boilerplate). We even named our variable age_option_len_pairs with the correct order!

While this is true, the crux of the issue is the following: humans make mistakes. Even the best programmers make silly mistakes.

To borrow an excellent phrase from an excellent blog post:

Engineering is not about "not doing mistakes". Engineering is about designing systems that ensure fewer mistakes occur.

Luckily, we have tools that we can leverage to catch some of these mistakes before they are deployed out in the wild. Type systems are one of our better safeguards for this, and the typing module along with mypy is the closest thing we have to this in Python right now.

Alas, type systems are not a silver bullet, and it’s important to understand their limitations. But these limitations shouldn’t stop us from leveraging them whenever we can. We can’t let perfection be the enemy of good.

How to know when to use a NewType?

Next time you are coding Python and write out a type alias, ask yourself if the name you are aliasing is the exact same thing as the type you are aliasing it to. Or is it actually a subtype and therefore would be better as a NewType? In our first examples, all Uids were strs, but not all strs are Uids. Likewise, all Ages are ints, but not all ints are Ages.

Unfortunately, there’s no magic formula for when something should be a NewType vs a simple type alias. Let’s look at a simple example where a type alias probably makes more sense:

def get_features() -> Dict[str, Union[int, float, str]]:
    ...


def copy_and_scale(
    features: Dict[str, Union[int, float, str]], x: int
) -> Dict[str, Union[int, float, str]]:
    ...


def log_features(
    features: Dict[str, Union[int, float, str]]
) -> None:
    ...

In this code, we see that the Dict[str, Union[int, float, str]] type appears in multiple places. Introducing a type alias removes this duplication from our code and arguably makes it more readable:

Features = Dict[str, Union[int, float, str]]


def get_features() -> Features:
    ...


def copy_and_scale(features: Features, x: int) -> Features:
    ...


def log_features(features: Features) -> None:
    ...

What about a NewType? Wouldn’t that help mypy catch bugs when we input a dictionary of something other than features instead of an actual Features dictionary?

Well, we could make a new type, but then we would need to support all of the operations we want to do. For example, we want to be able to add new features to our dictionary, and we may also want to update or delete existing features. We might want to easily combine two dictionaries of features into one.

A heuristic I use for determining if something should be a NewType vs a type alias is how simple the actual type is. By actual type, I mean the type on the right-hand side of the assignment. For example, if we have Features = Dict[str, Union[int, float, str, bool]], then the actual type is fairly complex. Whereas if we have Uid = str, then the actual type is simple.

In my experience, if the actual type is simple, then you probably want a NewType. Why? A type alias for a simple type probably isn’t to reduce redundancy. It almost certainly isn’t to save keystrokes.

It’s usually to signify that we have a special case of the actual type, or that we want to better "document" our data. If we care enough to alias our float, str, bool, etc. to a different name, then our different name is probably a different thing than the simple type.

Using NewTypes is not guaranteed to prevent logic bugs in your code. They only represent a small but important piece of the tools available to help catch bugs during static analysis instead of at runtime. Judicious use of these tools makes our code more robust and easier to maintain. They help create a system where it’s harder to make a mistake.

So, next time you are creating a type alias, ask yourself if you’d be better served by using a NewType instead.