Parameters: One Class to Rule Them All
When working on a complex project in Python that has many parameters, it helps to keep all parameters in one place. Read this to learn why, when, and how to best implement it.
My philosophy in writing code (and in other facets of life, but that’s for another post) is that I prefer to do some extra work now in order to do much less work later.
This is the motivation for creating a class that holds all my parameters in one place.
Do I Need This?
Not every function you ever write will have its own
Params class. The time is right when you have multiple parameters in your project, and when your architecture is non-trivial.
Let's ask ourselves two questions to understand if we need this new feature in our project:
- Is my project more than just a simple module?
Am I working on a big project that has one (or very few) entry points, and many many (many) internal components?
When drawing the architecture of my project, do I have multiple levels in the hierarchy?
- Do I use parameters in my project?
Do any of the internal components use “magic numbers” that are hardly ever changed, but I would still like to keep track of?
Do any of the internal components use configurable inputs that are trickled down from the main entry point?
If the answer is “yes” on both items, read on, and learn how to make your project easier to develop, expand, and maintain.
The Params Class
Now that we are certain we could use a
Params class, how should we implement it?
In this template, I have included a few hard-learned-lessons:
dataclass, you don’t have to specify an explicit
__init__method. You list all your attributes “as if” they were class attributes.
You can either have NO default values or have default values for ALL attributes. I prefer to have default values for all, and use
Nonewhere needed. Then, in the
__post_init__, I can handle all
- Since lists are mutable, we don’t want to have a default list value. Instead, let’s use a string of comma-separated-values, and in the
__post_init__we will fill the default value of our “real” list parameter.
__post_init__is also a great place to concatenate paths that we will later want to use in a concise manner. Specifically:
- When I want to keep track of my experiments (and also in production!) I want to make sure that outputs from different runs will not override each other. Every output, whether final or mid-process, that is distinct to a specific run, I will keep in a folder that is marked with a timestamp of the start time.
If you want to learn more about keeping track of algorithmic experiments you can read this.
- It helps to have an instantiating method
from_parserthat uses the input from the terminal; read on to learn how and when to use it.
Using the Param Class
To use the
Params class, instantiate this class outside the scope of the project, and instantiate our project using this existing
Now if you want to run your project from the terminal, you can do so with a lean runner file:
Never Instantiate Mid-Run
If you have some mid-process entry point, or if you have some tests you perform on chunks of modules, always instantiate the Params class outside the scope of your modules. Explicitly — never do something like this:
This is bad practice since
SomeMidProcessModule expects to have values trickled down to it. If you have non-default values configured, you don’t want to allow this module to use an instance of
Params that has only the default values.
This becomes even worse (don’t ask me how many hours I spent debugging because of this) when some parameters are computed or overwritten mid-run. Then, if you create a new instance of
Params mid-run, you lose the most updated values.
The good practice, in this case, is to force an input of an existing
Everything Is So Much Easier Now
Now, every time you want to create a new parameter, you only need to do that in two places — in the Params class and in the “leaf” of the tree.