Making the first steps might be hard, but it’s totally worth it.

Parameters: One Class to Rule Them All

When working on a complex project in Python that has many parameters, it helps to keep all parameters in one place. Read this to learn why, when, and how to best implement it.

My philosophy in writing code (and in other facets of life, but that’s for another post) is that I prefer to do some extra work now in order to do much less work later.
This is the motivation for creating a class that holds all my parameters in one place.

Do I Need This?

Not every function you ever write will have its own Params class. The time is right when you have multiple parameters in your project, and when your architecture is non-trivial.

Let's ask ourselves two questions to understand if we need this new feature in our project:

  1. Is my project more than just a simple module?
    Am I working on a big project that has one (or very few) entry points, and many many (many) internal components?
    When drawing the architecture of my project, do I have multiple levels in the hierarchy?
  2. Do I use parameters in my project?
    Do any of the internal components use “magic numbers” that are hardly ever changed, but I would still like to keep track of?
    Do any of the internal components use configurable inputs that are trickled down from the main entry point?

If the answer is “yes” on both items, read on, and learn how to make your project easier to develop, expand, and maintain.

If your project looks something like this, you could really use a Params class…

The Params Class

Now that we are certain we could use a Params class, how should we implement it?

I suggest a dataclass, which becomes even more useful when developing in Python3.6+ that has optional typing:

In this template, I have included a few hard-learned-lessons:

  1. In dataclass, you don’t have to specify an explicit __init__ method. You list all your attributes “as if” they were class attributes.
    You can either have NO default values or have default values for ALL attributes. I prefer to have default values for all, and use None where needed. Then, in the __post_init__, I can handle all None values.
  2. Since lists are mutable, we don’t want to have a default list value. Instead, let’s use a string of comma-separated-values, and in the __post_init__ we will fill the default value of our “real” list parameter.
  3. The __post_init__ is also a great place to concatenate paths that we will later want to use in a concise manner. Specifically:
  4. When I want to keep track of my experiments (and also in production!) I want to make sure that outputs from different runs will not override each other. Every output, whether final or mid-process, that is distinct to a specific run, I will keep in a folder that is marked with a timestamp of the start time.
    If you want to learn more about keeping track of algorithmic experiments you can read this.
  5. It helps to have an instantiating method from_parser that uses the input from the terminal; read on to learn how and when to use it.
The opposite of forgetting is writing (all your parameters in one place)

Using the Param Class

To use the Params class, instantiate this class outside the scope of the project, and instantiate our project using this existing Params singleton:

Now if you want to run your project from the terminal, you can do so with a lean runner file:

When you have all your parameters in one place, you can easily use them where needed.

Never Instantiate Mid-Run

If you have some mid-process entry point, or if you have some tests you perform on chunks of modules, always instantiate the Params class outside the scope of your modules. Explicitly — never do something like this:

This is bad practice since SomeMidProcessModule expects to have values trickled down to it. If you have non-default values configured, you don’t want to allow this module to use an instance of Params that has only the default values.

This becomes even worse (don’t ask me how many hours I spent debugging because of this) when some parameters are computed or overwritten mid-run. Then, if you create a new instance of Params mid-run, you lose the most updated values.

The good practice, in this case, is to force an input of an existing Params instance:

Everything Is So Much Easier Now

Now, every time you want to create a new parameter, you only need to do that in two places — in the Params class and in the “leaf” of the tree.

That’s how you’re going to feel very soon! Good luck y`all!

Mathematician, Algorithmatician, gives meaningful names to variables.