About Python Imports
Seeing some code snippets, sometimes hurts. Sometimes it hurts even more, realizing I’m the author. But very often, it hurts right in the beginning of the file. The imports.
How this should be according to the theory
Every other code editor or IDE has at least some possibility of linting. That’s the feature which will bother you to fix some stuff while writing your code. Even though, many people just consider it some game, where sustaining these suggestions is the goal of it. Only it isn’t.
Every Python story begins at PEP-8. I won’t copy-paste it. Just read it, if you haven’t before. Then go through PEP-328.
How to begin
It has been a while we’ve seen this often, and we should be really glad it is not so common these days.
from my_unicorn import *
Now, what happens under the hood, if you decide to re-invent the wheel.
""" my_unicorn/__init__.py"""
import os
import sys
and
Python 3.8.5 (default, Jul 20 2020, 19:48:14)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> os
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'os' is not defined
>>> sys
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sys' is not defined
>>> from my_unicorn import *
>>> os
<module 'os' from '/usr/lib/python3.8/os.py'>
>>> sys
<module 'sys' (built-in)>
This is the simplest explanation. We just loaded the whole context of my_unicorn
. We can go even further. Imagine, if for some reason we had some useful function we want to import from a poorly written executable, like this:
""" my_unicorn/__init__.py"""
import os
import sys
def yarp():
print("yarp")
yarp()
and
Python 3.8.5 (default, Jul 20 2020, 19:48:14)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from my_unicorn import *
yarp
If multiple people worked on the code base and just did whatever they liked, this would come as one of the outputs eventually. That is btw, why we never should execute functions outside of if __name__ == '__main__'
statement.
That’s why we should do the following instead:
""" my_unicorn/__init__.py"""
import os
import sys
def yarp():
print("yarp")
if __name__ == '__main__':
yarp()
and of course import like this:
from my_unicorn import yarp
Order
It is really painful to go through a file, if there are many different imports (let’s not pretend, sometimes files will grow much larger than suggested, will have much more imports than suggested, etc.) randomly placed at the beginning of a file. Not only talking about not differentiating between standard library imports, 3rd party library imports or project imports, but not even grouping imports.
from my_unicorn.yarp import yarpees
import os
import sys, logging
from my_unicorn.narp import narpees
import my_unicorn
We were just making sure, we all read PEP-8. This is explained there well.
What Java converts often do
./my_unicorn/
├── __init__.py
├── org
│ └── my
│ └── whatever
│ └── path
│ ├── __init__.py
│ └── __pycache__
│ └── __init__.cpython-38.pyc
└── __pycache__
└── __init__.cpython-38.pyc
""" my_unicorn/org/my/whatever/path/__init__.py"""
def narp():
print("narp")
Python 3.8.5 (default, Jul 20 2020, 19:48:14)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from my_unicorn.org.my.whatever import path as whatever_path
>>> whatever_path.narp()
narp
>>>
Why shouldn’t we do this. First, it is always better to follow PEP-20 a.k.a The Zen of Python. Actually, many of the principles can clash with the solution above.
Second, if our code gets so complex, we have to do whatever_path alias to make it more clear, our file is most likely too long and hosts too many features and different use cases all together. It is not only for entertaining purposes, Pylint will bother us, if our file reaches 1000 lines. We really shouldn’t do it routinely.
Third is the question, what else do we need from the path
module that makes us import is a whole. And not doing from my_unicorn.org.my.whatever.path import narp
. Back then, there were discussions, which solution is faster, which is more memory-effective, when comparing path.narp() or whatever_path.narp()
vs just narp()
. It is at least not so important with current Python versions and considering contemporary computers. If we do plenty loops (millions) and we care about some performance improvement regarding imports, from ....whatever.path import yarp
will perform a little bit better. This includes cases when we have some API or pipeline hammered with many requests and we do some loops in every one. Otherwise, won’t make so much difference. And the gap between these two approaches seems to shorten in Python 3. And in a huge loops, you will always find better candidates to be improved.
At the end, it is just easier to simply do from my_unicorn.whatever_path import yarp
.
Special cases
import numpy as np
import pandas as pd
These don’t really make much sense, plus both will cause Pylint complains, because np
and pd
are not descriptive variables. So, we just don’t do this, when we use these tools in some “programmed” code base. However, we will see this around in many statistics/analytics/data science/“data science” tutorials, examples. And it just will be okay. As that is more about job done than having a sustainable code base.
That is all. There are, for sure some Python internals which can explain all that above in a more exhausting way, but that is for some deeper Python developers to write. We ordinary ones will live long with the above.