Python project structure
-
I want to start my adventure with Python (for real; not like the previous seven tries). I want to make an "executable" project - not necessarily compiled to actual binary, but it will behave like a complete application, not a library.
So, I have several very basic questions about what files I need to make and where.
- If my project is named Foo, what would be the path to file containing
main()
function, relative to repo's root directory? Let's say that repo is also namedFoo
. - If I want to add module named
bar
which is all in a single file, where do I put it? - If I want to add module named
qux
with a submodulequz
, what files do I have to create and where? - When and where should I make
__init__.py
files? Are there other special files I have to remember? - What do I have to do to make the project installable (and uninstallable) via pip? Or is it a bad idea?
I tried to look it all up, but the articles I've found are all very old, very confusing and/or contradict each other. So, please help me here
fucking markdown... fucking double underscores...
- If my project is named Foo, what would be the path to file containing
-
There's a real language that this has to be asked about?
-
No, there isn't. But scripting languages are a different beast.
-
I don't know anything about python so I can't help, but I'm curious; what will the program do?
-
A little GUI tool to make my life easier at work that no one except me and possibly my teammates will ever use. But I want to make it as "professional" as possible because why not? It's a great learning opportunity!
-
This post is deleted!
-
Doesn't the documentation answer most of these questions?
-
No, it answers only about 0.8 of 5 questions - specifically, how to name single-file-module files , where to put child modules and how to make scripts executable. Not where to put single-file-module files in project tree, not where to put code of module that has both its own items as well as submodules (directories cannot have text data in them after all), and not how to make executable project.
-
Basically you need to structure it like you would structure a well-behaved Python library package, and use setuptools entry points so that
python setup.py install
(and by extension,pip install
) does the right thing and creates executables for you. Also, it will do the right thing on Windows.For gorier details, you might want to read the Setuptools documentation.
-
Basically you need to structure it like you would structure a well-behaved Python library package
That raises another five questions...
-
Is a plain ol' HTML file with a bunch of Javascript out of the question here?
-
-
I'm not exactly a Python guru (mostly use it for small scripts and have some experience with Flask), but here's what I know:
If my project is named Foo, what would be the path to file containing main() function, relative to repo's root directory? Let's say that repo is also named Foo.
I don't know of any convention, and I don't think it matters. In some python file, write:
if __name__ == '__main__': main_function()
This is the script you use to start the application now.
If I want to add module named bar which is all in a single file, where do I put it? If I want to add module named qux with a submodule quz, what files do I have to create and where? When and where should I make `__init__.py` files? Are there other special files I have to remember?
See http://programmers.stackexchange.com/questions/111871/module-vs-package and https://docs.python.org/2/tutorial/modules.html#packages:
- A package is a directory full of Python modules with some optional initialization code (which you put in the
__init__.py
). - A module is a Python file containing any kind of definitions (functions, classes, ...) and/or code
- It doesn't really matter what you put where, just structure your code into modules and packages however you want.
What do I have to do to make the project installable (and uninstallable) via pip? Or is it a bad idea?
- A package is a directory full of Python modules with some optional initialization code (which you put in the
-
I would presume/suggest that you would typically want to make a module correspond to a class (or a group of tightly related classes, if they're related enough), and then go nuts with the object oriented patterns to organize code.
-
Is a plain ol' HTML file with a bunch of Javascript out of the question here?
Javascript doesn't integrate well with shell commands.That link. Have you read it?
No, in fact, I didn't until now. And after I did read it, I'm none the wiser - one example there is for a single file, and another is for a single module - both of these are too small to be useful for me.- A package is a directory full of Python modules with some optional initialization code (which you put in the
__init__.py
).- A module is a Python file containing any kind of definitions (functions, classes, ...) and/or code
- It doesn't really matter what you put where, just structure your code into modules and packages however you want.
The natural follow-up question is, if package is directory, and module is file, and (if I understood correctly) modules can't have child modules because they're files, not directories, then how do I make some functions available directly in package's namespace?Also, if I'm going to use pip, do I still need the
__main__
trick?
-
how do I make some functions available directly in package's namespace?
You don't necessarily need to. Let's say you have a package
foo
containing a modulebaz
defining a classbar
. You can just write:from foo.baz import bar
If you really want to make a symbol from a module available in the containing package's namespace, just import it in
__init__.py
:Also, if I'm going to use pip, do I still need the
__main__
trick?You don't need that trick at all, you can just use "normal" scripts as entry points for your application all the time. But that trick can be quite handy: It ensures that you can import the class/function definitions from the "main" script in another module without executing the actual script.
-
You don't need that trick at all, you can just write a "normal" script, but that trick can be handy.
Let me ask again. The__main__
trick is for differentiating between being run as import vs. being run as executable script. It works by having some code on top-level (outside of any function) in an if block that runs when script is executed but not when it's imported. My question is, if I'm going to use pip, and pip uses setup.py and the list of entry points defined there instead of just running the script's top-level code, and I'm not caring about non-pip users, do I get any benefit from this__main__
trick?
-
If you don't write scripts at all, but declare a function in a certain module as a console_scripts entry point instead (in setup.py), then using that trick doesn't make any sense at all, of course, since setuptools will automatically create a wrapper script for you and you'll never need to execute the module_xy.py directly.
-
I see. Thanks!
So, to summarize:
- the Python Way of organizing project is to have a directory in your repo named exactly like the repo itself (assuming the repo is named exactly like the project, which it should), in which all (or most) implementation files go
- the Python Way doesn't say anything at all about the insides of this directory - it just has to be a valid package (ie. have
__init__.py
file) - I can do what the fuck I want and I won't be breaking any conventions because there aren't any?
Am I mostly right here?
-
AFAIK, yes.
BTW:
Are there other special files I have to remember?
There is one other special file I can think of:
__main__.py
https://docs.python.org/3/library/main.html#module-main
It's basically the
__name__ == '__main__'
equivalent for packages (instead of modules). I've never needed it, just thought I'd mention it since you asked.
-
Basically you need to structure it like you would structure a well-behaved Python library package,
Saying "you do it how you do it" is not very helpful.
http://click.pocoo.org/5/setuptools/#setuptools-integration
That seems to be more useful.
-
how do I make some functions available directly in package's namespace?
You import them in your package's init.py
-
Saying "you do it how you do it" is not very helpful.
If I read the OP correctly, it was implied he knows a bit how to structure a library project that other code uses and the unclear bit was how it's different when one needs to ship actual executables.
-
- I can do what the fuck I want and I won't be breaking any conventions because there aren't any?
Well, you're gonna make people upset if you mess with, say,
__builtins__
namespace in the__init.py__
. "Why did my code go haywire after I imported your shit?!"
-
If I read the OP correctly, it was implied he knows a bit how to structure a library project that other code uses
Well, you read it wrong. Sorry if I made it not very clear that I have virtually zero Python experience.
-
Well, you're gonna make people upset if you mess with, say,
__builtins__
namespace in the__init.py__
. "Why did my code go haywire after I imported your shit?!"a.k.a. "don't use ugly hacks unless you understand the consequences"
a.k.a. common sense
-
I do a lot of Python and this post is probably going to be long.
To start with, use Python 3.5. That is the newest official implementation from python.org (also known as CPython). Don't bother with alternative implementations or Python 2. Read the official tutorial, too, it's fairly good.
If my project is named Foo, what would be the path to file containing main() function, relative to repo's root directory?
I'd recommend going with this kind of basic structure:
repo/ <main package name>/ __init__.py __main__.py <other modules and packages> setup.py
Names of packages follow the rules of Python identifiers: must start with underscore or letter, might contain only letters, numbers and underscores, can't be a keyword (see [url=https://docs.python.org/3.5/reference/lexical_analysis.html#identifiers]here[/url]). While it's possible to not have a single root package for everything, it eliminates the problem of name collisions, so it's worth doing (that's pretty universal though).
Python package is a directory that contains an
__init__.py
file. This is a module that corresponds to the package name, so e.g.# package/__init__.py foo = 'bar' # somewhere else import package print(package.foo)
Otherwise a module is any
.py
file. You import nested things by using dot notation, sopackage/package2/module.py
ispackage.package2.module
.If my project is named Foo, what would be the path to file containing main() function, relative to repo's root directory?
That's the
__main__.py
in the main package. It should look like this:def main(): pass if __name__ == '__main__': main()
Others explained what the
if
means, but you want the__main__
module and a function for two things:- It makes package directly executable (via
python -m package
and if you ZIP it, viapython package.zip
) - It makes easier to create an entry script for your project through setup.py
Plus it's a well-known location so it's easier for people to find your entry points.
* If I want to add module named bar which is all in a single file, where do I put it?
- If I want to add module named qux with a submodule quz, what files do I have to create and where?
As examples,
bar.py
andqux/__init__.py
,qux/quz.py
.Search path for modules is in
sys.path
variable. This is determined by few things (you can change it from code, it's an ordinary list, or from environment by settingPYTHONPATH
), but you don't have to worry about it if you structure the project like this and write a propersetup.py
.* What do I have to do to make the project installable (and uninstallable) via pip? Or is it a bad idea?
You need a
setup.py
file that callssetuptools.setup()
. The most basic form looks like this:from setuptools import setup, find_packages package_name = '<main package name>' packages = find_packages(include = [package_name, package_name + '.*']) setup( name = '<project name>', version = '0.1.0', packages = packages, )
Then you can install the project in editable mode (that is, without copying the files, only making it a part of
sys.path
without manually adjustingPYTHONPATH
, so you only have do it once) viapip install -e <repo>
.You can use
entry_points
argument to make use of that__main__
module like this:setup( ..., entry_points = { 'console_scripts': [ '<executable name> = <main project package>.__main__:main' ] } )
Then
pip install
will generate wrapper executable that calls given function. This is not standalone though, it will still require full Python installation.You can declare dependencies using
install_requires
argument tosetup()
, it takes a list. Dependencies are installed from [url=https://pypi.python.org/pypi]PyPI[/url], so search on there. You can also install directly from Git/Hg/SVN repos (you'll need thedependency_links
argument).From other stuff I recommend using virtualenvs from the start: in Python 3.5 this is built-in, just do
python -m venv <directory>
and it will create an isolated environment there. Then in shell use. <directory>/bin/activate
(*nixes),. <directory>/Scripts/Activate.ps1
(Windows, PowerShell) or<directory>/scripts/activate
(Windows, cmd). After thatpip
will know what to do.There's some more story to creating distributions and uploading them to PyPI, not sure if you need that at the moment.
You import them in your package's init.py
Note here: careful with this. Python executes scripts as it goes, and that's also true for imports. And modules are created before the code of the module starts executing (because the names have to go somewhere). Which has a nasty consequence of circular imports having rather unintuitive behaviour:
# a.py import b x = 42 # b.py import a print(a.x) # AttributeError, because import above returns an already-created but yet-unfilled module
Circular imports are otherwise fine, as long as the module code has a chance to execute to the end before it's imported (
import
statements don't have to be at the top-level). But I'd avoid them if possible.
- It makes package directly executable (via
-
From other stuff I recommend using virtualenvs from the start: in Python 3.5 this is built-in, just do python -m venv <directory> and it will create an isolated environment there. Then in shell use . <directory>/bin/activate (*nixes), . <directory>/Scripts/Activate.ps1 (Windows, PowerShell) or <directory>/scripts/activate (Windows, cmd). After that pip will know what to do.
Uuuh, nice.
I didn't know they added that.
-
That was a great post - both in size and in quality. Thank you very much!
Just one more question: from what I understood,
__init__.py
is just a regular Python module that assumes the name of the directory it's in, and otherwise there's nothing special about it? I can treat it just like any other file and put any code I want there? Or would it be against some convention?
-
Just one more question: from what I understood, init.py is just a regular Python module that assumes the name of the directory it's in, and otherwise there's nothing special about it? I can treat it just like any other file and put any code I want there?
Yes to all of those. Sometimes projects put metadata in their root package (see for example [url=https://github.com/kennethreitz/requests/blob/master/requests/init.py]requests[/url]) but that's it as far as conventions go. Might be worth looking at some other [url=http://pypi-ranking.info/alltime]popular projects[/url], too.