How to Set up a Perfect Python Project

André Felipe Dias

2022-01-05

Also available in:

Starting a new Python project doesn't have to be a challenge because the basic needs are always the same even for different types of projects. This article presents how to create a perfect initial base that can be used for any Python project.

If you are in a hurry and don't want to read the article to the end, you can go straight to the perfect Python project template on Github. Instructions for use are in README.rst.

Definition of the Perfect Initial Base

To ensure a perfect Python project, it's important to consider the following key features:

Proper file and directory structure to organize the project, separating code from application, tests, documentation, and project configuration. This helps keep the project organized and makes it easier to navigate and maintain in the long run.
Virtual environments provide a sandboxed development environment where the project can be developed in isolation from external dependencies. This helps avoid conflicts between different Python packages and makes it easier to manage dependencies.
Linter tools provide static analysis of the code to identify defects, formatting problems, optimization, security, and other issues early in the development stage. This helps maintain code quality and consistency throughout the project.
Automated testing is crucial for ensuring code quality and catching bugs early. It's important to have a suite of tests that cover all aspects of the application, along with reports that indicate the percentage of code covered by tests.
Continuous integration (CI) helps ensure code quality by running automated tests and performing other checks on the code every time it's pushed to the server. This helps catch issues early and makes it easier to identify and fix problems before they become larger issues.
Version control is essential for managing changes to the codebase and collaborating with other developers. It's important to use a properly configured version control system that can ignore files that should not be versioned, such as temporary files or compiled code.

By following these key features, you can create a solid foundation for any Python project, whether it's a small script or a large web application.

Except for the file and directory structure which is unique, all other items depend on choices. And there are many options. Virtual environment management, for example, can be done with venv, pipenv, poetry or conda. There are dozens of linting tools such as ruff, flake8, pylint, mypy, etc., that are equivalent or complementary. In the end, the choices that form one or another combination depend on technical and personal decisions.

Virtual Environment Management

The management of Python versions, virtual environments and dependencies will be done through the combination of pyenv + poetry (read the previous article).

Initial Directory Structure

To create the initial structure of your project, use poetry new <project_name>:

$ poetry new project_x

The previous command creates the following directory structure:

project_x
├── project_x
│   └── __init__.py
├── pyproject.toml
├── README.rst
└── tests
    ├── __init__.py
    └── test_project_x.py

This is an excellent minimum file and directory structure. It separates the project-specific code in the project_x subdirectory of the code only related to tests in the tests directory, and the project's configuration and documentation files (pyproject.toml and README.rst). However, some adjustments are needed:

README.rst comes empty, and you need to complete it. Creating this type of file is beyond the scope of this article, but you can find good tips and more information in 1 and 2.
Edit pyproject.toml and change the settings created automatically for name, version, description, and authors.
Check the Python version specified in section [tool.poetry.dependencies] in pyproject.toml. poetry new uses the environment version, but you can install and specify other Python versions via pyenv.

The .rst extension refers to the restructuredText format, which is the official markup language for Python project documentation. Particularly I think it's the easiest to read and write but you can use whatever format you like, like markdown (.md) for example.

Linting and Testing Tools

The recommended minimum set of testing tools is:

pytest: testing tool for Python
pytest-cov: pytest plugin to measure code coverage

For linting, I recommend using:

mypy: static type analysis tool.
pip-audit: tool for scanning Python environments for packages with known vulnerabilities.
ruff: An extremely fast Python linter, written in Rust. It replaces other tools such as blue, black, flake8, isort, pep-naming, pyupgrade and bandit.

Installation and Configuration

All libraries and tools related to testing and linting are necessary for the development of the project, but not for its operation in production. They should be installed in a separate section in pyproject.toml to not get mixed up with the essential dependencies. To install them, use poetry add --dev:

$ poetry add --dev pytest=="*" pytest-cov=="*" \
                mypy=="*" pip-audit=="*" ruff=="*"

Using "*" in specifying dependency versions lets you use any version available now and in future updates. See more version specification options in the poetry documentation.

Settings

We can keep most dependencies configurations in pyproject.toml, in sections named following the pattern [tool.<tool-name>]:

	`[tool.pytest.ini_options]`
	`filterwarnings = ["ignore::DeprecationWarning"]`

	`[tool.mypy]`
	`ignore_missing_imports = true`
	`disallow_untyped_defs = true`

	`[tool.ruff]`
	`line-length = 100`
	`select = [`
	`"A",`
	`"ARG",`
	`"B",`
	`"C",`
	`"C4",`
	`"E",`
	`"ERA",`
	`"F",`
	`"I",`
	`"N",`
	`"PLC",`
	`"PLE",`
	`"PLR",`
	`"PLW",`
	`"RET",`
	`"S",`
	`"T10",`
	`"UP",`
	`"W",`
	`]`
	`ignore = ["A003"]`
	`target-version = "py310"`

	`[tool.ruff.format]`
	`quote-style = "single"`

	`[tool.ruff.per-file-ignores]`
	`"__init__.py" = ["F401"]`
	`"tests/**" = ["ARG", "S"]`

Some observations:

Lines 2 and 12 change the default line length from 79 to 100
mypy has several config options. ignore_missing_imports suppresses error messages about imports that cannot be resolved (line 12). disallow_untyped_defs disallows defining functions without type annotations or with incomplete type annotations (line 13).
ruff is capable of checking the rules used by several other linting tools.

Automation

Testing and linting should be easy to run without remembering each command and its arguments. For this, I recommend using a Makefile with the necessary tasks:

test:
    pytest --cov-report term-missing --cov-report html --cov-branch \
           --cov project_x/

lint:
    ruff check --diff .
    @echo
    ruff format --diff .
    @echo
    mypy .


format:
    ruff check --silent --exit-zero --fix .
    @echo
    ruff format .


audit:
    pip-audit

And then, just use make <task> :

make test runs the tests and generates test coverage reports.
make lint runs several linting tools in sequence.
make format formats Python code according to the patterns used by ruff and blue.
make audit checks for known vulnerabilities in the project's dependencies.

We can use these same commands in version control hooks and in the continuous integration configuration.

Continuous Integration System Configuration

Most modern continuous integration systems keep their configuration in the source code. GitHub Actions, for example, keep your configuration in yaml files, inside the .github/workflows directory. For our project, we are going to use .github/workflows/continuous_integration.yml:

	`name: Continuous Integration`
	`on: [push]`
	`jobs:`
	`lint_and_test:`
	`runs-on: ubuntu-latest`
	`steps:`

	`- name: Set up python`
	`uses: actions/setup-python@v3`
	`with:`
	`python-version: '3.10'`

	`- name: Check out repository`
	`uses: actions/checkout@v2`

	`- name: Install Poetry`
	`uses: snok/install-poetry@v1`
	`with:`
	`virtualenvs-in-project: true`

	`- name: Load cached venv`
	`id: cached-poetry-dependencies`
	`uses: actions/cache@v2`
	`with:`
	`path: .venv`
	`key: venv-${{ hashFiles('**/poetry.lock') }}`

	`- name: Install dependencies`
	`if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'`
	`run: poetry install --no-interaction`

	`- name: Lint`
	`run: poetry run make lint`

	`- name: Auditing`
	`run: poetry run make audit`

	`- name: Run tests`
	`run: poetry run make test`

This configuration works as follows:

This workflow will be executed every time the repository receives a push (line 2).
The stream will run on an Ubuntu operating system in the latest available version (line 5).
Use Python version 3.10 (line 11).
Next, install poetry (line 16) and configure it to use virtual environments in .venv directories (line 19).
Create a cache policy for the .venv directory (line 25). The key that identifies the cache is formed by the concatenation of the word venv and poetry.lock hash (line 26).
Dependencies are installed only if the cache is not found (lines 28 to 30).
Run the lint, audit and test tasks (lines 32 to 39)

`pre-commit` and `pre-push` Events

It is good practice to do the code quality check locally, even if continuous integration does the same process on the server again. It saves time because the result is immediate, and corrections can be made outside a continuous integration cycle.

Local check should happen before sharing changes with other developers or the official project repository. We can automate this process via version control hooks. The most suitable are pre-commit and pre-push.

Configuration	`pre-commit`	`pre-push`
1	`make lint`	`make test && make audit`
2		`make lint && make test && make audit`

In the configuration 1, static analysis is done before each commit. Tests and auditing are only done before push because they take longer. The advantage of this distribution is that every local revision is checked and formatted. On the other hand, running make lint before each commit can be a little annoying depending on your workflow.

In the configuration 2, linting, tests and auditing are done just before push. This workflow is more fluid but local reviews can be inconsistent. It is necessary to create an additional revision with the necessary adjustments in case of pre-push failure.

To make the developer's life easier, let's add a install_hooks task to the Makefile, which calls the scripts/install_hooks.sh to create the hooks:

install_hooks:
    scripts/install_hooks.sh

Makefile has certain syntax limitations that make it difficult to write more complex tasks. For those cases, the best is to call a separate script, which can be written in a more appropriate language.

And this is scripts/install_hooks.sh:

	`#!/usr/bin/env bash`

	`GIT_PRE_PUSH='#!/bin/bash`
	`cd $(git rev-parse --show-toplevel)`
	`poetry run make lint && poetry run make test && poetry run make audit`
	`'`

	`HG_HOOKS='[hooks]`
	pre-push.lint_test = (cd `hg root`; poetry run make lint && poetry run make test && poetry run make audit)
	`'`

	`if [ -d '.git' ]; then`
	`echo "$GIT_PRE_PUSH" > .git/hooks/pre-push`
	`chmod +x .git/hooks/pre-*`
	`elif ! grep -s -q 'pre-push.lint_test' '.hg/hgrc'; then`
	`echo "$HG_HOOKS" >> .hg/hgrc`
	`fi`

Some explanations:

On Git, hooks are executable files named according to the desired event, located in .git/hooks.
On Mercurial, hooks are defined in the [hooks] section of the .hg/hgrc configuration file, where each hook can be Python commands or functions.
Both scripts in bash (lines 3-6) as the commands used for Mercurial (lines 8-10) do the same thing: change the current directory to the root of the project, where Makefile is located, and run the command poetry run make <task>. Remember that poetry run <command> runs the command within the context of the project's virtual environment.
If a .git directory exists, so the Git hooks are created in the .git/hooks directory (lines 12-14). Otherwise, Mercurial hooks are created in the .hg/hgrc directory (lines 15-16).
The code snippet presented serves those who use Mercurial (my case) and those who use Git. You can remove some parts if only need one or the other.

Preparing Version Control

To prevent unwanted files from being mistakenly added to version control, you need to create a filter list in a particular file located at the root of the project, at the same level as the .hg or .git directory depending on which tool you use. If you use Mercurial, this file should be called .hgignore and should contain:

syntax: glob

.venv
.env
*~
*.py[cod]
*.orig

# Unit test / coverage reports
.coverage
htmlcov/

# cache
__pycache__
.mypy_cache
.pytest_cache

If you use Git, the filename must be called .gitignore and contain the same lines as above minus the first line (syntax: glob), which should be removed.

With the filters defined, we can start version control. For Mercurial, the commands are:

$ hg init .
$ poetry run make install_hooks
$ hg commit -Am 'Initial project structure'

If you use Git, execute:

$ git init .
$ poetry run make install_hooks
$ git add -A .
$ git commit -m 'Initial project structure'

And everything is ready to upload to the official project repository on GitHub.

Ready-to-Use Template on GitHub

It is important to know all the steps to create the perfect Python project. But instead of executing these same steps at each new project, you can just use the template that I made available on Github. Instructions for use are in README.rst.

Final Considerations

The design basis presented in this article works very well and can be easily adapted to other tools, if you want to try a different combination. The most important thing is to maintain the project structure and linting and testing activities automated.

References

1	Make a README

2	READMEs on READMEs (and other README-related resources)

3	How to set up a perfect Python project

Next article: Minimal FastAPI Project

Previous article: Managing Version, Virtual Environments and Dependencies with Pyenv and Poetry

Blog

Definition of the Perfect Initial Base

Virtual Environment Management

Initial Directory Structure

Linting and Testing Tools

Installation and Configuration

Settings

Automation

Continuous Integration System Configuration

pre-commit and pre-push Events

Preparing Version Control

Ready-to-Use Template on GitHub

Final Considerations

References

Comments

`pre-commit` and `pre-push` Events