Series · Python Engineering · Chapter 2

Python Engineering (2): Project Structure — From Script to Package

Learn how to organize Python code into proper packages with imports, entry points, and CLI tools. Build a real command-line application from scratch.

Every project starts as a single file. You write main.py, it works, you add features, and one day you realize you have 1,500 lines in one file with functions that call other functions that depend on globals defined 800 lines above. The code works, but nobody (including future you) can understand it.

The jump from script to package is the first real engineering decision in a Python project. Get it right early, and testing, packaging, and deployment become easier. Get it wrong, and you’ll spend weeks untangling circular imports.


When a Single File Is Not Enough#

A single-file script is fine when:

  • The code is under 300 lines
  • There is one clear flow from top to bottom
  • You are the only person who will ever read it
  • It is a throwaway script, not a maintained tool

You need a package when:

  • Multiple people work on the code
  • You want to test individual components
  • You need to reuse functions across scripts
  • The code has distinct logical sections (config, data, logic, CLI)
  • You plan to distribute it (pip install)

Flat Layout vs src Layout#

There are two dominant project structures in the Python ecosystem.

Flat vs src layout

Flat Layout#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
my_tool/
  my_tool/
    __init__.py
    core.py
    cli.py
    utils.py
  tests/
    test_core.py
    test_cli.py
  pyproject.toml
  README.md

The package directory sits at the project root. This is simpler and used by many projects including Flask and Requests.

src Layout#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
my_tool/
  src/
    my_tool/
      __init__.py
      core.py
      cli.py
      utils.py
  tests/
    test_core.py
    test_cli.py
  pyproject.toml
  README.md

The package directory is inside src/. This layout is recommended by the Python Packaging Authority (PyPA) and has one critical advantage: it forces you to install your package before testing it. This catches packaging errors (missing files, broken imports) before you ship.

With the flat layout, import my_tool resolves to the local directory even if the package is not properly installable. With src layout, Python cannot find my_tool unless you run pip install -e . first. This is a feature, not a bug.

Which to Choose#

CriterionFlat Layoutsrc Layout
SimplicitySimplerSlightly more nesting
Testing accuracyMay hide packaging bugsCatches them early
Popular examplesFlask, Requests, FastAPIpytest, pip, setuptools
PyPA recommendationAcceptableRecommended
Import safetyAccidental imports possibleMust install first

Use src layout for libraries you plan to publish. Use flat layout for applications where you control the deployment environment. When in doubt, use src layout.

__init__.py: The Package Marker#

Python package import resolution detective following sys pat

A directory becomes a Python package when it contains __init__.py. This file can be empty or contain initialization code.

Package structure

1
2
3
4
5
# src/my_tool/__init__.py

"""My Tool — a file downloader CLI."""

__version__ = "0.1.0"

What __init__.py Does#

  1. Marks a directory as a package so Python can import from it
  2. Runs on import — code in __init__.py executes when someone does import my_tool
  3. Controls the public API via __all__

init.py patterns

1
2
3
4
5
6
# src/my_tool/__init__.py

from my_tool.core import download_file, validate_url
from my_tool.utils import format_size

__all__ = ["download_file", "validate_url", "format_size"]

Now users can write from my_tool import download_file instead of from my_tool.core import download_file.

When __init__.py Should Be Empty#

Keep it empty when:

  • The package has submodules with distinct purposes
  • You want users to import from specific submodules
  • There are circular dependency risks between submodules

Example: import numpy has a large __init__.py that wires everything together. import sqlalchemy keeps __init__.py minimal and expects from sqlalchemy.orm import Session.

Namespace Packages (No __init__.py)#

Since Python 3.3, directories without __init__.py are namespace packages. These allow a package to span multiple directories on disk. Unless you are building a plugin system, always include __init__.py.

Relative vs Absolute Imports#

Import resolution order

1
2
3
4
5
6
7
8
# Absolute import — always works, always clear
from my_tool.core import download_file
from my_tool.utils import format_size

# Relative import — works inside the package
from .core import download_file
from .utils import format_size
from ..other_module import something  # parent package

Rules of Thumb#

SituationUse
Importing from within the same packageRelative (.module)
Importing stdlib or third-partyAbsolute (import os, import requests)
In __init__.pyEither, but be consistent
In scripts run directly (python script.py)Absolute only
In testsAbsolute

Relative imports fail when you run a module directly as a script (python src/my_tool/core.py) because Python does not know the package context. Use python -m my_tool.core instead.

Circular Imports#

Circular imports happen when module A imports from module B, and module B imports from module A.

1
2
3
4
5
# core.py
from my_tool.utils import format_size  # utils imports from core!

# utils.py
from my_tool.core import DEFAULT_TIMEOUT  # core imports from utils!

Solutions:

  1. Move shared constants to a separate module (constants.py or config.py)
  2. Import inside functions instead of at module level (delays the import)
  3. Restructure — if two modules are tightly coupled, maybe they should be one module

pyproject.toml for Package Metadata#

The full project metadata in pyproject.toml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
[build-system]
requires = ["setuptools>=68.0", "wheel"]
build-backend = "setuptools.backends._legacy:_Backend"

[project]
name = "my-tool"
version = "0.1.0"
description = "A CLI file downloader"
readme = "README.md"
requires-python = ">=3.10"
license = {text = "MIT"}
authors = [
    {name = "Your Name", email = "you@example.com"},
]
keywords = ["download", "cli", "tool"]
classifiers = [
    "Development Status :: 3 - Alpha",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
]
dependencies = [
    "requests>=2.28",
    "click>=8.0",
    "rich>=13.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=7.0",
    "pytest-cov",
    "ruff",
]

[project.scripts]
my-tool = "my_tool.cli:main"

[project.urls]
Homepage = "https://github.com/you/my-tool"
Repository = "https://github.com/you/my-tool"
Issues = "https://github.com/you/my-tool/issues"

[tool.setuptools.packages.find]
where = ["src"]

Entry Points and Console Scripts#

The [project.scripts] section in pyproject.toml creates executable commands when the package is installed:

1
2
[project.scripts]
my-tool = "my_tool.cli:main"

After pip install ., you can run my-tool from anywhere. It calls the main() function in my_tool/cli.py.

This is how CLI tools like black, ruff, pytest, and flask work. You pip install flask and the flask command appears in your PATH.

How It Works Internally#

pip install creates a small wrapper script in the venv’s bin/ directory:

1
2
3
4
5
6
7
8
9
$ cat .venv/bin/my-tool
#!/home/user/project/.venv/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from my_tool.cli import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

__main__.py for Runnable Packages#

__main__.py lets you run a package with python -m:

1
$ python -m my_tool

Python looks for my_tool/__main__.py and executes it.

1
2
3
4
5
6
7
8
# src/my_tool/__main__.py

"""Allow running as: python -m my_tool"""

from my_tool.cli import main

if __name__ == "__main__":
    main()

This is useful during development (before installing the package) and for modules that need to be both importable and runnable.

CLI with argparse#

Python project structure as a well organized filing cabinet

The standard library includes argparse for command-line interfaces:

CLI entry point architecture

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# src/my_tool/cli.py

import argparse
import sys

from my_tool.core import download_file


def parse_args(argv: list[str] | None = None) -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        prog="my-tool",
        description="Download files from URLs",
    )
    parser.add_argument(
        "url",
        help="URL to download",
    )
    parser.add_argument(
        "-o", "--output",
        help="Output file path (default: derive from URL)",
    )
    parser.add_argument(
        "-q", "--quiet",
        action="store_true",
        help="Suppress progress output",
    )
    parser.add_argument(
        "--timeout",
        type=int,
        default=30,
        help="Request timeout in seconds (default: 30)",
    )
    return parser.parse_args(argv)


def main(argv: list[str] | None = None) -> int:
    args = parse_args(argv)
    try:
        path = download_file(
            url=args.url,
            output=args.output,
            quiet=args.quiet,
            timeout=args.timeout,
        )
        if not args.quiet:
            print(f"Downloaded: {path}")
        return 0
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        return 1

Usage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
$ my-tool https://example.com/data.csv -o data.csv --timeout 60
Downloading: data.csv [============================] 100% 2.4MB
Downloaded: data.csv

$ my-tool --help
usage: my-tool [-h] [-o OUTPUT] [-q] [--timeout TIMEOUT] url

Download files from URLs

positional arguments:
  url                   URL to download

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output file path (default: derive from URL)
  -q, --quiet           Suppress progress output
  --timeout TIMEOUT     Request timeout in seconds (default: 30)

The argv parameter in parse_args and main makes testing easy:

1
2
3
4
def test_parse_args():
    args = parse_args(["https://example.com/file.txt", "-o", "out.txt"])
    assert args.url == "https://example.com/file.txt"
    assert args.output == "out.txt"

CLI with click#

For more complex CLIs, click is the de facto standard. It uses decorators instead of imperative parser setup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# src/my_tool/cli.py

import click

from my_tool.core import download_file


@click.command()
@click.argument("url")
@click.option("-o", "--output", default=None, help="Output file path")
@click.option("-q", "--quiet", is_flag=True, help="Suppress progress output")
@click.option("--timeout", default=30, type=int, help="Timeout in seconds")
def main(url: str, output: str | None, quiet: bool, timeout: int) -> None:
    """Download files from URLs."""
    try:
        path = download_file(url=url, output=output, quiet=quiet, timeout=timeout)
        if not quiet:
            click.echo(f"Downloaded: {path}")
    except Exception as e:
        click.echo(f"Error: {e}", err=True)
        raise SystemExit(1)

click advantages over argparse:

Featureargparseclick
SubcommandsPossible but verbose@click.group()
Type validationBasicExtensible click.Path, click.Choice
TestingParse argv manuallyCliRunner built in
Colored outputManualclick.style(), click.echo()
PromptsManualclick.prompt(), click.confirm()
Progress barsNot includedclick.progressbar()

click with Subcommands#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@click.group()
@click.version_option()
def cli():
    """My Tool — file downloader and converter."""
    pass


@cli.command()
@click.argument("url")
@click.option("-o", "--output", default=None)
def download(url: str, output: str | None) -> None:
    """Download a file from a URL."""
    path = download_file(url=url, output=output)
    click.echo(f"Downloaded: {path}")


@cli.command()
@click.argument("input_file", type=click.Path(exists=True))
@click.argument("output_format", type=click.Choice(["csv", "json", "parquet"]))
def convert(input_file: str, output_format: str) -> None:
    """Convert a file to another format."""
    result = convert_file(input_file, output_format)
    click.echo(f"Converted: {result}")

Usage:

1
2
3
$ my-tool download https://example.com/data.csv
$ my-tool convert data.csv json
$ my-tool --help

Real Example: Building a File Downloader#

Let us build the complete project structure for the downloader tool.

Project Layout#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
my-downloader/
  src/
    my_downloader/
      __init__.py         # Package version, public API
      __main__.py          # python -m my_downloader
      cli.py               # Click CLI interface
      core.py              # Download logic
      utils.py             # Helper functions
      config.py            # Constants, defaults
  tests/
    __init__.py
    conftest.py            # Shared fixtures
    test_core.py
    test_cli.py
    test_utils.py
  pyproject.toml
  requirements.txt
  .python-version
  .gitignore
  README.md

config.py — Constants#

1
2
3
4
5
6
7
8
# src/my_downloader/config.py

"""Application constants and defaults."""

DEFAULT_TIMEOUT = 30
DEFAULT_CHUNK_SIZE = 8192
MAX_RETRIES = 3
USER_AGENT = "my-downloader/0.1.0"

utils.py — Helpers#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# src/my_downloader/utils.py

"""Utility functions for file operations and formatting."""

from pathlib import Path
from urllib.parse import urlparse


def format_size(size_bytes: int) -> str:
    """Format byte count as human-readable string.

    Args:
        size_bytes: Number of bytes.

    Returns:
        Formatted string like '2.4 MB'.
    """
    for unit in ("B", "KB", "MB", "GB", "TB"):
        if abs(size_bytes) < 1024:
            return f"{size_bytes:.1f} {unit}"
        size_bytes /= 1024  # type: ignore[assignment]
    return f"{size_bytes:.1f} PB"


def filename_from_url(url: str) -> str:
    """Extract filename from a URL.

    Args:
        url: The URL to parse.

    Returns:
        The filename portion of the URL path,
        or 'download' if none can be determined.
    """
    parsed = urlparse(url)
    name = Path(parsed.path).name
    return name if name else "download"

core.py — Business Logic#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# src/my_downloader/core.py

"""Core download logic."""

from pathlib import Path

import requests

from my_downloader.config import DEFAULT_CHUNK_SIZE, DEFAULT_TIMEOUT, USER_AGENT
from my_downloader.utils import filename_from_url, format_size


def download_file(
    url: str,
    output: str | None = None,
    quiet: bool = False,
    timeout: int = DEFAULT_TIMEOUT,
) -> Path:
    """Download a file from a URL.

    Args:
        url: The URL to download from.
        output: Output file path. Derived from URL if None.
        quiet: If True, suppress progress output.
        timeout: Request timeout in seconds.

    Returns:
        Path to the downloaded file.

    Raises:
        requests.HTTPError: If the request fails.
    """
    headers = {"User-Agent": USER_AGENT}
    response = requests.get(url, headers=headers, stream=True, timeout=timeout)
    response.raise_for_status()

    dest = Path(output) if output else Path(filename_from_url(url))
    total = int(response.headers.get("content-length", 0))
    downloaded = 0

    with open(dest, "wb") as f:
        for chunk in response.iter_content(chunk_size=DEFAULT_CHUNK_SIZE):
            f.write(chunk)
            downloaded += len(chunk)
            if not quiet and total > 0:
                pct = downloaded / total * 100
                print(
                    f"\rDownloading: {dest.name} "
                    f"[{pct:5.1f}%] {format_size(downloaded)}",
                    end="",
                    flush=True,
                )

    if not quiet:
        print()  # newline after progress

    return dest

Install in Development Mode#

1
2
3
4
$ cd my-downloader
$ python -m venv .venv
$ source .venv/bin/activate
(.venv) $ pip install -e ".[dev]"

The -e flag installs in “editable” mode. Code changes take effect immediately without reinstalling.

After installation, the my-downloader command is available:

1
2
3
(.venv) $ my-downloader https://example.com/data.csv
Downloading: data.csv [100.0%] 1.2 KB
Downloaded: data.csv

And python -m my_downloader also works because of __main__.py.

Monorepo: Multiple Packages in One Repository#

As projects grow, you often end up with multiple related packages: a core library, a CLI tool, a web API, and shared utilities. A monorepo keeps them together with shared CI and synchronized releases.

Monorepo Layout#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
my-platform/
  packages/
    core/
      src/core/
        __init__.py
        models.py
        database.py
      pyproject.toml
      tests/
    api/
      src/api/
        __init__.py
        routes.py
        middleware.py
      pyproject.toml
      tests/
    cli/
      src/cli/
        __init__.py
        commands.py
      pyproject.toml
      tests/
  pyproject.toml          # workspace root (uv/hatch)
  uv.lock                 # single lockfile for all packages
  .python-version

uv Workspaces#

uv supports workspaces natively. The root pyproject.toml declares member packages:

1
2
3
# Root pyproject.toml
[tool.uv.workspace]
members = ["packages/*"]

Each member has its own pyproject.toml with cross-references:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# packages/api/pyproject.toml
[project]
name = "my-platform-api"
version = "0.1.0"
dependencies = [
    "my-platform-core",  # references sibling package
    "fastapi>=0.100",
]

[tool.uv.sources]
my-platform-core = { workspace = true }

Commands work at the workspace level:

1
2
3
4
5
6
7
8
# Install all packages in the workspace
$ uv sync

# Run tests for one package
$ uv run --package my-platform-api pytest

# Add a dependency to a specific package
$ uv add --package my-platform-cli typer

When to Use a Monorepo vs Separate Repos#

FactorMonorepoSeparate repos
Team sizeSmall-medium (1-10 devs)Large (many independent teams)
Release cadencePackages released togetherIndependent release cycles
Shared codeHeavy cross-package importsMinimal coupling
CI complexityOne pipeline tests everythingPer-repo CI, simpler individually
Version managementSynchronized versionsIndependent semver
Dependency managementSingle lockfilePer-repo lockfiles

Namespace Packages#

Namespace packages allow multiple distributions to contribute to the same import path. This is common in plugin systems and large organizations.

Implicit Namespace Packages (PEP 420)#

Since Python 3.3, any directory without __init__.py is a namespace package:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Package A (installable separately)
company-auth/
  src/
    company/        # NO __init__.py
      auth/
        __init__.py
        login.py

# Package B (installable separately)
company-billing/
  src/
    company/        # NO __init__.py
      billing/
        __init__.py
        invoice.py

After installing both:

1
2
from company.auth import login
from company.billing import invoice

The company namespace is shared without either package “owning” it.

Rules for Namespace Packages#

  1. The shared directory (company/) must not have __init__.py
  2. Sub-packages (auth/, billing/) must have __init__.py
  3. Each distribution installs into the same namespace independently
  4. Use find_namespace_packages() or configure [tool.setuptools.packages.find]:
1
2
3
# pyproject.toml for company-auth
[tool.setuptools.packages.find]
where = ["src"]

Plugin Architecture with Entry Points#

Namespace packages work well with entry points for discoverable plugins:

1
2
3
4
# In plugin package's pyproject.toml
[project.entry-points."my_app.plugins"]
csv_export = "my_plugin_csv:CsvExporter"
json_export = "my_plugin_json:JsonExporter"
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# In the main application: discover all installed plugins
from importlib.metadata import entry_points

def load_plugins():
    plugins = {}
    for ep in entry_points(group="my_app.plugins"):
        plugins[ep.name] = ep.load()
    return plugins

# Returns: {"csv_export": <class CsvExporter>, "json_export": <class JsonExporter>}

This pattern lets users install plugins via pip without the main application knowing about them at build time.

Typer: Modern CLI (Type Hints → CLI)#

Typer generates CLI interfaces from type annotations. No decorators, no argument parsing boilerplate:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# src/my_tool/cli.py
import typer
from pathlib import Path
from enum import Enum

app = typer.Typer(help="File processing toolkit")

class Format(str, Enum):
    json = "json"
    csv = "csv"
    parquet = "parquet"

@app.command()
def convert(
    input_file: Path,
    output_format: Format = Format.json,
    verbose: bool = False,
    limit: int = typer.Option(0, help="Max rows (0=unlimited)"),
):
    """Convert a file to another format."""
    if verbose:
        typer.echo(f"Converting {input_file}{output_format.value}")
    # ... conversion logic

@app.command()
def validate(
    files: list[Path],
    strict: bool = typer.Option(False, "--strict", "-s"),
):
    """Validate one or more data files."""
    for f in files:
        if not f.exists():
            typer.echo(f"✗ {f}: not found", err=True)
            raise typer.Exit(1)
        typer.echo(f"✓ {f}: valid")

if __name__ == "__main__":
    app()

Usage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
$ my-tool convert data.csv --output-format parquet --verbose
Converting data.csv → parquet

$ my-tool validate *.json --strict
✓ users.json: valid
✓ config.json: valid

$ my-tool --help
Usage: my-tool [OPTIONS] COMMAND [ARGS]...

  File processing toolkit

Commands:
  convert   Convert a file to another format.
  validate  Validate one or more data files.

argparse vs click vs Typer#

FeatureargparseclickTyper
Standard libraryYesNoNo
Type annotationsNoNoYes (core concept)
SubcommandsVerbose setup@group.command()app.command()
Shell completionManualPluginBuilt-in
Rich outputNoPartialYes (via Rich)
Learning curveMediumLowVery low
TestingManual parsingCliRunnerCliRunner (inherited)

Recommendation: Use Typer for new CLIs (simplest, most modern). Use click if you need advanced plugin systems. Use argparse only for zero-dependency scripts.

Common Import Errors and Fixes#

ErrorCauseFix
ModuleNotFoundError: No module named 'my_tool'Package not installedpip install -e .
ImportError: attempted relative import with no known parent packageRunning file directlyUse python -m my_tool.module
ImportError: cannot import name 'X' from 'my_tool'X not in __init__.py or circular importCheck __init__.py, break circular deps
ModuleNotFoundError: No module named 'my_tool.core'Missing __init__.py or wrong package structureVerify __init__.py exists, check find config in pyproject.toml

What’s Next#

With a proper project structure in place, the next step is making sure it actually works. Testing is not about writing tests for the sake of coverage numbers. It is about building confidence that your code does what you think it does. In the next article, we will set up pytest, write meaningful tests with fixtures and parametrize, and learn to debug efficiently when tests reveal problems.

In this series

Python Engineering 8 parts

  1. 01 Python Engineering (1): Environment Setup — pyenv, venv, and Dependency Hell
  2. 02 Python Engineering (2): Project Structure — From Script to Package you are here
  3. 03 Python Engineering (3): Testing — pytest, Fixtures, and the Confidence Loop
  4. 04 Python Engineering (4): Type Hints, Linting, and Code Quality
  5. 05 Python Engineering (5): I/O, Serialization, and Data Formats
  6. 06 Python Engineering (6): Concurrency — Threads, Processes, and asyncio
  7. 07 Python Engineering (7): Packaging — From pip install to PyPI
  8. 08 Python Engineering (8): Performance — Profiling, Caching, and Knowing When to Stop

Liked this piece?

Follow on GitHub for the next one — usually one a week.

GitHub