You change one line and three unrelated features break. You refactor a function and spend two hours manually clicking through the app to check if everything still works. You deploy on Friday and get paged at midnight. All of these are symptoms of the same disease: no tests.
Tests are not bureaucracy. They are the fastest way to know that your code does what you think it does. A good test suite runs in seconds and catches the bugs that would take hours to find manually.
Writing tests costs time up front. Not writing tests costs more time later. Here is the math:
Activity
Without Tests
With Tests
Initial development
Faster (no tests to write)
Slower (tests add 20-40% time)
Refactoring
Terrifying (did I break something?)
Confident (tests catch regressions)
Debugging
Read the whole codebase
Run tests, see exactly what broke
Onboarding new developer
“Ask Sarah, she knows how it works”
Tests document expected behavior
Deploying to production
Manual QA, hope for the best
Automated gate, deploy with confidence
Bug reported by customer
Reproduce manually, fix, manually verify
Write test that reproduces bug, fix, test verifies
The real payoff comes on the second change to any piece of code. The first time, you know the code works because you just wrote it. Every subsequent change, you do not know unless you test.
# Verbose output (show each test name)(.venv) $ pytest -v
# Stop on first failure(.venv) $ pytest -x
# Run tests matching a keyword expression(.venv) $ pytest -k "download"# Run tests in a specific file(.venv) $ pytest tests/test_core.py
# Run a specific test function(.venv) $ pytest tests/test_core.py::test_download_file
# Show print statements (not captured)(.venv) $ pytest -s
# Show local variables in tracebacks(.venv) $ pytest -l
# Re-run only failed tests from last run(.venv) $ pytest --lf
# Run failed tests first, then the rest(.venv) $ pytest --ff
importpytestfrommy_tool.coreimportFileDownloader@pytest.fixturedefdownloader():"""Create a configured downloader instance."""returnFileDownloader(timeout=5,retries=1)deftest_download_sets_timeout(downloader):assertdownloader.timeout==5deftest_download_sets_retries(downloader):assertdownloader.retries==1
pytest sees downloader in the test function’s parameter list, finds the fixture with that name, calls it, and passes the result. Each test gets a fresh instance.
importtempfilefrompathlibimportPathimportpytest@pytest.fixturedeftemp_dir():"""Create a temporary directory, clean up after test."""path=Path(tempfile.mkdtemp())yieldpath# Cleanup runs after the test, even if the test failsimportshutilshutil.rmtree(path,ignore_errors=True)deftest_file_creation(temp_dir):test_file=temp_dir/"test.txt"test_file.write_text("hello")asserttest_file.read_text()=="hello"
@pytest.fixture(scope="session")defdatabase_connection():"""Create a database connection, shared across all tests."""conn=create_connection("test.db")yieldconnconn.close()@pytest.fixture(scope="module")defsample_data():"""Load sample data, shared within one test file."""returnload_test_data("fixtures/sample.json")@pytest.fixture(scope="class")defapi_client():"""Create API client, shared within one test class."""returnAPIClient(base_url="http://localhost:8000")@pytest.fixture# scope="function" is the defaultdefclean_state():"""Fresh state for each test."""return{}
Fixtures in conftest.py are available to all tests in the same directory and subdirectories without importing:
1
2
3
4
5
6
7
tests/
conftest.py # fixtures available to all tests
test_core.py
test_cli.py
integration/
conftest.py # additional fixtures for integration tests
test_api.py
deftest_capture_output(capsys):"""capsys captures stdout and stderr."""print("hello world")captured=capsys.readouterr()assertcaptured.out=="hello world\n"deftest_temp_path(tmp_path):"""tmp_path provides a unique temporary directory."""file=tmp_path/"data.txt"file.write_text("content")assertfile.read_text()=="content"deftest_monkeypatch_env(monkeypatch):"""monkeypatch modifies environment for the test."""monkeypatch.setenv("API_KEY","test-key-123")importosassertos.environ["API_KEY"]=="test-key-123"
When testing a function that calls an external service, you do not want your tests to make real HTTP requests. Mocking replaces parts of your code with controlled fakes.
fromunittest.mockimportpatch,MagicMockfrommy_tool.coreimportdownload_file@patch("my_tool.core.requests.get")deftest_download_file_success(mock_get,tmp_path):# Configure the mock responsemock_response=MagicMock()mock_response.status_code=200mock_response.headers={"content-length":"11"}mock_response.iter_content.return_value=[b"hello world"]mock_response.raise_for_status.return_value=Nonemock_get.return_value=mock_responseoutput=tmp_path/"test.txt"result=download_file("https://example.com/test.txt",str(output),quiet=True)assertresult==outputassertoutput.read_bytes()==b"hello world"mock_get.assert_called_once()@patch("my_tool.core.requests.get")deftest_download_file_http_error(mock_get):mock_response=MagicMock()mock_response.raise_for_status.side_effect=Exception("404 Not Found")mock_get.return_value=mock_responsewithpytest.raises(Exception,match="404"):download_file("https://example.com/missing.txt",quiet=True)
Over-mocking is a common mistake. If you mock everything, your tests verify the mocks, not your code. Mock at the boundary (network, disk, clock) and test real logic with real code.
For simple cases, monkeypatch is cleaner than patch:
1
2
3
4
deftest_download_with_env_config(monkeypatch):monkeypatch.setenv("DOWNLOAD_TIMEOUT","60")monkeypatch.setattr("my_tool.config.DEFAULT_TIMEOUT",60)# test code that reads from config
Do not chase 100% coverage. Some code (error handlers for impossible states, __repr__ methods) is not worth testing. Focus on business logic and edge cases.
# Run only unit tests (skip integration)(.venv) $ pytest -m "not integration"# Run only integration tests(.venv) $ pytest -m integration
Register custom marks in pyproject.toml:
1
2
3
4
5
[tool.pytest.ini_options]markers=["integration: marks tests as integration tests (deselect with '-m \"not integration\"')","slow: marks tests as slow (deselect with '-m \"not slow\"')",]
Test the full application from the user’s perspective:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# tests/e2e/test_cli.pyfromclick.testingimportCliRunnerfrommy_tool.cliimportmaindeftest_cli_download(tmp_path,monkeypatch):monkeypatch.chdir(tmp_path)runner=CliRunner()result=runner.invoke(main,["https://httpbin.org/bytes/100","-o","test.bin","-q"])assertresult.exit_code==0assert(tmp_path/"test.bin").exists()deftest_cli_help():runner=CliRunner()result=runner.invoke(main,["--help"])assertresult.exit_code==0assert"Download files from URLs"inresult.output
Debug a test that just failed, at the point of failure:
1
2
(.venv) $ pytest --pdb tests/test_core.py
# Drops into pdb at the point of failure
This is extremely powerful. You do not need to add breakpoint() anywhere. Just run with --pdb and pytest will open the debugger at the exact line where the assertion failed, with all local variables intact.
Traditional tests check specific examples. Property-based testing checks invariants that should hold for any valid input. Hypothesis ↗
generates hundreds of random inputs to find edge cases you would never think to test manually.
fromhypothesisimportgiven,assume,settingsfromhypothesisimportstrategiesasst@given(st.lists(st.integers()))deftest_sort_is_idempotent(xs):"""Sorting twice gives the same result as sorting once."""assertsorted(sorted(xs))==sorted(xs)@given(st.lists(st.integers()))deftest_sort_preserves_length(xs):"""Sorting does not add or remove elements."""assertlen(sorted(xs))==len(xs)@given(st.lists(st.integers(),min_size=1))deftest_sort_result_is_ordered(xs):"""Every element is <= the next."""result=sorted(xs)fora,binzip(result,result[1:]):asserta<=b
Instead of testing sorted([3, 1, 2]) == [1, 2, 3], we test properties that any correct sort must satisfy. Hypothesis will try empty lists, single elements, duplicates, negative numbers, huge values, and more.
fromdataclassesimportdataclass@dataclassclassUser:name:strage:intemail:str# Build a strategy for generating Usersusers=st.builds(User,name=st.text(min_size=1,max_size=50),age=st.integers(min_value=0,max_value=150),email=st.emails(),)@given(users)deftest_user_serialization_roundtrip(user):"""Serialize → deserialize should return the original object."""data=serialize(user)restored=deserialize(data)assertrestored==user
fromhypothesis.statefulimportRuleBasedStateMachine,rule,preconditionclassSetMachine(RuleBasedStateMachine):"""Test that our CustomSet behaves like Python's built-in set."""def__init__(self):super().__init__()self.model=set()# referenceself.actual=CustomSet()# system under test@rule(value=st.integers())defadd(self,value):self.model.add(value)self.actual.add(value)assertvalueinself.actual@rule(value=st.integers())defremove(self,value):ifvalueinself.model:self.model.remove(value)self.actual.remove(value)assertvaluenotinself.actual@rule()defcheck_length(self):assertlen(self.actual)==len(self.model)TestSet=SetMachine.TestCase
fromhypothesisimportsettings,Phase,Verbosity# Slow but thorough (CI)@settings(max_examples=1000,deadline=None)@given(st.text())deftest_thorough(s):...# Fast iteration (dev)@settings(max_examples=50)@given(st.text())deftest_quick(s):...# Configure profiles globally in conftest.pysettings.register_profile("ci",max_examples=1000)settings.register_profile("dev",max_examples=50)settings.load_profile(os.environ.get("HYPOTHESIS_PROFILE","dev"))
importpytest@pytest.mark.timeout(5)deftest_api_response():"""Must complete within 5 seconds."""response=client.get("/api/heavy-endpoint")assertresponse.status_code==200
Or globally in pyproject.toml:
1
2
[tool.pytest.ini_options]timeout=30# seconds — any single test exceeding this fails
# src/my_tool/processor.pyfromdatetimeimportdatetimedefparse_log_entry(line:str)->dict:"""Parse a log line into structured data.
Expected format: YYYY-MM-DD HH:MM:SS LEVEL message
Args:
line: Raw log line string.
Returns:
Dict with keys: timestamp, level, message.
Raises:
ValueError: If the line format is invalid.
"""parts=line.strip().split(" ",3)iflen(parts)<4:raiseValueError(f"Invalid log format: {line!r}")date_str,time_str,level,message=partstimestamp=datetime.fromisoformat(f"{date_str}{time_str}")level=level.upper()iflevelnotin("DEBUG","INFO","WARNING","ERROR","CRITICAL"):raiseValueError(f"Unknown log level: {level!r}")return{"timestamp":timestamp,"level":level,"message":message,}
# tests/test_processor.pyfromdatetimeimportdatetimeimportpytestfrommy_tool.processorimportparse_log_entryclassTestParseLogEntry:"""Tests for parse_log_entry function."""deftest_valid_info_line(self):result=parse_log_entry("2024-01-15 10:30:00 INFO Server started")assertresult=={"timestamp":datetime(2024,1,15,10,30,0),"level":"INFO","message":"Server started",}deftest_valid_error_line(self):result=parse_log_entry("2024-01-15 10:30:00 ERROR Connection refused")assertresult["level"]=="ERROR"assertresult["message"]=="Connection refused"deftest_message_with_spaces(self):result=parse_log_entry("2024-01-15 10:30:00 WARNING Disk usage at 90% on /dev/sda1")assertresult["message"]=="Disk usage at 90% on /dev/sda1"deftest_level_case_insensitive(self):result=parse_log_entry("2024-01-15 10:30:00 info lowercase level")assertresult["level"]=="INFO"@pytest.mark.parametrize("level",["DEBUG","INFO","WARNING","ERROR","CRITICAL",])deftest_all_valid_levels(self,level):line=f"2024-01-15 10:30:00 {level} test message"result=parse_log_entry(line)assertresult["level"]==leveldeftest_empty_string_raises(self):withpytest.raises(ValueError,match="Invalid log format"):parse_log_entry("")deftest_incomplete_line_raises(self):withpytest.raises(ValueError,match="Invalid log format"):parse_log_entry("2024-01-15 10:30:00")deftest_invalid_level_raises(self):withpytest.raises(ValueError,match="Unknown log level"):parse_log_entry("2024-01-15 10:30:00 TRACE message")deftest_invalid_timestamp_raises(self):withpytest.raises(ValueError):parse_log_entry("not-a-date 10:30:00 INFO message")deftest_strips_whitespace(self):result=parse_log_entry(" 2024-01-15 10:30:00 INFO padded \n")assertresult["level"]=="INFO"
Tests tell you that your code works. Type hints and linting tell you that your code is correct before you even run it. In the next article, we will add type annotations to our codebase, set up mypy for static type checking, and configure ruff and black so that style arguments become automated, not debated.