Pre-commit : Protecting your future self

Neil Shephard

View these slides…

ns-rse.github.io/pre-commit/

Introduction

  • Research Software Engineer at University of Sheffield
  • Background : Statistical Genetics, Medical Statistics and Data Scientist for Telematics Company
  • Blog Post (2022-10-10) : pre-commit : Protecting your future self

Structure

  • (Very) brief Git version control.
  • A digression into Linting and Testing.
  • Git Hooks.
  • pre-commit installation.
  • pre-commit configuration.
  • pre-commit usage.
  • pre-commit in CI/CD.

Git

xkcd (1597)

https://xkcd.com/1597/

Git Workflow

%%{init: { 'logLevel': 'debug', 'theme': 'base', 'gitGraph': {'showBranches': true,'showCommitLabel': true, 'rotateCommitLabel': true}} }%%
gitGraph
    commit
    commit
    branch bug1
    checkout main
    commit
    checkout bug1
    commit
    commit
    checkout main
    branch feature1
    checkout feature1
    commit
    commit
    checkout bug1
    commit
    checkout main
    merge bug1 tag: "v0.1.1"
    checkout feature1
    commit
    commit
    checkout main
    merge feature1 tag: "v0.1.2"
    commit

Linting and Testing

A digression…

  • Good practice to lint code & conform to Style Guides
  • Good practice to have tests in place for code.

Linting - What is all the fluff about?

A simple Python function

sample.py

import numpy as np

from pathlib import Path

def find_files(file_path: Union[str, Path], file_ext: str) -> list:
    """Recursively find files of the stated type along the given file path."""
    # We have a really long comment on this line just for demonstration purposes so that we can generate a few errors that need linting
    try:
        return list(Path(file_path).rglob(f"*{file_ext}"))
    except:
        raise

A Simple Test

test_sample.py

from .sample import find_files

def test_find_files():
    """Test the find_files() function"""
    py_files = find_files(file_path="./", file_ext=".py")
    assert isinstance(py_files, list)
    assert "sample.py" in py_files

Linting and Testing Tools

Linting and Testing manually…

black sample.py
flake8 sample.py
pylint sample.py
pytest test_sample.py

Linting manually

black

 black sample.py
All done! ✨ 🍰 ✨
1 file changed.

flake8

 flake8 sample.py
sample.py:1:1: D100 Missing docstring in public module
sample.py:1:1: F401 'numpy as np' imported but unused
sample.py:2:1: F401 'pandas as pd' imported but unused
sample.py:7:36: F821 undefined name 'Union'
sample.py:8:80: E501 line too long (87 > 79 characters)
sample.py:9:80: E501 line too long (135 > 79 characters)
sample.py:12:5: E722 do not use bare 'except'

pylint

 pylint sample.py
************* Module sample
sample.py:9:0: C0301: Line too long (135/120) (line-too-long)
sample.py:1:0: C0114: Missing module docstring (missing-module-docstring)
sample.py:7:35: E0602: Undefined variable 'Union' (undefined-variable)
sample.py:12:4: W0706: The except handler raises immediately (try-except-raise)
sample.py:4:0: C0411: standard import "from pathlib import Path" should be placed before "import numpy as np" (wrong-import-order)
sample.py:1:0: W0611: Unused numpy imported as np (unused-import)
sample.py:2:0: W0611: Unused pandas imported as pd (unused-import)

-------------------------------------
Your code has been rated at -10.00/10

pytest

 pylint test_sample.py
================= test session starts =================
platform linux -- Python 3.7.11, pytest-7.1.1, pluggy-1.0.0
rootdir: /home/neil/work/projects/pre-commit/assets/python
plugins: hydra-core-1.2.0, regtest-1.5.0, cov-3.0.0
collected 0 items / 1 error

======================= ERRORS ========================
___________ ERROR collecting test_sample.py ___________
test_sample.py:1: in <module>
    from .sample import find_files
sample.py:7: in <module>
    def find_and_load_files(file_path: Union[str, Path], file_type: str):
E   NameError: name 'Union' is not defined
================ short test summary info ===============
ERROR test_sample.py - NameError: name 'Union' is not defined
!!!!!!!! Interrupted: 1 error during collection !!!!!!!!
=================== 1 error in 0.49s ===================

then you can commit and push

Automate with pre-commit

  • Uses Git Hooks to run checks automatically.
  • Written in Python but hooks for most languages.
  • Large number of supported hooks available to use.
  • Supports : C, C++, R, Java, JavaScript, PHP, LISP, Markdown, Go, Bash, Ansible, Docker, Lua, Jupyter Notebooks and more.

What are Hooks?

  • Actions that are run prior to or in response to a given action.
 ls -lha .git/hooks
drwxr-xr-x neil neil 4.0 KB Mon Oct 24 10:26:37 2022  .
drwxr-xr-x neil neil 4.0 KB Tue Jan  3 18:48:37 2023  ..
.rwxr-xr-x neil neil 478 B  Sun Aug 14 13:35:27 2022  applypatch-msg.sample
.rwxr-xr-x neil neil 896 B  Sun Aug 14 13:35:27 2022  commit-msg.sample
.rwxr-xr-x neil neil 4.6 KB Sun Aug 14 13:35:27 2022  fsmonitor-watchman.sample
.rwxr-xr-x neil neil 189 B  Sun Aug 14 13:35:27 2022  post-update.sample
.rwxr-xr-x neil neil 424 B  Sun Aug 14 13:35:27 2022  pre-applypatch.sample
.rwxr-xr-x neil neil 1.6 KB Sun Aug 14 13:35:27 2022  pre-commit.sample
.rwxr-xr-x neil neil 416 B  Sun Aug 14 13:35:27 2022  pre-merge-commit.sample
.rwxr-xr-x neil neil 1.3 KB Sun Aug 14 13:35:27 2022  pre-push.sample
.rwxr-xr-x neil neil 4.8 KB Sun Aug 14 13:35:27 2022  pre-rebase.sample
.rwxr-xr-x neil neil 544 B  Sun Aug 14 13:35:27 2022  pre-receive.sample
.rwxr-xr-x neil neil 1.5 KB Sun Aug 14 13:35:27 2022  prepare-commit-msg.sample
.rwxr-xr-x neil neil 2.7 KB Sun Aug 14 13:35:27 2022  push-to-checkout.sample
.rwxr-xr-x neil neil 3.6 KB Sun Aug 14 13:35:27 2022  update.sample

Installation of pre-commit

Python

 workon a_virtual_env  # Optional
 pip install pre-commit

Conda Environment

 conda activate conda_env
 conda install -c \
        conda-forge pre-commit

GNU/Linux

# Arch
 pacman -Syu python-pre-commit
# Gentoo
 emerge -av pre-commit
# Debian/Ubuntu
 sudo apt install pre-commit

OSX

 sudo port install pre-commit
 brew install pre-commit

.pre-commit-config.yaml

Root of a project under Git version control.

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.3.0
    hooks:
      - id: trailing-whitespace
        types: [file, text]
      - id: end-of-file-fixer
        types: [file, text]
      - id: check-docstring-first
      - id: check-case-conflict
      - id: check-yaml
  - repo: https://github.com/psf/black
    rev: 22.10.0
    hooks:
      - id: black
        types: [python]
        additional_dependencies: ['click==8.0.4']
        args: ["--config pyproject.toml"]
  - repo: https://github.com/pycqa/flake8.git
    rev: 5.0.4
    hooks:
      - id: flake8
        args: ["--config setup.cfg"]
        additional_dependencies: [flake8-print]
        types: [python]
  - repo: local
    hooks:
      - id: pylint
        args: ["--rcfile=.pylintrc"]
        name: Pylint
        entry: python -m pylint
        language: system
        files: \.py$
  - repo: local
    hooks:
      - id: pytest
        name: pytest
        entry: pytest --cov
        language:system

Hook configuration - pre-commit

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.3.0  # Use the rev you want to point at
    hooks:
      - id: trailing-whitespace
        types: [file, text]
      - id: end-of-file-fixer
        types: [file, text]
      - id: check-docstring-first
      - id: check-case-conflict
      - id: check-yaml

Hook configuration - Black

  - repo: https://github.com/psf/black
    rev: 22.10.0
    hooks:
      - id: black
        types: [python]
        additional_dependencies: ['click==8.0.4']
        args: ["--config pyproject.toml"]

Hook Configuration - Local

  - repo: local
    hooks:
      -id: pytest
      name: pytest
      entry: pytest --cov
      language: system

pre-commit installation

 git add .pre-commit-config
 pre-commit --version
pre-commit 2.20.0
 pre-commit install
pre-commit installed at .git/hooks/pre-commit
 pre-commit install-hooks    # Optional

Check existing Files

 pre-commit run --all-files
[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Initializing environment for https://github.com/psf/black.
[INFO] Initializing environment for https://github.com/pycqa/flake8.git.
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/psf/black.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/pycqa/flake8.git
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
Check Yaml.....................................................Passed
Fix End of Files...............................................Passed
Check for case conflicts.......................................Passed
Check docstring is first.......................................Failed
Trim Trailing Whitespace.......................................Failed
- hook id: trailing-whitespace
- exit code: 1

Files were modified by this hook. Additional output:

Fixing sample.py

black..........................................................Failed
reformatted sample.py

All done! ✨ 🍰 ✨
1 file reformatted.
flake8.........................................................Failed
- hook id: flake8
- exit code: 1

sample.py:1:1: D100 Missing docstring in public module
sample.py:1:1: F401 'numpy as np' imported but unused
sample.py:2:1: F401 'pandas as pd' imported but unused
sample.py:7:36: F821 undefined name 'Union'
sample.py:8:80: E501 line too long (87 > 79 characters)
sample.py:9:80: E501 line too long (135 > 79 characters)
sample.py:12:5: E722 do not use bare 'except'

pylint.........................................................Failed
- hook id: pylint
- exit code: 1

************* Module python.sample
sample.py:9:0: C0301: Line too long (135/120) (line-too-long)
sample.py:1:0: C0114: Missing module docstring (missing-module-docstring)
sample.py:7:35: E0602: Undefined variable 'Union' (undefined-variable)
sample.py:12:4: W0706: The except handler raises immediately (try-except-raise)
sample.py:4:0: C0411: standard import "from pathlib import Path" should be placed before "import numpy as np" (wrong-import-order)
sample.py:1:0: W0611: Unused numpy imported as np (unused-import)
sample.py:2:0: W0611: Unused pandas imported as pd (unused-import)

-------------------------------------
Your code has been rated at -10.00/10

pytest.........................................................Failed
================= test session starts =================
platform linux -- Python 3.7.11, pytest-7.1.1, pluggy-1.0.0
rootdir: /home/neil/work/projects/pre-commit/assets/python
plugins: hydra-core-1.2.0, regtest-1.5.0, cov-3.0.0
collected 0 items / 1 error

======================= ERRORS ========================
___________ ERROR collecting test_sample.py ___________
test_sample.py:1: in <module>
    from .sample import find_files
sample.py:7: in <module>
    def find_and_load_files(file_path: Union[str, Path], file_type: str):
E   NameError: name 'Union' is not defined
================ short test summary info ===============
ERROR test_sample.py - NameError: name 'Union' is not defined
!!!!!!!! Interrupted: 1 error during collection !!!!!!!!
=================== 1 error in 0.49s ===================

git --ignore-rev Who’s to Blame?

Correcting Errors

Original sample.py

import numpy as np

from pathlib import Path

def find_files(file_path: Union[str, Path], file_ext: str) -> List:
    """Recursively find files of the stated type along the given file path."""
    # Short comment
    try:
        return list(Path(file_path).rglob(f"*{file_ext}"))
    except:
        raise

pylint errors

************* Module python.sample
sample.py:9:0: C0301: Line too long (135/120) (line-too-long)
sample.py:1:0: C0114: Missing module docstring (missing-module-docstring)
sample.py:7:35: E0602: Undefined variable 'Union' (undefined-variable)
sample.py:12:4: W0706: The except handler raises immediately (try-except-raise)
sample.py:4:0: C0411: standard import "from pathlib import Path" should be placed before "import numpy as np" (wrong-import-order)
sample.py:1:0: W0611: Unused numpy imported as np (unused-import)

-------------------------------------
Your code has been rated at -10.00/10

Linted

"""Find files of a given type """
from pathlib import Path
from typing import Union


def find_files(file_path: Union[str, Path], file_type: str) -> list:
    """Recursively find files of the stated type along the given file path."""
    return list(Path(file_path).rglob(f"*{file_type}"))

Add and commit changes

git add sample.py
git commit -m "Linting sample.py."
Check Yaml.....................................................Passed
Fix End of Files...............................................Passed
Check for case conflicts.......................................Passed
Check docstring is first.......................................Passed
Trim Trailing Whitespace.......................................Passed
black..........................................................Passed
flake8.........................................................Passed
pylint.........................................................Passed
pytest.........................................................Passed
[INFO] Restored changes from /home/neil/.cache/pre-commit/patch1674045267-394193.
[main 05b1568] Linting sample.py
 1 file changed, 2 insertions(+), 2 deletions(-)

Continuous Integration/Delivery (CI/CD)

GitHub Actions

  • Actions are hooks that run under certain conditions e.g. push to main branch or tag beginning with v.
  • Useful for CI/CD.
  • Defined in .github/workflows/*.yaml
  • Write your own .github/workflows/pre-commit.yaml or…

pre-commit.ci

  • Supports GitHub but more to come in the future.
  • Zero configuration, just need .pre-commit-config.yaml.
  • Corrects & commits some formatting issues automatically without need for developer to reformat.
  • Only runs on Pull Request commits (not on individual branches)
  • Automatically updates .pre-commit-config.yaml for you (e.g. new rev).
  • Free for open source repositories (paid for version for private/organisation repositories).

Configuration (.pre-commit-config.yaml)


ci:
  autofix_prs: true
  autofix_commit_msg: '[pre-commit.ci] Fixing issues with pre-commit'
  autoupdate_schedule: weekly
  autoupdate_commit_msg: '[pre-commit.ci] pre-commit automatically updated.'
  skip: [pylint, pytest] # Optionally list ids of hooks to skip on CI

pre-commit.ci Setup

Manage Repos for pre-commit.ci

GitLab pre-commit

Pre-commit GitHub Action

Pre-commit Pass

Pass

Pre-commit Pass

Pass

Pre-commit Fail

Fail

Summary

  • ✔️ pre-commit is useful for automating repetitive tasks.
  • ✔️ Helps keep git history clean (no more “linting code” commit messages).
  • ✔️ Improves code quality by ensuring style guides are adhered to.
  • ✔️ Automates running test suites and ensures they pass.
  • ✔️ Integrates with CI/CD on GitHub and others.
  • ✔️ Frees up developer time.

Alternatives

  • Megalinter.io - comparable, lots of languages, lots of tools.
  • Codacy - only works with GitHub repos, not locally.
  • No doubt many others I’m not aware of!

Bonus - Linting with IDE

Popular IDEs have tools to run linting automatically on file save…

View these slides…