Contributing guide#

Scanpy provides extensive developer documentation, most of which applies to this project, too. This document will not reproduce the entire content from there. Instead, it aims at summarizing the most important information to get you started on contributing.

We assume that you are already familiar with git and with making pull requests on GitHub. If not, please refer to the scanpy developer guide.

Installing dev dependencies#

In addition to the packages needed to use this package, you need additional python packages to run tests and build the documentation.

The easiest way is to get familiar with hatch environments, with which these tasks are simply:

hatch test  # defined in the table [tool.hatch.envs.hatch-test] in pyproject.toml
hatch run docs:build  # defined in the table [tool.hatch.envs.docs]

If you prefer managing environments manually, you can use pip:

cd alphapepttools
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,test,doc]"

Handling anndata objects#

The central data structure of alphapepttools is the anndata.AnnData object. All functions should be compatible with anndata.AnnData.

Functions that act on the omics data in the anndata object (typically in the .pp and .tl modules) should generally follow the following call signature

alphapepttools.pp.func(adata: ad.AnnData, ..., layer: str | None = None, copy: bool = False) -> None | ad.AnnData:
...

alphapepttools.tl.func(adata: ad.AnnData, ..., layer: str | None = None, copy: bool = False) -> None | ad.AnnData:

Layer modification This means that they take an anndata.AnnData object and modify/update a specific measurement layer in the object. Per default (None), this will be the anndata.AnnData.X attribute, otherwise the specified layer.

Modification inplace Per default, the anndata.AnnData object is modified inplace (copy=False), this means that the current object is updated and the function returns None. If copy=True, an updated copy of the object is returned and the original object remains unchanged.

This behaviour is adapted from scanpy and aims to maximize the compatibility of the interfaces.

Examples#

Default behaviour:

adata.layers["original"] = adata.X.copy()

return_value = alphapepttools.pp.func(adata)
assert return_value is None
assert not np.array_equal(adata.X, adata.layers["original"])

Act on a specific layer

adata.layers["original"] = adata.X.copy()
adata.layers["new_layer"] = adata.X.copy()

return_value = alphapepttools.pp.func(adata, layer="new_layer")
assert return_value is None

# adata.X is unchanged
assert np.array_equal(adata.X, adata.layers["original"])

# New layer is changed
assert not np.array_equal(adata.layers["new_layer"], adata.layers["original"])

Return an updated copy

adata_original = adata.copy()
adata_new = alphapepttools.pp.func(adata, copy=True)
# Returns an updated anndata object
assert not np.array_equal(adata.X, adata_new.X)

# The original anndata remains unchanged
assert np.array_equal(adata.X, adata_original.X)

Code-style#

This package uses pre-commit to enforce consistent code-styles. On every commit, pre-commit checks will either automatically fix issues with the code, or raise an error message.

To enable pre-commit locally, simply run

pre-commit install

in the root of the repository. Pre-commit will automatically download all dependencies when it is run for the first time.

Finally, most editors have an autoformat on save feature. Consider enabling this option for ruff and prettier.

Writing modules#

Top-level modules consist of data, io, metrics, pl, pp and tl. Modules are usually python scripts containing semantically related code (e.g. code to impute values lives in the impute.py module inside pp). If necessary, sub-module directories may be introduced to enhance clarity, but in general a flat structure is preferred. .py modules are lowercase and don’t start with underscores, and generally have a test_....py module in the corresponding tests-top-level module. An example for this structure is the top-level module io/anndata_factory.py, which is covered by unit tests in tests/io/test_anndata_factory.py.

Writing tests#

This package uses pytest for automated testing. Please write Tests for every function added to the package.

Most IDEs integrate with pytest and provide a GUI to run tests. Just point yours to one of the environments returned by

hatch env create hatch-test  # create test environments for all supported versions
hatch env find hatch-test  # list all possible test environment paths

Alternatively, you can run all tests from the command line by executing

hatch test  # test with the highest supported Python version
# or
hatch test --all  # test with all supported Python versions

source .venv/bin/activate
pytest

in the root of the repository.

Continuous integration#

Continuous integration will automatically run the tests on all pull requests and test against the minimum and maximum supported Python version.

Additionally, there’s a CI job that tests against pre-releases of all dependencies (if there are any). The purpose of this check is to detect incompatibilities of new package versions early on and gives you time to fix the issue or reach out to the developers of the dependency before the package is released to a wider audience.

Publishing a release#

Updating the version number#

Before making a release, you need to update the version number in the pyproject.toml file. Please adhere to Semantic Versioning, in brief

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes,

MINOR version when you add functionality in a backwards compatible manner, and

PATCH version when you make backwards compatible bug fixes.

Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

Once you are done, commit and push your changes and navigate to the “Releases” page of this project on GitHub. Specify vX.X.X as a tag name and create a release. For more information, see managing GitHub releases. This will automatically create a git tag and trigger a Github workflow that creates a release on PyPI.

Writing documentation#

Please write documentation for new or changed features and use-cases. This project uses sphinx with the following features:

The myst extension allows to write documentation in markdown/Markedly Structured Text
Numpy-style docstrings (through the napoloen extension).
Jupyter notebooks as tutorials through myst-nb (See Tutorials with myst-nb)
sphinx-autodoc-typehints, to automatically reference annotated input and output types
Citations (like [VBH+23]) can be included with sphinxcontrib-bibtex

See scanpy’s Documentation for more information on how to write your own.

Tutorials with myst-nb and jupyter notebooks#

The documentation is set-up to render jupyter notebooks stored in the docs/notebooks directory using myst-nb. Currently, only notebooks in .ipynb format are supported that will be included with both their input and output cells. It is your responsibility to update and re-run the notebook whenever necessary.

If you are interested in automatically running notebooks as part of the continuous integration, please check out this feature request in the cookiecutter-scverse repository.

Hints#

If you refer to objects from other packages, please add an entry to intersphinx_mapping in docs/conf.py. Only if you do so can sphinx automatically create a link to the external documentation.
If building the documentation fails because of a missing link that is outside your control, you can add an entry to the nitpick_ignore list in docs/conf.py

Building the docs locally#

hatch run docs:build
hatch run docs:open

source .venv/bin/activate
cd docs
make html
(xdg-)open _build/html/index.html

Contributing guide

Contents