alphapepttools.data.get_data

Contents

alphapepttools.data.get_data#

alphapepttools.data.get_data(study, output_dir=None)#

Download data from a specific study

Downloads proteomics data from predefined study datasets and returns the path to the downloaded file. The data can then be loaded using alphapepttools readers for further analysis.

Parameters:
  • study (str) – Name of the study to download. Use available_data() to see all available studies

  • output_dir (str | Path | None (default: None)) – Directory where the data should be downloaded. If None, uses current working directory

Return type:

Path

Returns:

Path to the downloaded data file

Raises:

KeyError – If the specified study name is not found in the collection

Examples

Download and load PELSA data:

import tempfile
import pandas as pd
import alphapepttools as at

# Download the data using the alphapepttools data module
report_path = at.data.get_data("pelsa_report_diann", output_dir=tempfile.mkdtemp())

# Get the full report as DataFrame
full_report = pd.read_parquet(report_path)

# Create AnnData object with protein level data
adata_protein = at.io.read_psm_table(
    file_paths=report_path,
    search_engine="diann",
    level="proteins",
    var_columns=["genes"],
)

Download to specific directory:

from pathlib import Path
import alphapepttools as at

# Download to a specific project folder
data_dir = Path("./my_project/data")
data_dir.mkdir(parents=True, exist_ok=True)

file_path = at.data.get_data("bader2020_pg_alphadia", output_dir=data_dir)
print(f"Data downloaded to: {file_path}")

See also

available_data

List all available study datasets

StudyData.download

Lower-level download method for individual studies