alphapepttools.tl.map_genes_to_protein_groups#
- alphapepttools.tl.map_genes_to_protein_groups(id2gene_map, protein_groups, delimiter=';')#
Map gene names to protein groups using the provided id2gene_map mapping
Protein groups may consist of multiple UniProt IDs, separated by a delimiter. This function iterates over each protein group and assigns the corresponding unique genes to the protein group.
- Parameters:
id2gene_map (dict) – Dictionary mapping UniProt IDs to gene names
id_column (list) – List containing protein group identifiers, where each identifier may consist of multiple UniProt IDs
delimiter (str, optional) – Delimiter used to separate UniProt IDs in the protein group identifiers, by default “;”
Examples
You can map a list of uniprot IDs to gene names
id2gene_map = {"ID0": "GN0", "ID1": "GN1", "ID2": "GN1", "ID3": "GN3", "ID4": "GN4"} protein_groups = ["ID0", "ID1;ID2", "ID3;ID4"] map_genes_to_protein_groups(id2gene_map, protein_groups, delimiter=";") > ["GN0", "GN1", "GN3;GN4"]
To map gene names to an AnnData object, you can use the
get_id2gene_map()function to create a mapping from a FASTA file or string and subsequently assign the extracted gene names to theadata.varattributefrom alphapepttools.tl.tools import get_id2gene_map, map_genes_to_protein_groups fasta = '''\ >tr|ID0|ID0_HUMAN Protein1 OS=Homo sapiens OX=9606 GN=GN0 PE=1 SV=1 PEPTIDEKPEPTIDEK >tr|ID1|ID1_HUMAN Protein1 OS=Homo sapiens OX=9606 GN=GN1 PE=1 SV=1 PEPTIDEKPEPTIDEK ''' mapping = get_id2gene_map(fasta, source_type="string") mapping # {'ID0': 'GN0', 'ID1': 'GN1'} adata.var # Empty DataFrame # Columns: [] # Index: [ID0, ID1] adata.var["gene_id"] = map_genes_to_protein_groups( id2gene_map=mapping, protein_groups=adata.var_names )