cellmaps_utils package

Description of various functions and methods provided by this package.

cellmaps_utils.logutils module

Contains methods used internally by the Cell Maps tools to setup logging

cellmaps_utils.logutils.setup_cmd_logging(args)[source]

Sets up logging based on parsed command line arguments. If args.logconf is set use that configuration otherwise look at args.verbose and set logging for this module

This function assumes the following:

  • args.logconf exists and is None or set to str containing path to logconf file

  • args.verbose exists and is set to int to one of these values:

    • 0 = no logging

    • 1 = critical

    • 2 = error

    • 3 = warning

    • 4 = info

    • 5 = debug

Parameters:

args (argparse.Namespace) – parsed command line arguments from argparse

Raises:

AttributeError – If args is None or if args.logconf is None or missing or if args.verbose is None or missing

cellmaps_utils.logutils.setup_filelogger(outdir=None, handlerprefix='cellmaps')[source]

Sets up a logger to write all debug and higher logs to output outdir/OUTPUT_LOG_FILE and all error level log messages and higher to output outdir/ERROR_LOG_FILE

Parameters:
  • outdir (str) – directory where to store OUTPUT_LOG_FILE and ERROR_LOG_FILE files

  • handlerprefix (str) – prefix of name to give to handlers and formatters, if None code will set value to cellmaps

cellmaps_utils.logutils.write_task_finish_json(outdir=None, start_time=None, end_time=None, status=None)[source]

Writes TASK_FILE_PREFIX ## TASK_FINISH_FILE_SUFFIX file in outdir directory where ## is the start_time value

from cellmaps_utils import logutils
import time

logutils.write_task_finish_json(outdir='./mydir', start_time=int(time.time())-10,
                                end_time=int(time.time()),
                                status=0)
Parameters:
  • outdir (str) – directory to write file

  • start_time (int) – time in seconds since epoch if set to -1 or None value will be set to current time

  • end_time (int) – time in seconds since epoch if set to -1 or None value will be set to current time

  • status (int) – status of task, 0 means success, otherwise error

cellmaps_utils.logutils.write_task_start_json(outdir=None, start_time=None, data=None, version=None)[source]

Writes TASK_FILE_PREFIX ## TASK_START_FILE_SUFFIX file with information about what is to be run. The ## in name is value of start_time

from cellmaps_utils import logutils
import time

logutils.write_task_start_json(outdir='./mydir', start_time=int(time.time()),
                               data={'someparam': 'some value'},
                               version='1.0.0')
Parameters:
  • outdir (str) – directory to write file

  • start_time (int) – time in seconds since epoch If -1 or None then value will be set to current time

  • data (dict) – additional data to persist in

  • version (str) – Version of software

cellmaps_utils.provenance module

Contains wrapper functionality for calls to FAIRSCAPE CLI

class cellmaps_utils.provenance.ProvenanceUtil(fairscape_binary='fairscape-cli', default_date_format_str='%Y-%m-%d', raise_on_error=False)[source]

Bases: object

Wrapper around FAIRSCAPE-cli calls

Constructor

Parameters:
  • fairscape_binary (str :param default_date_format_str: Default date format string) – FAIRSCAPE command line binary If no path separators are included in this value (for example no / on Linux|mac) this code assumes the full path to the binary is the same directory where the python binary executing this script resides. To bypass this set the value to a full path with ex: /tmp/foo.py

  • raise_on_error (bool) – Flag to determine if exceptions should be raised on errors

static example_dataset_provenance(requiredonly=True, with_ids=False)[source]

Returns example provenance dataset dict

Parameters:
  • requiredonly (bool) – If True only output required fields, otherwise output all fields. This is ignored if with_ids parameter is True

  • with_ids (bool) – If True ignore requiredonly and just output dict where caller has dataset id

Returns:

Example provenance dictionary

Return type:

dict

get_default_date_format_str()[source]

Gets default date format string set via constructor

Returns:

default date format string usually something like %Y-%m-%d

Return type:

str

get_id_of_rocrate(rocrate)[source]

Gets id of RO-Crate

Parameters:

rocrate (str or dict) – RO-Crate dict or directory containing ro-crate-metadata.json file or path to file assumed to be RO-Crate meta data file

Returns:

get_login()[source]

Attempts to get login of user

Returns:

login of user or empty string if unable to obtain

Return type:

str

get_merged_rocrate_provenance_attrs(rocrate=None, override_name=None, override_project_name=None, override_organization_name=None, extra_keywords=None, keywords_to_preserve=6, merged_delimiter='|')[source]

Creates a merged provenance attributes object when given one or more RO-Crates. It does this by the following rules:

Values for name, project_name, and organization_name are put into respective sets for uniqueness sorted alphabetically and joined together using value of merged_delimiter

If override_name, override_project, or override_organization is not None then those values will be used in leiu of the merged data mentioned earlier.

For keywords, the first keywords_to_preserve elements are put into respective sets for uniqueness and joined together using value of merge_delimiter and put back into a list. Any extra entries in extra_keywords is appended to this list.

The description is a merging of the keywords list with a space delimiter

Parameters:
  • rocrate (str or dict or list) – dict or directory containing ro-crate-metadata.json file or path to file assumed to be RO-Crate meta data file or a list of either of the previously mentioned items

  • override_name (str) – If not None, overrides name returned

  • override_project_name (str) – If not None, overrides project name returned

  • override_organization_name (str) – If not None, overrides organization-name

  • extra_keywords (list or str) – Any extra keywords to append

  • keywords_to_preserve (int) – Denotes number of keywords to preserve. A value of 5 means keep the 1st 5. None means preserve all keywords

  • merged_delimiter (str) – default is ‘|’

Raises:

CellMapsProvenanceError – If rocrate, extra_keywords

Returns:

Merged rocrate provenance attributes

Return type:

ROCrateProvenanceAttributes

get_name_project_org_of_rocrate(rocrate)[source]

Gets name, project, and organization name of RO-Crate

Parameters:

rocrate (str or dict) – RO-Crate dict or directory containing ro-crate-metadata.json file or path to file assumed to be RO-Crate meta data file

Returns:

(name, project, organization-name)

Return type:

tuple

get_rocrate_as_dict(rocrate_path)[source]

Loads RO-Crate as a dict

Parameters:

rocrate_path (str) – Directory containing ro-crate-metadata.json file or path to file assumed to be ro-crate meta data file

Raises:

CellMapsProvenanceError – If rocrate_path is None or if raise_on_error passed into constructor is True and there is an issue parsing the ro-crate meta data file

Returns:

RO-Crate

Return type:

dict

get_rocrate_provenance_attributes(rocrate)[source]

Gets provenance attributes for an RO-Crate

Parameters:

rocrate (str or dict) – RO-Crate dict or directory containing ro-crate-metadata.json file or path to file assumed to be RO-Crate metadata file

Returns:

Return type:

ROCrateProvenanceAttributes

register_computation(rocrate_path, name='', run_by='', command='', date_created=None, description='Must be at least 10 characters', used_software=[], used_dataset=[], generated=[], keywords=[''], guid=None, timeout=60)[source]

Registers computation adding information to ro-crate-metadata.json file stored in rocrate_path directory.

Parameters:
  • rocrate_path (str) – Path to existing RO-Crate directory

  • name (str)

  • run_by (str)

  • command (str)

  • date_created

  • description (str)

  • used_software (list) – list of FAIRSCAPE software ids

  • used_dataset (list) – list of FAIRSCAPE dataset ids used by this computation

  • generated (list) – list of FAIRSCAPE dataset ids for datasets generated by this computation

  • keywords (list)

  • guid (str) – ID for RO-Crate

  • timeout (float) – Time in seconds to wait for registration of computation to complete

register_dataset(rocrate_path, data_dict=None, source_file=None, skip_copy=True, guid=None, timeout=30)[source]

Adds a dataset to existing rocrate specified by rocrate_path by adding information to ro-crate-metadata.json file

Information about dataset should be specified in the data_dict dict passed in.

Expected format of data_dict:

{'name': 'Name of dataset',
 'author': 'Author of dataset',
 'version': 'Version of dataset',
 'url': 'Url of dataset (optional)',
 'date-published': 'Date dataset was published MM-DD-YYYY',
 'description': 'Description of dataset',
 'data-format': 'Format of data',
 'schema': Path or URL to schema file in JSON format
 'keywords': ['keyword1','keyword2']}

Changed in version 0.2.0: Added support for schema in data_dict passed in

Parameters:
  • rocrate_path (str) – Path to directory with registered rocrate

  • data_dict (dict) – Information about dataset to add. See above for expected data

  • source_file (str) – Path to source file of dataset

  • skip_copy – If True skip the copy of source file into crate_path. Use this when source file already resides in crate_path

  • guid (str) – ID for RO-Crate

  • timeout (float) – Time in seconds to wait for registration of dataset to complete

Returns:

id of dataset from FAIRSCAPE

Return type:

str

register_rocrate(rocrate_path, name='', organization_name='', project_name='', description='Please enter a description', keywords=[''], guid=None, timeout=30)[source]

Creates/registers RO-Crate in directory specified by rocrate_path Upon completion a ro-crate-metadata.json file will be created in the directory

Parameters:
  • rocrate_path (str)

  • name (str) – Name for RO-Crate

  • organization_name (str) – Name of organization

  • project_name (str) – Name of project

  • description (str) – Description

  • keywords (list) – keywords

  • guid (str) – ID for RO-Crate

  • timeout (float) – Time in seconds to wait for registration of RO-Crate to complete

register_software(rocrate_path, name='unknown', description='Must be at least 10 characters', author='', version='', file_format='', url='', date_modified=None, keywords=[''], guid=None, timeout=30)[source]

Registers software by adding information to ro-crate-metadata.json file stored in rocrate_path directory.

Parameters:
  • name (str) – Name of software

  • description (str) – Description of software

  • author (str) – Author(s) of software

  • version (str) – Version of software

  • file_format (str) – Format of software file(s)

  • url (str) – URL to repository for software

  • date_modified (str)

  • keywords (list)

  • guid (str) – ID for RO-Crate

  • rocrate_path (str) – Path to directory with registered rocrate

  • timeout (float) – Time in seconds to wait for registration of ro-crate to complete

Raises:

CellMapsProvenanceError – If FAIRSCAPE call fails

Returns:

guid of software from FAIRSCAPE

Return type:

str

class cellmaps_utils.provenance.ROCrateProvenanceAttributes(name='Please enter a name', organization_name='Please enter an organization', project_name='Please enter a project', description='Please enter a description', keywords=[''])[source]

Bases: object

Wrapper object to hold subset of RO-Crate provenance attributes

Constructor

Parameters:
  • name (str) – name for RO-Crate

  • organization_name (str) – what lab or group

  • project_name (str) – usually funding source

  • description (str) – describes RO-Crate

  • keywords (list) – keywords to identify RO-Crate usually set with these values in order: project, data_release_name, cell_line, treatment, name_of_computation

get_description()[source]

Gets description for RO-Crate

Returns:

description

Return type:

str

get_keywords()[source]

Gets keywords for RO-Crate

Returns:

keywords

Return type:

list

get_name()[source]

Gets name for RO-Crate

Returns:

name for RO-Crate

Return type:

str

get_organization_name()[source]

Gets organization name for RO-Crate

Returns:

organization name

Return type:

str

get_project_name()[source]

Gets project name for RO-Crate

Returns:

project name

Return type:

str

cellmaps_utils.basecmdtool module

Contains base class for all command line tools. Command line tools MUST subclass this

class cellmaps_utils.basecmdtool.BaseCommandLineTool[source]

Bases: object

Base class for all command line tools. Command line tools MUST subclass this

Constructor

COMMAND = 'BaseCommandLineTool'
static add_subparser(subparsers)[source]

Should add any argparse commandline arguments to subparsers passed in This must be implemented by sub classes and will always raise an error

Parameters:

subparsers (argparse)

Returns:

run()[source]

Should contain logic that will be run by command line tool. This must be implemented by sub classes and will always raise an error

Raises:

CellMapsError – will always raise this

Returns:

save_dataset_info_to_json(outdir, info_dict, file_name)[source]

Saves project information to a JSON file.

Parameters:
  • outdir – Output directory where the file will be saved

  • info_dict – Dictionary with dataset information

  • file_name – Name of the file to save the information.

class cellmaps_utils.basecmdtool.HelloWorldCommand(theargs)[source]

Bases: BaseCommandLineTool

Simply prints Hello World and returns 0 always

Parameters:

theargs

COMMAND = 'helloworld'
add_subparser()[source]
Returns:

run()[source]
Returns:

cellmaps_utils.apmstool (AP-MS) module

Contains class that creates RO-Crate of AP-MS data from raw AP-MS tables.

class cellmaps_utils.apmstool.APMSDataLoader(theargs, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>)[source]

Bases: BaseCommandLineTool

Creates RO-Crate of AP-MS data from raw AP-MS tables

Constructor

Parameters:

theargs (Namespace) – Command line arguments that at minimum need to have the following attributes:

BAIT_COL_NAME = 'Bait'
COMMAND = 'apmsconverter'
add_subparser()[source]

Adds a command-line subparser for the APMSDataLoader tool.

Returns:

run()[source]

Run method to create RO-Crate from AP-MS data tables. This process involves merging input tables, registering the dataset and related software in the RO-Crate.

Returns:

cellmaps_utils.iftool (Immunofluorescent Images) module

Contains classes that download images.

class cellmaps_utils.iftool.FakeImageDownloader[source]

Bases: ImageDownloader

Creates fake download by downloading the first image in each color from Human Protein Atlas and making renamed copies. The download_file() function is used to download the first image of each color

Constructor

download_images(download_list=None)[source]

Downloads 1st image from server and then and makes renamed copies for subsequent images

Parameters:

download_list (list of tuple)

Returns:

class cellmaps_utils.iftool.IFImageDataConverter(theargs, imgsuffix='.jpg', provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>, imagedownloader=None)[source]

Bases: BaseCommandLineTool

Converts IF Image data into format consumable by Cell Maps Pipeline

Constructor

Parameters:

theargs (Namespace) – Command line arguments that at minimum need to have the following attributes:

COMMAND = 'ifconverter'
add_subparser()[source]

Adds command-line argument parsing for the IFImageDataConverter tool.

Returns:

run()[source]

Runs the process of converting IF Image data into a format consumable by the Cell Maps Pipeline. This includes generating a directory path for the RO-Crate, creating the output directory, registering the RO-Crate, filtering the input data based on criteria, downloading and organizing the images. It also handles the registration of datasets and computations in the FAIRSCAPE ecosystem.

Returns:

Return type:

int

class cellmaps_utils.iftool.ImageDownloader[source]

Bases: object

Abstract class that defines interface for classes that download images

download_images(download_list=None)[source]

Subclasses should implement

Parameters:

download_list (list) – list of tuples where first element is full URL of image to download and 2nd element is destination path

Returns:

class cellmaps_utils.iftool.MultiProcessImageDownloader(poolsize=4, skip_existing=False, override_dfunc=None)[source]

Bases: ImageDownloader

Uses multiprocess package to download images in parallel

Constructor

Warning

Exceeding poolsize of 4 causes errors from Human Protein Atlas site

Parameters:
  • poolsize (int) – Number of concurrent downloaders to use.

  • skip_existing (bool) – If True skip download if image file exists and has size greater then 0

  • override_dfunc (function) – Function that takes a tuple (image URL, download str path) and downloads the image. If None download_file() function is used

download_images(download_list=None)[source]

Downloads images returning a list of failed downloads

from cellmaps_imagedownloader.runner import MultiProcessImageDownloader

dloader = MultiProcessImageDownloader(poolsize=2)

d_list = [('https://images.proteinatlas.org/992/1_A1_1_red.jpg',
           '/tmp/1_A1_1_red.jpg')]
failed = dloader.download_images(download_list=d_list)
Parameters:

download_list (list of tuple) – Each tuple of format (image URL, dest file path)

Returns:

Failed downloads, format of tuple (http status code, text of error, (link, destfile))

Return type:

list of tuple

cellmaps_utils.iftool.download_file(downloadtuple)[source]

Downloads file pointed to by ‘download_url’ to ‘destfile’

Note

Default download function used by MultiProcessImageDownloader

Parameters:

downloadtuple (tuple) – (download link, dest file path)

Returns:

None upon success otherwise: (requests status code, text from request, downloadtuple)

Return type:

tuple

cellmaps_utils.iftool.download_file_skip_existing(downloadtuple)[source]

Downloads file in downloadtuple unless the file already exists with a size greater then 0 bytes, in which case function just returns

Parameters:

downloadtuple (tuple) – (download link, dest file path)

Returns:

None upon success otherwise: (requests status code, text from request, downloadtuple)

Return type:

tuple

cellmaps_utils.crisprtool (CRISPR) module

Contains class that creates RO-Crate of CRISPR data from raw CRISPR data files

class cellmaps_utils.crisprtool.CRISPRDataLoader(theargs, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>)[source]

Bases: BaseCommandLineTool

Creates RO-Crate of CRISPR data from raw CRISPR data files

Constructor

Parameters:

theargs (Namespace) – Command line arguments that at minimum need to have the following attributes:

COMMAND = 'crisprconverter'
add_subparser()[source]

Adds a subparser for the CRISPR data loader command.

Returns:

run()[source]

Runs the process of CRISPR data loading into a RO-Crate. It includes generating the output directory, linking and registering h5ad file and registering the computation and software used in the process.

Returns:

cellmaps_utils.tabletool (RO-Crate Table) module

Contains class that creates table of meta data and links from one or more RO-Crates.

class cellmaps_utils.tabletool.TableFromROCrates(theargs, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>)[source]

Bases: BaseCommandLineTool

Creates table of meta data and links from one or more RO-Crates

Constructor

Parameters:

theargs (Namespace) – Command line arguments that at minimum need to have the following attributes:

CELL_LINE_COL = 'Cell Line'
COLUMNS = ['FAIRSCAPE ARK ID', 'Date', 'Version', 'Type', 'Cell Line', 'Tissue', 'Treatment', 'Gene set', 'Generated By Software', 'Name', 'Description', 'Keywords', 'Download RO-Crate Data Package', 'Download RO-Crate Data Package Size MB', 'Generated By Software', 'Output Dataset', 'Responsible Lab']
COMMAND = 'rocratetable'
COMPUTATION_COL = 'Name'
DATA_ROCRATE = 'Data'
DATE_COL = 'Date'
DESCRIPTION_COL = 'Description'
DOWNLOAD_COL = 'Download RO-Crate Data Package'
DOWNLOAD_COL_SIZE = 'Download RO-Crate Data Package Size MB'
GENERATED_COL = 'Generated By Software'
GENESET_COL = 'Gene set'
ID_COL = 'FAIRSCAPE ARK ID'
INTERMEDIATE_ROCRATE = 'Intermediate'
KEYWORDS_COL = 'Keywords'
MODEL_ROCRATE = 'Model'
OTHER_ROCRATE = 'Other'
OUTPUT_COL = 'Output Dataset'
RESPONSIBLE_COL = 'Responsible Lab'
TISSUE_COL = 'Tissue'
TREATMENT_COL = 'Treatment'
TYPE_COL = 'Type'
VERSION_COL = 'Version'
static add_subparser(subparsers)[source]

Adds the command-line subparser for the TableFromROCrates tool

Returns:

run()[source]

Runs the process of creation a metadata table from one or more RO-Crates. This method iterates through each RO-Crate, collects metadata and provenance information, and writes it into a tab-separated values (TSV) file.

Returns:

cellmaps_utils.hidefconverter (HiDeF) module

Contains classes that convert a hierarchy network (in CX2 format) to a HiDeF format and vice versa.

class cellmaps_utils.hidefconverter.HiDeFToHierarchyConverter(output_dir, nodes_file_path, edges_file_path, parent_edgelist_path=None, parent_ndex_url=None, host='ndexbio.org', parent_uuid=None, ndex_user=None, ndex_password=None)[source]

Bases: object

A class to convert a edge list and node list in HiDeF format to hierarchy in HCX.

Added in version 0.5.0: The class was added to enable conversion from HiDeF-formatted edge and node files to hierarchy in HCX.

Initializes the converter with file paths and optional parent network details.

Parent network can be specified one of the following: - edge list file - link to interactome in NDEx - uuid of NDEx network along with the host where the network is hosted. If the network is private, username and password need to be specified.

Parameters:
  • output_dir (str) – Directory where the output files will be stored.

  • nodes_file_path (str) – File path for the nodes file.

  • edges_file_path (str) – File path for the edges file.

  • parent_edgelist_path (str, optional) – Path to the edge list of the interactome (optional).

  • parent_ndex_url (str, optional) – URL of parent interactome in NDEx (optional).

  • parent_uuid (str, optional) – UUID of the network in NDEx.

  • host (str, optional) – NDEx host.

  • ndex_user (str, optional) – NDEx username (optional).

  • ndex_password (str, optional) – NDEx password (optional).

generate_hierarchy_hcx_file(hierarchy_filename='hierarchy.cx2', interactome_filename='hierarchy_parent.cx2')[source]

Generates the HiDeF hierarchy file in CX2 format. If the object is initialized with parent network’s edge list, the interactome in CX2 format will be generated in output directory as well. If the object is initialized with uuid of parent network, only hierarchy will be generated.

Parameters:
  • hierarchy_filename (str) – The name of the file to write the hierarchy.

  • interactome_filename (str) – The name of the file to write the interactome (parent network of the hierarchy).

class cellmaps_utils.hidefconverter.HierarchyToHiDeFConverter(output_dir, input_dir=None, hierarchy=None)[source]

Bases: object

A class to convert a hierarchy network (in CX2 format) to a HiDeF format.

Constructor

Parameters:
  • input_dir (str) – The directory containing the hierarchy file.

  • output_dir (str) – The directory where the output files will be stored.

HIDEF_OUT_PREFIX = 'hidef_output_gene_names'
generate_hidef_files(nodes_filename='hidef_output_gene_names.nodes', edges_filename='hidef_output_gene_names.edges')[source]

Generates HiDeF files .nodes and .edges from the hierarchy network.

cellmaps_utils.ddotconverter (DDOT) module

Contains classes that convert a interaction or hierarchical network (in CX2 format) to a DDOT format and vice versa.

class cellmaps_utils.ddotconverter.DDOTToHierarchyConverter(output_dir, ontology_ddot_path, parent_ddot_path=None, parent_ndex_url=None, host='ndexbio.org', parent_uuid=None, ndex_user=None, ndex_password=None, hierarchy_filename='hierarchy.cx2', interactome_filename='hierarchy_parent.cx2')[source]

Bases: object

Initializes the converter to create a hierarchy in CX2 from DDOT ontology file.

Parameters:
  • output_dir (str) – Directory where the output files will be saved

  • ontology_ddot_path (str) – Path to the DDOT formatted ontology file

  • parent_ddot_path (str, optional) – Optional path to the parent interactome DDOT file

  • parent_ndex_url (str, optional) – Optional URL to the parent interactome in NDEx

  • host (str) – Hostname for the NDEx server

  • parent_uuid (str, optional) – UUID of the parent network in NDEx

  • ndex_user (str, optional) – Username for NDEx server authentication

  • ndex_password (str, optional) – Password for NDEx server authentication

  • hierarchy_filename (str) – Name for the output hierarchy file

  • interactome_filename (str) – Name for the output interactome file (parent network)

generate_hierarchy_hcx_file()[source]

Constructs a hierarchy network from a DDOT formatted ontology file, utilizing a parent interactome. The hierarchy is enriched with hierarchical context (HCX) information and saved as CX2.

class cellmaps_utils.ddotconverter.DDOTToInteractomeConverter(output_dir, interactome_ddot_path, interactome_file_name='interactome.cx2')[source]

Bases: object

Initializes the converter to transform a DDOT formatted file to an CX2 format.

Parameters:
  • output_dir (str) – Directory where the output file will be saved

  • interactome_ddot_path (str) – Path to the DDOT formatted file

  • interactome_file_name (str) – Name of the output file for the interactome

generate_interactome_file()[source]

Converts a DDOT formatted file to an interactome CX2 network file. It parses the DDOT file, constructs nodes and edges in the interactome, and saves the output.

class cellmaps_utils.ddotconverter.HierarchyToDDOTConverter(output_dir, hierarchy_path, ontology_file_name='ontology.ont')[source]

Bases: object

Initializes the converter to transform a hierarchy data file in CX2 format into a DDOT ontology file.

Parameters:
  • output_dir (str) – Directory where the output ontology file will be saved

  • hierarchy_path (str) – Path to the hierarchy data file

  • ontology_file_name (str) – Name of the output ontology file

generate_ontology_ddot_file()[source]

Converts a hierarchy network file into a DDOT ontology file format. This method extracts hierarchy edges and node information to create an ontology representation suitable for DDOT applications.

class cellmaps_utils.ddotconverter.InteractomeToDDOTConverter(output_dir, interactome_path, ddot_file_name='interactome_ddot.txt')[source]

Bases: object

Initializes the converter with the path to the interactome data in CX2 format, output directory, and file name.

Parameters:
  • output_dir (str) – Directory where the output file will be saved

  • interactome_path (str) – Path to the interactome file

  • ddot_file_name (str) – Name of the output file to write DDOT formatted data

generate_ddot_format_file()[source]

Reads an interactome from a specified path and writes it out in DDOT format. The output includes nodes and their interactions.

cellmaps_utils.hcx_utils (HCX) module

Contains classes that annotates a CX2 network with HCX annotations.

cellmaps_utils.hcx_utils.add_hcx_members_annotation(hierarchy, interactome, gene_column='CD_MemberList')[source]

Adds the ‘HCX::members’ attribute to nodes in the hierarchy based on the interactome.

Parameters:
  • hierarchy (CX2Network) – The hierarchical network in CX2 format.

  • interactome (CX2Network) – The interactome network.

  • gene_column (str, optional) – Column name containing gene members.

cellmaps_utils.hcx_utils.add_hcx_network_annotations(hierarchy, interactome=None, output_dir='.', interactome_name='hierarchy_parent.cx2', host='www.ndexbio.org', uuid=None)[source]

Adds HCX network annotations to the hierarchy.

Parameters:
  • hierarchy (CX2Network) – The hierarchical network in CX2 format.

  • interactome (CX2Network, optional) – The interactome network.

  • output_dir (str, optional) – Directory where the interactome file will be saved.

  • interactome_name (str, optional) – Name of the interactome file.

  • host (str, optional) – NDEx host for interactome retrieval.

  • uuid (str, optional) – UUID of the interactome in NDEx.

Returns:

The updated hierarchy network with annotations.

Return type:

CX2Network

cellmaps_utils.hcx_utils.add_isroot_node_attribute(hierarchy, root_nodes)[source]

Using the root_nodes set or list, add HCX::isRoot to every node setting value to True if node id is in root_nodes otherwise set the value to False

cellmaps_utils.hcx_utils.convert_hierarchical_network_to_hcx(hierarchy, interactome_url, ndex_username=None, ndex_password=None, gene_column='CD_MemberList')[source]

Converts a hierarchical network into HCX format by adding necessary annotations and interactome details.

Parameters:
  • hierarchy (CX2Network or str) – The hierarchical network in CX2 format or a path to the CX2 file.

  • interactome_url (str) – URL of the interactome network in NDEx.

  • ndex_username (str, optional) – NDEx username for authentication.

  • ndex_password (str, optional) – NDEx password for authentication.

  • gene_column (str, optional) – Column name containing gene members.

Returns:

The updated hierarchy network in CX2 format with HCX annotations.

Return type:

CX2Network

cellmaps_utils.hcx_utils.get_host_and_uuid_from_network_url(network_url)[source]

Extracts the host and UUID from a given NDEx network URL.

Parameters:

network_url (str) – The URL of the NDEx network.

Returns:

A tuple containing the host and the UUID of the network.

Return type:

tuple

cellmaps_utils.hcx_utils.get_interactome(host, uuid, username, password, parent_edgelist)[source]

Retrieves the interactome either from NDEx or from a local edge list.

Parameters:
  • host (str) – The NDEx server host.

  • uuid (str) – The UUID of the interactome network in NDEx.

  • username (str) – The NDEx username for authentication.

  • password (str) – The NDEx password for authentication.

  • parent_edgelist (str) – Path to a file containing the interactome edge list.

Returns:

A CX2Network object representing the interactome.

Return type:

CX2Network

cellmaps_utils.hcx_utils.get_root_nodes(hierarchy)[source]

Identifies the root nodes in a hierarchical network.

In CDAPS the root node has only source edges to children so this function counts up number of target edges for each node and the one with 0 is the root

Returns:

root node ids

Return type:

set

cellmaps_utils.hierdiff (Hierarchy comparison) module

Contains class that compare hierarchies (Jaccard similarity)

class cellmaps_utils.hierdiff.HierarchyDiff[source]

Bases: object

A class to compare two hierarchies in CX2 (HCX)

Constructor

compare_hierarchies(hierarchy_a=None, hierarchy_b=None)[source]

Compare two hierarchies in CX2 format by calculating Jaccard overlaps and assigning a ‘robustness’ (overlap) score to each node in the first hierarchy.

Parameters:
  • hierarchy_a (ndex2.cx2.CX2Network) – The first (reference) hierarchy to compare.

  • hierarchy_b (ndex2.cx2.CX2Network) – The second (alternative) hierarchy to compare against.

Returns:

The first hierarchy with an added ‘robustness’ node attribute.

Return type:

ndex2.cx2.CX2Network

compare_hierarchies_from_files(hierarchy_a_path=None, hierarchy_b_path=None)[source]

Compare two hierarchies from files then calculating overlap-based scores.

Parameters:
  • hierarchy_a_path (str) – Path to the first (reference) hierarchy file.

  • hierarchy_b_path (str) – Path to the second (alternative) hierarchy file.

Returns:

The first hierarchy (from hierarchy_a_path) with added robustness scores.

Return type:

ndex2.cx2.CX2Network

compute_hierarchy_robustness(ref_hierarchy, alt_hierarchies, ji_thre=0.4)[source]

Computes a robustness score for each node in a reference hierarchy based on its structural overlap across multiple alternative hierarchies. The overlap is measured using the Jaccard Index (JI), and a threshold that determines if a node is considered to have sufficient overlap in a given alternative hierarchy (values above the threshold are set to 1, while values below are set to 0). The higher the overlap across the alternative hierarchies, the higher the robustness score.

robustness = (# hierarchies where JI > ji_thre) / (total number of alternative hierarchies)

Parameters:
  • ref_hierarchy (ndex2.cx2.CX2Network or dict (raw CX2)) – The reference hierarchy whose nodes’ robustness is computed.

  • alt_hierarchies (list[ndex2.cx2.CX2Network or dict]) – A list of alternative hierarchies to compare against.

  • ji_thre (float) – The Jaccard threshold used to determine overlap.

Returns:

The reference hierarchy with an added ‘robustness’ attribute for each node.

Return type:

ndex2.cx2.CX2Network

cellmaps_utils.ndexupload (NDEx upload) module

Contains class that aids uploading hierarchy and interactome to NDEx

class cellmaps_utils.ndexupload.NDExHierarchyUploader(ndexserver, ndexuser, ndexpassword, visibility=None)[source]

Bases: object

Base class for uploading hierarchy and its parent network to NDEx.

Constructor

Parameters:
  • ndexserver (str)

  • ndexuser (str)

  • ndexpassword (str)

  • visibility (str or bool) – If set to public, PUBLIC or True sets hierarchy and interactome to publicly visibility on NDEx, otherwise they are left as private

get_cytoscape_url(ndexurl)[source]

Generates a Cytoscape URL for a given NDEx network URL.

Parameters:

ndexurl (str) – The URL of the NDEx network.

Returns:

The URL pointing to the network’s view on the Cytoscape platform.

Return type:

str

save_hierarchy_and_parent_network(hierarchy, parent_ppi)[source]

Saves both the hierarchy and its parent network to the NDEx server. This method first saves the parent network, then updates the hierarchy with HCX annotations based on the parent network’s UUID, and finally saves the updated hierarchy. It returns the UUIDs and URLs for both the hierarchy and the parent network.

Parameters:
  • hierarchy (CX2Network) – The hierarchy network to be saved.

  • parent_ppi (CX2Network) – The parent protein-protein interaction network associated with the hierarchy.

Returns:

UUIDs and URLs for both the parent network and the hierarchy.

Return type:

tuple

upload_hierarchy_and_parent_network_from_files(hier_dir=None, hierarchy_path=None, parent_path=None)[source]

Uploads hierarchy and parent network to NDEx from CX2 files. It first checks if hierarchy_path and parent_path are provided. If not provided, it tries to get them from hier_dir directory. If none is specified or cannot find hierarchy and parent in hier_dir, it raises an exception.

Parameters:
  • hier_dir (str) – The directory where the hierarchy and parent network files are located.

  • hierarchy_path (str, optional) – The path to the hierarchy network file.

  • parent_path (str, optional) – The path to the parent network file.

Returns:

UUIDs and URLs for both the hierarchy and parent network.

Return type:

tuple

Raises:

CellMapsError – If the required hierarchy or parent network files do not exist.

cellmaps_utils.music_utils module

Contains helper methods for MUSIC.

cellmaps_utils.music_utils.canberra_similarity(df)[source]

Calculate Canberra similarity between each pair of rows in a DataFrame. Similarity scaled into [0, 1]

Parameters:

df

Returns:

Return type:

pandas.DataFrame

cellmaps_utils.music_utils.check_symmetric(a, rtol=1e-05, atol=1e-08)[source]

Check if the given numpy matrix is symmetric or not.

Parameters:
  • a

  • rtol

  • atol

Returns:

cellmaps_utils.music_utils.cosine_similarity_scaled(df)[source]

Calculate Cosine similarity between each pair of rows in a DataFrame. Similarity scaled into [0, 1]

Parameters:

df

Returns:

Return type:

pandas.DataFrame

cellmaps_utils.music_utils.euclidean_similarity(df)[source]

Calculate Euclidean similarity between each pair of rows in a DataFrame. Similarity scaled into [0, 1]

Parameters:

df

Returns:

Return type:

pandas.DataFrame

cellmaps_utils.music_utils.jaccard(setA, setB)[source]

Calculates jaccard

Parameters:
  • setA

  • setB

Returns:

cellmaps_utils.music_utils.kendall_scaled(df)[source]

Calculate Kendall correlation between each pair of rows in a DataFrame. Correlation scaled into [0, 1]

Parameters:

df

Returns:

cellmaps_utils.music_utils.load_obj(fname, method='pickle')[source]

Loading object that was saved in pickle format

Parameters:
  • fname (str) – path to file

  • method (str) – {pickle, dill} specify package used for compressing

Raises:

ValueError – if method is not set to pickle or dill

cellmaps_utils.music_utils.manhattan_similarity(df)[source]

Calculate Manhattan similarity between each pair of rows in a DataFrame. Similarity scaled into [0, 1]

Parameters:

df

Returns:

Return type:

pandas.DataFrame

cellmaps_utils.music_utils.num_comb(x)[source]

TODO: Add doc here

Parameters:

x

Returns:

cellmaps_utils.music_utils.pearson_scaled(df)[source]

Calculate Pearson correlation between each pair of rows in a DataFrame. Correlation scaled into [0, 1]

Parameters:

df

Returns:

Return type:

pandas.DataFrame

cellmaps_utils.music_utils.save_obj(obj, fname, method='pickle')[source]
Parameters:
  • obj – object that want to be saved

  • fname (str) – path to saved file

  • method (str) – {pickle, dill} specify package used for compressing

Raises:

ValueError – if method is not set to pickle or dill

cellmaps_utils.music_utils.scaled_P_to_nm(scaled_P)[source]

TODO: Add doc here

Parameters:

scaled_P

Returns:

cellmaps_utils.music_utils.spearman_scaled(df)[source]

Calculate Spearman correlation between each pair of rows in a DataFrame. Correlation scaled into [0, 1]

Parameters:

df

Returns:

cellmaps_utils.music_utils.upper_tri_values(df)[source]

Return array with values of upper triangle of the DataFrame

Parameters:

df (pandas.DataFrame) – Symmetric DataFrame

Returns:

Return type:

numpy.array()

cellmaps_utils.music_utils.znorm(df)[source]

Z-transform within each column.

Parameters:

df

Returns:

Return type:

pandas.DataFrame

Constants module

Contains constants used by the various Cell Maps Tools

cellmaps_utils.constants.ANTIBODY_GENE_TABLE_FILE = 'antibody_gene_table.tsv'

Antibody Gene Table file

cellmaps_utils.constants.APMS_TSV_FILE = 'apms.tsv'

AP-MS tsv file

class cellmaps_utils.constants.ArgParseFormatter(prog, indent_increment=2, max_help_position=24, width=None)[source]

Bases: ArgumentDefaultsHelpFormatter, RawDescriptionHelpFormatter

Combine two argparse Formatters to get help and default values displayed when showing help

cellmaps_utils.constants.BLUE = 'blue'

Blue color directory name and color name in blue color files

cellmaps_utils.constants.COEMBEDDING_STEP_DIR = '3.coembedding_fold'

Name of directory where co-embeddings are stored

cellmaps_utils.constants.COLORS = ['red', 'blue', 'green', 'yellow']

List of colors

cellmaps_utils.constants.COLOR_INDEXS = {'blue': 2, 'green': 1, 'red': 0, 'yellow': 0}

Indexes for colors in .jpeg image

cellmaps_utils.constants.COLOR_LABELS_MAP = {'blue': 'Nucleus (DAPI)', 'green': 'protein of interest', 'red': 'Microtubules (Tubulin antibody)', 'yellow': 'ER (Calreticulin antibody)'}

Map of what each color refers to

cellmaps_utils.constants.CO_EMBEDDING_FILE = 'coembedding_emd.tsv'

Name of file containing coembedding

cellmaps_utils.constants.CX2_SUFFIX = '.cx2'

Suffix for files in `CX 2.0<https://cytoscape.org/cx/cx2/specification/2022/12/01/cytoscape-exchange-format-specification-(version-2).html>`__ format

cellmaps_utils.constants.CX_SUFFIX = '.cx'

Suffix for files in CX format

cellmaps_utils.constants.DATASET_AUTHOR = 'author'

Author of the dataset

cellmaps_utils.constants.DATASET_CELL_LINE = 'cell_line'

Cell line

cellmaps_utils.constants.DATASET_COLLECTION_SET = 'collection_set'

Collection set

cellmaps_utils.constants.DATASET_GENE_SET = 'gene_set'

Gene set

cellmaps_utils.constants.DATASET_INFO_FILE = 'dataset_info.json'

Name of file where information about a dataset is stored

cellmaps_utils.constants.DATASET_NAME = 'name'

Name of the dataset

cellmaps_utils.constants.DATASET_ORGANIZATION_NAME = 'organization_name'

Name of the organization

cellmaps_utils.constants.DATASET_PROJECT_NAME = 'project_name'

Name of the project

cellmaps_utils.constants.DATASET_RELEASE = 'release'

Release of the dataset

cellmaps_utils.constants.DATASET_SLICE = 'slice'

Slice

cellmaps_utils.constants.DATASET_TISSUE = 'tissue'

Tissue

cellmaps_utils.constants.DATASET_TREATMENT = 'treatment'

Treatment

cellmaps_utils.constants.ERROR_LOG_FILE = 'error.log'

Error log file name

cellmaps_utils.constants.GREEN = 'green'

Green color directory name and color name in green color files

cellmaps_utils.constants.HCX_SUFFIX = '.hcx'

Suffix for files in HCX format

cellmaps_utils.constants.HIERARCHYEVAL_STEP_DIR = '5.hierarchyeval'

Name of directory where hierarchy evaluations are stored

cellmaps_utils.constants.HIERARCHY_NETWORK_PREFIX = 'hierarchy'

CX2 format hierarchy filename

cellmaps_utils.constants.HIERARCHY_NODES_FILE = 'hierarchy_node_attributes.tsv'

Hierarchy node attributes file

cellmaps_utils.constants.HIERARCHY_PARENT_NETWORK_PREFIX = 'hierarchy_parent'

CX2 format hierarchy parent (interactome) filename

cellmaps_utils.constants.HIERARCHY_STEP_DIR = '4.hierarchy'

Name of directory where hierarchies are stored

cellmaps_utils.constants.IMAGE_DOWNLOAD_STEP_DIR = '1.image_download'

Name of directory where downloaded images are stored

cellmaps_utils.constants.IMAGE_EMBEDDING_FILE = 'image_emd.tsv'

Name of image embedding file

cellmaps_utils.constants.IMAGE_EMBEDDING_STEP_DIR = '2.image_embedding_fold'

Name of directory where image embeddings are stored

cellmaps_utils.constants.IMAGE_GENE_NODE_AMBIGUOUS_COL = 'ambiguous'

Ambiguous column

cellmaps_utils.constants.IMAGE_GENE_NODE_ANTIBODY_COL = 'antibody'

Antibody name column

cellmaps_utils.constants.IMAGE_GENE_NODE_ATTR_FILE = 'image_gene_node_attributes.tsv'

Image gene node attributes filename

cellmaps_utils.constants.IMAGE_GENE_NODE_COLS = ['name', 'represents', 'ambiguous', 'antibody', 'filename', 'imageurl']

Columns in IMAGE_GENE_NODE_ATTR_FILE file

cellmaps_utils.constants.IMAGE_GENE_NODE_ERRORS_FILE = 'image_gene_node_attributes.errors'

Image gene node attributes errors filename

cellmaps_utils.constants.IMAGE_GENE_NODE_FILENAME_COL = 'filename'

File name column

cellmaps_utils.constants.IMAGE_GENE_NODE_IMAGEURL_COL = 'imageurl'

Image URL column

cellmaps_utils.constants.IMAGE_GENE_NODE_NAME_COL = 'name'

Gene Symbol name column

cellmaps_utils.constants.IMAGE_GENE_NODE_REPRESENTS_COL = 'represents'

Ensembl ids column

cellmaps_utils.constants.IMAGE_LABELS_PROBABILITY_FILE = 'labels_prob.tsv'

Name of image labels probability file

cellmaps_utils.constants.LOG_FORMAT = '%(asctime)-15s %(levelname)s %(relativeCreated)dms %(filename)s::%(funcName)s():%(lineno)d %(message)s'

Sets format of logging messages

cellmaps_utils.constants.OUTPUT_LOG_FILE = 'output.log'

Output log file name

cellmaps_utils.constants.PERTURBATION_FILE = 'perturbation.h5ad'

Perturbation/CRISPRi h5ad file

cellmaps_utils.constants.PPI_DOWNLOAD_STEP_DIR = '1.ppi_download'

Name of directory where downloaded PPI edge and baitlist are stored

cellmaps_utils.constants.PPI_EDGELIST_COLS = ['geneA', 'geneB']

Columns in PPI_EDGELIST_FILE

cellmaps_utils.constants.PPI_EDGELIST_FILE = 'ppi_edgelist.tsv'

Protein to Protein interaction edgelist file name

cellmaps_utils.constants.PPI_EDGELIST_GENEA_COL = 'geneA'

First column name

cellmaps_utils.constants.PPI_EDGELIST_GENEB_COL = 'geneB'

Second column name

cellmaps_utils.constants.PPI_EMBEDDING_FILE = 'ppi_emd.tsv'

Name of Protein to Protein embedding file

cellmaps_utils.constants.PPI_EMBEDDING_STEP_DIR = '2.ppi_embedding'

Name of directory where PPI embeddings are stored

cellmaps_utils.constants.PPI_GENE_NODE_ATTR_FILE = 'ppi_gene_node_attributes.tsv'

Protein to Protein gene node attributes file

cellmaps_utils.constants.PPI_GENE_NODE_COLS = ['name', 'represents', 'ambiguous', 'bait']

Columns in PPI_GENE_NODE_ATTR_FILE

cellmaps_utils.constants.PPI_GENE_NODE_ERRORS_FILE = 'ppi_gene_node_attributes.errors'

Protein to Protein gene node attributes error filename

cellmaps_utils.constants.PROVENANCE_ERRORS_FILE = 'provenance_errors.json'

Contains log of any failed fairscape-cli calls

cellmaps_utils.constants.RED = 'red'

Red color directory name and color name in red color files

cellmaps_utils.constants.RO_CRATE_METADATA_FILE = 'ro-crate-metadata.json'

rocrate metadata JSON file name

cellmaps_utils.constants.TASK_FILE_PREFIX = 'task_'

Prefix for task file

cellmaps_utils.constants.TASK_FINISH_FILE_SUFFIX = '_finish.json'

Suffix for task finish file

cellmaps_utils.constants.TASK_START_FILE_SUFFIX = '_start.json'

Suffix for task start file

cellmaps_utils.constants.WEIGHTED_PPI_EDGELIST_WEIGHT_COL = 'Weight'

weight column

cellmaps_utils.constants.YELLOW = 'yellow'

Yellow color directory name and color name in yellow color files

Exceptions

Base error classes for Cell Maps Tools

class cellmaps_utils.exceptions.CellMapsError[source]

Base exception for CellMapsUtils

class cellmaps_utils.exceptions.CellMapsProvenanceError[source]

Base exception for provenance errors

Module contents

Top-level package for cellmaps_utils.

cellmaps_utils.cellmaps_utilscmd.main(args)[source]

Main entry point for program

Parameters:

args (list) – arguments passed to command line usually sys.argv[1:]()

Returns:

return value of cellmaps_imagedownloader.runner.CellmapsImageDownloader.run() or 2 if an exception is raised

Return type:

int