cellmaps_utils package
Description of various functions and methods provided by this package.
cellmaps_utils.logutils module
Contains methods used internally by the Cell Maps tools to setup logging
- cellmaps_utils.logutils.setup_cmd_logging(args)[source]
Sets up logging based on parsed command line arguments. If args.logconf is set use that configuration otherwise look at args.verbose and set logging for this module
This function assumes the following:
args.logconf exists and is
None
or set tostr
containing path to logconf fileargs.verbose exists and is set to
int
to one of these values:0
= no logging1
= critical2
= error3
= warning4
= info5
= debug
- Parameters:
args (
argparse.Namespace
) – parsed command line arguments from argparse- Raises:
AttributeError – If args is
None
or if args.logconf is None or missing or if args.verbose is None or missing
- cellmaps_utils.logutils.setup_filelogger(outdir=None, handlerprefix='cellmaps')[source]
Sets up a logger to write all debug and higher logs to output outdir/
OUTPUT_LOG_FILE
and all error level log messages and higher to output outdir/ERROR_LOG_FILE
- Parameters:
outdir (str) – directory where to store
OUTPUT_LOG_FILE
andERROR_LOG_FILE
fileshandlerprefix (str) – prefix of name to give to handlers and formatters, if
None
code will set value tocellmaps
- cellmaps_utils.logutils.write_task_finish_json(outdir=None, start_time=None, end_time=None, status=None)[source]
Writes
TASK_FILE_PREFIX
##TASK_FINISH_FILE_SUFFIX
file in outdir directory where ## is the start_time valuefrom cellmaps_utils import logutils import time logutils.write_task_finish_json(outdir='./mydir', start_time=int(time.time())-10, end_time=int(time.time()), status=0)
- Parameters:
outdir (str) – directory to write file
start_time (int) – time in seconds since epoch if set to
-1
orNone
value will be set to current timeend_time (int) – time in seconds since epoch if set to
-1
orNone
value will be set to current timestatus (int) – status of task,
0
means success, otherwise error
- cellmaps_utils.logutils.write_task_start_json(outdir=None, start_time=None, data=None, version=None)[source]
Writes
TASK_FILE_PREFIX
##TASK_START_FILE_SUFFIX
file with information about what is to be run. The ## in name is value of start_timefrom cellmaps_utils import logutils import time logutils.write_task_start_json(outdir='./mydir', start_time=int(time.time()), data={'someparam': 'some value'}, version='1.0.0')
cellmaps_utils.provenance module
Contains wrapper functionality for calls to FAIRSCAPE CLI
- class cellmaps_utils.provenance.ProvenanceUtil(fairscape_binary='fairscape-cli', default_date_format_str='%Y-%m-%d', raise_on_error=False)[source]
Bases:
object
Wrapper around FAIRSCAPE-cli calls
Constructor
- Parameters:
fairscape_binary (str :param default_date_format_str: Default date format string) – FAIRSCAPE command line binary If no path separators are included in this value (for example no
/
on Linux|mac) this code assumes the full path to the binary is the same directory where the python binary executing this script resides. To bypass this set the value to a full path with ex:/tmp/foo.py
raise_on_error (bool) – Flag to determine if exceptions should be raised on errors
- static example_dataset_provenance(requiredonly=True, with_ids=False)[source]
Returns example provenance dataset dict
- get_default_date_format_str()[source]
Gets default date format string set via constructor
- Returns:
default date format string usually something like %Y-%m-%d
- Return type:
- get_login()[source]
Attempts to get login of user
- Returns:
login of user or empty string if unable to obtain
- Return type:
- get_merged_rocrate_provenance_attrs(rocrate=None, override_name=None, override_project_name=None, override_organization_name=None, extra_keywords=None, keywords_to_preserve=6, merged_delimiter='|')[source]
Creates a merged provenance attributes object when given one or more RO-Crates. It does this by the following rules:
Values for name, project_name, and organization_name are put into respective sets for uniqueness sorted alphabetically and joined together using value of merged_delimiter
If override_name, override_project, or override_organization is not
None
then those values will be used in leiu of the merged data mentioned earlier.For keywords, the first keywords_to_preserve elements are put into respective sets for uniqueness and joined together using value of merge_delimiter and put back into a list. Any extra entries in extra_keywords is appended to this list.
The description is a merging of the keywords list with a space delimiter
- Parameters:
rocrate (str or dict or list) –
dict
or directory containing ro-crate-metadata.json file or path to file assumed to be RO-Crate meta data file or a list of either of the previously mentioned itemsoverride_name (str) – If not
None
, overrides name returnedoverride_project_name (str) – If not
None
, overrides project name returnedoverride_organization_name (str) – If not
None
, overrides organization-namekeywords_to_preserve (int) – Denotes number of keywords to preserve. A value of 5 means keep the 1st 5.
None
means preserve all keywordsmerged_delimiter (str) – default is ‘|’
- Raises:
CellMapsProvenanceError – If rocrate, extra_keywords
- Returns:
Merged rocrate provenance attributes
- Return type:
- get_name_project_org_of_rocrate(rocrate)[source]
Gets name, project, and organization name of RO-Crate
- get_rocrate_as_dict(rocrate_path)[source]
Loads RO-Crate as a dict
- Parameters:
rocrate_path (str) – Directory containing ro-crate-metadata.json file or path to file assumed to be ro-crate meta data file
- Raises:
CellMapsProvenanceError – If rocrate_path is
None
or if raise_on_error passed into constructor isTrue
and there is an issue parsing the ro-crate meta data file- Returns:
- Return type:
- register_computation(rocrate_path, name='', run_by='', command='', date_created=None, description='Must be at least 10 characters', used_software=[], used_dataset=[], generated=[], keywords=[''], guid=None, timeout=60)[source]
Registers computation adding information to
ro-crate-metadata.json
file stored in rocrate_path directory.- Parameters:
name (str)
run_by (str)
command (str)
date_created
description (str)
used_dataset (list) – list of FAIRSCAPE dataset ids used by this computation
generated (list) – list of FAIRSCAPE dataset ids for datasets generated by this computation
keywords (list)
timeout (float) – Time in seconds to wait for registration of computation to complete
- register_dataset(rocrate_path, data_dict=None, source_file=None, skip_copy=True, guid=None, timeout=30)[source]
Adds a dataset to existing rocrate specified by rocrate_path by adding information to
ro-crate-metadata.json
fileInformation about dataset should be specified in the data_dict dict passed in.
Expected format of data_dict:
{'name': 'Name of dataset', 'author': 'Author of dataset', 'version': 'Version of dataset', 'url': 'Url of dataset (optional)', 'date-published': 'Date dataset was published MM-DD-YYYY', 'description': 'Description of dataset', 'data-format': 'Format of data', 'schema': Path or URL to schema file in JSON format 'keywords': ['keyword1','keyword2']}
Changed in version 0.2.0: Added support for
schema
in data_dict passed in- Parameters:
rocrate_path (str) – Path to directory with registered rocrate
data_dict (dict) – Information about dataset to add. See above for expected data
source_file (str) – Path to source file of dataset
skip_copy – If
True
skip the copy of source file into crate_path. Use this when source file already resides in crate_pathtimeout (float) – Time in seconds to wait for registration of dataset to complete
- Returns:
id of dataset from FAIRSCAPE
- Return type:
- register_rocrate(rocrate_path, name='', organization_name='', project_name='', description='Please enter a description', keywords=[''], guid=None, timeout=30)[source]
Creates/registers RO-Crate in directory specified by rocrate_path Upon completion a
ro-crate-metadata.json
file will be created in the directory
- register_software(rocrate_path, name='unknown', description='Must be at least 10 characters', author='', version='', file_format='', url='', date_modified=None, keywords=[''], guid=None, timeout=30)[source]
Registers software by adding information to
ro-crate-metadata.json
file stored in rocrate_path directory.- Parameters:
name (str) – Name of software
description (str) – Description of software
author (str) – Author(s) of software
version (str) – Version of software
file_format (str) – Format of software file(s)
url (str) – URL to repository for software
date_modified (str)
keywords (list)
rocrate_path (str) – Path to directory with registered rocrate
timeout (float) – Time in seconds to wait for registration of ro-crate to complete
- Raises:
CellMapsProvenanceError – If FAIRSCAPE call fails
- Returns:
guid of software from FAIRSCAPE
- Return type:
- class cellmaps_utils.provenance.ROCrateProvenanceAttributes(name='Please enter a name', organization_name='Please enter an organization', project_name='Please enter a project', description='Please enter a description', keywords=[''])[source]
Bases:
object
Wrapper object to hold subset of RO-Crate provenance attributes
Constructor
- Parameters:
cellmaps_utils.basecmdtool module
Contains base class for all command line tools. Command line tools MUST subclass this
- class cellmaps_utils.basecmdtool.BaseCommandLineTool[source]
Bases:
object
Base class for all command line tools. Command line tools MUST subclass this
Constructor
- COMMAND = 'BaseCommandLineTool'
- static add_subparser(subparsers)[source]
Should add any argparse commandline arguments to subparsers passed in This must be implemented by sub classes and will always raise an error
- Parameters:
subparsers (argparse)
- Returns:
- run()[source]
Should contain logic that will be run by command line tool. This must be implemented by sub classes and will always raise an error
- Raises:
CellMapsError – will always raise this
- Returns:
cellmaps_utils.apmstool (AP-MS) module
Contains class that creates RO-Crate of AP-MS data from raw AP-MS tables.
- class cellmaps_utils.apmstool.APMSDataLoader(theargs, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>)[source]
Bases:
BaseCommandLineTool
Creates RO-Crate of AP-MS data from raw AP-MS tables
Constructor
- Parameters:
theargs (
Namespace
) – Command line arguments that at minimum need to have the following attributes:
- BAIT_COL_NAME = 'Bait'
- COMMAND = 'apmsconverter'
cellmaps_utils.iftool (Immunofluorescent Images) module
Contains classes that download images.
- class cellmaps_utils.iftool.FakeImageDownloader[source]
Bases:
ImageDownloader
Creates fake download by downloading the first image in each color from Human Protein Atlas and making renamed copies. The
download_file()
function is used to download the first image of each colorConstructor
- class cellmaps_utils.iftool.IFImageDataConverter(theargs, imgsuffix='.jpg', provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>, imagedownloader=None)[source]
Bases:
BaseCommandLineTool
Converts IF Image data into format consumable by Cell Maps Pipeline
Constructor
- Parameters:
theargs (
Namespace
) – Command line arguments that at minimum need to have the following attributes:
- COMMAND = 'ifconverter'
- add_subparser()[source]
Adds command-line argument parsing for the IFImageDataConverter tool.
- Returns:
- run()[source]
Runs the process of converting IF Image data into a format consumable by the Cell Maps Pipeline. This includes generating a directory path for the RO-Crate, creating the output directory, registering the RO-Crate, filtering the input data based on criteria, downloading and organizing the images. It also handles the registration of datasets and computations in the FAIRSCAPE ecosystem.
- Returns:
- Return type:
- class cellmaps_utils.iftool.ImageDownloader[source]
Bases:
object
Abstract class that defines interface for classes that download images
- class cellmaps_utils.iftool.MultiProcessImageDownloader(poolsize=4, skip_existing=False, override_dfunc=None)[source]
Bases:
ImageDownloader
Uses multiprocess package to download images in parallel
Constructor
Warning
Exceeding poolsize of
4
causes errors from Human Protein Atlas site- Parameters:
poolsize (int) – Number of concurrent downloaders to use.
skip_existing (bool) – If
True
skip download if image file exists and has size greater then0
override_dfunc (
function
) – Function that takes a tuple (image URL, download str path) and downloads the image. IfNone
download_file()
function is used
- download_images(download_list=None)[source]
Downloads images returning a list of failed downloads
from cellmaps_imagedownloader.runner import MultiProcessImageDownloader dloader = MultiProcessImageDownloader(poolsize=2) d_list = [('https://images.proteinatlas.org/992/1_A1_1_red.jpg', '/tmp/1_A1_1_red.jpg')] failed = dloader.download_images(download_list=d_list)
- cellmaps_utils.iftool.download_file(downloadtuple)[source]
Downloads file pointed to by ‘download_url’ to ‘destfile’
Note
Default download function used by
MultiProcessImageDownloader
cellmaps_utils.crisprtool (CRISPR) module
Contains class that creates RO-Crate of CRISPR data from raw CRISPR data files
- class cellmaps_utils.crisprtool.CRISPRDataLoader(theargs, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>)[source]
Bases:
BaseCommandLineTool
Creates RO-Crate of CRISPR data from raw CRISPR data files
Constructor
- Parameters:
theargs (
Namespace
) – Command line arguments that at minimum need to have the following attributes:
- COMMAND = 'crisprconverter'
cellmaps_utils.tabletool (RO-Crate Table) module
Contains class that creates table of meta data and links from one or more RO-Crates.
- class cellmaps_utils.tabletool.TableFromROCrates(theargs, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>)[source]
Bases:
BaseCommandLineTool
Creates table of meta data and links from one or more RO-Crates
Constructor
- Parameters:
theargs (
Namespace
) – Command line arguments that at minimum need to have the following attributes:
- CELL_LINE_COL = 'Cell Line'
- COLUMNS = ['FAIRSCAPE ARK ID', 'Date', 'Version', 'Type', 'Cell Line', 'Tissue', 'Treatment', 'Gene set', 'Generated By Software', 'Name', 'Description', 'Keywords', 'Download RO-Crate Data Package', 'Download RO-Crate Data Package Size MB', 'Generated By Software', 'Output Dataset', 'Responsible Lab']
- COMMAND = 'rocratetable'
- COMPUTATION_COL = 'Name'
- DATA_ROCRATE = 'Data'
- DATE_COL = 'Date'
- DESCRIPTION_COL = 'Description'
- DOWNLOAD_COL = 'Download RO-Crate Data Package'
- DOWNLOAD_COL_SIZE = 'Download RO-Crate Data Package Size MB'
- GENERATED_COL = 'Generated By Software'
- GENESET_COL = 'Gene set'
- ID_COL = 'FAIRSCAPE ARK ID'
- INTERMEDIATE_ROCRATE = 'Intermediate'
- KEYWORDS_COL = 'Keywords'
- MODEL_ROCRATE = 'Model'
- OTHER_ROCRATE = 'Other'
- OUTPUT_COL = 'Output Dataset'
- RESPONSIBLE_COL = 'Responsible Lab'
- TISSUE_COL = 'Tissue'
- TREATMENT_COL = 'Treatment'
- TYPE_COL = 'Type'
- VERSION_COL = 'Version'
cellmaps_utils.hidefconverter (HiDeF) module
Contains classes that convert a hierarchy network (in CX2 format) to a HiDeF format and vice versa.
- class cellmaps_utils.hidefconverter.HiDeFToHierarchyConverter(output_dir, nodes_file_path, edges_file_path, parent_edgelist_path=None, parent_ndex_url=None, host='ndexbio.org', parent_uuid=None, ndex_user=None, ndex_password=None)[source]
Bases:
object
A class to convert a edge list and node list in HiDeF format to hierarchy in HCX.
Added in version 0.5.0: The class was added to enable conversion from HiDeF-formatted edge and node files to hierarchy in HCX.
Initializes the converter with file paths and optional parent network details.
Parent network can be specified one of the following: - edge list file - link to interactome in NDEx - uuid of NDEx network along with the host where the network is hosted. If the network is private, username and password need to be specified.
- Parameters:
output_dir (str) – Directory where the output files will be stored.
nodes_file_path (str) – File path for the nodes file.
edges_file_path (str) – File path for the edges file.
parent_edgelist_path (str, optional) – Path to the edge list of the interactome (optional).
parent_ndex_url (str, optional) – URL of parent interactome in NDEx (optional).
parent_uuid (str, optional) – UUID of the network in NDEx.
host (str, optional) – NDEx host.
ndex_user (str, optional) – NDEx username (optional).
ndex_password (str, optional) – NDEx password (optional).
- generate_hierarchy_hcx_file(hierarchy_filename='hierarchy.cx2', interactome_filename='hierarchy_parent.cx2')[source]
Generates the HiDeF hierarchy file in CX2 format. If the object is initialized with parent network’s edge list, the interactome in CX2 format will be generated in output directory as well. If the object is initialized with uuid of parent network, only hierarchy will be generated.
cellmaps_utils.ddotconverter (DDOT) module
Contains classes that convert a interaction or hierarchical network (in CX2 format) to a DDOT format and vice versa.
- class cellmaps_utils.ddotconverter.DDOTToHierarchyConverter(output_dir, ontology_ddot_path, parent_ddot_path=None, parent_ndex_url=None, host='ndexbio.org', parent_uuid=None, ndex_user=None, ndex_password=None, hierarchy_filename='hierarchy.cx2', interactome_filename='hierarchy_parent.cx2')[source]
Bases:
object
Initializes the converter to create a hierarchy in CX2 from DDOT ontology file.
- Parameters:
output_dir (str) – Directory where the output files will be saved
ontology_ddot_path (str) – Path to the DDOT formatted ontology file
parent_ddot_path (str, optional) – Optional path to the parent interactome DDOT file
parent_ndex_url (str, optional) – Optional URL to the parent interactome in NDEx
host (str) – Hostname for the NDEx server
parent_uuid (str, optional) – UUID of the parent network in NDEx
ndex_user (str, optional) – Username for NDEx server authentication
ndex_password (str, optional) – Password for NDEx server authentication
hierarchy_filename (str) – Name for the output hierarchy file
interactome_filename (str) – Name for the output interactome file (parent network)
- class cellmaps_utils.ddotconverter.DDOTToInteractomeConverter(output_dir, interactome_ddot_path, interactome_file_name='interactome.cx2')[source]
Bases:
object
Initializes the converter to transform a DDOT formatted file to an CX2 format.
- Parameters:
- class cellmaps_utils.ddotconverter.HierarchyToDDOTConverter(output_dir, hierarchy_path, ontology_file_name='ontology.ont')[source]
Bases:
object
Initializes the converter to transform a hierarchy data file in CX2 format into a DDOT ontology file.
- Parameters:
cellmaps_utils.hcx_utils (HCX) module
Contains classes that annotates a CX2 network with HCX annotations.
- cellmaps_utils.hcx_utils.add_hcx_members_annotation(hierarchy, interactome, gene_column='CD_MemberList')[source]
Adds the ‘HCX::members’ attribute to nodes in the hierarchy based on the interactome.
- Parameters:
hierarchy (CX2Network) – The hierarchical network in CX2 format.
interactome (CX2Network) – The interactome network.
gene_column (str, optional) – Column name containing gene members.
- cellmaps_utils.hcx_utils.add_hcx_network_annotations(hierarchy, interactome=None, output_dir='.', interactome_name='hierarchy_parent.cx2', host='www.ndexbio.org', uuid=None)[source]
Adds HCX network annotations to the hierarchy.
- Parameters:
hierarchy (CX2Network) – The hierarchical network in CX2 format.
interactome (CX2Network, optional) – The interactome network.
output_dir (str, optional) – Directory where the interactome file will be saved.
interactome_name (str, optional) – Name of the interactome file.
host (str, optional) – NDEx host for interactome retrieval.
uuid (str, optional) – UUID of the interactome in NDEx.
- Returns:
The updated hierarchy network with annotations.
- Return type:
CX2Network
- cellmaps_utils.hcx_utils.add_isroot_node_attribute(hierarchy, root_nodes)[source]
Using the root_nodes set or list, add
HCX::isRoot
to every node setting value toTrue
if node id is in root_nodes otherwise set the value toFalse
- cellmaps_utils.hcx_utils.convert_hierarchical_network_to_hcx(hierarchy, interactome_url, ndex_username=None, ndex_password=None, gene_column='CD_MemberList')[source]
Converts a hierarchical network into HCX format by adding necessary annotations and interactome details.
- Parameters:
hierarchy (CX2Network or str) – The hierarchical network in CX2 format or a path to the CX2 file.
interactome_url (str) – URL of the interactome network in NDEx.
ndex_username (str, optional) – NDEx username for authentication.
ndex_password (str, optional) – NDEx password for authentication.
gene_column (str, optional) – Column name containing gene members.
- Returns:
The updated hierarchy network in CX2 format with HCX annotations.
- Return type:
CX2Network
- cellmaps_utils.hcx_utils.get_host_and_uuid_from_network_url(network_url)[source]
Extracts the host and UUID from a given NDEx network URL.
- cellmaps_utils.hcx_utils.get_interactome(host, uuid, username, password, parent_edgelist)[source]
Retrieves the interactome either from NDEx or from a local edge list.
- Parameters:
- Returns:
A CX2Network object representing the interactome.
- Return type:
CX2Network
- cellmaps_utils.hcx_utils.get_root_nodes(hierarchy)[source]
Identifies the root nodes in a hierarchical network.
In CDAPS the root node has only source edges to children so this function counts up number of target edges for each node and the one with 0 is the root
- Returns:
root node ids
- Return type:
cellmaps_utils.hierdiff (Hierarchy comparison) module
Contains class that compare hierarchies (Jaccard similarity)
- class cellmaps_utils.hierdiff.HierarchyDiff[source]
Bases:
object
A class to compare two hierarchies in CX2 (HCX)
Constructor
- compare_hierarchies(hierarchy_a=None, hierarchy_b=None)[source]
Compare two hierarchies in CX2 format by calculating Jaccard overlaps and assigning a ‘robustness’ (overlap) score to each node in the first hierarchy.
- Parameters:
hierarchy_a (ndex2.cx2.CX2Network) – The first (reference) hierarchy to compare.
hierarchy_b (ndex2.cx2.CX2Network) – The second (alternative) hierarchy to compare against.
- Returns:
The first hierarchy with an added ‘robustness’ node attribute.
- Return type:
ndex2.cx2.CX2Network
- compare_hierarchies_from_files(hierarchy_a_path=None, hierarchy_b_path=None)[source]
Compare two hierarchies from files then calculating overlap-based scores.
- compute_hierarchy_robustness(ref_hierarchy, alt_hierarchies, ji_thre=0.4)[source]
Computes a robustness score for each node in a reference hierarchy based on its structural overlap across multiple alternative hierarchies. The overlap is measured using the Jaccard Index (JI), and a threshold that determines if a node is considered to have sufficient overlap in a given alternative hierarchy (values above the threshold are set to 1, while values below are set to 0). The higher the overlap across the alternative hierarchies, the higher the robustness score.
robustness = (# hierarchies where JI > ji_thre) / (total number of alternative hierarchies)
- Parameters:
- Returns:
The reference hierarchy with an added ‘robustness’ attribute for each node.
- Return type:
ndex2.cx2.CX2Network
cellmaps_utils.ndexupload (NDEx upload) module
Contains class that aids uploading hierarchy and interactome to NDEx
- class cellmaps_utils.ndexupload.NDExHierarchyUploader(ndexserver, ndexuser, ndexpassword, visibility=None)[source]
Bases:
object
Base class for uploading hierarchy and its parent network to NDEx.
Constructor
- Parameters:
- save_hierarchy_and_parent_network(hierarchy, parent_ppi)[source]
Saves both the hierarchy and its parent network to the NDEx server. This method first saves the parent network, then updates the hierarchy with HCX annotations based on the parent network’s UUID, and finally saves the updated hierarchy. It returns the UUIDs and URLs for both the hierarchy and the parent network.
- Parameters:
hierarchy (
CX2Network
) – The hierarchy network to be saved.parent_ppi (
CX2Network
) – The parent protein-protein interaction network associated with the hierarchy.
- Returns:
UUIDs and URLs for both the parent network and the hierarchy.
- Return type:
- upload_hierarchy_and_parent_network_from_files(hier_dir=None, hierarchy_path=None, parent_path=None)[source]
Uploads hierarchy and parent network to NDEx from CX2 files. It first checks if hierarchy_path and parent_path are provided. If not provided, it tries to get them from hier_dir directory. If none is specified or cannot find hierarchy and parent in hier_dir, it raises an exception.
- Parameters:
- Returns:
UUIDs and URLs for both the hierarchy and parent network.
- Return type:
- Raises:
CellMapsError – If the required hierarchy or parent network files do not exist.
cellmaps_utils.music_utils module
Contains helper methods for MUSIC.
- cellmaps_utils.music_utils.canberra_similarity(df)[source]
Calculate Canberra similarity between each pair of rows in a DataFrame. Similarity scaled into [0, 1]
- Parameters:
df
- Returns:
- Return type:
- cellmaps_utils.music_utils.check_symmetric(a, rtol=1e-05, atol=1e-08)[source]
Check if the given numpy matrix is symmetric or not.
- Parameters:
a
rtol
atol
- Returns:
- cellmaps_utils.music_utils.cosine_similarity_scaled(df)[source]
Calculate Cosine similarity between each pair of rows in a DataFrame. Similarity scaled into [0, 1]
- Parameters:
df
- Returns:
- Return type:
- cellmaps_utils.music_utils.euclidean_similarity(df)[source]
Calculate Euclidean similarity between each pair of rows in a DataFrame. Similarity scaled into [0, 1]
- Parameters:
df
- Returns:
- Return type:
- cellmaps_utils.music_utils.jaccard(setA, setB)[source]
Calculates jaccard
- Parameters:
setA
setB
- Returns:
- cellmaps_utils.music_utils.kendall_scaled(df)[source]
Calculate Kendall correlation between each pair of rows in a DataFrame. Correlation scaled into [0, 1]
- Parameters:
df
- Returns:
- cellmaps_utils.music_utils.load_obj(fname, method='pickle')[source]
Loading object that was saved in pickle format
- Parameters:
- Raises:
ValueError – if method is not set to
pickle
ordill
- cellmaps_utils.music_utils.manhattan_similarity(df)[source]
Calculate Manhattan similarity between each pair of rows in a DataFrame. Similarity scaled into [0, 1]
- Parameters:
df
- Returns:
- Return type:
- cellmaps_utils.music_utils.pearson_scaled(df)[source]
Calculate Pearson correlation between each pair of rows in a DataFrame. Correlation scaled into [0, 1]
- Parameters:
df
- Returns:
- Return type:
- cellmaps_utils.music_utils.save_obj(obj, fname, method='pickle')[source]
- Parameters:
- Raises:
ValueError – if method is not set to
pickle
ordill
- cellmaps_utils.music_utils.scaled_P_to_nm(scaled_P)[source]
TODO: Add doc here
- Parameters:
scaled_P
- Returns:
- cellmaps_utils.music_utils.spearman_scaled(df)[source]
Calculate Spearman correlation between each pair of rows in a DataFrame. Correlation scaled into [0, 1]
- Parameters:
df
- Returns:
- cellmaps_utils.music_utils.upper_tri_values(df)[source]
Return array with values of upper triangle of the DataFrame
- Parameters:
df (
pandas.DataFrame
) – Symmetric DataFrame- Returns:
- Return type:
Constants module
Contains constants used by the various Cell Maps Tools
- cellmaps_utils.constants.ANTIBODY_GENE_TABLE_FILE = 'antibody_gene_table.tsv'
Antibody Gene Table file
- cellmaps_utils.constants.APMS_TSV_FILE = 'apms.tsv'
AP-MS tsv file
- class cellmaps_utils.constants.ArgParseFormatter(prog, indent_increment=2, max_help_position=24, width=None)[source]
Bases:
ArgumentDefaultsHelpFormatter
,RawDescriptionHelpFormatter
Combine two
argparse
Formatters to get help and default values displayed when showing help
- cellmaps_utils.constants.BLUE = 'blue'
Blue color directory name and color name in blue color files
- cellmaps_utils.constants.COEMBEDDING_STEP_DIR = '3.coembedding_fold'
Name of directory where co-embeddings are stored
- cellmaps_utils.constants.COLORS = ['red', 'blue', 'green', 'yellow']
List of colors
- cellmaps_utils.constants.COLOR_INDEXS = {'blue': 2, 'green': 1, 'red': 0, 'yellow': 0}
Indexes for colors in .jpeg image
- cellmaps_utils.constants.COLOR_LABELS_MAP = {'blue': 'Nucleus (DAPI)', 'green': 'protein of interest', 'red': 'Microtubules (Tubulin antibody)', 'yellow': 'ER (Calreticulin antibody)'}
Map of what each color refers to
- cellmaps_utils.constants.CO_EMBEDDING_FILE = 'coembedding_emd.tsv'
Name of file containing coembedding
- cellmaps_utils.constants.CX2_SUFFIX = '.cx2'
Suffix for files in `CX 2.0<https://cytoscape.org/cx/cx2/specification/2022/12/01/cytoscape-exchange-format-specification-(version-2).html>`__ format
- cellmaps_utils.constants.DATASET_AUTHOR = 'author'
Author of the dataset
- cellmaps_utils.constants.DATASET_CELL_LINE = 'cell_line'
Cell line
- cellmaps_utils.constants.DATASET_COLLECTION_SET = 'collection_set'
Collection set
- cellmaps_utils.constants.DATASET_GENE_SET = 'gene_set'
Gene set
- cellmaps_utils.constants.DATASET_INFO_FILE = 'dataset_info.json'
Name of file where information about a dataset is stored
- cellmaps_utils.constants.DATASET_NAME = 'name'
Name of the dataset
- cellmaps_utils.constants.DATASET_ORGANIZATION_NAME = 'organization_name'
Name of the organization
- cellmaps_utils.constants.DATASET_PROJECT_NAME = 'project_name'
Name of the project
- cellmaps_utils.constants.DATASET_RELEASE = 'release'
Release of the dataset
- cellmaps_utils.constants.DATASET_SLICE = 'slice'
Slice
- cellmaps_utils.constants.DATASET_TISSUE = 'tissue'
Tissue
- cellmaps_utils.constants.DATASET_TREATMENT = 'treatment'
Treatment
- cellmaps_utils.constants.ERROR_LOG_FILE = 'error.log'
Error log file name
- cellmaps_utils.constants.GREEN = 'green'
Green color directory name and color name in green color files
- cellmaps_utils.constants.HIERARCHYEVAL_STEP_DIR = '5.hierarchyeval'
Name of directory where hierarchy evaluations are stored
- cellmaps_utils.constants.HIERARCHY_NETWORK_PREFIX = 'hierarchy'
CX2 format hierarchy filename
- cellmaps_utils.constants.HIERARCHY_NODES_FILE = 'hierarchy_node_attributes.tsv'
Hierarchy node attributes file
- cellmaps_utils.constants.HIERARCHY_PARENT_NETWORK_PREFIX = 'hierarchy_parent'
CX2 format hierarchy parent (interactome) filename
- cellmaps_utils.constants.HIERARCHY_STEP_DIR = '4.hierarchy'
Name of directory where hierarchies are stored
- cellmaps_utils.constants.IMAGE_DOWNLOAD_STEP_DIR = '1.image_download'
Name of directory where downloaded images are stored
- cellmaps_utils.constants.IMAGE_EMBEDDING_FILE = 'image_emd.tsv'
Name of image embedding file
- cellmaps_utils.constants.IMAGE_EMBEDDING_STEP_DIR = '2.image_embedding_fold'
Name of directory where image embeddings are stored
- cellmaps_utils.constants.IMAGE_GENE_NODE_AMBIGUOUS_COL = 'ambiguous'
Ambiguous column
- cellmaps_utils.constants.IMAGE_GENE_NODE_ANTIBODY_COL = 'antibody'
Antibody name column
- cellmaps_utils.constants.IMAGE_GENE_NODE_ATTR_FILE = 'image_gene_node_attributes.tsv'
Image gene node attributes filename
- cellmaps_utils.constants.IMAGE_GENE_NODE_COLS = ['name', 'represents', 'ambiguous', 'antibody', 'filename', 'imageurl']
Columns in
IMAGE_GENE_NODE_ATTR_FILE
file
- cellmaps_utils.constants.IMAGE_GENE_NODE_ERRORS_FILE = 'image_gene_node_attributes.errors'
Image gene node attributes errors filename
- cellmaps_utils.constants.IMAGE_GENE_NODE_FILENAME_COL = 'filename'
File name column
- cellmaps_utils.constants.IMAGE_GENE_NODE_IMAGEURL_COL = 'imageurl'
Image URL column
- cellmaps_utils.constants.IMAGE_GENE_NODE_NAME_COL = 'name'
Gene Symbol name column
- cellmaps_utils.constants.IMAGE_GENE_NODE_REPRESENTS_COL = 'represents'
Ensembl ids column
- cellmaps_utils.constants.IMAGE_LABELS_PROBABILITY_FILE = 'labels_prob.tsv'
Name of image labels probability file
- cellmaps_utils.constants.LOG_FORMAT = '%(asctime)-15s %(levelname)s %(relativeCreated)dms %(filename)s::%(funcName)s():%(lineno)d %(message)s'
Sets format of logging messages
- cellmaps_utils.constants.OUTPUT_LOG_FILE = 'output.log'
Output log file name
- cellmaps_utils.constants.PERTURBATION_FILE = 'perturbation.h5ad'
Perturbation/CRISPRi h5ad file
- cellmaps_utils.constants.PPI_DOWNLOAD_STEP_DIR = '1.ppi_download'
Name of directory where downloaded PPI edge and baitlist are stored
- cellmaps_utils.constants.PPI_EDGELIST_COLS = ['geneA', 'geneB']
Columns in
PPI_EDGELIST_FILE
- cellmaps_utils.constants.PPI_EDGELIST_FILE = 'ppi_edgelist.tsv'
Protein to Protein interaction edgelist file name
- cellmaps_utils.constants.PPI_EDGELIST_GENEA_COL = 'geneA'
First column name
- cellmaps_utils.constants.PPI_EDGELIST_GENEB_COL = 'geneB'
Second column name
- cellmaps_utils.constants.PPI_EMBEDDING_FILE = 'ppi_emd.tsv'
Name of Protein to Protein embedding file
- cellmaps_utils.constants.PPI_EMBEDDING_STEP_DIR = '2.ppi_embedding'
Name of directory where PPI embeddings are stored
- cellmaps_utils.constants.PPI_GENE_NODE_ATTR_FILE = 'ppi_gene_node_attributes.tsv'
Protein to Protein gene node attributes file
- cellmaps_utils.constants.PPI_GENE_NODE_COLS = ['name', 'represents', 'ambiguous', 'bait']
Columns in
PPI_GENE_NODE_ATTR_FILE
- cellmaps_utils.constants.PPI_GENE_NODE_ERRORS_FILE = 'ppi_gene_node_attributes.errors'
Protein to Protein gene node attributes error filename
- cellmaps_utils.constants.PROVENANCE_ERRORS_FILE = 'provenance_errors.json'
Contains log of any failed fairscape-cli calls
- cellmaps_utils.constants.RED = 'red'
Red color directory name and color name in red color files
- cellmaps_utils.constants.RO_CRATE_METADATA_FILE = 'ro-crate-metadata.json'
rocrate metadata JSON file name
- cellmaps_utils.constants.TASK_FILE_PREFIX = 'task_'
Prefix for task file
- cellmaps_utils.constants.TASK_FINISH_FILE_SUFFIX = '_finish.json'
Suffix for task finish file
- cellmaps_utils.constants.TASK_START_FILE_SUFFIX = '_start.json'
Suffix for task start file
- cellmaps_utils.constants.WEIGHTED_PPI_EDGELIST_WEIGHT_COL = 'Weight'
weight column
- cellmaps_utils.constants.YELLOW = 'yellow'
Yellow color directory name and color name in yellow color files
Exceptions
Base error classes for Cell Maps Tools
Module contents
Top-level package for cellmaps_utils.