Outputs
This page describes the format of the outputs generated by
the cellmaps_utilscmd.py
tools that either create RO-Crate
directories or a table file summarizing RO-Crate directories that
have been created.
RO-Crate directory summary table
Outputs described below are created by the invocation of cellmaps_utilscmd.py rocratetable
- data.tsv:
Contents of `tsv`_
data.tsv
file:FAIRSCAPE ARK ID Date Version Type Cell Line Tissue Treatment Gene set Generated By Software Name Description KeywordDownload RO-Crate Data Package Download RO-Crate Data Package Size MB Generated By Software Output Dataset Responsible Lab d4d80b1d-8d49-4204-8c0d-209c5b9ccdf2:cm4ai_chromatin_kolf2.1j_undifferentiated_untreated_crispr_4channel_0.1_alpha 2024-04-29 0.1 alpha Data KOLF2.1J undifferentiated untreated chromatin CRISPR CM4AI 0.1 alpha KOLF2.1J untreated CRISPR undifferentiated 4channel chromatin CM4AI,0.1 alpha,KOLF2.1J,untreated,CRISPR,undifferentiated,4channel,chromatin https://cm4ai.org/Data/cm4ai_chromatin_kolf2.1j_undifferentiated_untreated_crispr_4channel_0.1_alpha.tar.gz 1 Mali Lab 134e01c8-90ea-457d-9e6e-ca046ecc860f:cm4ai_chromatin_mda-mb-468_paclitaxel_ifimage_0.1_alpha 2024-04-29 0.1 alpha Data MDA-MB-468 breast; mammary gland paclitaxel chromatin IF images CM4AI 0.1 alpha MDA-MB-468 paclitaxel IF microscopy images breast; mammary gland chromatin CM4AI,0.1 alpha,MDA-MB-468,paclitaxel,IF microscopy,images,breast; mammary gland,chromatin https://cm4ai.org/Data/cm4ai_chromatin_mda-mb-468_paclitaxel_ifimage_0.1_alpha.tar.gz 1 Lundberg Lab 7240c7d7-327c-423c-834d-1e99ab8a417b:cm4ai_chromatin_mda-mb-468_untreated_apms_0.1_alpha 2024-04-29 0.1 alpha Data MDA-MB-468 breast; mammary gland untreated chromatin AP-MS CM4AI 0.1 alpha MDA-MB-468 untreated breast; mammary gland AP-MS edgelist chromatin CM4AI,0.1 alpha,MDA-MB-468,untreated,breast; mammary gland,AP-MS edgelist,chromatin https://cm4ai.org/Data/cm4ai_chromatin_mda-mb-468_untreated_apms_0.1_alpha.tar.gz 1 Krogan Lab
Perturbation/CRISPR
Outputs described below are created by the invocation of cellmaps_utilscmd.py crisprconverter
- perturbation.h5ad:
This file is a .h5ad file. Details about the data structure of this format can be found here: https://anndata.readthedocs.io/en/latest/. This data is a cell x gene matrix, with all preprocesing steps done in accordance to single cell best practices guide (https://www.sc-best-practices.org/conditions/perturbation_modeling.html#analysing-single-pooled-crispr-screens) up to section 19.4.5. The .X layer is set to the ‘X_pert’ output of the mixscape pipeline.
dataset_info.json
readme.txt
ro-crate-metadata.json
Affinity Purification Mass Spectrometry (AP-MS)
Outputs described below are created by the invocation of cellmaps_utilscmd.py apmsconverter
- apms.tsv:
Columns:
- Bait:
Name of the pull downed protein
- Prey:
Uniprot ID number of identified proteins by MS in pull down (putative bait interactor).
- PreyGene.x:
Uniprot protein name of identified protein by MS in pull down (putative bait interactor).
- Spec:
Number of spectral count in each test biological replicates (separated by | ).
- SpecSum:
Sum of Spectral counts in test samples.
- AvgSpec:
Average Spectral counts across replicates in test samples.
- NumReplicates.x:
Number of replicates in test samples.
- ctrlCounts:
Number of spectral count in each control replicates (separated by | ).
- AvgP.x:
Average probability that an interaction is true, measure of the likelihood that a given interaction is a true positive rather than a random or non-specific interaction. A lower AvgP indicates higher confidence in the interaction being genuine.
- MaxP.x:
maximum probability associated with a protein interaction in the context of its prey-bait pair. Similar to AvgP, a lower MaxP suggests a higher likelihood of the interaction being true.
- TopoAvgP.x:
extension of the AvgP score that also takes into consideration the topology of the interaction network. It incorporates information about the hierarchical structure of the interaction data to provide a refined assessment of the interactions.
- TopoMaxP.x:
topology-aware score that considers the maximum probability of an interaction in the context of the interaction network’s topology.
- SaintScore.x:
composite score that integrates multiple aspects of the interaction data, including spectral counts and probability estimates. It’s designed to prioritize interactions based on their strength and reliability. Higher SaintScores indicate interactions that are more likely to be true.
- logOddsScore:
Logarithm of the odds ratio between test and control conditions for each prey as a measure of interaction significance. The LogOddsScore is a statistical score that represents the logarithm of the odds ratio for a protein-protein interaction. It’s used to quantify the strength and significance of the association between two proteins in an interaction network. The odds ratio compares the likelihood of the interaction occurring to the likelihood of it not occurring. Taking the logarithm of the odds ratio often helps to transform the score into a more symmetric and interpretable form, making it easier to compare and analyze the interactions. Higher LogOddsScores typically indicate stronger evidence for the interaction.
- FoldChange.x:
represents the ratio of the abundance of a protein or interaction in one experimental condition (Test) compared to another (control). It helps assess whether the abundance of a protein changes significantly between different conditions.
- BFDR.x:
Bayesian False Discovery Rate
dataset_info.json
readme.txt
ro-crate-metadata.json
Size Exclusion Chromatography with Mass Spectrometry (SEC-MS)
Outputs described below are created by the invocation of cellmaps_utilscmd.py secmsconverter
TODO
Immunofluorescent Image (IFImage)
Outputs described below are created by the invocation of cellmaps_utilscmd.py ifconverter
- antibody_gene_table.tsv:
The .tsv file describes each image in the data set. Each row represents one image. The columns describe the staining from which the image was taken: “Antibody ID” describes the antibody ID for the antibody applied to stain the protein visible in the “green” channel. The antibody ID can be looked up at proteinatlas.org to find out more information about the antibody. “ENSEMBL ID” indicates the ENSEMBL ID(s) of the gene(s) of the proteins visualized in the “green” channel. Treatment refers to how the cells that are depicted in the image were treated (with Paclitaxel, Vorinostat, or untreated) “Well” refers to the well coordinate on the 96-well plate “Region” is a unique identifier for the position in the well, where the cells were acquired.
red eg. B2AI_1_Paclitaxel_C1_R1_z01_red.jpg
blue eg. B2AI_1_Paclitaxel_C1_R1_z01_blue.jpg
green eg. B2AI_1_Paclitaxel_C1_R1_z01_green.jpg
yellow eg. B2AI_1_Paclitaxel_C1_R1_z01_yellow.jpg
dataset_info.json
readme.txt
ro-crate-metadata.json