Synthetic lethality arises where cell death is produced by loss or inhibition of function events in two or more genes - and where cells remain viable when any one of these events occurs in isolation [O'Neil N, et al. 2017, Wappett M, et al. 2016]. Therefore, synthetic lethality can be exploited to kill cancer cells; a classic example is where mutations in BRCA1 or BRCA2, part of the homologous recombination DNA repair pathway, result in dependency on DNA repair by PARP genes, making cells exquisitely sensitive to pharmacological inhibition of PARP1/2 [Lord C & Ashworth A, 2017, Bryant H, et al. 2005, McCabe N, et al. 2006].
Therefore, mutually exclusive loss is a key consequence of synthetic lethal relationships. SynLeGG enables exploration of mutually exclusive, or depleted, gene states in polyomics data across a large number of cell lines and tissues. In addition to having therapeutic potential, these dependencies may reveal gene functions.
SynLeGG presents results from the MultiSEp algorithm for RNA-seq data from a comprehensive cancer cell line atlas (CCLE) [Ghandi M et al. 2019]. MultiSEp applies Gaussian mixture modelling with expectation-maximisation to discover gene expression clusters, or modes, where cardinality is determined by Bayesian Information Criterion regularisation [Lubbock A et al. 2013]. This method allows categorisation of cell lines into different clusters based on the expression of a given gene, flexibly capturing between two to five different sub-populations of cell lines (Figures 1-3, below). Each cell line is assigned a probability of belonging to each cluster by MultiSEp, which determines cluster assignment; for example when a cell line is positioned in a region of overlap between two clusters.
Figure 1. Two gene expression clusters identified by MultiSEp. A bimodal distribution is shown, where MultiSEp identified two clusters of cell lines from the CCLE data for EAF2. The first cluster is coloured red; the second cluster is shown in blue. Each of the two clusters could potentially be split further, however MultiSEp found that the bimodal model was top-scoring when compared to the the other models evaluated (from 2 to 5 clusters, scored using Bayesian Information Criterion). Therefore EAF2 gene expression was divided into two clusters.
Figure 2. Three gene expression clusters identified by MultiSEp. MultiSEp identified three clusters (modes) of cell lines from the CCLE data for CDK6. The first cluster is coloured red; the second cluster is shown in green, the third in blue. Each of the three clusters could potentially be split further, however MultiSEp determined that the trimodal model was top-scoring when compared to the the others evaluated (from 2 to 5 clusters, scored using Bayesian Information Criterion). Therefore three clusters were assigned to the CDK6 gene expression data.
Figure 3. Four gene expression clusters identified by MultiSEp. MultiSEp identified four clusters (modes) of cell lines from the CCLE data for FERMT3. The first cluster is coloured red; the second, third and fourth clusters are respectively shown in green, blue and purple. Each of the four clusters could potentially be split further, however MultiSEp determined that the above model was top-scoring when compared to the the others evaluated (from 2 to 5 clusters, scored using Bayesian Information Criterion). Therefore four clusters were assigned to the FERMT3 gene expression data.
The cell line by gene, cluster assigment matrix is available to download here
MultiSEp
Gene expression clusters enable partitioning of effect scores from CRISPR screen data, and therefore may reveal pairwise gene dependencies. SynLeGG visualises MultiSEp mRNA expression clusters and the CRISPR scores from CERES, including statistical evaluation of dependency relationships (Figure 4). A log2 fold-change and two-tailed T-test p-value are calculated between the CRISPR scores for each consecutive pair of mRNA expression clusters. Only gene pairs with log2 fold-change values less than -0.1 between clusters and p-values less than 0.1 are returned, q-values are also provided [Benjamini & Hochberg 1995]. We strongly recommend taking the q-values as a baseline for determining statistical significance, indeed it is crucially important to correct for multiple hypothesis testing to ensure the statistical validity of results. Users that are following up a hypothesis for a single gene pair that has arisen in analysis of data from outside of SynLeGG might have grounds to refer to the trends described by the uncorrected p-values. SynLeGG also allows the user to explore tissue-specific penetrance (Figure 5).
Figure 4. Visualisation of NMT1 CRISPR scores by NMT2 gene expression clusters for all tissues represented in CCLE. Each dot represents a cell line and the tissue of origin is shown in the key. The NMT1 CRISPR score is shown on the y-axis, NMT2 gene expression clusters are shown on the x-axis. Lower CRISPR scores represent greater depletion with NMT1 loss of function and so correspond to a larger effect upon cell viability/growth. The CRISPR scores are shifted and scaled per cell line so that nonessential genes have a median score of 0 and essential genes have a median score of -1. The NMT1 CRISPR scores are lower in the first NMT2 expression cluster (left), relative to the second expression cluster (right). Therefore, low expression of NMT2 appears to potentiate the effect of NMT1 loss of function upon viability/growth in the CCLE cancer cell lines examined.
Figure 5. Visualisation of NMT1 CRISPR scores by NMT2 gene expression clusters for lung cancer cell lines. Each dot represents a lung cancer cell line. The NMT1 CRISPR score is shown on the y-axis, NMT2 gene expression clusters are shown on the x-axis. Lower CRISPR scores represent greater depletion with NMT1 loss of function and so correspond to a larger effect upon cell viability/growth. The NMT1 CRISPR scores are lower in the first NMT2 expression cluster (left), relative to the second expression cluster (right). Therefore, low expression of NMT2 appears to potentiate the effect of NMT1 loss of function upon viability/growth in the lung cancer cell lines shown.
SynLeGG displays mutations that are enriched in mRNA expression clusters, providing insight into the genetic background relevant to candidate 'Achilles Heel' relationships suggested by analysis of the CRISPR screen and transcriptome data (Figure 6). A chi-squared test was performed for each gene expression / CRISPR gene pair to derive p-values for the enrichment of mutation classes across the gene expression clusters. Q-values were produced using Benjamini-Hochberg false discovery rate correction [Benjamini & Hochberg 1995].
Figure 6. Visualisation of NMT1 CRISPR scores by NMT2 gene expression clusters for lung cancer cell lines. Each dot represents a lung cancer cell line. The NMT1 CRISPR score is shown on the y-axis, NMT2 gene expression clusters are shown on the x-axis. The colour of each dot represents mutational status, as shown in the key. Cell lines with wild-type (WT) CACNA1B are coloured blue. There is strong enrichment of CACNA1B mutations in the NMT2 high expression group; moreover, cell lines with these mutations and high NMT2 expression are relatively unaffected by NMT1 loss of function in the CRISPR screen. These data are consistent with elevated NMT2 expression as a mechanism to compensate for loss of NMT1 function due to targeting by CRISPR in a CACNA1B mutant background. Similarly, cell lines with low NMT2 expression appear vulnerable to NMT1 loss.
In addition to statistical comparison of CRISPR scores across mRNA expresssion modes, <SynLeGG integrates further sources of evidence that support gene functional similarity, specifically:
- Overlapping Gene Ontology annotations [Carlson M et al. 2019, Yu G et al. 2010].
- Physical protein-protein interactions (PPI's) from BioGRID [Stark C et al. 2006].
- Paralogous gene pairs; defined by Ensembl and within human only [Zerbino D et al. 2018].
The check-boxes in the 'Optional Results Filters' panel limit the display of results to require the above additional evidence. When multiple checkboxes are ticked, the results shown combine the filters using a logical AND operator. For example selecting both the 'PPI' and 'Gene Ontology' checkboxes displays results where the gene pairs have annotated PPIs and overlapping Gene Ontology terms in addition to passing the statistical filters outlined above.
Integrated 'Achilles Heel' analysis
This section gives a succinct summary for each of the plot types available within SynLeGG.
The 'Integrated' plot shows a boxplot of CRISPR dependency scores. Please see Figure 17, below.Visualisation
The CRISPR tab provides integrated gene expression and CRISPR scores for investigation of candidate 'Achilles-Heel' relationships. The 'Results' table (Figure 7) displays gene pairs that pass the MultiSEp thresholds; by default, rows are ordered by qvalue. Results may be filtered by clicking on any of the PPI, Paralogue and Gene Ontology checkboxes. For example, if the PPI box is checked then only mRNA/CRISPR gene pairs that share a physical protein-protein interaction in BioGRID will be returned. If both the 'PPI' and 'Paralogue' boxes are checked then only within-human paralogues with evidence for a physical protein interaction are displayed in the 'Results' table. The 'Search' bar above the Results table allows querying by gene name; there are also individual search boxes for the columns: mRNA_gene, crispr_gene, and cluster. The 'clusters' column refers to the number of gene expression clusters identified by MultiSEp and requires integer values between 2 and 5. For example, entering a value of 3 in the 'clusters' search box will return genes with three expression clusters. Clicking on the 'U' next to each gene name navigates to the relevant UniProt page (opens in a new tab). Clicking the download icon underneath the 'Results' table exports the contents as a .csv file.
Figure 7. Selecting a gene pair in the CRISPR tab. Gene pairs that have dependencies identified by MultiSEp are shown in the Results panel, the gene pair EAF2, EAF1 is selected (blue row). The 'Details' panel shows statistical and Gene Ontology information for the selected gene pairing (EAF2, EAF1).
Clicking on a row in the 'Results' table populates two additional panels; the 'Details' panel (Figure 7) and the 'Plot' panel (Figure 8). The 'Details' panel provides summary statistics and Gene Ontology information, if available, for the selected gene pair. The following statistical information is provided:
- 'Cell Line Number: Cluster X': The number of cell lines in gene expression cluster 'X' (where X has value between 1 and 5).
- 'Average CRISPR Score: Cluster X': The average CRISPR score for each gene expression cluster. Non-essential genes have a median value of 0, essential genes have a median value of -1. Therefore, lower values indicate a stronger effect for the CRISPR gene upon the cell lines in the given expression cluster.
- 'CRISPR Score Fold Change (Cluster X - Cluster Y): The fold-change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1).
- 'CRISPR Score P Value (Cluster X - Cluster Y): The p value relating to the change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1).
- 'CRISPR Score Q Value (Cluster X - Cluster Y): The q value relating to the change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1).
The Gene Ontology Details table has two columns:
- 'Identifier': The identifiers for any overlapping Gene Ontology terms, all three branches are considered (i.e. Molecular Function, Biological Process and Cellular Component).
- 'Name': The name assigned to overlapping GO terms.
Clicking the download icon beneath the 'Gene Ontology Details' table exports all of the information in the 'Details' panel to a .csv file (for the selected gene pair).
The 'Plot' panel shows MultiSEp gene expression clusters and, by default, CRISPR scores from CERES for the selected gene pair ('Integrated' plot type). Several plot types are available from the 'Select Plot Type' drop down menu. The 'crispr_only' plot visualises the CRISPR scores for the 'crispr_gene' selected in the 'Results' panel. The 'mRNA_only' plot visualises the MultiSEp gene expression clusters for the 'mrna_gene' selected in the 'Results' panel. The 'protein' plot visualises protein concentrations from mass spectrometry, where available, for the 'mrna_gene' selected in the 'Results' panel. Clicking the download icon beneath the panel exports the data shown in the plot in .csv format.
Figure 8. The Plot Panel Provides Visualisation of CRISPR scores and Gene Expression Clusters. The Plot type menu is shown at the top of the image, and the 'Integrated' plot type is selected. EAF2 gene expression clusters from MultiSEp are on the x-axis and CRISPR scores are given on the y-axis. The dots represent cell lines and are coloured by tissue of origin, which is detailed in the key. Overall, cell lines in the high EAF2 expression cluster are robust to EAF1 loss of function. On the other hand, many cell lines in the EAF2 low expression cluster have a substantial loss of viability/growth when EAF1 is disrupted, indicated by low CRISPR score values. The median CRISPR score for non-essential genes is 0, the median CRISPR score for essential genes is -1. Please note that plots are generated dynamically and there may be some variability in the x coordinates of individual cell lines that fall within the same mRNA expression cluster.
A relatively small proportion of the exome is covered by the available mass spectrometry proteomics data, compared to transcriptome (RNA-seq) data. However, proteomic information is particularly useful in drug discovery applications because proteins form the molecular machines that ensure cellular function, and are the target of the vast majority of drugs. Protein mass-spectrometry data [Nusinow D et al. 2020] is accessible within the CRISPR tab in the drop-down menu of the 'Plot' section (select 'protein' in the menu). The protein concentration values are visualised in the gene expression clusters from MultiSEp analysis of the corresponding mRNA. Figure 9 shows protein expression for NMT2 gene expression clusters.
Figure 9. Visualisation of NMT2 protein concentrations for MultiSEp gene expression clusters. The y-axis shows protein concentration values, and mRNA expression clusters are given on the x-axis. All of the CCLE cell lines are displayed where mass spectrometry proteomic data is available for NMT2. The high mRNA expression cluster also has higher median NMT2 protein concentration, although some cell lines with a low NMT2 mRNA value have relatively high NMT2 protein concentration and vice versa. Please note that plots are generated dynamically and there may be some variability in the x coordinates of individual cell lines within that fall within the same mRNA expression cluster.
Exploration of gene expression and CRISPR score relationships for individual tissues is available by using the 'Tissue Type' tab. A gene pair must be selected in the 'Results' panel of the 'All Tissue' tab (Figure 7) before navigating to 'Tissue Type' (Figure 10). The 'Tissue Breakdown' table displays individual tissue types for the gene pair selected within the 'All Tissue' tab. Text entered into the 'Search' bar applies a filter to select tissues with matching names. Selecting a row in the 'Tissue Breakdown' table populates the 'Details' and 'Tissue Specific Plot' (Figure 11) panels with information for the tissue of interest. The 'Details' panel (Figure 10) provides summary statistics and includes a 'Top Mutations' table that lists genes for the selected tissue with significantly different mutational profiles between the MultiSEp gene expression clusters. By default, the 'Top Mutations' table is ordered by p-value. The following statistical information is provided:
- 'Cell Line Number: Cluster X': The number of cell lines for the selected tissue in gene expression cluster 'X' (where X has value between 1 and 5).
- 'Average CRISPR Score: Cluster X': The average CRISPR score for each gene expression cluster in the selected tissue. Non-essential genes have a median value of 0, essential genes have a median value of -1. Therefore, lower values indicate a stronger effect for the CRISPR gene upon the cell lines in the given expression cluster.
- 'CRISPR Score Fold Change (Cluster X - Cluster Y): The fold-change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1) in the selected tissue:
- 'CRISPR Score P Value (Cluster X - Cluster Y): The fold-change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1) in the selected tissue.
The 'Top Mutations' table has three columns:
- 'mutation_gene': The gene for which mutations were evaluated across the MultiSEp gene expression clusters.
- 'tissue': The selected tissue.
- 'pvalue': Chi-squared test p-value for the enrichment of mutation classes for the mutation_gene across the MultiSEp gene expression clusters.
Selecting a row in the 'Top Mutations' table displays mutations for the mutation_gene in the 'Tissue Specific Plot' panel, if the 'Integrated' plot type is selected (Figure 12).
Figure 10. Tissue-specific Analysis in the 'Tissue Type' Tab. Selecting the 'Tissue Type' panel displays the 'Tissue Breakdown' table for the gene pair that was selected in the 'All Tissue' tab; EAF2, EAF1 was selected in this example. The row for Lung Cancer is selected (blue) in the 'Tissue Breakdown' table, relevant summary statistics and mutations are displayed in the 'Details' panel (right), including the 'Top Mutations' table (bottom right).
The 'Tissue Specific Plot' panel is shown in Figure 11, which gives MultiSEp gene expression clusters and, by default, CRISPR scores derived from CERES for the selected gene pair and tissue ('Integrated' plot type). Several plot types are available from the 'Select Plot Type' drop down menu. The 'crispr_only' plot visualises the CRISPR scores for the 'crispr_gene' and selected tissue from the 'Tissue Breakdown' panel. The 'mRNA_only' plot visualises the MultiSEp gene expression clusters for the 'mrna_gene' and selected tissue in the 'Tissue Breakdown' panel. The 'protein' plot visualises protein concentrations from mass spectrometry, where available, for the 'mrna_gene' and selected tissue in the 'Tissue Breakdown' panel. Clicking the download icon beneath the panel exports the data shown in the displayed plot in .csv format.
Figure 11. The Tissue Specific Plot Panel Provides Visualisation of CRISPR scores and Gene Expression Clusters. The Plot type menu is shown at the top of the image, and the 'Integrated' plot type is selected. EAF2 gene expression clusters from MultiSEp are on the x-axis and CRISPR scores are given on the y-axis. The dots represent lung cancer cell lines. Overall, cell lines in the high EAF2 expression cluster are robust to EAF1 loss of function. On the other hand, many cell lines in the EAF2 low expression cluster have a substantial loss of viability/growth when EAF1 is disrupted, indicated by low CRISPR score values. The median CRISPR score for non-essential genes is 0, the median CRISPR score for essential genes is -1. Please note that plots are generated dynamically and there may be some variability in the x coordinates of individual cell lines within that fall within the same mRNA expression cluster.
Figure 12. Integrated Visualisation of CRISPR Scores, Gene Expression and Mutations in the Tissue Specific Plot Panel. The Plot type menu is shown at the top of the image, and the 'Integrated' plot type is selected. EAF2 gene expression clusters from MultiSEp are on the x-axis and CRISPR scores are given on the y-axis. The dots represent lung cancer cell lines, coloured by mutations in the gene selected in the 'Top Mutations' panel. Please note that plots are generated dynamically and there may be some variability in the x coordinates of individual cell lines within that fall within the same mRNA expression cluster.
Browse & Query
Documentation
SynLeGG is free of charge for all to use (including for commercial use). We make every effort to ensure that the software is free from errors, however all usage of SynLeGG implies acceptance of the disclaimer notice. Usage of SynLeGG also implies acceptance of these Terms of Use and the Privacy & Cookie Policy.
If you use SynLeGG in any published academic or commercial materials, please cite:
Wappett M, Harris A, Lubbock ALR, Lobb I, McDade S, Overton IM. SynLeGG: finding ‘Achilles Heel’ relationships in large CRISPR screens using multi-modal mRNA expression. Manuscript in preparation. URL: www.overton-lab.uk/synlegg
DepMap, Broad (2020): DepMap 20Q3 Public. figshare. Dataset doi:10.6084/m9.figshare.12931238.v1.
Robin M. Meyers, Jordan G. Bryan, James M. McFarland, Barbara A. Weir, ... David E. Root, William C. Hahn, Aviad Tsherniak. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 2017 October 49:1779–1784. doi:10.1038/ng.3984
Dempster, J. M., Rossen, J., Kazachkova, M., Pan, J., Kugener, G., Root, D. E., & Tsherniak, A. (2019). Extracting Biological Insights from the Project Achilles Genome-Scale CRISPR Screens in Cancer Cell Lines. BioRxiv, 720243.
Mahmoud Ghandi, Franklin W. Huang, Judit Jané-Valbuena, Gregory V. Kryukov, ... Todd R. Golub, Levi A. Garraway & William R. Sellers. 2019. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).
Also please see the depmap terms and the CC Attribution 4.0 (CC BY) license.
We hope you find SynLeGG a useful research tool. We are always keen to hear feedback and comments. Please contact us with your views, they are much appreciated.
DISCLAIMER OF LIABILITIES AND WARRANTIES
THE SYNLEGG WEBSITE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS AND/OR QUEEN'S UNIVERSITY BELFAST AND/OR ALMAC DIAGNOSTICS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SYNLEGG WEBSITE OR THE USE OR OTHER DEALINGS IN THE SYNLEGG WEBSITE.
QUEEN'S UNIVERSITY BELFAST AND/OR ALMAC DIAGNOSTICS MAKE NO WARRANTIES ABOUT THE ACCURACY, RELIABILITY, COMPLETENESS, OR TIMELINESS OF THE MATERIAL, SERVICES, SOFTWARE, TEXT, GRAPHICS AND LINKS. DESCRIPTION OF OR REFERENCES TO PRODUCTS OR PUBLICATIONS DOES NOT IMPLY ENDORSEMENT OF THAT PRODUCT OR PUBLICATION.
BY USING THE SYNLEGG WEBSITE YOU AGREE TO ASSUME ALL RISKS ASSOCIATED WITH YOUR USE OR TRANSFER OF ANY AND ALL INFORMATION CONTAINED ON THIS SITE AND TO HOLD THE THE AUTHORS AND/OR QUEEN'S UNIVERSITY BELFAST AND/OR ALMAC DIAGNOSTICS HARMLESS FROM ANY CLAIMS RELATING TO CONTENT OR INFORMATION IN EXCHANGE FOR YOUR USE OF THE SITE.
THE AUTHORS AND/OR QUEEN'S UNIVERSITY BELFAST AND/OR ALMAC DIAGNOSTICS DOES NOT WARRANT THE SYNLEGG WILL OPERATE ERROR-FREE OR THAT THIS SITE AND ITS SERVER ARE FREE OF COMPUTER VIRUSES OR OTHER HARMFUL MECHANISMS. IF YOUR USE OR TRANSFER OF THIS SITE OR THE MATERIALS RESULTS IN THE NEED FOR SERVICING OR REPLACING EQUIPMENT OR DATA, THE AUTHORS AND/OR QUEEN'S UNIVERSITY BELFAST AND/OR ALMAC DIAGNOSTICS IS NOT RESPONSIBLE FOR THOSE COSTS.
INDEMNITY
YOU AGREE TO DEFEND, INDEMNIFY AND HOLD HARMLESS THE QUEEN'S UNIVERSITY BELFAST AND/OR ALMAC DIAGNOSTICS, ITS OFFICERS, DIRECTORS, EMPLOYEES AND AGENTS FROM AND AGAINST ANY CLAIMS ACTIONS OR DEMANDS, (INCLUDING WITHOUT LIMITATION ALL LEGAL AND ACCOUNTING FEES) WHICH MAY ARISE DUE TO YOUR USE OR TRANSFER OF THE SYNLEGG WEBSITE, ITS CONTENTS, OR INFORMATION THEREOF.
By using the SynLeGG website, you consent to the collection, retention and use of your personal information in accordance with the terms of this policy.
We employ security measures to protect your information from access by unauthorised persons and against unlawful processing, accidental loss, destruction or damage.
Unfortunately, the transmission of information via the internet is never completely secure. Although we will do our best to protect your personal data, we cannot guarantee the security of your personal or other data transmitted to our website. Any transmission is at your own risk.
Once we have received your information, we will use strict procedures and security features to prevent unauthorised access.
To cite SynLeGG, please reference the publication below:
Wappett M, Harris A, Lubbock ALR, Lobb I, McDade S, Overton IM. SynLeGG: finding ‘Achilles Heel’ relationships in large CRISPR screens using multi-modal mRNA expression. Nucleic Acids Research, Volume 49, Issue W1, 2 July 2021, Pages W613–W618, DOI: https://doi.org/10.1093/nar/gkab338
The CRISPR tab provides integrated gene expression and CRISPR scores for investigation of candidate 'Achilles-Heel' relationships. The 'Results' table (Figure 7) displays gene pairs that pass the MultiSEp thresholds; by default, rows are ordered by qvalue. Results may be filtered by clicking on any of the PPI, Paralogue and Gene Ontology checkboxes. For example, if the PPI box is checked then only mRNA/CRISPR gene pairs that share a physical protein-protein interaction in BioGRID will be returned. If both the 'PPI' and 'Paralogue' boxes are checked then only within-human paralogues with evidence for a physical protein interaction are displayed in the 'Results' table. The 'Search' bar above the Results table allows querying by gene name; there are also individual search boxes for the columns: mRNA_gene, crispr_gene, and cluster. The 'clusters' column refers to the number of gene expression clusters identified by MultiSEp and requires integer values between 2 and 5. For example, entering a value of 3 in the 'clusters' search box will return genes with three expression clusters. Clicking on the 'U' next to each gene name navigates to the relevant UniProt page (opens in a new tab). Clicking the download icon underneath the 'Results' table exports the contents as a .csv file.
Figure 7. Selecting a gene pair in the CRISPR tab. Gene pairs that have dependencies identified by MultiSEp are shown in the Results panel, the gene pair EAF2, EAF1 is selected (blue row). The 'Details' panel shows statistical and Gene Ontology information for the selected gene pairing (EAF2, EAF1).
Clicking on a row in the 'Results' table populates two additional panels; the 'Details' panel (Figure 7) and the 'Plot' panel (Figure 8). The 'Details' panel provides summary statistics and Gene Ontology information, if available, for the selected gene pair. The following statistical information is provided:
- 'Cell Line Number: Cluster X': The number of cell lines in gene expression cluster 'X' (where X has value between 1 and 5).
- 'Average CRISPR Score: Cluster X': The average CRISPR score for each gene expression cluster. Non-essential genes have a median value of 0, essential genes have a median value of -1. Therefore, lower values indicate a stronger effect for the CRISPR gene upon the cell lines in the given expression cluster.
- 'CRISPR Score Fold Change (Cluster X - Cluster Y): The fold-change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1).
- 'CRISPR Score P Value (Cluster X - Cluster Y): The p value relating to the change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1).
- 'CRISPR Score Q Value (Cluster X - Cluster Y): The q value relating to the change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1).
The Gene Ontology Details table has two columns:
- 'Identifier': The identifiers for any overlapping Gene Ontology terms, all three branches are considered (i.e. Molecular Function, Biological Process and Cellular Component).
- 'Name': The name assigned to overlapping GO terms.
Clicking the download icon beneath the 'Gene Ontology Details' table exports all of the information in the 'Details' panel to a .csv file (for the selected gene pair).
The 'Plot' panel shows MultiSEp gene expression clusters and, by default, CRISPR scores from CERES for the selected gene pair ('Integrated' plot type). Several plot types are available from the 'Select Plot Type' drop down menu. The 'crispr_only' plot visualises the CRISPR scores for the 'crispr_gene' selected in the 'Results' panel. The 'mRNA_only' plot visualises the MultiSEp gene expression clusters for the 'mrna_gene' selected in the 'Results' panel. The 'protein' plot visualises protein concentrations from mass spectrometry, where available, for the 'mrna_gene' selected in the 'Results' panel. Clicking the download icon beneath the panel exports the data shown in the plot in .csv format.
Figure 8. The Plot Panel Provides Visualisation of CRISPR scores and Gene Expression Clusters. The Plot type menu is shown at the top of the image, and the 'Integrated' plot type is selected. EAF2 gene expression clusters from MultiSEp are on the x-axis and CRISPR scores are given on the y-axis. The dots represent cell lines and are coloured by tissue of origin, which is detailed in the key. Overall, cell lines in the high EAF2 expression cluster are robust to EAF1 loss of function. On the other hand, many cell lines in the EAF2 low expression cluster have a substantial loss of viability/growth when EAF1 is disrupted, indicated by low CRISPR score values. The median CRISPR score for non-essential genes is 0, the median CRISPR score for essential genes is -1. Please note that plots are generated dynamically and there may be some variability in the x coordinates of individual cell lines that fall within the same mRNA expression cluster.
A relatively small proportion of the exome is covered by the available mass spectrometry proteomics data, compared to transcriptome (RNA-seq) data. However, proteomic information is particularly useful in drug discovery applications because proteins form the molecular machines that ensure cellular function, and are the target of the vast majority of drugs. Protein mass-spectrometry data [Nusinow D et al. 2020] is accessible within the CRISPR tab in the drop-down menu of the 'Plot' section (select 'protein' in the menu). The protein concentration values are visualised in the gene expression clusters from MultiSEp analysis of the corresponding mRNA. Figure 9 shows protein expression for NMT2 gene expression clusters.
Figure 9. Visualisation of NMT2 protein concentrations for MultiSEp gene expression clusters. The y-axis shows protein concentration values, and mRNA expression clusters are given on the x-axis. All of the CCLE cell lines are displayed where mass spectrometry proteomic data is available for NMT2. The high mRNA expression cluster also has higher median NMT2 protein concentration, although some cell lines with a low NMT2 mRNA value have relatively high NMT2 protein concentration and vice versa. Please note that plots are generated dynamically and there may be some variability in the x coordinates of individual cell lines within that fall within the same mRNA expression cluster.
Exploration of gene expression and CRISPR score relationships for individual tissues is available by using the 'Tissue Type' tab. A gene pair must be selected in the 'Results' panel of the 'All Tissue' tab (Figure 7) before navigating to 'Tissue Type' (Figure 10). The 'Tissue Breakdown' table displays individual tissue types for the gene pair selected within the 'All Tissue' tab. Text entered into the 'Search' bar applies a filter to select tissues with matching names. Selecting a row in the 'Tissue Breakdown' table populates the 'Details' and 'Tissue Specific Plot' (Figure 11) panels with information for the tissue of interest. The 'Details' panel (Figure 10) provides summary statistics and includes a 'Top Mutations' table that lists genes for the selected tissue with significantly different mutational profiles between the MultiSEp gene expression clusters. By default, the 'Top Mutations' table is ordered by p-value. The following statistical information is provided:
- 'Cell Line Number: Cluster X': The number of cell lines for the selected tissue in gene expression cluster 'X' (where X has value between 1 and 5).
- 'Average CRISPR Score: Cluster X': The average CRISPR score for each gene expression cluster in the selected tissue. Non-essential genes have a median value of 0, essential genes have a median value of -1. Therefore, lower values indicate a stronger effect for the CRISPR gene upon the cell lines in the given expression cluster.
- 'CRISPR Score Fold Change (Cluster X - Cluster Y): The fold-change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1) in the selected tissue:
- 'CRISPR Score P Value (Cluster X - Cluster Y): The fold-change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1) in the selected tissue.
The 'Top Mutations' table has three columns:
- 'mutation_gene': The gene for which mutations were evaluated across the MultiSEp gene expression clusters.
- 'tissue': The selected tissue.
- 'pvalue': Chi-squared test p-value for the enrichment of mutation classes for the mutation_gene across the MultiSEp gene expression clusters.
Selecting a row in the 'Top Mutations' table displays mutations for the mutation_gene in the 'Tissue Specific Plot' panel, if the 'Integrated' plot type is selected (Figure 12).
Figure 10. Tissue-specific Analysis in the 'Tissue Type' Tab. Selecting the 'Tissue Type' panel displays the 'Tissue Breakdown' table for the gene pair that was selected in the 'All Tissue' tab; EAF2, EAF1 was selected in this example. The row for Lung Cancer is selected (blue) in the 'Tissue Breakdown' table, relevant summary statistics and mutations are displayed in the 'Details' panel (right), including the 'Top Mutations' table (bottom right).
The 'Tissue Specific Plot' panel is shown in Figure 11, which gives MultiSEp gene expression clusters and, by default, CRISPR scores derived from CERES for the selected gene pair and tissue ('Integrated' plot type). Several plot types are available from the 'Select Plot Type' drop down menu. The 'crispr_only' plot visualises the CRISPR scores for the 'crispr_gene' and selected tissue from the 'Tissue Breakdown' panel. The 'mRNA_only' plot visualises the MultiSEp gene expression clusters for the 'mrna_gene' and selected tissue in the 'Tissue Breakdown' panel. The 'protein' plot visualises protein concentrations from mass spectrometry, where available, for the 'mrna_gene' and selected tissue in the 'Tissue Breakdown' panel. Clicking the download icon beneath the panel exports the data shown in the displayed plot in .csv format.
Figure 11. The Tissue Specific Plot Panel Provides Visualisation of CRISPR scores and Gene Expression Clusters. The Plot type menu is shown at the top of the image, and the 'Integrated' plot type is selected. EAF2 gene expression clusters from MultiSEp are on the x-axis and CRISPR scores are given on the y-axis. The dots represent lung cancer cell lines. Overall, cell lines in the high EAF2 expression cluster are robust to EAF1 loss of function. On the other hand, many cell lines in the EAF2 low expression cluster have a substantial loss of viability/growth when EAF1 is disrupted, indicated by low CRISPR score values. The median CRISPR score for non-essential genes is 0, the median CRISPR score for essential genes is -1. Please note that plots are generated dynamically and there may be some variability in the x coordinates of individual cell lines within that fall within the same mRNA expression cluster.
Figure 12. Integrated Visualisation of CRISPR Scores, Gene Expression and Mutations in the Tissue Specific Plot Panel. The Plot type menu is shown at the top of the image, and the 'Integrated' plot type is selected. EAF2 gene expression clusters from MultiSEp are on the x-axis and CRISPR scores are given on the y-axis. The dots represent lung cancer cell lines, coloured by mutations in the gene selected in the 'Top Mutations' panel. Please note that plots are generated dynamically and there may be some variability in the x coordinates of individual cell lines within that fall within the same mRNA expression cluster.
This section gives a succinct summary for each of the plot types available within SynLeGG.
The 'Integrated' plot shows a boxplot of CRISPR dependency scores. Please see Figure 17, below.The CRISPR tab provides integrated gene expression and CRISPR scores for investigation of candidate 'Achilles-Heel' relationships. The 'Results' table (Figure 7) displays gene pairs that pass the MultiSEp thresholds; by default, rows are ordered by qvalue. Results may be filtered by clicking on any of the PPI, Paralogue and Gene Ontology checkboxes. For example, if the PPI box is checked then only mRNA/CRISPR gene pairs that share a physical protein-protein interaction in BioGRID will be returned. If both the 'PPI' and 'Paralogue' boxes are checked then only within-human paralogues with evidence for a physical protein interaction are displayed in the 'Results' table. The 'Search' bar above the Results table allows querying by gene name; there are also individual search boxes for the columns: mRNA_gene, crispr_gene, and cluster. The 'clusters' column refers to the number of gene expression clusters identified by MultiSEp and requires integer values between 2 and 5. For example, entering a value of 3 in the 'clusters' search box will return genes with three expression clusters. Clicking on the 'U' next to each gene name navigates to the relevant UniProt page (opens in a new tab). Clicking the download icon underneath the 'Results' table exports the contents as a .csv file.
Figure 7. Selecting a gene pair in the CRISPR tab. Gene pairs that have dependencies identified by MultiSEp are shown in the Results panel, the gene pair EAF2, EAF1 is selected (blue row). The 'Details' panel shows statistical and Gene Ontology information for the selected gene pairing (EAF2, EAF1).
Clicking on a row in the 'Results' table populates two additional panels; the 'Details' panel (Figure 7) and the 'Plot' panel (Figure 8). The 'Details' panel provides summary statistics and Gene Ontology information, if available, for the selected gene pair. The following statistical information is provided:
- 'Cell Line Number: Cluster X': The number of cell lines in gene expression cluster 'X' (where X has value between 1 and 5).
- 'Average CRISPR Score: Cluster X': The average CRISPR score for each gene expression cluster. Non-essential genes have a median value of 0, essential genes have a median value of -1. Therefore, lower values indicate a stronger effect for the CRISPR gene upon the cell lines in the given expression cluster.
- 'CRISPR Score Fold Change (Cluster X - Cluster Y): The fold-change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1).
- 'CRISPR Score P Value (Cluster X - Cluster Y): The p value relating to the change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1).
- 'CRISPR Score Q Value (Cluster X - Cluster Y): The q value relating to the change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1).
The Gene Ontology Details table has two columns:
- 'Identifier': The identifiers for any overlapping Gene Ontology terms, all three branches are considered (i.e. Molecular Function, Biological Process and Cellular Component).
- 'Name': The name assigned to overlapping GO terms.
Clicking the download icon beneath the 'Gene Ontology Details' table exports all of the information in the 'Details' panel to a .csv file (for the selected gene pair).
The 'Plot' panel shows MultiSEp gene expression clusters and, by default, CRISPR scores from CERES for the selected gene pair ('Integrated' plot type). Several plot types are available from the 'Select Plot Type' drop down menu. The 'crispr_only' plot visualises the CRISPR scores for the 'crispr_gene' selected in the 'Results' panel. The 'mRNA_only' plot visualises the MultiSEp gene expression clusters for the 'mrna_gene' selected in the 'Results' panel. The 'protein' plot visualises protein concentrations from mass spectrometry, where available, for the 'mrna_gene' selected in the 'Results' panel. Clicking the download icon beneath the panel exports the data shown in the plot in .csv format.
Figure 8. The Plot Panel Provides Visualisation of CRISPR scores and Gene Expression Clusters. The Plot type menu is shown at the top of the image, and the 'Integrated' plot type is selected. EAF2 gene expression clusters from MultiSEp are on the x-axis and CRISPR scores are given on the y-axis. The dots represent cell lines and are coloured by tissue of origin, which is detailed in the key. Overall, cell lines in the high EAF2 expression cluster are robust to EAF1 loss of function. On the other hand, many cell lines in the EAF2 low expression cluster have a substantial loss of viability/growth when EAF1 is disrupted, indicated by low CRISPR score values. The median CRISPR score for non-essential genes is 0, the median CRISPR score for essential genes is -1. Please note that plots are generated dynamically and there may be some variability in the x coordinates of individual cell lines that fall within the same mRNA expression cluster.
A relatively small proportion of the exome is covered by the available mass spectrometry proteomics data, compared to transcriptome (RNA-seq) data. However, proteomic information is particularly useful in drug discovery applications because proteins form the molecular machines that ensure cellular function, and are the target of the vast majority of drugs. Protein mass-spectrometry data [Nusinow D et al. 2020] is accessible within the CRISPR tab in the drop-down menu of the 'Plot' section (select 'protein' in the menu). The protein concentration values are visualised in the gene expression clusters from MultiSEp analysis of the corresponding mRNA. Figure 9 shows protein expression for NMT2 gene expression clusters.
Figure 9. Visualisation of NMT2 protein concentrations for MultiSEp gene expression clusters. The y-axis shows protein concentration values, and mRNA expression clusters are given on the x-axis. All of the CCLE cell lines are displayed where mass spectrometry proteomic data is available for NMT2. The high mRNA expression cluster also has higher median NMT2 protein concentration, although some cell lines with a low NMT2 mRNA value have relatively high NMT2 protein concentration and vice versa. Please note that plots are generated dynamically and there may be some variability in the x coordinates of individual cell lines within that fall within the same mRNA expression cluster.
Exploration of gene expression and CRISPR score relationships for individual tissues is available by using the 'Tissue Type' tab. A gene pair must be selected in the 'Results' panel of the 'All Tissue' tab (Figure 7) before navigating to 'Tissue Type' (Figure 10). The 'Tissue Breakdown' table displays individual tissue types for the gene pair selected within the 'All Tissue' tab. Text entered into the 'Search' bar applies a filter to select tissues with matching names. Selecting a row in the 'Tissue Breakdown' table populates the 'Details' and 'Tissue Specific Plot' (Figure 11) panels with information for the tissue of interest. The 'Details' panel (Figure 10) provides summary statistics and includes a 'Top Mutations' table that lists genes for the selected tissue with significantly different mutational profiles between the MultiSEp gene expression clusters. By default, the 'Top Mutations' table is ordered by p-value. The following statistical information is provided:
- 'Cell Line Number: Cluster X': The number of cell lines for the selected tissue in gene expression cluster 'X' (where X has value between 1 and 5).
- 'Average CRISPR Score: Cluster X': The average CRISPR score for each gene expression cluster in the selected tissue. Non-essential genes have a median value of 0, essential genes have a median value of -1. Therefore, lower values indicate a stronger effect for the CRISPR gene upon the cell lines in the given expression cluster.
- 'CRISPR Score Fold Change (Cluster X - Cluster Y): The fold-change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1) in the selected tissue:
- 'CRISPR Score P Value (Cluster X - Cluster Y): The fold-change in CRISPR dependency score between neighbouring clusters (for example, cluster 2 - cluster 1) in the selected tissue.
The 'Top Mutations' table has three columns:
- 'mutation_gene': The gene for which mutations were evaluated across the MultiSEp gene expression clusters.
- 'tissue': The selected tissue.
- 'pvalue': Chi-squared test p-value for the enrichment of mutation classes for the mutation_gene across the MultiSEp gene expression clusters.
Selecting a row in the 'Top Mutations' table displays mutations for the mutation_gene in the 'Tissue Specific Plot' panel, if the 'Integrated' plot type is selected (Figure 12).
Figure 10. Tissue-specific Analysis in the 'Tissue Type' Tab. Selecting the 'Tissue Type' panel displays the 'Tissue Breakdown' table for the gene pair that was selected in the 'All Tissue' tab; EAF2, EAF1 was selected in this example. The row for Lung Cancer is selected (blue) in the 'Tissue Breakdown' table, relevant summary statistics and mutations are displayed in the 'Details' panel (right), including the 'Top Mutations' table (bottom right).
The 'Tissue Specific Plot' panel is shown in Figure 11, which gives MultiSEp gene expression clusters and, by default, CRISPR scores derived from CERES for the selected gene pair and tissue ('Integrated' plot type). Several plot types are available from the 'Select Plot Type' drop down menu. The 'crispr_only' plot visualises the CRISPR scores for the 'crispr_gene' and selected tissue from the 'Tissue Breakdown' panel. The 'mRNA_only' plot visualises the MultiSEp gene expression clusters for the 'mrna_gene' and selected tissue in the 'Tissue Breakdown' panel. The 'protein' plot visualises protein concentrations from mass spectrometry, where available, for the 'mrna_gene' and selected tissue in the 'Tissue Breakdown' panel. Clicking the download icon beneath the panel exports the data shown in the displayed plot in .csv format.
Figure 11. The Tissue Specific Plot Panel Provides Visualisation of CRISPR scores and Gene Expression Clusters. The Plot type menu is shown at the top of the image, and the 'Integrated' plot type is selected. EAF2 gene expression clusters from MultiSEp are on the x-axis and CRISPR scores are given on the y-axis. The dots represent lung cancer cell lines. Overall, cell lines in the high EAF2 expression cluster are robust to EAF1 loss of function. On the other hand, many cell lines in the EAF2 low expression cluster have a substantial loss of viability/growth when EAF1 is disrupted, indicated by low CRISPR score values. The median CRISPR score for non-essential genes is 0, the median CRISPR score for essential genes is -1. Please note that plots are generated dynamically and there may be some variability in the x coordinates of individual cell lines within that fall within the same mRNA expression cluster.
Figure 12. Integrated Visualisation of CRISPR Scores, Gene Expression and Mutations in the Tissue Specific Plot Panel. The Plot type menu is shown at the top of the image, and the 'Integrated' plot type is selected. EAF2 gene expression clusters from MultiSEp are on the x-axis and CRISPR scores are given on the y-axis. The dots represent lung cancer cell lines, coloured by mutations in the gene selected in the 'Top Mutations' panel. Please note that plots are generated dynamically and there may be some variability in the x coordinates of individual cell lines within that fall within the same mRNA expression cluster.
This section gives a succinct summary for each of the plot types available within SynLeGG.
The 'Integrated' plot shows a boxplot of CRISPR dependency scores. Please see Figure 17, below. The 'Mutation' tab provides integrated gene expression and mutation data for investigation of candidate 'Achilles-Heel' relationships. To commence analysis, HGNC gene symbols(s) are typed into the box 'Please Enter Mutation Gene(s)' located at the top-left of the page (Figure 14); these genes are refered to as 'mutation gene(s)'. If needed, the gene symbol(s) can be checked using a built-in synonym checker by clicking the button 'Check Symbol(s)', which may also provides suggestions based on a dictionary lookup of gene aliases. For queries with multiple gene symbols, separation with whitespace (e.g. a space) is required. SynLeGG evaluates the enrichment of the 'mutation gene(s)' mutational status (including Wild-Type) across MultiSEp gene expression clusters for the 'mRNA_gene(s)'. Pairings between the 'mutation gene(s)' and 'mRNA genes' are displayed in the 'Mutation Results' table, which also has columns for 'tissue', 'pvalue', 'qvalue' and 'Mutation_Number'. Clicking on the 'U' next to each gene name navigates to the relevant UniProt page (opens in a new tab)'. The 'Mutation_Number' column identifies the number of mutations in the 'mutation gene' for the tissue specified in the 'tissue' column. Rows are ordered according to 'Mutation_Number' by default. The results may be filtered by clicking on any of the PPI, Paralogue and Gene Ontology checkboxes. For example, if the PPI box is checked then only mRNA gene and mutation gene pairs that share a physical protein-protein interaction in BioGRID will be displayed. If both the 'PPI' and 'Paralogue' boxes are checked then only within-human paralogues with evidence for a physical protein interaction are shown in the 'Mutation Results' table. The 'Search' bar allows querying by gene name; there are also individual search boxes for the columns: 'mRNA_gene', 'tissue', 'qvalue' and 'Mutation_Number'. Clicking on the 'qvalue', 'pvalue' and 'Mutation_Number' column filter boxes displays a range filter, where both upper and lower limits may be set for the filtering threshold values.
Selecting a row in the 'Mutation Results' table populates the 'Mutation Details' panel (Figure 14, right) and the 'Plot Results' panel (Figures 15 and 16). Clicking the download icon underneath either the 'Mutation Results' table or the 'Mutation Details' table exports the relevant data as a .csv file. The 'Mutation Details' panel has two columns 'Category' and 'Outcome' which provide information about protein-protein interactions, within-human paralogues and overlapping Gene Ontology terms, summarised below:
- Category: Biogrid: May have an 'Outcome' value of either 'Yes' or 'No', according to the availability of evidence for a physical interaction in the BioGRID database for the gene pair selected in the Mutation Results table.
- Category: Paralogue: May have an 'Outcome' value of either 'Yes' or 'No', indicating whether or not the gene pair selected in the Mutation Results table are within-human paralogues.
- Category: GO_TERM_Overlap: The 'Outcome' value for this category provides the identifiers for any Gene Ontology terms that overlap for the gene pair selected in the Mutation Results table.
- Category: GO_TERM_Description: The 'Outcome' value for this category provides the name(s) of any Gene Ontology terms that overlap for the gene pair selected in the Mutation Results table.
Figure 14. Querying Relationships Between Mutation and Gene Expression. The gene symbol TP53 was typed into the 'Please Enter Mutation Genes' box and the GO (Gene Ontology) checkbox is ticked, therefore all of the gene pairs in the 'Mutation Results' table have overlapping GO terms. The table row corresponding to the RPS27L expression gene in Lung Cancer is selected (blue) and so the 'Mutation Details' panel (right) shows the paralogue, PPI and GO information for TP53 and RPS27L.
The 'Plot Results' panel shows gene expression values, MultiSEp gene expression clusters and mutational status for the selected gene pair. The default visualisation is the 'Cluster' plot (Figure 15) and a 'Cluster_Type' plot is also available (Figure 16). Clicking the download icon beneath the panel exports the data shown in the plot in .csv format. Descriptions of the different mutation calls are listed below:
WT: Wild-Type
Missense_Mutation: a single nucleotide change that leads to an amino acid substitution.
Nonsense_Mutation: a single nucleotide change that leads to insertion of a premature stop codon.
Nonstop_Mutation: a single nucleotide change that occurs within a stop codon.
Frame_Shift_Ins: a dna insertion that inserts additional amino acids into the sequence, and changes the way the remaining sequence is translated into amino acids.
Frame_Shift_Del: a dna deletion that deletes amino acids, and changes the way the remaining sequence is translated into amino acids.
In_Frame_Ins: a dna insertion that inserts additional amino acids into the sequence, leaving the remaining sequence unchanged.
In_Frame_Del: a dna deletion that deletes amino acids, leaving the remaining sequence unchanged.
Silent: a dna sequence mutation that does not lead to a change in amino acid sequence.
Splice_Site: a dna sequence mutation that changes the sequence at a splice site.
Start_Codon_SNP: a single nucleotide change that occurs within a start codon.
Start_Codon_Ins: a dna insertion that occurs within a start codon.
Start_Codon_Del: a dna deletion change that occurs within a start codon.
De_novo_Start_OutOfFrame: a dna sequence mutation that generates a novel start codon.
Stop_Codon_SNP: a single nucleotide change that occurs within a stop codon.
Stop_Codon_Ins: a dna insertion that occurs within a stop codon.
Stop_Codon_Del: a dna deletion change that occurs within a stop codon.
Intron: a dna sequence mutation that occurs within an intronic region.
IGR: a dna sequence mutation that occurs within an internegic region.
5'Flank: a dna sequence mutation that occurs within the 5' flank region.
3'UTR: a dna sequence mutation that occurs within the 3'UTR region of the coding sequence.
5'UTR: a dna sequence mutation that occurs within the 5'UTR region of the coding sequence.
Figure 15. Visualisation of Gene Expression and Mutational Status in the Plot Results Panel. The Plot type menu is shown at the top of the image, and the 'Cluster' plot type is selected. RPS27L gene expression clusters from MultiSEp are on the x-axis and RPS27L gene expression values on the y-axis. The dots represent lung cancer cell lines and are coloured by TP53 mutation status. Details of the mutation types are shown in the key to the right of the plot. There is an enrichment of TP53 mutations in the RPS27 low expression cluster.
Figure 16. Alternative Visualisation of Gene Expression and Mutations with the 'Cluster_Type Plot'. Nine plots are shown, corresponding to the nine different classes of mutational status. RPS27L gene expression clusters from MultiSEp are on the x-axis and RPS27L gene expression values on the y-axis of each plot. The dots represent lung cancer cell lines and are coloured by TP53 mutation status.
This section gives a succinct summary for each of the plot types available within SynLeGG.
The 'Integrated' plot shows a boxplot of CRISPR dependency scores. Please see Figure 17, below.