Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Integrative analysis of pooled CRISPR genetic screens using MAGeCKFlute

Abstract

Genome-wide screening using CRISPR coupled with nuclease Cas9 (CRISPR–Cas9) is a powerful technology for the systematic evaluation of gene function. Statistically principled analysis is needed for the accurate identification of gene hits and associated pathways. Here, we describe how to perform computational analysis of CRISPR screens using the MAGeCKFlute pipeline. MAGeCKFlute combines the MAGeCK and MAGeCK-VISPR algorithms and incorporates additional downstream analysis functionalities. MAGeCKFlute is distinguished from other currently available tools by its comprehensive pipeline, which contains a series of functions for analyzing CRISPR screen data. This protocol explains how to use MAGeCKFlute to perform quality control (QC), normalization, batch effect removal, copy-number bias correction, gene hit identification and downstream functional enrichment analysis for CRISPR screens. We also describe gene identification and data analysis in CRISPR screens involving drug treatment. Completing the entire MAGeCKFlute pipeline requires ~3 h on a desktop computer running Linux or Mac OS with R support.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic representation of CRISPR–Cas9 screen analysis using MAGeCKFlute.
Fig. 2: Example quality control assessment of CRISPR–Cas9 screen data.
Fig. 3: Batch effect correction and normalization of read counts and beta scores from CRISPR screen data.
Fig. 4: CRISPR–Cas9 screen analysis by MAGeCKFlute.

Similar content being viewed by others

Data availability

The source code of MAGeCKFlute (version 0.99.18) is freely available at https://bitbucket.org/liulab/mageckflute/ under the three-clause Berkeley Software Distribution (BSD) open-source license. Questions or comments can be submitted through the MAGeCK Google group: https://groups.google.com/d/forum/mageck. The datasets used in this paper are presented in http://cistrome.org/MAGeCKFlute/.

References

  1. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Gilbert, L. A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159, 647–661 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Konermann, S. et al. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature 517, 583–588 (2015).

    Article  CAS  PubMed  Google Scholar 

  4. Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014).

    Article  CAS  PubMed  Google Scholar 

  6. Koike-Yusa, H., Li, Y., Tan, E. P., Velasco-Herrera Mdel, C. & Yusa, K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat. Biotechnol. 32, 267–273 (2014).

    Article  CAS  PubMed  Google Scholar 

  7. Shalem, O. et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science 343, 84–87 (2014).

    Article  CAS  PubMed  Google Scholar 

  8. Hart, T. et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163, 1515–1526 (2015).

    Article  CAS  PubMed  Google Scholar 

  9. Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Chen, S. et al. Genome-wide CRISPR screen in a mouse model of tumor growth and metastasis. Cell 160, 1246–1260 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Manguso, R. T. et al. In vivo CRISPR screening identifies Ptpn2 as a cancer immunotherapy target. Nature 547, 413–418 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Burr, M. L. et al. CMTM6 maintains the expression of PD-L1 and regulates anti-tumour immunity. Nature 549, 101–105 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Kurata, M. et al. Using genome-wide CRISPR library screening with library resistant DCK to find new sources of Ara-C drug resistance in AML. Sci. Rep. 6, 36199 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Han, K. et al. Synergistic drug combinations for cancer identified in a CRISPR screen for pairwise genetic interactions. Nat. Biotechnol. 35, 463–474 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Shi, J. et al. Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains. Nat. Biotechnol. 33, 661–667 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Mootha, V. K. et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–273 (2003).

    Article  CAS  PubMed  Google Scholar 

  17. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Li, W. et al. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biol. 16, 281 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Toledo, C. M. et al. Genome-wide CRISPR-Cas9 screens reveal loss of redundancy between PKMYT1 and WEE1 in glioblastoma stem-like cells. Cell Rep. 13, 2425–2439 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Luo, B. et al. Highly parallel identification of essential genes in cancer cells. Proc. Natl. Acad. Sci. USA 105, 20380–20385 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Konig, R. et al. A probability-based approach for the analysis of large-scale RNAi screens. Nat. Methods 4, 847–849 (2007).

    Article  PubMed  Google Scholar 

  24. Hart, T. & Moffat, J. BAGEL: a computational framework for identifying essential genes from pooled library screens. Bioinformatics 17, 164 (2016).

    PubMed  PubMed Central  Google Scholar 

  25. Yu, J., Silva, J. & Califano, A. ScreenBEAM: a novel meta-analysis algorithm for functional genomics screens via Bayesian hierarchical modeling. Bioinformatics 32, 260–267 (2016).

    Article  CAS  PubMed  Google Scholar 

  26. Morgens, D. W., Deans, R. M., Li, A. & Bassik, M. C. Systematic comparison of CRISPR-Cas9 and RNAi screens for essential genes. Nat. Biotechnol. 34, 634–636 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Falcon, S. & Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 23, 257–258 (2007).

    Article  CAS  PubMed  Google Scholar 

  28. Yu, G., Lg, W., H., Y. & Qy., H. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Shalem, O., Sanjana, N. E. & Zhang, F. High-throughput functional genomics using CRISPR-Cas9. Nat. Rev. Genet. 16, 299–311 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Gini, C. “Concentration and dependency ratios” (in Italian). Rev. Pol. Econ. 87, 769–789 (1997).

    Google Scholar 

  31. Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

    Article  PubMed  Google Scholar 

  32. Chen, C. H. et al. Improved design and analysis of CRISPR knockout screens. Bioinformatics 34, 4095–4101 (2018).

  33. Jiang, P. et al. Network analysis of gene essentiality in functional genomics experiments. Genome Biol. 16, 239 (2015).

  34. DeKelver, R. C. et al. Functional genomics, proteomics, and regulatory DNA analysis in isogenic settings using zinc finger nuclease-driven transgenesis into a safe harbor locus in the human genome. Genome Res. 20, 1133–1142 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Hockemeyer, D. et al. Efficient targeting of expressed and silent genes in human ESCs and iPSCs using zinc-finger nucleases. Nat. Biotechnol. 27, 851–857 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Aguirre, A. J. et al. Genomic copy number dictates a gene-independent cell response to CRISPR/Cas9 targeting. Cancer Discov. 6, 914–929 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Sherr, C. J. & Roberts, J. M. CDK inhibitors: positive and negative regulators of G1-phase progression. Genes Dev. 13, 1501–1512 (1999).

    Article  CAS  PubMed  Google Scholar 

  39. Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).

    Article  CAS  Google Scholar 

  40. Wang, T. et al. Gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic Ras. Cell 168, 890–903 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Tzelepis, K. et al. A CRISPR dropout screen identifies genetic vulnerabilities and therapeutic targets in acute myeloid leukemia. Cell Rep. 17, 1192–1205 (2016).

    Article  Google Scholar 

  42. Wang, T., Wei. J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014).

  43. Chen, C.H., et al. Improved design and analysis of CRISPR knockout screens. Bioinformatics 34, 4095-4101 (2018).

  44. Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Luo, W. & Brouwer, C. Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 29, 1830–1831 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This project was supported by the National Institutes of Health (R01 HG008927), the National Key Research and Development Program of China (2017YFC0908500 to X.S.L), the Breast Cancer Research Foundation, the Department of Defense (PC140817P1 to M.B. and X.S.L), and the start-up fund of the Center for Genetic Medicine Research and the Gilbert Family Neurofibromatosis Institute (to W.L.).

Author information

Authors and Affiliations

Authors

Contributions

W.L. and X.S.L. developed the original MAGeCK and MAGeCK-VISPR algorithm. B.W., M.W. and W.Z. developed the R package MAGeCKFlute. B.W. and W.Z. performed the data analysis; B.W., M.W., F.W., W.L. and X.S.L. wrote the manuscript with the help of Z.L., N.T. and X.W. W.L., X.S.L., B.W., M.W., W.Z., F.W., Z.L., N.T., X.W., T.X., C.-H.C., A.W., S.M., Y.C., S.S., J.J.L., M.H., J.Z. and M.B. contributed to the discussion and writing of the final manuscript.

Corresponding authors

Correspondence to Wei Li or X. Shirley Liu.

Ethics declarations

Competing interests

T.X. and X.S.L are co-founders and M.B. and X.S.L. are on the Scientific Advisory Board of GV20 Oncotherapy. The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Li, W. et al. Genome Biol. 15, 554 (2014): https://doi.org/10.1186/s13059-014-0554-4

Li, W. et al. Genome Biol. 16, 281 (2015): https://doi.org/10.1186/s13059-015-0843-6

Jeselsohn, R. et al. Cancer Cell 33, 173–186 (2018): https://doi.org/10.1016/j.ccell.2018.01.004

Xiao, T. et al. Proc. Natl Acad. Sci. USA 115, 7869–7878 (2018): https://doi.org/10.1073/pnas.1722617115

Key data used in this protocol

Toledo, C. M. et al. Cell Rep. 13, 2425–2439 (2015): https://doi.org/10.1016/j.celrep.2015.11.021

Hart, T. et al. Cell 163, 1515–1526 (2015): https://doi.org/10.1016/j.cell.2015.11.015

Shalem, O. et al. Science 343, 84–87 (2014): https://doi.org/10.1126/science.1247005

Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Science 343, 80–84 (2014): https://doi.org/10.1126/science.1246981

Chen, C.-H. et al. Bioinformatics 34, 4095–4101 (2018): https://doi.org/10.1093/bioinformatics/bty450

Integrated supplementary information

Supplementary Figure 1 Selection of non-essential genes for normalization of CRIPSR screen data.

(a) Distribution of expression of all non-essential genes in CCLE cell lines. The x-axis is the relative expression of all non-essential genes measured by microarray. The y-axis is the density of expression of all non-essential genes. Genes with expression levels below the cutoff (red dashed line) were excluded from the non-essential gene list. (b) The coordinate of each dot indicates the number of genes (y-axis) whose expression ranked between the 5th and 100th percentile of the number of cell lines (x-axis). The dashed lines indicate that there are 350 out of 937 non-essential genes had expression that ranked between the 5th and 100th percentile in 98.3% (1019 out of 1036) Cancer Cell Line Encyclopedia (CCLE)36 cell lines.

Supplementary Figure 2 Copy number bias correction in MAGeCK-MLE.

Model of the relationship between β scores and gene copy numbers before (a) and after (b) copy number correction. The red line of each panel is the regression line, and the inflection point is calculated by minimizing the least squared error. Without the copy number bias correlation, the beta score shows a positive correlation with copy number. This bias can be corrected using MAGeCKFlute.

Supplementary Figure 3 Normalization with essential genes.

Beta score of core essential genes (blue dots) and all genes except essential genes (red dots) before and after normalization with essential genes. The histograms (blue bars) show the beta scores of treatment (top) and control (right) conditions. Before normalization (a), the beta score distribution of treatment and control conditions are not comparable. After normalization (b), these two distributions are more comparable (c) The formula for normalization of the beta score using essential genes where c is an empirical value is used to scale the normalized beta score. The value of c is 0.6 and was obtained from public screen data8.

Supplementary Figure 4 Output figures of MAGeCKFlute.

All the data are from a genome-wide CRRSPR screen on the A375 cell line (EQUIPMENT) and downstream analysis was performed with FluteMLE (a) Beta score distribution of treatment samples (PLX7_R1, PLX7_R2) and control samples (D7_R1, D7_R2). (b) Scatterplot of beta scores of treatment (PLX7_R1) and control (D7_R1) sample. The regression line (dashed line) indicates the consistency of beta scores between the two conditions. (c) The MA plot can be used to visualize the differences between beta scores in two samples, by transforming the data onto M (log ratio) and A (mean average) scales, in which M= βTC, A=βTC, βT is the beta score of treatment samples, βC is the beta score of control samples. Blue line is M=0 and red line is the loess regression line. (d) Identification of treatment related genes. The horizontal and vertical dashed lines indicate the mean plus or minus one stand deviation of treatment and control beta score, respectively. The diagonal dashed line indicates mean plus or minus one standard deviation of the differential beta score which can be calculated by subtracting the control from the treatment beta score. The number in red is the number of genes classified in each group. Top 5 genes are selected based on the largest absolute value of the differential beta score and labelled in each group. Genes in the green group are strongly negatively selected in the control samples and are weakly positively or negatively selected in the treatment samples. These genes are potentially located in the pathways targeted by the treatment. The orange group contains genes that are weakly selected in the control and strongly positively selected in treatment. These are genes whose loss confers treatment resistance. Genes in the blue group are strongly positively selected in the control and weakly selected in the treatment. These genes may be either potential regulators of cell proliferation in general, or regulators of the treatment target. Genes in the purple group are weakly selected in the control and strongly negatively selected in the treatment. These genes are potentially synthetically lethal in combination with the drug treatment. The histograms (grey bars) show the beta scores of treatment (top) and control (right) conditions.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4 and Supplementary Methods

Reporting Summary

Supplementary Data 1

The nonessential gene list.

Supplementary Data 2

Copy-number file used to perform the copy-number correction.

Supplementary Data 3

The list of core essential genes.

Supplementary Data 4

The LNCap data, which include AAVS1, CCR5 and ROSA26 as negative-control genes.

Supplementary Video 1

A video tutorial showing how to edit the ‘config.yaml’ file used by MAGeCK-VISPR.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Wang, M., Zhang, W. et al. Integrative analysis of pooled CRISPR genetic screens using MAGeCKFlute. Nat Protoc 14, 756–780 (2019). https://doi.org/10.1038/s41596-018-0113-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-018-0113-7

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing