E.8 Genome Analyses

The primary goal of the ICGC is to generate catalogues of somatic genomic abnormalities
(mutations) in different tumor types and/or subtypes which are of clinical and societal
importance across the globe. Assembling a restricted, core set of genomic analyses at the outset is challenging, given that technologies are rapidly evolving and more platforms are in development. It will therefore be necessary to review the recommendations on a regular and frequent basis. It is also preferable to avoid being unduly prescriptive about study designs and platform choices, as it is conceivable that several different technologies and designs could be used to achieve the same ultimate goals.

Ultimately, however, it is critical to the overall success of the ICGC that the datasets obtained from one class of cancer (generated in a particular way) will be directly comparable to the datasets obtained from another class of cancer (even if generated using a different approach/technology). It is particularly important, therefore, that members adhere to quality standards set by the ICGC, including sufficient depth and coverage to detect a high proportion of somatic mutations in each sample that will be interrogated. We outline below the yields of somatic alteration from each cancer expected by the ICGC and a process for evaluating the quality of data generated by different centers.

The following classes of genome analyses are recommended for ICGC membership:

Class 1: Catalogue of Somatic Mutations

POLICY: Genomic DNA analyses of tumors (and matching control DNA) are core elements of the project, and are therefore referred to as mandatory studies.

Ultimately, catalogues for each tumor type or subtype will include the full range of somatic mutations including single base substitutions, insertions, deletions, copy number changes, translocations and other chromosomal rearrangements. It is anticipated that cancer genome studies will expand to include high coverage whole genome shotguns of cancer and normal genomes. This is an anticipated goal that should be envisaged by all participants joining the ICGC. A whole genome shotgun design will ultimately provide the optimal, pragmatic strategy and primary output of the ICGC. New generation sequencing technologies are becoming available, the parameters of which make the whole genome shotgun approach potentially a realistic option for application to substantial numbers of cancers within the next 2-5 years.

All sequencing platforms have an error rate. The error rate may in part be mitigated by high genome coverage provided by the whole genome shotgun. However it is likely that errors will remain. We propose that at least 95% of somatic variants listed in the catalogue for each sample should be real. To generate a high quality catalogue of variants may, therefore, require confirmation of somatic variants by a targeted technology, both to exclude sequence artifacts and to eliminate residual private polymorphisms. This curation of each cancer genome may constitute a substantial additional workload to the analysis of cancer genomes and would essentially constitute a “finishing” phase to the generation of the catalogue of somatic variants.

Box 6. Whole genome shotgun analyses (anticipated policy)

The aim of the whole genome shotgun will be to harvest as high a proportion as practically feasible of the somatic mutations present in each individual cancer sample. We propose that at least 80% of the somatic alterations should be identified in each sample and that coverage calculations on each sample should be based on this expectation.

Several issues will need to be understood, however, before specific recommendations on whole genome shotgun designs can be made. In order to provide an almost complete catalogue of somatic mutations in each cancer sample the most important consideration will be the overall depth of sequence coverage obtained, which will determine the sensitivity and specificity of detection of somatic mutations. The coverage required will be influenced by a number of factors including the presence of aneuploidy in the cancer, tissue heterogeneity (normal contamination and tumor subclones), the prevalence of somatic mutations in the cancer, sequence error rates, other data features of the sequencing technology adopted and the proportion of known SNPs.

In order to ascertain which variants are somatic it will be necessary to evaluate variants found in the cancer sample in normal DNA from the same individual. It is likely that this will primarily be obtained through a whole genome shotgun of the normal DNA sample. However, other approaches are not excluded.

Preliminary estimates indicate that approximately 30-fold genome coverage, and possibly more, of the cancer sample will be required. Formal power calculations will be provided to support these predictions and will be adapted in future on the basis of specific information on each platform used.

Whole genome shotgun analyses of cancer genomes may not be feasible for two years or more. Box 7 lists the recommended initial interim goals that are proposed until whole genome shotgun approaches are feasible on thousands of samples. It is expected that members joining the ICGC plan to go beyond the interim analyses, and launch whole genome sequencing when the technologies are shown to be robust and affordable. The ICGC will monitor progress and re-evaluate these guidelines periodically.

Box 7. Interim, large-scale, catalogues of somatic mutations
  1. Sequencing of all coding exons and other genomic regions of particular biological interest for point mutations. There are several technologies now available to achieve this goal including enrichment by array pull down or PCR followed by sequencing on one of the new technology platforms. The aim of these analyses would be to find at least 80% of somatic alterations in these regions in each cancer sample. Sequence coverage should be estimated on this basis. The primary targets would be all coding exons / splice sites and microRNAs, followed by regulatory and conserved non-coding sequences;
  2. Analysis of low genome coverage of paired-end reads for rearrangements. Paired-end designs will be available for most new sequencing technologies. The aim of these analyses will be to identify at least 80% of somatic genomic rearrangements down to sequence level resolution. Paired-end sequence coverage should be estimated on this basis;
  3. Genotyping arrays
    It is recommended that a high density genotyping array be performed at an early stage on all samples in the ICGC set. This is a straightforward and inexpensive
    additional experiment that will provide copy number, LOH and breakpoint
    information that is highly useful in the interpretation of other
    analyses. Information from the genotyping array will also critically be useful in tracking of samples and confirming the relationship between a tumor and a normal sample.

The above studies should preferably be conducted on samples that will be entered into the final whole genome shotgun approach and therefore the data will ultimately be merged.

Class 2: Complementary genomic analyses

POLICY: Additional studies of DNA methylation and RNA expression are recommended on the same samples that are used to find somatic mutations.

The potential list of complementary analyses is long. The recommended supplementary analyses for ICGC have therefore been selected to be pragmatic, to have relatively easily achievable aims, to not significantly complicate sample acquisition, and to likely enrich interpretation of somatic mutation information. The ICGC members have restricted the scope of the Consortium to include only those analyses only requiring DNA and RNA.

2.1. Analyses of DNA methylation

Optimally, the outcome of DNA methylation profiling should be the assignment of the methylation state of every CpG and the identification of CpGs that are differentially methylated in the cancer versus normal genomes of the same tissues.

The current expectation is that it may not be necessary to determine the frequency/extent of DNA methylation at each and every CpG, but it may be sufficient to determine the frequency/extent of DNA methylations within a small genomic region containing multiple CpGs.

Current ‘comprehensive’ DNA methylation analyses have focused mostly on (CpG-island containing) promoters and readout of differential methylated regions (DMRs) on microarrays. However, DNA methylation outside CpG-islands and/or promoters may well be diagnostic and prognostic and instrumental in classification even though the expression profile in the cancer cell may not be affected by the extent of this type of DNA methylation.

A wide variety of techniques have been described for identifying/profiling of DNA methylation. They differ in the resolution of methylation mapping, the ability to give qualitative rather than quantitative measurements, and in their potential to be used in global rather than gene-specific analysis. There is consensus at this point that it is unwise to make specific recommendations regarding the methylation approach that must be used by ICGC cancer projects.

2.2. Analyses of RNA expression

There are currently several technologies and platforms for analysis of RNA expression. These continue to develop. In particular, digital quantification of expression based on sequencing technologies may become practical soon and may be optimal for these purposes. At this time, ICGC therefore is not making specific recommendations on technological approaches to be used for RNA expression studies.

It is recommended, however, that analyses of expression include all protein coding genes (or use easily available commercial platforms that include most protein coding genes) and consider some of the non-coding RNAs, notably microRNAs. Analysis of the transcriptome may be more critical in some cancer types than in others, for example in breast cancer where it is fundamental to the classification.

Class 3: Optional analyses

Although outside of the initial scope of the ICGC, other analyses of samples used in the somatic mutation screen are clearly to be encouraged. These could include:

  • Proteomic analyses;
  • Metabolomic analyses;
  • Immunohistochemical analyses;
  • Analyses of chromatin state;
  • It may be particularly helpful in the long term to construct tissue arrays from the cancers in each category for future immunohistochemical and other in situ analyses.

No specific recommendation is made to ICGC members regarding approaches, platforms, and other issues related to optional analyses.

Quality control
The ability of each center to produce data of the requisite quality will be assessed by circulating a small set of inexhaustible tumor/normal samples which each center will have to analyze for each component of the project they are engaging in. It is proposed that these samples be publicly available cancer cell lines for which a normal DNA sample is available. Cancer cell lines may be spiked with normal DNA to better recapitulate the state of a primary tumor specimen. Centers will be expected to provide coverage of these samples such that 80% of somatic alterations are detected, of which 95% should be real.

Coverage requirements for primary tumor samples analyzed by the ICGC will be estimated on an individual basis by assessment of sequence error rate in each sample and other parameters that determine sensitivity and specificity of variant detection. These ongoing quality control measures will continue to be refined by the ICGC with a view to implementation during the course of the project.