TCGA mutational profiles across 12 major cancer types from Kandoth et al. (2013)


This dataset is available from TCGA, containing somatic mutational profiles for 3096 cancer samples with survival data. These cancer samples belong to one of 12 major cancer types, including breast adenocarcinoma (BRCA), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), uterine corpus endometrial carcinoma (UCEC), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), colon and rectal carcinoma (COAD/READ), bladder urothelial carcinoma (BLCA), kidney renal clear cell carcinoma (KIRC), ovarian serous carcinoma (OV) and acute myeloid leukaemia (LAML). For each patient sample, somatic mutations are represented as a profile of states on genes, where non-zero entry indicates a gene for which how many mutations have occurred in the tumor relative to germ line. The dataset is provided as an 'ExpressionSet' object.




an object of class "ExpressionSet". It has slots for "assayData", "phenoData", and "featureData":

  • assayData: a matrix of 19171 genes X 3096 samples
  • phenoData: variables describing sample phenotypes (i.e. columns in assayData), including clinical/survival information about samples: "time" (i.e. survival time in days), "status" (i.e., survival status: 0=alive; 1=dead), "Age" (the patient age in years), "Gender" (the patient gender: male/female), "TCGA_tumor_type", "Tumor_stage", "Tumor_grade"
  • featureData: variables describing features (i.e. rows in assayData), including information about features/genes: "EntrezID" for gene EntrezID, "Symbol" for gene symbol, "Desc" for gene description, "Synonyms" for gene symbol alias


Kandoth et al. (2013). Mutational landscape and significance across 12 major cancer types. Nature, 502(7471):333-9.


#TCGA_mutations <- dRDataLoader(RData='TCGA_mutations') data(TCGA_mutations) TCGA_mutations
ExpressionSet (storageMode: lockedEnvironment) assayData: 19420 features, 3096 samples element names: exprs protocolData: none phenoData sampleNames: TCGA-B8-4153-01B-11D-1669-08 TCGA-24-1469-01A-01W-0553-09 ... TCGA-34-5241-01A-01D-1441-08 (3096 total) varLabels: time status ... Tumor_grade (7 total) varMetadata: labelDescription featureData featureNames: 1060P11.3 A1BG ... ZZZ3 (19420 total) fvarLabels: EntrezID Symbol Desc Synonyms fvarMetadata: labelDescription experimentData: use 'experimentData(object)' Annotation:
library(Biobase) # extract information about the first 5 samples pData(TCGA_mutations)[1:5,]
time status Age Gender TCGA_tumor_type Tumor_stage TCGA-B8-4153-01B-11D-1669-08 404 0 74 male KIRC 3 TCGA-24-1469-01A-01W-0553-09 277 0 71 female OV 3 TCGA-06-5411-01A-01D-1696-08 254 1 51 male GBM NA TCGA-05-4249-01A-01D-1105-08 1158 0 67 male LUAD 1 TCGA-18-3406-01A-01D-0983-08 371 1 67 male LUSC 1 Tumor_grade TCGA-B8-4153-01B-11D-1669-08 3 TCGA-24-1469-01A-01W-0553-09 3 TCGA-06-5411-01A-01D-1696-08 NA TCGA-05-4249-01A-01D-1105-08 NA TCGA-18-3406-01A-01D-0983-08 NA
# extract information about the first 5 features fData(TCGA_mutations)[1:5,]
EntrezID Symbol 1060P11.3 100506173 1060P11.3 A1BG 1 A1BG A1CF 29974 A1CF A2M 2 A2M A2ML1 144568 A2ML1 Desc 1060P11.3 killer cell immunoglobulin-like receptor, three domains, pseudogene A1BG alpha-1-B glycoprotein A1CF APOBEC1 complementation factor A2M alpha-2-macroglobulin A2ML1 alpha-2-macroglobulin-like 1 Synonyms 1060P11.3 - A1BG A1B|ABG|GAB|HYST2477 A1CF ACF|ACF64|ACF65|APOBEC1CF|ASP A2M A2MD|CPAMD5|FWP007|S863-7 A2ML1 CPAMD9
# number of samples for each cancer type table(pData(TCGA_mutations)$TCGA_tumor_type)
BLCA BRCA COADREAD GBM HNSC KIRC LAML LUAD 92 763 193 275 300 417 185 155 LUSC OV UCEC 171 315 230