Clinical archives of patient material near-exclusively consist of formalin-fixed and paraffin-embedded (FFPE) blocks. The ability to precisely characterise mutational signatures from FFPE-derived DNA has tremendous translational potential. However, sequencing of DNA derived from FFPE material is known to be riddled with artefacts. Here we derive genome-wide mutational signatures caused by formalin fixation. We show that the FFPE-signature is highly similar to signature 30 (the signature of Base Excision Repair deficiency due to NTHL1 mutations), and chemical repair of DNA lesions leads to a signature highly similar to signature 1 (clock-like signature due to spontaneous deamination of methylcytosine). We demonstrate that using uncorrected mutational catalogues of FFPE samples leads to major mis-assignment of signature activities. To correct for this, we introduce FFPEsig, a computational algorithm to rectify the formalin-induced artefacts in the mutational catalogue. We demonstrate that FFPEsig enables accurate mutational signature analysis both in simulated and whole-genome sequenced FFPE cancer samples. FFPEsig thus provides an opportunity to unlock additional clinical potential of archival patient tissues.
Patient samples are routinely processed with formalin fixation and paraffin embedding (FFPE) by pathology laboratories around the world. FFPE preserves tissue morphology and enables immunohistochemical analysis for clinical diagnosis1,2. However, genomic analysis of DNA extracted from FFPE blocks is problematic, as formalin fixation negatively impacts DNA quality and quantity compared to fresh frozen (FF) material3,4. The pathology archive of any large hospital is likely to contain tens of thousands of FFPE blocks. Enabling accurate genomic analysis of FFPE material would unlock tremendous translational research potential from these vast collections of archival material5,6,7.
During the fixation step of FFPE preservation, buffered formalin (4% formaldehyde) penetrates the biospecimen and generates cross-links between intracellular macromolecules (DNA–DNA, DNA–RNA and DNA–protein). These cross-links stall DNA polymerases during library amplification7,8,9. As a consequence, the diversity and the number of templates that can be amplified by PCR from FFPE DNA is significantly depleted4,10. Furthermore, formalin causes hydrolytic deamination of cytosine bases to uracil1,7, resulting in U:G mismatches where DNA polymerase incorporates adenine opposite to uracil in amplicon-based protocols, generating artefactual C:G>T:A substitutions in sequencing data5,6,7.
To mitigate deamination artefacts, some FFPE sequencing library preparations provide repair treatment whereby uracil DNA glycosylase (UDG) is added to remove uracil bases prior to amplification5,6,11. However, formalin-induced deamination of 5-methylcytosine (5mC; exclusively present in CG dinucleotides) would be converted directly to thymine instead of uracil3,10. This second class of formalin artefacts is not corrected by the UDG treatment; therefore, downstream bioinformatics approaches are necessary to attempt their removal7.
Mutational signatures derived from whole-genome sequencing (WGS) data characterise the mutational processes that have acted upon the cancer genome12,13. Single base substitution (SBS) signatures are derived by considering the type of specific base pair change (e.g. C>T) together with the flanking base pair context (e.g. ACA>ATA)12,13. The recently updated mutational signature catalogue provides a comprehensive source of mutational processes active in human cancers, which is derived from an unprecedentedly large number of samples14. Activities of signatures have immediate translational relevance15,16,17,18,19, for example homologous recombination (HR) deficiency signature (SBS3), which is one of the response indicators to poly (ADP-ribose) polymerase (PARP) inhibitors for targeted therapy17,20,21.
Mutational signature analysis on FFPE material is problematic because of the artefactual mutations induced by formalin fixation5,6,7. Here, we use the statistical machinery of mutational signature analysis to derive a mutational footprint caused by formalin exposure during FFPE biospecimen processing. First, we identify formalin artefact mutational signatures in both unrepaired (without UDG) and repaired-FFPE (with UDG) samples, using paired FFPE and FF sample sequencing data from the same tissue. We next design and validate a decomposition algorithm, FFPEsig, to subtract FFPE artefacts and thereby infer biological mutation profile in a given FFPE sample. We demonstrate the efficiency of our method on synthetic and sequenced FFPE samples and show that FFPEsig can correctly recover the true activities of mutational signatures otherwise masked by FFPE-induced artefacts. Our method enables robust mutational signature analysis on FFPE samples, thus paving the way towards clinical implementation using FFPE WGS data.
Read more...