The genomic and transcriptional landscape of primary central nervous system lymphoma

Primary lymphomas of the central nervous system (PCNSL) are mainly diffuse large B-cell lymphomas (DLBCLs) confined to the central nervous system (CNS). Despite extensive research, the molecular alterations leading to PCNSL have not been fully elucidated. In order to provide a comprehensive description of the genomic and transcriptional landscape of PCNSL, we here performed whole-genome and transcriptome sequencing and integrative analysis of 51 lymphomas presenting in the CNS, including 42 EBV-negative PCNSL, 6 secondary CNS lymphomas (SCNSL) and 3 EBV+ CNSL and matched controls. The results were compared to an independent validation cohort of 31 FFPE CNSL specimens (PCNSL, n = 19; SCNSL, n = 9; EBV+ CNSL, n = 3) as well as 39 FL and 36 systemic DLBCL cases outside the CNS. Somatic genomic alterations in PCNSL mainly affect the JAK-STAT, NFkB, and B-cell receptor signaling pathways, with hallmark recurrent mutations including MYD88 L265P (67%) and CD79B (63%), CDKN2A deletions (83%) and also non-coding RNA genes such as MALAT1 (70%), NEAT (60%), and MIR142 (80%). Kataegis events, which affected 15 of 50 identified driver genes and 21 of the top 50 mutated ncRNAs, played a decisive role in shaping the mutational repertoire of PCNSL. Compared to systemic DLBCL, PCNSLs exhibited significantly more focal deletions in 6p21 targeting the HLA-D locus that encodes for MHC class II molecules as a potential mechanism of immune evasion. Mutational signatures correlating with DNA replication and mitosis (SBS1, ID1 and ID2) were significantly enriched in PCNSL (SBS1: p = 0.0027, ID1/ID2: p < 1x10-4). Furthermore, TERT gene expression was significantly higher in PCNSL compared to ABC-DLBCL (p = 0.027). Although PCNSL share many genetic alterations with systemic ABC-DLBCL in the same signaling pathways, transcriptome analysis clearly distinguished both into distinct molecular subtypes. EBV+ CNSL cases may be distinguished by lack of recurrent mutational hotspots apart from IG and HLA-DRB loci.


Introduction
Central nervous system (CNS) lymphomas are predominantly aggressive neoplasms involving brain, meninges, spinal cord and eyes 1,2 . Two clinical subtypes of CNSL can be distinguished: primary central nervous system lymphoma (PCNSL), which is confined to the CNS; and secondary central nervous system lymphoma (SCNSL) presenting initially with systemic, non-CNS involvement. SCNSL reflects a spread of a peripheral lymphoma to the CNS and its presentation, tropism, outcome and therapeutic options differ from PCNSL 3,4 .
PCNSL incidence is increased in immunocompromised patients, in which the tumor cells are typically Epstein-Barr virus (EBV)-positive. In contrast, PCNSL in immunocompetent patients is typically EBVnegative. The mechanisms leading to the topographical restriction of PCNSL are still matter of scientific debate 5 . PCNSL is classified as diffuse large B-cell lymphoma (DLBCL) in the vast majority of cases (approx. 90%) which immunohistochemically most often show a non-GCB immunophenotype 1,6,7 according to the Hans classification 8 . The tumor cells express pan B-cell markers (CD19, CD20, and CD79a), the germinal center (GC)-associated molecule BCL6 9 , and the post-GC-associated marker MUM1/IRF4 10 . By gene expression profiling, the tumor cells are most closely related to late germinal center (exit) B-cells 11 . Pathomechanistic genomic alterations involving Toll-like-and B-cell receptor (TLR, BCR) signaling pathways have been identified in previous studies revealing a very high frequency of somatic nonsynonymous mutations in genes such as MYD88, CARD11, and CD79B [12][13][14][15][16] . Additionally, often homozygous HLA class II 17,18 and CDKN2A loss, recurrent BCL6 translocations 19,20 and structural variants at chromosome band 9p24.1 (affecting CD274/PD-L1 and PDCD1LG2/PD-L2) 21 as well as TBL1XR1 variants 22 have been repeatedly described in PCNSL 23,24 . These mutational patterns suggest PCNSL to be genetically similar to recently described "MCD", "C5" or "MYD88-like" subtypes for which a derivation from long-lived memory B-cells has been proposed 25-30 .
The outcome of PCNSL, even in immunocompetent hosts, is poor compared to most primary systemic DLBCL 31 , though probably not worse than that of DLBCL of the MCD/C5 group in general 26 . High-dose methotrexate (MTX) remains the commonly administered therapy but the use of rituximab (monoclonal anti-CD20 antibody) has been shown to be effective 32,33 . However, reports on rituximab efficiency in PCNSL are conflicting [34][35][36] . Genomic studies have suggested lymphoma cell proliferation and survival to be driven at least in part, by deregulated TLR, BCR, JAK-STAT, and NFkB signaling pathways inducing constitutive NFkB activation 37-39 . Therefore, inhibitors up-and downstream of NFkB such as ibrutinib, known to inhibit Bruton's tyrosine kinase (BTK) as critical mediator of B-cell receptor signaling, and lenalidomide which was shown to have indirect effects on tumor immunity have been applied, which seem to be effective therapeutic alternatives in PCNSL 40-43 . PD-L1/2 blockade is discussed as another is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021. Barr virus (EBV) PCR as previously described 53 (Supplementary figure 1 B). For the categorization of GCB and non-GCB, the samples were stratified according to the Hans classification 8 (CD10, BCL6, MUM1. We enrolled CNSL from a total of 51 patients for whole-genome (WGS, n = 38) and RNA sequencing (RNAseq, n = 37), including n = 24 samples subjected to both workflows. The study cohort and sample size as well as the experimental design, analysis workflow, diagnosis, and quality metrics of WGS and RNAseq are displayed in Figure 1 and Supplementary table 1. The inclusion criteria were based on the diagnosis of PCNSL and SCNSL according to the recent WHO classifications of tumors of hematopoietic and lymphoid organs and tumors of the central nervous system 1,2,7,54,55 . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021.

Alignment of sequencing reads
Sequencing reads were aligned using the DKFZ alignment workflow from ICGC Pan-Cancer Analysis of Whole Genome projects (https://dockstore.org/containers/quay.io/pancancer/pcawg-bwa-memworkflow). Briefly, read pairs were mapped to the human reference genome (build 37, version hs37d5) using bwa mem (version 0.7.8) with minimum base quality threshold set to zero [-T 0] and remaining settings left at default values 57 , followed by coordinate sorting with biobambam bamsort (version 0.0.148) with compression option set to fast (1) and marking duplicate read pairs with biobambam bammarkduplicates with compression option set to best (9) 58 .

Small mutation calling and annotation
Somatic small variants (SNVs and indels) in matched tumor normal pairs were called using the DKFZ in-house pipelines as previously described 59 . Briefly, the SNVs were identified using samtools and bcftools version 0.1.1957 60 and then classified as somatic or germline by comparing the tumor sample to the control, and later assigned a confidence which is initially set to 10, and subsequently reduced based on overlaps with repeats, DAC blacklisted regions, DUKE excluded regions, self-chain regions, . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  61 and additionally if the SNV exhibited PCR or sequencing strand bias. Only SNVs with confidence 8 or above were considered for further analysis. Tumor and matched blood samples were analyzed by Platypus 62 to identify indels. Indel calls were filtered based on Platypus internal confidence calls, and only indels with confidence 8 or greater were used for subsequent analysis. In order to remove recurrent artifacts and misclassified germline events, somatic indels that were identified as germline in at least two patients in the CNS lymphoma cohort were excluded.
The protein coding effect of somatic SNVs and indels from all samples were annotated using ANNOVAR 63 according to GENCODE gene annotation (version 19) and overlapped with variants from dbSNP10 (build 141) and the 1000 Genomes Project database. Mutations of interest were defined as somatic SNV and indels that were predicted to cause protein coding changes (non-synonymous SNVs, gain or loss of stop codons, splice site mutations, and both frameshift and non-frameshift indels), and also synonymous exonic mutations on non-coding genes.

Tumor in normal contamination detection
We applied the TiNDA (tumor in normal detection algorithm) workflow to account for potential tumor in normal contamination leading to false negative calls as previously described 59 . Briefly, the B-allele frequency (BAF) was calculated from the tumor and control samples. Positions overlapping with common variants were filtered out. Then, the clustering algorithm from Canopy 64 was applied to the BAF values for the positions in tumor vs control using a single pass run, assuming 9 clusters. The clusters that were determined to be tumor-in-normal had to have 75% of positions above the identity line (where the VAF in the tumor sample is the same as the VAF in the control sample). These identified mutations were then reclassified as somatic instead of the original germline annotation. All but 4 CNSL WGS samples exhibited evidence for tumor in normal. On average 31 SNVs (range 0-136) were "rescued" in PCNSL, 6 in PCNSL-M (6-6, single sample), 27 (19-34) in SCNSL, 22 (9-43) in SCNSL-M, and 0 in the EBV-positive sub-cohorts (0-0, 2 samples). In total, only 6 SNVs with protein coding effects were rescued, including the MYD88 p.L265P mutation in sample LS-0102, which had 3 of 47 read support in the control, and 86 of 170 reads supporting the variant in the tumor sample (Supplementary table 2).

Genomic structural rearrangements
Genomic structural rearrangements (SVs) were detected using SOPHIA v.34.0 65 . Briefly, SOPHIA uses supplementary alignments as produced by bwa-mem as indicators of a possible underlying SV. SV candidates are filtered by comparing them to a background control set of sequencing data obtained using normal blood samples from a background population database of 3261 patients from published TCGA and ICGC studies and both published and unpublished DKFZ studies, sequenced using Illumina HiSeq 2000, 2500 (100 bp) and HiSeq X (151 bp) platforms and aligned uniformly using the same workflow as in this study. Gencode V19 was used for the gene annotations. We used the script draw_fusions.R from the Arriba package 66 to visualize SVs generated by SOPHIA.

Copy number alterations and allelic imbalances
Allele-specific copy-number aberrations were detected using ACEseq (allele-specific copy-number estimation from whole-genome sequencing) 67 . ACEseq determines absolute allele-specific copy . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Final copy number segments were further smoothed to calculate the total number of gains and losses.
Neighboring segments were merged if they rounded to the same copy number and deviated by less than 0.5 copies in case of segments <20 kb or deviated by less than 0.3 copies otherwise. Remaining segments <500 kb were merged with their closer neighbor based on allele-specific and total copy number and once again segments smaller than 2 Mb deviating by less than 0.4 copies were merged.
Based on the resulting segments the number of gains and losses was estimated.
Furthermore, the fraction of aberrant genome was calculated as the fraction of the genome that is classified either as duplication or deletion (>0.7 deviation from the ploidy) or was identified as a loss of heterozygosity.

Classification of mutational hotspots (kataegis events)
Mutational hotspots indicating putative kataegis events (likely due to somatic hypermutation (SHM) or aberrant SHM) were defined as regions with at least 6 somatic SNVs within an average intermutational distance of 1000 bp or less, as previously used by Alexandrov and colleagues 68 . A gene was described to be targeted by kataegis if its definition (from Gencode version 19 gene models) overlapped with at least 1 kataegis region in at least 1 sample. While many of these kataegis loci are indeed SHM/aSHM targets, located 2.5 kb from the transcription start site (TSS), we cannot completely control for all PCNSL-specific TSSs due to the normal brain background tissue.

Mutational signatures
Supervised mutational signature analysis was performed using YAPSA development version 3.13 69 using R 4.0.0. Briefly, the linear combination decomposition (LCD) of the mutational catalog with known and predefined PCAWG COSMIC signatures 70 was computed by non-negative least squares (NNLS).
The mutational signature analysis was applied to the mutational catalogs for SNVs (or single bas substitutions, SBS) and indels of all tumor samples. Signature-specific cutoffs were applied and cohort level analysis was used for detecting signatures as recommended by Huebschmann et al 30 . The cutoff used corresponds to "cost factors" of 10 for SNVs and 3 for indels in the modified ROC analysis.

Integration of different variant types
SNVs, indels, SVs and CNAs were integrated in order to account for all variant types in the recurrence analysis. All genes with SNVs or indels in coding regions (nonsynonymous, stop gain, stop loss, splicing, frameshift and non-frameshift events) and ncRNA (exonic) were included. Any SV with breakpoints directly lying on a gene (SV direct) were considered for oncoprints, however SVs were also annotated to a gene when they were either within 100kb of a gene (SV near), or to the closest gene (SV close) for SV recurrence analysis to account for regulatory mutations such as enhancer hijacking events. Genes were annotated with CNAs if they were completely or partially affected. Chromosome level CNVs events were determined once >30% of a chromosome arm was altered. Only focal CNA events were taken into account for variant integration, as these are more likely to target specific genes within the affected region than large events such as whole chromosome arm events. To capture the precise target of CNVs, we is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Mutual exclusivity and inclusivity analysis
Mutual exclusivity analysis was performed to investigate the relationship between MYD88 mutations with other implicated drivers from the IntOGen analysis including SNVs, indels, SVs, CNAs. The minimal recurrence threshold was set to 5. We applied the commonly used Fisher's exact test and the CoMET test 71 for both co-occurrence and mutual exclusivity. Fisher's right tailed test was used to support cooccurrence when the number of samples with alterations in both genes is significantly higher than expected by chance. Additionally, Fisher's left tailed test was used to suggest mutual exclusivity when the number of samples with alterations in both genes is significantly lower than expected. Resultant pvalues were corrected for multiple testing by FDR.

Mutational significance analysis
The IntOGen pipeline 72 algorithm was applied to identify significant cancer drivers in the core set of IntOGen reported 50 genes to be significant drivers.

Telomere content estimation
The telomere content was determined from whole-genome sequencing data using the software tool TelomereHunter using default settings (filtering of telomere reads: at least 6 telomere repeats per 100 bp read length) 73 . Briefly, unmapped reads or reads with a very low alignment confidence (mapping quality lower than 8) containing six non-consecutive instances of the four most common telomeric repeat types (TTAGGG, TCAGGG, TGAGGG, and TTGGGG) were extracted. The telomere content was determined by normalizing the telomere read count to all reads in the sample with a GC-content of 48-52%. In the case of tumor samples, the telomere content was further corrected for the tumor purity (as estimated by ACEseq) using the following formula: is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint RNA sequencing and data processing RNA library preparation and sequencing RNA libraries of the tumor samples and normal brain samples were prepared using the TruSeq RNA library preparation Kit Set A and B, following the manufacturer's instructions at an insert size of ~300 bp. Two barcoded libraries were pooled per lane and sequenced on Illumina HiSeq2000 or HiSeq4000 platforms.

RNAseq alignment and expression quantification
RNAseq reads were aligned and gene expression quantified as previously described 74 . Briefly the RNAseq read pairs were aligned to the STAR index generated reference genome (build 37, version hs37d5) using STAR in 2 pass mode (version 2.5.2b) 74,75 . Duplicate reads were marked using sambamba (version 0.4.6) and BAM files were coordinate sorted using SAMtools (version 0.1.19).
featureCounts (version 1.5.1) 76 was used to perform non-strand specific read counting for genes over exon features based on the Gencode V19 gene model (without excluding read duplicates). When both read pairs aligned uniquely (indicated by a STAR alignment quality score of 255) they were used towards gene reads counts. For total library abundance calculations, during TPM and FPKM expression values estimation, genes on chromosomes X, Y, MT and rRNA and tRNA were omitted as they can introduce library size estimation biases.
Hierarchical consensus clustering was applied using the cola package (version 1.5.6) with "MAD" as top-value method and "kmeans" as partitioning method. Classification on CNS samples was applied using cola with "ATC" as top-value method and "skmeans" as partitioning method. All other parameters took default values 77 .

RNA dilution experiment
To further investigate the impact of brain tissue contamination in unsupervised clustering analysis of gene expression data on PCNSL, we performed a serial dilution experiment with total RNA from a PCNSL sample considered "pure" (LS-027, estimated tumor cell content > 80%) and a normal brain tissue control (CTRL). Total RNA from LS-027 was mixed with CTRL RNA with increasing concentrations (0%, 20%, 40%, 60%, 80%) and sequenced. The z-score transformed TPM expression levels for PCNSL group 1 and group 2 signature genes for the serially diluted H050-0027 sample was compared against the cohort and individually using clustering analysis.
Differential expression analysis to identify signature genes Differential expression (DE) of genes was analyzed using DESeq2 (version 1.14.1) with default settings using raw read counts from featureCounts. Genes without any count in all samples were excluded from the analysis. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021. ;

Data availability
The raw sequencing data of the 51 CNSL samples has been deposited at the European Genome-    is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021.

Driver mutations in CNSL
We first identified the genes recurrently mutated in CNSL (  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021. ; As STAT3 is not highly mutated or hit by SVs or CNVs, its activation seems -in line with previous reports -induced by extrinsic factors such as infiltrating macrophages/microglial cells 84 , or intrinsic factors such as activation downstream of MYD88 85 .
Next, we used IntOGen and MutSigCV to discover putative driver mutations in the PCNSL WGS subcohort (Figure 2 D, Supplementary Table 7). We identified a total of 50 mutated driver genes, of which 21 were previously known drivers. Many of the predicted drivers were associated with MCD enriched L265P mutations, which is in line with previous findings 21 . Mutations in TBL1XR1 also modulating TLR/MYD88 signaling 21 were identified in 40% of PCNSL (Figures 2 A, B). We investigated mutual exclusivity and co-occurrence patterns for MYD88 among the driver genes that affect at least five patients using Fisher and CoMET test. We observed mutual exclusivity between alterations in MYD88 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021. Kataegis shapes the mutational repertoire of PCNSL Kataegis is a pattern of mutational hotspots that has been associated with a number of cancers 68 , and is a frequent consequence of AID activity in lymphomas 95 . Many of the recurrently mutated genes in PCNSL were dominated by alterations that are located in these highly mutated hotspots 11 , of which several have previously been described as targets of aSHM, such as OSBPL10, PIM1, BTG2, and PAX5 Physiologically, SHM is the process of introducing mutations in the antibody genes to alter the antigenbinding site, increasing the immunglobulin (IG) diversity 98 . Kataegis events were at IGH (100%), IGL (100%) and IGK (70%) loci but were also found outside IG loci, targeting BTG2 (63%), GRHPR (50%), While patterns of aSHM and kataegis were similar between CNSL and systemic DLBCL subtypes, we identified that EBV+ CNSL cases did not share many of the recurrent mutational hotspots apart from IGH and the HLA−DRB locus. (Figure 3 B, Supplementary figures 3 A, B, Supplementary table 5). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021. Additionally, we found deletions on chromosomes 1p13 and 3q13, affecting genes such as CD58 and CD80, both candidates reported to lead to immune evasion 104 . Further CN losses were detected on chromosomes 8q12 (TOX), 12p13 (ETV6), and 15q21 (B2M) as well as 3p14, affecting the fragile site tumor suppressor gene, fragile histidine triad (FHIT). TOX deletions have been previously described by array-based imbalance profiling 105 . TOX is required for the development of various T-cell subsets and was described as putative tumor suppressor in MCD DLBCL 26 . TOX downregulation has been associated with poor prognosis in different cancers 106 and is a predictor for anti-PD1 response 107 .
Significant CN gains in PCNSL mapped to 2q37 and 18q21 affecting DIS3L2 and MALT. DIS3L2 encodes for an exoribonuclease that is responsible for Perlman syndrome 108 and was recently described to promote HCC tumour progression by upregulating production of the oncogenic isoform of RAC1, RAC1B 109 . MALT is a regulator of NFkB signaling and potential therapeutic target in B-cell lymphoma 110 .

Recurrent structural variations (SVs)
We defined SVs as genomic breakpoints, which can correspond the borders of amplifications and Recent studies have shown that translocation can act as enhancer hijacking even when the events is several hundred thousand base-pairs away from target genes 113 . To investigate this, we also annotated SV breakpoints to genes within 100 kbp and also to the closest genes. We found a number of genes involved in G protein-coupled receptor signaling (ARAP2, LPHN2, LPHN3, EPHA4, ADGRL2, and GPC5) consistent with observations in pan-cancer studies 114   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  [125][126][127] . IGH-BCL6 fusions are recurrent in PCNSL 20 , which mirrors observations of ABC-DLBCL 128 . IGH-BCL2 fusions are more prominent in GCB-DLBCL 129 . We investigated the recurrent translocations (≥ 2 patients) in our cohort and identified five CNSL samples with IGH-BCL6 translocations (Figure 5 A, Supplementary   figures 4 A-D). We also identified three cases with IGH-BCL2 translocations (Figure 5 (Figure 6 A). The presence of SBS3, hallmark of defective DNA break repair by homologous recombination, and SBS40 may be therapeutically relevant as these indicate potential effectiveness of combination therapy with PARP inhibitors (e.g., Olaparib) alongside cytotoxic chemotherapy 130,131 . The three most prominent signatures in DLBCL, FL and CNSL were SBS9, SBS5, and SBS40 (Figure 6 B).
Direct comparison of PCNSL and DLBCL revealed that signature SBS1, which correlates with DNA replication at mitosis (mitotic clock) 70 , was significantly enriched in PCNSL (p = 0.0027; Figure 6 C,

Supplementary figures 5 A-G).
Analysis of small insertion and deletion signatures (ID) revealed mutational patterns associated with slippage during DNA replication of the replicated DNA strand (ID1) and template DNA strand (ID2); both of these signatures appeared significantly (p < 1x10 -4 , Wilcoxon) more prominent in PCNSL compared to DLBCL and FL (Figure 6 D), though different read-depths may have influence this analysis.
Interestingly, only CNSL samples but not DLBCL or FL revealed mutations caused by mutational signature ID12 that is of unknown etiology and has been observed in prostate adenocarcinoma and soft tissue liposarcoma 70 . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021.  (Figure 7 A). For each cluster, we identified signature gene sets that significantly correlated with the groupings. Interestingly, all meningeal PCNSL (PCNSL-M) and SCNSL-M grouped together with either GCB-or ABC-DLBCL, clearly indicating that these subtypes are molecularly and pathomechanistically distinct from intraparenchymal CSNL, which formed one separate cluster suggesting a distinct signature of CNS tropism. The ABC type DLBCL cluster was enriched for MYD88 mutant samples, which were still distinct from MYD88 mutant PCNSL at the gene expression level. To further exclude an impact of the potentially contaminating surrounding CNS tissue, we analyzed total RNA from normal brain controls (n = 2), of which one control was spiked with increasing concentrations of RNA from a pure PCNSL sample (0%, 20%, 40%, 60%, 80%). Then, we further stratified the PCNSL group by another round of consensus clustering using the two different classification methods (Figure   7 B, Supplementary figures 6 A-C), which both revealed two groups based on tumor purity.
The first PCNSL expression group (PCNSL1, "pure") consisted of samples with high tumor cell content (determined by whole-genome sequencing and histopathological analysis). Expression of its signature gene set did not show similarity to normal brain tissue expression. However, the second PCNSL expression group (PCNSL2, "impure") contained mainly samples with lower tumor cell content, and expression of its signature gene set was indeed similar to normal brain tissue expression. We identified the PCNSL signature gene sets relative to ABC and GCB type DLBCLs and FLs, and removed potential background signatures from contaminating brain tissue (Figure 7 B). The marker genes in each group were identified based on differential gene expression analysis (Supplementary table 14). Among the marker genes for PCNSL were e.g. LAPTM5, a CD40 related gene expressed in malignant B-cell lymphoma 132 and ITGAE, mediating cell adhesion, migration, and lymphocyte homing through interaction with E-cadherin 133

Expression of IGHM is characteristic for PCNSL
Additionally, we analyzed the expression of IG constant genes, which again revealed the same clusters as the unsupervised consensus clustering approach, demonstrating that PCNSL can be differentiated from DLBCL based on only the expression of IG constant genes. In contrast to DLBCL and FL, PCNSL show generally low expression of IG constant genes, but higher expression of IGHM (Figure 7 C). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  134 . We used TelomereHunter, a software for detailed characterization of telomere maintenance mechanisms 73 to estimate the telomere content in a representative cohort of PCNSL, SCNSL, peripheral lymphoma, as well as non-tumorous naïve and GC-B-cells as control 51 . In approximately 1/3 of the samples, the purity-corrected telomere content was higher in the tumor than in the matched control (whole blood) (Supplementary figure 7 A).
Nevertheless, telomere content (tumor/control log2 ratio) was not significantly different between the different histological, clinical and molecular subgroups (Supplementary figure 7 B). However, expression of the TERT gene, the main activity of the encoded protein is the elongation of telomeres, was significantly higher in GC B-cells 51 and in PCNSL compared to ABC-DLBCL (Figures 8 A, B). This was consistent with observations when stratifying samples by RNA subgroups, where TERT expression was significantly higher in PCNSL compared to ABC-DLBCL, GCB-DLBCL and FL (Figure 8 C). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021. Moreover, somatic hypermutation has previously been described as having a pathogenic role in PCNSL development and that its extent was greater there than in systemic DLBCL 96 . In agreement with previous reports, we identified several aSHM targets including the proto-oncogenes PIM1, PAX5, BTG2, and OSBPL10 21,45,96 . Exploiting a whole-genome sequencing approach, we observe additional mutational hotspots indicative of aSHM also in other genes including MIR142, FHIT, ETV6, BTG1, GRHPR, and CD79B. Our data suggest that katagis loci are reasonable indications of aSHM. We observed significantly higher RNA expression of genes with putative aSHM loci compared to those without. In addition, these putative aSHM loci were significantly enriched in genes involved in BCR signaling.
Together this implicates that BCR signaling genes are both upregulated and targeted by putative aSHM, raising the question of cause and effect -is aSHM upregulating these genes, or is the high expression levels of these genes priming them for aSHM? This becomes even more complex when considering that highly expressed genes should have lower mutational rates due to transcriptional coupled repair 151 .
The landscape of copy number aberrations and structural variations revealed potentially clinically exploitable deletion of TOX as a predictor for anti-PD1 response 107 , amplification of MALT1, whose inhibition has been shown to be selectively toxic for ABC-DLBCL 152 , and potential enhancer-hijacking events involving PIK3C3 and EPHA4, whose inhibition has shown therapeutic advantage in a number of cancer models [117][118][119]124 . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021.

Conflicts of interest
The authors declare that there is no conflict of interest . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021. ;   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint     is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Figure Legends
The copyright holder for this this version posted August 5, 2021.   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint    ID13  ID12  ID10  ID9  ID8  ID6  ID5  ID3  ID2  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 5, 2021. ; https://doi.org/10.1101/2021.07.30.21261280 doi: medRxiv preprint