Global patterns of geo-ecological controls on the response of soil respiration to warming

While soil respiration is known to be controlled by a range of biotic and abiotic factors, its temperature sensitivity in global models is largely related to climate parameters. Here, we show that temperature sensitivity of soil respiration is primarily controlled by interacting soil properties and only secondarily by vegetation traits and plant growth conditions. Temperature was not identified as a primary driver for the response of soil respiration to warming. In contrast, the nonlinearity and large spatial variability of identified controls stress the importance of the interplay among soil, vegetation and climate parameters in controlling warming responses. Global models might predict current soil respiration but not future rates because they neglect the controls exerted by soil development. To accurately predict the response of soil respiration to warming at the global scale, more observational studies across pedogenetically diverse soils are needed rather than focusing on the isolated effect of warming alone. Understanding the temperature sensitivity of soil respiration is critical to determining soil carbon dynamics under climate change. Spatial heterogeneity in controls highlights the importance of interactions between vegetation, soil and climate in driving the response of respiration to warming.

W ith implemented climate policies struggling to limit global warming to an average of <1.5 °C (ref. 1 ), elucidating the response of an adapting ecosphere to warming is more and more important. Understanding soil carbon (C) dynamics is key to this because it directly determines a large portion of future net GHG emissions from terrestrial ecosystems 2 .
Soils are considered net sinks for C with current net sequestration estimated at 1 PgC yr -1 (ref. 3 ). This is only a minor part of the continuous exchange of C between soil and atmosphere due to C input to soils through plants and release of C through soil respiration, approximately balanced at annual fluxes of 58-80 PgC yr −1 (refs. 4-6 ). Rising global temperatures are expected to lead to notably higher decomposition rates of soil C and thus CO 2 release from soils 7,8 , largely because of more energy available for microbial decomposer communities 9 . However, despite its importance, the response of soil C to warming is still one of the great uncertainties in global carbon cycling 10 . Great uncertainties are related to the effect of warming on vegetation, C input 11 across different soil depths 12 , microbial responses 13 and estimates for losses of soil C in Arctic plus high latitudes 14 and tropical plus low latitudes 15 .
While the temperature sensitivity of soil carbon has been long studied 10,16 , only now ecosystem models begin to implement mechanistic controls of microbial soil respiration in response to climate and soil changes 17,18 . One issue is that soil properties, often crucially related to subsoils, are hidden from air-and space-borne sensing techniques that do not 'see' soils. Therefore, statistical models are needed to better represent relationships between microscopic and macroscopic processes, especially on broader scales 19,20 . Furthermore, most of our mechanistic understanding of soil processes and warming is derived from studies in temperate zones; their numbers dwarf the number of studies in boreal and tropical ecosystems (Supplementary Figs. 1 and 2). Due to the nature of small-scale studies with often homogenous soil and environmental properties, an holistic, global assessment on factors controlling soil respiration, except for basic variables that integrate various processes at once (clay content) has not been done yet 16 . Soil is not mechanistically represented in global ecosystem models but is rather given a mostly budgetary function. Thus, future global soil GHG emissions might be critically misrepresented under changing environmental conditions. For example, global climate and ecosystem models 21,22 dealing with warming focus on GHG fluxes from environments where climatic and hydrological barriers are the key controls to limit C decomposition 23 . However, these climate-and hydrology-driven, geochemically speaking 'young' , soil systems do not represent soil conditions found for the largest part of globally relevant soil C stocks 24 . Most soil C is stored in geochemically more complex and weathered soil systems, where soils have developed over millennia and the biosphere adapted to warmer conditions over millions of years of evolution 25 . Hence, soils in every (geo)climatic zone will probably show very different responses in respirations with warming due to their different, soil type-dependent, properties and drivers 26 . To the best of our knowledge, previous models of soil Q 10 took the average air temperature as main predictor for soil Q 10 (refs. [27][28][29] ). Thus, the global representation of soils and GHG emissions from them with their drivers and controls are not well represented in Earth system models and Q 10 is still treated as an average value over all climate zones and state-of-the-art in Coupled Model Intercomparison Project Phase 5 models to consider temperature sensitivity in soil 29-32 . By using highly averaged values of temperature sensitivity of soil C (refs. 28,33-36 ) that do not represent the underlying processes 16 or by focusing on selected climatic drivers, current Earth system and climate models unintentionally neglect the variability of crucial biogeochemical factors altering the response of soils to climate forcing 37 . Doing so introduces large biases and uncertainties in global estimates of future C emissions from soils.
Here, we brought together large-and small-scale controls that have been identified as key variables to explain the soil respiration response to warming-expressed as soil Q 10 -at the global scale and used machine learning techniques to identify the most important groups of explaining variables for soil Q 10 . More specifically, we combined experimental results with a large database on climate-, vegetation-and soil-related parameters (further called best data approach) as proxies of soil respiration influencing factors under warming 38,39 (Supplementary Table 3). While Q 10 of soil respiration is not a mechanistic depiction of soil C response, it can be interpreted as a phenomenological response of multiple instantaneous Global patterns of geo-ecological controls on the response of soil respiration to warming David Haaf , Johan Six and Sebastian Doetterl ✉ While soil respiration is known to be controlled by a range of biotic and abiotic factors, its temperature sensitivity in global models is largely related to climate parameters. Here, we show that temperature sensitivity of soil respiration is primarily controlled by interacting soil properties and only secondarily by vegetation traits and plant growth conditions. Temperature was not identified as a primary driver for the response of soil respiration to warming. In contrast, the nonlinearity and large spatial variability of identified controls stress the importance of the interplay among soil, vegetation and climate parameters in controlling warming responses. Global models might predict current soil respiration but not future rates because they neglect the controls exerted by soil development. To accurately predict the response of soil respiration to warming at the global scale, more observational studies across pedogenetically diverse soils are needed rather than focusing on the isolated effect of warming alone.
processes that differ across geoclimatic and land use settings 38,39 and is widely used in global-scale ecosystem models. We compiled 3,400 observations from 560 soil warming studies conducted from 1971 to 2018 with incubation lengths of several days to >3 yr from all major climate and land use combinations (Methods and Supplementary  Fig. 2). For our analyses, we concentrated on climate zones in which rich plant-soil interactions occur and excluded regions with bare soils (polar and non-polar (semi)deserts and high alpine environments) for which not enough data to train models and/or global maps of independent predictors were available. Then, we (1) built linear and nonlinear predictive models for soil Q 10 , (2) derived the relative importance of the derived groups of explaining variables for soil Q 10 and (3) determined the changing importance of the identified controls in different climate systems and land use zones using partial dependence analyses ( Figs. 1 and 2). To assess the validity of our interpretation and the robustness of our models, we have repeated steps 1-3 by using only predictors of soil Q 10 derived from global datasets, further referred to as the generalized data approach (Supplementary Tables 4 and 6).

Predicting soil Q 10 and its controls
Our model satisfactorily predicted soil Q 10 across all included systems for both the best data and the generalized data approach ( Fig. 1a and Supplementary Fig. 3a), showing that the temperature sensitivity of heterotrophic soil respiration is driven by a combination of soil properties, vegetation and climate interactions at the global scale. Similarly to previous assessments of soil Q 10 at the regional scale 40 , nonlinear model approaches (R 2 = 0.18-0.46; root mean square error of cross-validation (RMSE) 0.58-0.72) greatly outperformed linear models (R 2 = 0.07-0.08; RMSE 0.76-0.77) (Supplementary Table 6). Both the best data and generalized data model approaches performed similarly in explaining the variability in temperature sensitivity of soil respiration (R 2 = 0.46) and with reasonable uncertainty (relative RMSE = 24%). Only a relative small part of soil Q 10 was directly controlled by plant growth conditions (11.6%) as well as evapotranspiration and precipitation (12.6%). In contrast, a much larger share of soil Q 10 variability was controlled by soil properties (63.1%) (Fig. 1b). Interestingly, climate and vegetation variables were more intercorrelated and their effects on soil Q 10 were not clearly separable (Supplementary Table 3).

Global patterns of controls on soil Q 10
Our analyses also revealed an extremely high variability in the controlling factors for soil respiration (Fig. 1c). Vegetation-and climate-related parameters like growth conditions and evapotranspiration had a strong influence at both extreme ends of their respective range of values, which represent climatic extremes; as a general trend, climate was a strong control at lower temperatures, low precipitation or higher evaporation (Figs. 1c and 2). This is probably related to the lack of mineral stabilization of C in these colder climate zones 41 leading to a faster response of microorganisms to warming and hence a decomposition of labile C once temperature barriers are released 42 . Notably, temperature was not a separate dominant control on soil Q 10 and climatic variables in general exert little influence in environments with more moderate climate; moreover, temperature ceases to influence soil Q 10 in warmer climate zones.
In contrast, a wide range of biotic and abiotic soil variables controlled the variability of soil Q 10 across their full range of values, resulting in the observed high heterogeneity. This dominance of soil variables is most likely because of the variety of parent materials that soils develop from and the various stages of weathering across the globe that affect plant growth and C stabilization. In cold climates, soils show low reactivity due to climatic barriers to chemical soil weathering 16 . Plant litter, and not microbially processed or mineral associated C, is often the main source of energy for microorganisms under these cold conditions. In temperate climates, soils have generally higher chemical reactivity and high C stabilization potential, thereby diversifying potential C sources for microorganisms.  This diversification of energy sources can lead to very variable competitive strategies driving C use efficiency 43 and thus soil Q 10 (ref. 42 ).
In tropical climates, chemical weathering has depleted many soils of reactive minerals and reduced C stabilization potential, leading to a reduction in the variety of C resources. Hence, strategies for an efficient recycling of nutrients from litter back into plants are prevailing 26,44 . The implementation of all identified controls in our model resulted into a spatially highly variable map of soil Q 10 (Fig. 2a,b) and a similarly diverse map of relative uncertainty of prediction (Fig. 2c,d). More specifically, in Arctic and boreal environments, where temperature is a major barrier for decomposition of labile C, soil Q 10 was particularly high across all major land use systems. In contrast, soil Q 10 was highly variable in temperate zones where local soil development drives C stabilization and thus responsiveness to warming.
Lastly, soil Q 10 was generally lowest in tropical environments where soils are deeply weathered and C accessibility is driven by litter quality. Deviations from this general pattern were tied to local variations in climatic, topographic and biogeochemical soil conditions (Supplementary Table 9). Our uncertainty map (Fig. 2c,  shows high spatial variability, especially in data-poor regions of the (sub)tropics or in regions with highly diverse soil landscapes (temperate and tropical zones). We explain this with the fact that in data-poor regions the model cannot be trained to the same degree as in data-rich regions due to a lack of data and precision for both response and independent variables. In regions of highly developed soils, our results point at the importance of considering local soil development and land use history for predicting soil Q 10 because these can differ greatly from one geoclimatic region to the next leading to varying model complexity and strength of predictors ( Fig. 1) that is not fully captured at the global scale. In summary, our analyses allowed for predicting global patterns of soil Q 10 with reasonable uncertainty at a much higher accuracy and spatial variability than comparable approaches using climatic and vegetation variables alone 27,28,40,45 across major climate zones in which forests, grasslands and agricultural land use appears. Nevertheless, a larger share of variability in soil Q 10 remained unexplained (~55%). We relate this lack of identifiability to the coarse spatial and temporal resolution of global key datasets, where information on local heterogeneity is lost, paired with a lack of accurate data from data-poor regions (that is, mountains, boreal zones, wetlands and tropics). Furthermore, global studies and predictions are in parts driven by completely different parameters than comparable regional studies due to the different resolution and data availability 46 . A large number of local to regional scale controls on soil Q 10 and microbial decomposition processes exist (land management) that cannot be represented currently through proxy variables at the global scale 39 .

CO 2 release from soils in the decades to come
Our study showed much higher and more variable temperature sensitivity of respiration than comparable ecosystem-level assessments 27 . Soil Q 10 predicted by our model was on average 33 ± 10% higher compared to soil Q 10 in climate-driven models 28 . Our results are consistent with, and can help explain, the predicted reduced uptake of C in soils by the end of the twenty-first century 47, 48 . As has been demonstrated before 49 , boreal and temperate climate zones of the northern hemisphere showed increased C release from soils with changing temperature and precipitation while soils of the southern hemisphere showed only limited responses and tropical soils even less. However, on the basis of our results, we would predict that in colder environments, warming will create over time a more reactive soil matrix, similar to those found in temperate climates. Examples for the expected changes in Arctic soils are higher rock-derived nutrient release due to (bio)chemical weathering, higher potential to stabilize carbon with minerals, thicker soils for higher water retention capacity and larger rooting zones 50-52 . It is thus likely that in many of these changed future soils of Arctic, Antarctic or alpine environments, plant productivity will increase, C stabilization through various mineral-related physicochemical mechanisms 53 will improve and microbial communities will respond to the changed climatic conditions with, for example, higher C use efficiency 43 . Greening and weathering are likely to compensate some of the projected soil C loss from thawing and regressing permafrost 54 losses through additional C sequestration and create new terrestrial C sinks in higher latitudes. However, recent studies show 36 that it is unreasonable to assume that these processes can fully compensate for the additional release of C from soils. Plant growth is limited by more than atmospheric parameters and weathering leading to nutrient release or C stabilization potential is slow and on decadal timescales 55 . Warming in the next decades could lead to an additional C release from soil that is equal to all other current anthropogenic C emissions. A warming climate, however, will ultimately lead to lower soil Q 10 in boreal zones in the long term, as plant-soil systems become more adapted to warming 56 with Arctic soil systems becoming more similar to boreal or even temperate systems if climate change is progressing as predicted 57 . Predicting these contrasting trends of soil Q 10 in changed soil landscapes requires Earth system models to incorporate soil development trajectories as a control for future C fluxes and account correctly for the carbon flux between soil and atmosphere 58 . Indeed, to estimate C fluxes further into the future, a more mechanistic approach is needed that includes processes like soil formation (accelerated soil formation in Arctic due to warming and increased weathering) or soil degradation (that is, in the tropics due to land use change and erosion) to accurately predict the future warming response of these dynamic systems.
Our results illustrate how complex the interplay and strengths of controlling factors for soil Q 10 can be at global scales. First, using a large range of independent variables to predict soil Q 10 in heterogeneous ecosystems, we confirm that controls on soil C responses to climate change are drastically different between climate zones and environmental settings, limiting the transferability of experimental and mechanistic knowledge on soil processes across geoclimatic zones. Second, almost all variables showed spatially varying influence on soil Q 10 , meaning that soil Q 10 is highly nonlinear and multifactorial. Lastly, from poles to the equator, temperature has not been identified as the main driving factor for soil Q 10 . While temperature was certainly a limiting and controlling factor for biological activity in high latitudinal environments, soil Q 10 was increasingly more strongly related to biogeochemical and physical soil conditions than to warming per se in mid-and lower-latitudes. Thus, large changes to the soil C cycle will occur through a warming-induced feedback loop that is more strongly controlled by changing soil parameters and development due to better conditions for chemical weathering than by temperature itself. Our study, focusing on soil development-related variables shows which key controls have to be considered in Earth system models besides warming to understand and predict a changing terrestrial C sink versus source by the end of the twenty-first century. Lastly, improving our mechanistic understanding of the effects of developing soil characteristics in different climate zones and ecosystems, especially in tropical regions, is required before soil respiration responses to warming can be accurately projected into the future.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41558-021-01068-9.     Fig. 1). In our compiled database, soil Q 10 data in these studies were taken from temperature ranges −5 to +50 °C, conducted from 1971 to 2018 with incubation lengths of several days to >3 yr. We constrained our study to observations of topsoil samples (weighted averages for 0-30 cm of soil depth) and excluded studies that targeted autotrophic soil respiration. Reported Q 10 in these studies represent the average soil Q 10 during the length of the experiment and are considered as soil basal respiration.
The included soil Q 10 data were tested for fulfilling normal distribution using the Shapiro-Wilk normality test 61 and for fulfilling homogeneity of variances with the Fligner-Killeen test 62 . Comparability of soil Q 10 and to avoid introducing potential biases was tested in several ways.
To identify experiment-specific influencing factors (measures taken by the experimenter, Supplementary Fig. 2) we used one-way analysis of variances (ANOVA) 63,64 and in case of significant rejection of the statistical requirements for ANOVA, using the Kruskal-Wallis test 65 , to test for differences in soil Q 10 between (1) laboratory and field studies, (2) studies reporting explicitly heterotrophic respiration versus mixed respiration where remnants of autotrophic respiration cannot be excluded, (3) sequential versus parallel warming of soils and (4) explicit pretreatments of the samples versus non-treated samples. Results of this test indicated only minor differences between the above compared studies ( Supplementary Fig. 2). Furthermore, we evaluate the effect size of the applied Kruskal-Wallis test pairs to show the strength of the analysed relationship of statistical significant differences between subgroups of the database. We computed the effect size as follows 66 : where H is the test statistic, n is the number of observations and k is the number of groups in the model. The analyses revealed that, among all pairs, only grouping by climate zone has a strong effect on Q 10 differences between subgroups. Other pairings, including the division of laboratory-versus field-derived Q 10 did not show a significant effect size (Supplementary Table 2). Additionally, we tested our model performance on a data-rich and environmentally diverse region (Continental Europe, Scandinavia and the United Kingdom) using the same independent predictor variables and model structures as for the global approach but predicting soil Q 10 with only subsets of the data: one prediction where we use both field and laboratory data (n = 786) combined and one prediction each where we used only laboratory (n = 237) or field data (n = 549). Our results (Supplementary Fig. 3) show that no difference in model performance or potential bias can be observed on the basis of the origin of parts of our data. Hence, we continued with a unified dataset for all other analyses but included these experiment-specific criteria in our later modelling approach as a confining factor (Methods section 'Statistical analyses' , results in Supplementary Tables 3-5 and Fig. 1).
From the compiled Q 10 data, values <1 and >4.5 were excluded from further analysis, as (1) we want to represent natural conditions that follow current paradigm, namely that soil basal respiration increases with incubation temperatures 27 and (2) that Q 10 > 4.5 are the result of the decomposition of large amounts of poorly decomposed, isolated organic matter (litter, roots 67 ) in litter layers or defrosting former permafrost soils. Furthermore, including these values would lead to inaccuracy in calculation with exponential equations 68 . These criteria led to the exclusion of 8% of the compiled observations (262 observations), resulting in a total of 3,413 observations remaining across all major land uses (grassland, cropland, forest and wetland) for the boreal, temperate, subtropical and tropical climate zones of the northern and southern hemisphere used in this study (Supplementary Fig. 2).
Included independent variables. To analyse the influence of soil properties, vegetation and climate parameters on Q 10 , five climatic and vegetation as well as eight soil parameters were selected as independent variables. These parameters were used for all further statistical analyses. Where available, we used high-resolution local data taken from the included studies directly, resulting in our 'best data' dataset. Where local studies did not include all the desired independent variables, global datamaps and satellite remote sensing data were used to fill gaps in climate and soil properties (Supplementary Table 3). Note that values of pH <3 were replaced with a pH 3, due to the fact that soils with a pH <3 do not occur in the ecosystems investigated in this study 69 and are an artefact created during the assembly of the original dataset (best data approach, nine datapoints replaced; generalized data approach, zero datapoints replaced). Note that these global datamaps of independent controls show variable spatial resolutions ranging from 250 m to 0.5° and represent averages over 1-30 yr (Supplementary Table 1). To assess the potential impact of spatially highly variable data in our analyses, we used the data in the highest available resolution and did not transform the data to match resolutions. In addition, to represent potential controls that result from the interaction of soil parameters with climate and vegetation, a series of interaction terms were included. Organic carbon/organic nitrogen/total phosphorus ratios were included to represent effects of nutrient stoichiometry in soils 70 . Clay content/ mean annual temperature ratios were included to represent soil weathering and changes in mineral surface area 71 . Base saturation/clay content and potential cation exchange capacity/clay content as well as base saturation/cation exchange capacity ratios were used to assess mineral surface charge effects. Base saturation/pH ratios were used to assess soil acidity effects. Mean annual precipitation/potential evapotranspiration and potential evapotranspiration/normalized vegetation index ratios were used to assess plant productivity as well as precipitation-and evapotranspiration-related effects 72 .
The resulting dataset of independent variables is not inclusive for all experimentally identified controls (variability of microbial decomposers and their strategies are not included) 73,74 . However, key criteria for their selection in our modelling exercise were availability as global datasets to fill data gaps of the metadata of the included warming studies. Furthermore, all included variables stand in a causal relationship for controlling biological processes and C cycling between soils and atmosphere and vary across a large range of possible values (Supplementary Table 1) that represent most conditions in which biological processes take place in soils (that is, very acidic, to very basic, very low and very high temperatures and so on). This compilation of empirical data was selected to bridge a crucial gap from experimental finding to implementation of soil processes into Earth system models.
Rotated principal component analysis. To increase the identifiability of larger groups of controls and to reduce the number of independent variables that are autocorrelated, we used rotated principal component analysis (rPCA), performed for both our best data model building (Supplementary Table 3) as for our generalized data approach (Supplementary Table 4) and interpreted the loading of each principal component according to their underlying relevance as a controlling factor for soil Q 10 . To minimize multicollinearity effects, the variance inflation factor was estimated for all independent predictor variables and maximal variance inflation factor was eliminated until all independent variables possessed a variance inflation factor of <5. As rotation method and to minimize multicollinearity, variance maximizing (VARIMAX) was used. The selection of an optimal number of principal components was done on the basis of the Kaiser-Guttman rule and limited to principal components with an eigenvalue of >1. This resulted in eight rotated principal components (rPC), identifying the eight most important groups of explaining variables for soil Q 10 (Supplementary Table 3).
Predictive modelling. To build and identify the best model for predicting soil Q 10 and using the results of the rPCA analyses, regression modelling was conducted including four different linear and four different nonlinear regression types. Linear regression included models without (LM) and with (LEAPS) stepwise selection 75 as well as models such as least angle regression (LARS) 76 and Elastic Net (ENET) 77 that use a penalizing term to the regression coefficients of those variables with minor influence on the prediction 78 . Nonlinear regressions included the tree-and rule-based (= representing the path of partitioned regression(s) by using distinct if-then rules to create prediction models) 77 models random forest (RF) 79 and boosted tree model (BOOSTED) 80 , as well as model bagged tree (BAGGED) 81 and cubist (CUBIST) 82 . All models, except for the LM linear regression and the BAGGED model, have built-in feature selection procedures and were tuned individually, to increase the accuracy and control the complexity of the models 78 . As part of the tuning process, the following steps have been taken: LEAPS models were trained for the maximal number of variables. For penalizing models, penalty terms for feature reduction (that is, lowering the effect of less important variables on the final linear equation) varied between 0 and 0.1 in 0.01 steps. The RF models were constrained by setting the maximum number of allowed trees to 1,000. The number of included predictors was set to the maximum number of possible predictors divided by three 83 . BOOSTED were trained with a minimum of ten to a maximum of 100 trees with one to seven nodes, a shrinkage factor of 0.01 or 0.1 and a maximum size of five. To train the CUBIST models, 1-9 by 2 neighbours and 1, 5, 10, 50, 75 and 100 communities were used. For all models, Monte Carlo cross-validation 84 , with 100 repeated data resamples and a ratio of 80% training to 20% validation data were used to assess the uncertainty of model structures and prevent overfitting. RMSE and R² were estimated for all tuned models and used to analyse the residual variance and accuracy of the models 85 and as a criterion for ranking model performance (Supplementary Table 5). For an easier interpretation of the uncertainty of estimated soil Q 10 , relative RMSE was estimated by dividing the absolute error by the global mean of Q 10 . Random forest regressions resulted in the best model performance within one-standard error of minimal RMSE 86 and were used for all further analyses of variable importance. Furthermore, residual plots for the global best model (Supplementary Fig. 4) and the three data-rich examples of continental Europe (Supplementary Fig. 3) were created. All residual plots show random patterns, indicating a good fit of the used random forest models for the global and the European models.
Assessing variable importance. To estimate the influence of the identified rPC variables for predicting Q 10 , we assessed variable importance using permutation variable importance measurements through the variable importance tool implemented in R caret package 87 for the model with the highest accuracy and prediction quality (RF). Briefly, to assess the error of prediction in the model, the permutation variable importance measurements method calculates the mean square error for every given regression tree with out-of-bag estimates 79,88 . The resulting measure of variable importance of RF models represents the influence of the predictor variables on the model results 89 . For better comparability all independent controls in our models 90 , the included independent rPC control variables were normalized on a scale of 0 to 100% to represent relative importance for the model outcome.
Partial dependency of controls. Partial dependence analyses using the R package pdp (ref. 91 ) were used to test effects between predicted Q 10 and independent controls across the whole range of possible values that were included in the RF modelling. Briefly, the method results in a statement about the global relationship of an independent variable to the predicted across the whole range of all potential values by removing and averaging out the effect of other independent controls and isolating the effect of the targeted independent variable(s) 80 . In contrast to the assessment of the relative importance of an independent variable overall, partial dependence analyses and their visual representations (partial dependence plots, PDP) can illustrate the average marginal effect of one or more independent variables on the predicted outcome of a machine learning model 80 across a specific range of values. For example, PDPs can show whether the relationship between the predicted variable and an independent control is linear, monotonic or complex 92 . The shape and knickpoints of the PDP curve can then be used to interpret and identify areas where an independent has a particular strong and direct effect on the predicted, and where its control is rather indirect, for example through influencing other independent variables. For simpler interpretation of the PDPs x axis from low to high, the curves of rPCs with dominant negative loading (best data approach: rPC1, rPC7; Supplementary Table 3) were reversed.
As an example in our study, PDPs illustrate that precipitation and evapotranspiration have a weak effect and control on Q 10 at lower ranges but a stronger effect at higher ones (Fig. 1c). As the loading of our rPC variable 'precipitation and evapotranspiration' is not mixed with other controls (Supplementary Tables 3  and 4), the PDP allows a direct interpretation of the variable's value. In contrast, temperature has a complex relationship to the predicted soil Q 10 , mostly through affecting plant growth conditions, experimental setup and weathering.
Global soil Q 10 mapping. A map of the global distribution of soil Q 10 , expressed as Q 10 of soil basal respiration and a corresponding map of the relative uncertainty of prediction (Fig. 2) was derived using our best data rPCA structure and scores (Supplementary Table 4) and an RF model with the included global climate, vegetation and soil datasets (Supplementary Table 1) that we used to build our generalized data model of soil Q 10 . Using the datasets of the generalized data approach, we calculated factor maps on the basis of the primary input variables for our eight rPC scores for each according raster cell before using them to calculate a spatial explicit map of global soil Q 10 . In consequence, the resulting map corresponds in quality to the results of our RF model results without experiment-specific modifiers as explanatory variables (Supplementary Table 7; R² = 0.42, RMSE 0.61). For this mapping exercise at a global scale, input variables were run at a 0.5° resolution and later aggregated at 0.25° latitudinal resolution to derive a mean Q 10 value separately for major land use systems at the respective latitude. Land use was derived using the 2015 ESA CCI-LC (ref. 93 ) land cover maps (300 m original resolution) and summarized to agriculture, forest and grassland systems. We excluded those areas from our prediction where (1) data in any of the required predictors were missing, (2) land use was different to the aggregated land use systems listed above or (3) areas were located in climate zones which were not targeted by our model (polar and non-polar (semi)deserts). Predictors that were available at a higher resolution were resampled using geostatistics to match a 0.5° resolution. The resulting map's averages shows significant differences for distinct USDA and WRB (ref. 94 ) soil orders across climate zones and land use systems (Supplementary Table 9).To assess the uncertainty related to the creation of the map due to resampling of data and unexplained variability not captured by the rPC scores we run the model also at a finer resolved 1-km 2 grid or those areas where input variables were available at this higher resolution. This analysis revealed an overall uncertainty of our global soil Q 10 map averaging at 27.4 ± 10%. The corresponding map of relative uncertainty of prediction was built by displaying standard deviation/divided by the mean of prediction based on the results of our final random forest model with standard deviation related to the range of possible predictions based on the build-up of the used decision tree after 500 model runs.
Caveats. The 'real' controls and the influence of experimental modifications. The identification of variables for regression models, including their importance and dependency assessments, are highly dependent on the range in which the included variables can vary. In our global model design, we addressed this by including independent variables that vary across a large range of possible values in which biological processes take place and which represent most conditions that can occur in soils (Supplementary Table 1). To assess the validity of our interpretation and the robustness of our models, we repeated all statistical analyses that involve independent predictors by using data only derived from global datamaps, further referred to as the generalized data approach (Supplementary Table 4 and Supplementary Fig. 3). An approach that excluded experiment-specific modifiers (Supplementary Table 7) generally resulted in less performance than fully parameterized models but differences were marginal (R 2 = 0.03-0.42; RMSE 0.61-0.79). Together with our analyses of potential biases in the database that yielded negative results ( Supplementary Fig. 2) this suggests that experimental and climatic conditions, if made comparable across larger gradients, do not exceed the control of soil variables on soil Q 10 .
Spatial autocorrelation. Building our predictive models of soil Q 10 ( Fig. 1a and Supplementary Fig. 3a), we tested for and quantified spatial autocorrelation of modelled residuals using Moran I test 95 . Results indicated only a minor influence of spatial autocorrelation for all linear models (Moran I ≅ 0.3 for all models). Further corrections taking into account spatial variability and the accuracy of geographic coordinates 96 in the modelling structure of the linear models showed no improvement. In combination with the good results of the machine learning models (Supplementary Tables 6 and 7), we interpret these results as supportive to our finding that the relationship of soil Q 10 and the included independent controls are primarily nonlinear.

Data availability
Datasets used in this study are deposited in a permanent open access online repository at ETH Zurich's Research Collection under the following DOI: https://doi.org/10.3929/ethz-b-000479158.

Code availability
The R code used and produced in this study is deposited in a permanent open-access online repository at ETH Zurich's Research Collection at https://doi.org/10.3929/ethz-b-000479158.