Journal of Global Change Data & Discovery2022.6(3):386-394

[PDF] [DATASET]

Citation:Zhang, C. P., Ye, Y., Fang, X. Q.Development of a Global Land Suitability Dataset for Cultivation Based on Physiogeographic Factors[J]. Journal of Global Change Data & Discovery,2022.6(3):386-394 .DOI: 10.3974/geodp.2022.03.08 .

Development of a Global Land Suitability Dataset for Cultivation Based on Physiogeographic Factors

Zhang, C. P.  Ye, Y. *  Fang, X. Q.

Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China

 

Abstract: Land suitability data for cultivation based on physiogeographical factors are the basic input when studying the spatially explicit reconstruction of historical cropland cover. The credibility of the gridded allocated results was partially determined by the quality of land suitability data. The assumption that the physiogeographical factor affects the cultivation intensity with the same linear relationship was unreasonable for most previous studies at global or continental scales. In this study, the correlation between modern cultivation intensity and 13 physiogeographic factors (including climate, topography, soil, etc.) was detected around the world under each 0.5° × 0.5° grid cell, and then the 5′× 5′ global dataset of the land suitability was developed based on the integration of all identified factors that significantly correlated with the cultivation intensity. The results mainly aid research regarding spatially explicit reconstruction of historical cropland cover on the global scale. The dataset is archived in .img format and consists of 4 data files. The data size is 38.7 MB (compressed to 3.55 MB).

Keywords: physiogeographical factors; cropland cover; spatial differentiation; correlation; land suitability for cultivation

DOI: https://doi.org/10.3974/geodp.2022.03.08

CSTR: https://cstr.escience.org.cn/CSTR:20146.14.2022.03.08

Dataset Availability Statement:

The dataset supporting this paper was published and is accessible through the Digital Journal of Global Change Data Repository at: https://doi.org/10.3974/geodb.2022.04.01.V1 or https://cstr.escience.org.cn/CSTR:20146.11.2022.04.01.V1.

1 Introduction

Since the agricultural revolution, anthropogenic land use and cover changes (LUCC) have deeply influenced terrestrial ecosystems and have become one of the increasingly critical driving forces affecting global environmental changes[1–4]. Especially for land used during the industrial revolution, the original natural landscape has undergone unprecedented alteration with the boom in population, which has exerted profound impacts on the Earth’s matter and energy cycle as well as global climate change[5].

Since the 1990s, several global historical LUCC gridded datasets (SAGE, HYDE, PJ, KK10, etc.) has been successfully developed and published, which relies on the integration of multiple disciplines of paleoenvironmental science, archaeology, history, etc.[6], with the promotion of IGBP and PAGEs, as well as the development of model simulation[5,7–10]. As input parameters of climate models, carbon cycle models, etc., these datasets have been widely used in research to assess the impact of anthropogenic land cover change (ALCC) on the carbon budget and climate change. The data quality directly affects the reliability of the simulation and assessment results of past global changes[11,12].

As an extensive and long-lasting land use type, the quantitative reconstruction of cropland area and its gridded spatially explicit allocation have been the main objects of historical LUCC research[13–15]. The uncertainty of the gridded dataset is mostly caused by the allocation algorithms when the quantitative reconstruction of the historical cropland area is relatively accurate[16–18]. In previous studies, most of the gridded allocation methods subjectively select several physiogeographical factors to determine the suitability for cultivation (assuming that there is a single linear relationship between each factor and cultivation intensity), multiply the normalized results of each factor with the same weights to calculate the land suitability for cultivation (here refers to the suitability of the land for planting crops), and then use these data as the weight to allocate the total cropland area in the reconstructed administrative unit to generate the gridded result[14]. For the global and continental scales, however, the above assumptions are obviously unreasonable. On the one hand, any physiogeographical factor has a distinct correlation with the cultivation intensity in different spatial units; on the other hand, the combination of physiogeographical factors that affect cultivation intensity is also different[18].

Given the above mentioned problems, this study collected 13 physiogeographical factors (including the climate, topography, soil, etc.), intended to generally represent the influences on cultivation from the aspects of heat, water availability, soil fertility, labor difficulty, etc. The global land was divided into 0.5°×0.5° regular grids, the correlation between each factor and the modern cultivation intensity was identified in each unit, and all the factors significantly correlated with the cultivation intensity in each unit were normalized. Then, the geometric averages of these normalized results were calculated as the global land suitability for the cultivation dataset.

2 Metadata of the Dataset

The metadata of the Global cultivation land suitability for dataset based on physiogeographic factors[19] are summarized in Table 1.

3 Methods

3.1 Raw Data

Considering the spatial coverage of the study and the data availability, this study selected 13 factors based on expert knowledge of physical geography, including climate, topography, and soil (Table 2). Among them, the climate data came from WorldClim[1][21], and the annual mean temperature and precipitation were selected. The altitude data came from the USGS GTOPO30 DEM[2][22], and the slope was calculated from DEM data. The soil data came from the World Soil Information (ISRIC) SoilGrids dataset[3][23], and 8 factors were selected, including mechanical composition, bulk density, organic carbon density, cation exchange capacity, and pH. NDVI data came from GIMMS[4][24]. The annual value was calculated from

Table 1  Metadata summary of the Global cultivation land suitability for dataset based on physiogeographic factors

Items

Description

Dataset full name

Global cultivation land suitability for dataset based on physiogeographic factors

Dataset short name

GlobalCultivLandSuitability

Authors

Zhang, C. P. HGE-4534-2022, Beijing Normal University, cpzhang@mail.bnu.edu.cn

Ye, Y., Beijing Normal University, yeyuleaffish@bnu.edu.cn

Fang, X. Q., Beijing Normal University, xfang@bnu.edu.cn

Geographical region

Global land area

Year

Modern (approximately 2000 C.E.)

Data Format

.img

Data Size

3.55 MB (compressed)

Data files

The raster dataset of land suitability for cultivation with 5′×5′ resolution

Foundation

Ministry of Science and Technology of P. R. China (2017YFA0603304)

Data publisher

Global Change Research Data Publishing & Repository, http://www.geodoi.ac.cn

Address

No. 11A, Datun Road, Chaoyang District, Beijing 100101, China

Data sharing policy

 

Data from the Global Change Research Data Publishing & Repository includes metadata, datasets (in the Digital Journal of Global Change Data Repository), and publications (in the Journal of Global Change Data & Discovery). Data sharing policy includes: (1) Data are openly available and can be free downloaded via the Internet; (2) End users are encouraged to use Data subject to citation; (3) Users, who are by definition also value-added service providers, are welcome to redistribute Data subject to written permission from the GCdataPR Editorial Office and the issuance of a Data redistribution license; and (4) If Data are used to compile new datasets, the ‘ten per cent principal’ should be followed such that Data records utilized should not surpass 10% of the new dataset contents, while sources should be clearly noted in suitable places in the new dataset[20]

Communication and searchable system

DOI, CSTR, Crossref, DCI, CSCD, CNKI, SciEngine, WDS/ISC, GEOSS

 

Table 2  Basic information on the physiogeographic factors and cropland fraction dataset used in this study

Type

Name

Unit

Resolution

Description

Physiogeo-graphic factors

Annual mean temperature

°C

0.5′×0.5′

WorldClim (1970-2000 C.E.)[21]

Annual mean precipitation

mm

0.5′×0.5′

WorldClim (1970-2000 C.E.)[21]

Altitude

m

3″×3″

GTOPO30 DEM[22]

Slope

°

3″×3″

Calculated from DEM data

NDVI

5′×5′

Derived from GIMMS NDVI dataset[23], the annual value was calculated from the semimonthly data from 1981 to 2015 by using the MVC method, and the multiyear average value was calculated

Bulk density

kg/m3

10″×10″

Derived from SoilGrids[24], calculating the average value of 5 layers from the surface to the underground 60 cm

Organic carbon content

kg/m3

10″×10″

Coarse fragment content

%

10″×10″

Sand content

%

10″×10″

Silt content

%

10″×10″

Clay content

%

10″×10″

Cation exchange capacity

mol/kg

10″×10″

pH

-

10″×10″

Cropland

Cropland Fraction

%

0.5′×0.5′

Adopting the global synergistic cropland fraction dataset developed by Zhang et al.[25]

 

the semimonthly values by using the MVC method, and then multiyear averages from 1982 to 2015 C.E. The modern cropland fraction data adopted the synergistic 0.5′×0.5′ cropland dataset derived from multiple sets of global land cover products developed by Zhang et al.[25]. Compared with the original satellite-derived products, these data have relatively higher reliability in terms of the spatial distribution and cultivation intensity at the pixel scale[25].

The principles of selection considered the representativeness, hierarchy and comprehensiveness of these factors. First, the climate determines the basic pattern of agricultural types and the distribution of cropland on macroscales; second, the large geomorphological structure and topography affect the farming methods and utilization intensity; third, the distribution of waters determines the preferences of the cultivation order in space; and on a smaller scale, humans can modify the field topography and improve the soil fertility relatively easily. Actually, the influence of these factors on the land suitability for cultivation is the influence of land with different soil fertility (represented by yield) on the preference of land cultivation by humans. With the same labor cost and economic input, it is easier to obtain high yields on land with better soil fertility, so it is most likely to be cultivated first. Data such as potential photosynthetic production and potential farmland production can comprehensively indicate crop yield conditions, considering that they are not independent of temperature, precipitation, soil, etc. This study selected NDVI as the alternative indicator.

3.2 Algorithms

The spatial resolution of all data was unified to 5′×5′ (using the Zonal Statistics tool in ArcGIS), and 0.5°×0.5° was adopted as the basic unit to detect the correlation of physiogeographic factors and cultivation intensity. The basic steps of the development of land suitability for cultivation are described as follows:

(1) In each 0.5°×0.5° grid unit, Pearson correlation analysis (Sig = 0.001) is performed to detect the relationship between physiogeographic factors and cultivation intensity, and the schematic is shown in Figure 1.

 

Figure 1  Schematic of the correlation detection between physiogeographic factors and cultivation intensity under the global 0.5° grid

 

(2) In each 0.5°×0.5° grid, the factors significantly correlated with cultivation intensity were normalized. When it is a positive correlation, Equation 1 is adopted; when it is a negative correlation, Equation 2 is adopted.

                                                                                                              (1)

                                                                                                                (2)

where xiMaxNorm and xiMinNorm are the normalized pixel values in spatial unit i; xi is the 5′ grid value in spatial unit i; ximax is the maximum value in spatial unit i; and ximin is the minimum value in spatial unit i.

(3) Calculate the geometric average value of the normalization results of all factors correlated to the cultivation intensity in spatial unit i (converting the original interval from 0–1 to 1–100) (Equation 3). The previous method directly multiplied the factors, which changed the linear relationship between physiogeographic factors and cultivation intensity to an exponential relationship.

                                                                                                            (3)

where Wi is the land suitability for cultivation in spatial unit i; fni is the Factor n used to construct l and suitability for cultivation in spatial unit i; and n is the number of factors engaged in constructing the land suitability in spatial unit i.

(4) Perform the interval stretching transformation for the results calculated in step 2 with the modern cultivation intensity interval in spatial unit i (Equation 4) and then obtain the dataset of land suitability for cultivation with 5′×5′ resolution (Figure 2). Before identifying the maximum and minimum values of the cropland fraction in spatial unit i, the cropland data were first smoothed by using the focal statistics tool with a 3×3 length in ArcGIS. For the few spatial units where there was no correlation between physiogeographic factors and cultivation intensity, the modern cultivation intensity was used to represent the land suitability.

Figure 2  Schematic of the construction of global land suitability

(“+” indicates a significant positive correlation; “-” indicates a significant negative correlation; “×” indicates no correlation.)

                                                              (4)

where Wimdfy is the modified result of land suitability in spatial unit i; Wi is the value of land suitability in spatial unit i; CFmaxi is the maximum cropland fraction in spatial unit i; and CFmini is the minimum cropland fraction in spatial unit i.

4 Data Results and Validation

4.1 Data Composition

The global dataset of land suitability for cultivation with 5′×5′ resolution based on physiogeographic factors (Figure 3) consists of 4 data files, archived in .img format, and the data size is 38.7 MB (compressed into 1 file, 3.55 MB).

 

Figure 3  The global dataset of land suitability for cultivation with 5′×5′ resolution

4.2 Data Results

Overall, the spatial pattern of land suitability is very similar to the distribution characteristic of cultivation intensity. Namely, high intensity is generally located in major agricultural regions around the world (such as the Eastern European Plain, the North China Plain, the Ganges Plain, and the Central Plains of North America), while in regions with relatively harsh cultivation conditions, the land suitability is generally low. The land suitability result shows slight irrationality in a few regions. Different from the cultivation intensity around space with smooth gradients, the suitability values on both sides of the 0.5° grid boundary show obvious differences. More detailed information on this dataset at the regional scale can be loaded into ArcGIS for viewing.

4.3 Data Validation

Land suitability should effectively indicate the potential cultivation conditions. In this study, modern cultivation intensity data were used to assess the reliability of the land suitability results. According to calculating the correlation between two data (Sig = 0.001) under the 0.5°×0.5° grid unit, the reliability of the land suitability dataset was evaluated. As shown in Figure 4, the land suitability results showed a good positive correlation with the modern cultivation intensity dataset. More than 74% of the regions have correlation coefficients above 0.7.

Figure 4  Correlation between land suitability and modern cultivation intensity under a 0.5°×0.5° grid unit

 

5 Discussion and Conclusion

To analyze the correlation between physiogeographic factors and modern cultivation intensity around the world, the following recognitions were acquired: the relationship between these two varies by region. The relationship should be identified at regional scales. First, the correlation between almost all physiogeographic factors and cultivation intensity does not have a uniform linear relationship on a global scale; it may be positive or negative correlations, and the significance of the correlation coefficient also has obvious differences. Second, the types and quantities of physiogeographic factors that affect cultivation intensity are quite distinct in large regions. The influence of physiogeographic factors on the cultivation intensity in the low mountainous and hilly areas is significantly greater than that in plains and basins.

The spatial differentiation characteristics of the correlation between some physiogeographic factors and cultivation intensity in China are shown in Figure 5. For precipitation, most previous studies do sufficiently assume that the greater the precipitation is, the higher the cultivation intensity; the two sample points in southern Anhui and the Qiongzhou Peninsula, where rainfall is abundant, show a negative correlation and no correlation, respectively. However, only Longdong (with precipitation less than 600 mm) shows a fine positive correlation. NDVI shows a fine positive correlation with the reclamation intensity at the southwestern margin of the Tarim Basin in Xinjiang (the oasis agriculture relying on irrigation), while in the southeastern part of the Northeast Plain, the two show a strong negative correlation.

For the data applicability used in the study of historical periods, the following issues need to be emphasized. First, to indicate the historical cultivation condition by modern land suitability, a basic assumption that should be obeyed is that the relationship between physiogeographic factors and land suitability is constant from ancient to modern times. Namely, the relationship of land suitability for cultivation with climate, topography, and soil has not changed drastically with time. Given that it is difficult to obtain historical physiogeographic factor data (not simulation results) on a large scale, factors such as terrain and soil in historical periods were approximately similar to those in modern periods, the magnitude of climate change was relatively small, and the changes were relatively consistent on a regional scale. Therefore, this study uses modern factors to represent historical conditions.

Figure 5  Example of spatial differentiation of correlations between physiogeographic factors and cultivation intensity in China

(Notes: The upper left shows the coordinates of each sample point (0.5°×0.5° grid). Colors and shapes indicate the factors and the correlation types. The x-axis indicates the cultivation intensity, and the y-axis indicates the normalized value of the factor.)

 

In this study, the correlation between physiogeographic factors and modern cultivation intensity was identified on a global scale, and several physiogeographic factors significantly correlated with cultivation intensity were selected to construct a dataset of global land suitability for cultivation. It has been verified that land suitability results can effectively indicate cultivation intensity. The dataset can indicate the potential cultivation capacity, and this method also provides a new vision for developing a region-based spatially explicit allocation of historical cropland.

Author Contributions

Ye, Y. and Fang, X. Q. designed the overall dataset development. Zhang, C. P. contributed to the data analysis, designed the algorithms and wrote the data paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]      Ellis, E., Kaplan, J., Fuller, D., et al. Used planet: a global history [J]. Proceedings of the National Academy of Sciences, 2013, 110(20): 79787985.

[2]      Foley, J., DeFries, R., Asner, G., et al. Global consequences of land use [J]. Science, 2005, 309(5734): 570574.

[3]      Gaillard, M. LandCover6k: global anthropogenic land-cover change and its role in past climate [J]. PAGES Magazine, 2015, 23(1): 3839.

[4]      Lambin, E., Geist, H. Land-use and Land-cover Change: Local Processes and Global Impacts [M]. Berlin: Springer Science & Business Media, 2008.

[5]      Klein Goldewijk, K., Beusen, A., Doelman, J., et al. Anthropogenic land use estimates for the Holocene–HYDE 3.2 [J]. Earth System Science Data, 2017, 9(2): 927953.

[6]      Moran, E., Ojima, D., Buchmann, B., et al. Global Land Project: Science Plan and Implementation Strategy [M]. Stockholm: IGBP Secretariat, 2005.

[7]      Ramankutty, N., Foley, J., Hall, F., et al. ISLSCP II historical croplands cover, 1700-1992 [DB/OL]. ORNL DAAC, 2010.

[8]      Ramankutty, N., Foley, J. Estimating historical changes in global land cover: croplands from 1700 to 1992 [J]. Global Biogeochemical Cycles, 1999, 13(4): 9971027.

[9]      Pongratz, J., Reick, C., Raddatz, T., et al. A reconstruction of global agricultural areas and land cover for the last millennium [J]. Global Biogeochemical Cycles, 2008, 22(6): 116.

[10]   Kaplan, J., Krumhardt, K., Ellis, E., et al. Holocene carbon emissions as a result of anthropogenic land cover change [J]. The Holocene, 2011, 21(5): 775791.

[11]   Boyle, J., Gaillard, M., Kaplan, J., et al. Modelling prehistoric land use and carbon budgets: a critical review [J]. The Holocene, 2011, 21(5):18.

[12]   Pielke, R., Pitman, A., Niyogi, D., et al. Land use/land cover changes and climate: modeling analysis and observational evidence [J]. Wiley Interdisciplinary Reviews: Climate Change, 2011, 2(6): 828850.

[13]   Ge, Q. S., Dai, J. H., He, F. N., et al. Numerical changes and driving factor of provincial cropland resources in China over the past 300 years [J]. Natural Resources Advance, 2003, 13(8): 825832.

[14]   Lin, S. S., Zheng, J. Y., He, F. N. The approach for gridding data derived from historical cropland records of the traditional cultivated region in China [J]. Acta Geographica Sinica, 2008, 61(1): 8392.

[15]   Ye, Y., Fang, X. Q., Ren, Y. Y., et al. Reconstruction of cropland cover changes in the Northeast China over the past 300 years [J]. Science China: D Series, 2009, 39(3): 340350.

[16]   He, F. N., Li, S. C., Zhang, X. Z., et al. Comparisons of cropland area from multiple datasets over the past 300 years in the traditional cultivated region of China [J]. Journal of Geographical Sciences, 2013, 23(6): 978990.

[17]   Li, S. C., He, F. N., Zhang, X. Z. A spatially explicit reconstruction of cropland cover in China from 1661 to 1996 [J]. Regional Environmental Change, 2016, 16(2): 417428.

[18]   Yang, X., Jin, X., Guo, B., et al. Research on reconstructing spatial distribution of historical cropland over 300 years in traditional cultivated regions of China [J]. Global and Planetary Change, 2015, 128: 90102.

[19]   Zhang, C. P., Ye, Y., Fang, X. Q. Global cultivatable land suitability dataset based on physical-geographic factors [J/DB/OL]. Digital Journal of Global Change Data Repository, 2022. https://doi.org/ 10.3974/geodb.2022.04.01.V1. https://cstr.escience.org.cn/CSTR:20146.11.2022.04.01.V1.

[20]   GCdataPR Editorial Office. GCdataPR data sharing policy [OL]. https://doi.org/10.3974/dp.policy.2014.05 (Updated 2017).

[21]   Fick, S., Hijmans, R. WorldClim2: new 1km spatial resolution climate surfaces for global land areas [J]. International Journal of Climatology, 2017, 37(12): 43024315.

[22]   Danielson, J., Gesch, D. Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) [M]. Washington, DC, USA: US Department of the Interior, US Geological Survey, 2011.

[23]   Pinzon, J., Tucker, C. A non-stationary 1981–2012 AVHRR NDVI3g time series [J]. Remote Sensing, 2014, 6(8): 69296960.

[24]   Mantel, S., Kempen, B. SoilGrids250m: Global gridded soil information based on machine learning [J]. PLoS ONE, 2017, 12: e0169748.

[25]   Zhang, C., Ye, Y., Fang, X., et al. Synergistic modern global 1 km cropland dataset derived from multi-sets of land cover products [J]. Remote Sensing, 2019, 11(19): 1–18.



[1] https://www.worldclim.org/data/worldclim21.html.

[2] https://earthexplorer.usgs.gov/.

[3] https://soilgrids.org/.

[4] https:// ecocast.arc.nasa.gov/data/pub/gimms/3 g.v1/.

Co-Sponsors
Superintend