A Multi-Source Remote Sensing and Machine Learning Integrated Dataset of Multi-Layer Soil Total Nitrogen Content in Taiyuan, China (2020)
SHAO Xin1YANG Ting*2
1 Faculty of Geography,Yunnan Normal University,Kunming 650500,China2 The CAS Engineering Laboratory for Yellow River Delta Modern Agriculture,Institute of Geographic
Sciences and Natural Resources Research,Chinese Academy of Sciences,Beijing 100101,China
DOI:10.3974/geodb.2025.04.01.V1
Published:Apr. 2025
Visitors:2 Data Files Downloaded:0
Data Downloaded: 无 Citations:
Key Words:
GEE,soil total nitrogen,multi-source remote sensing data,machine learning models,
Abstract:
Soil total nitrogen content is a crucial indicator of soil nutrient levels and ecological functions, significantly impacting agricultural productivity, ecological conservation, and environmental safety. This study focuses on Taiyuan City of Shanxi Province of China, utilizing the Google Earth Engine (GEE) cloud computing platform to integrate multi-source remote sensing data and develop a multi-layer (0-200 cm) soil total nitrogen dataset in 2020. The selected environmental factors in-clude: (1) NDVI index (AVHRR NDVI long-term series dataset, 16-day composite, approximately 5.1 km resolution); (2) near-infrared reflectance from Sentinel-2 (Level-2A product, b8 band, 10 m resolution); (3) surface soil moisture (OpenLandMap soil moisture at -33 kPa, b10 band, ap-proximately 250 m resolution); (4) precipitation (CHIRPS dataset, 0.05° resolution, approxi-mately 5.6 km); (5) surface temperature (MOD11A1 dataset, daytime surface temperature LST_Day_1 km band, 1 km resolution); (6) Digital Elevation Model (SRTM DEM dataset, 30 m resolution). To minimize artificial surface interference, land use data at 30 m resolution based on the Chinese Academy of Sciences LUCC classification system was employed to mask built-up areas and water bodies, retaining only natural soil regions for modeling analysis. Three machine learning methods—Random Forest Regression (RF), Classification and Regression Trees (CART), and Gradient Boosting Regression Trees (GBRT)—were utilized for modeling and inversion, with the ISRIC SoilGrids dataset serving as a reference. Cross-validation was conducted using Root Mean Square Error (RMSE) and the coefficient of determination (R²). Results indicated that the RMSE for RF, CART, and GBRT were 0.16 g/kg, 0.21 g/kg, and 0.17 g/kg, respectively, with corresponding R² values of 0.79, 0.64, and 0.75. The dataset comprises the soil total nitro-gen content for Taiyuan City in 2020, including six depth layers (0-5 cm, 5-15 cm, 15-30 cm, 30-60 cm, 60-100 cm, and 100-200 cm), with a spatial resolution of 30 m, and is archived in .tif format, and consists of 18 data files with data size of 1.52 GB (Compressed to one file with 219 MB).
Foundation Item:
Ministry of Science and Technology of P. R. China (2023YFD1701804)
Data Citation:
SHAO Xin, YANG Ting*.A Multi-Source Remote Sensing and Machine Learning Integrated Dataset of Multi-Layer Soil Total Nitrogen Content in Taiyuan, China (2020)[J/DB/OL]. Digital Journal of Global Change Data Repository, 2025. https://doi.org/10.3974/geodb.2025.04.01.V1.
References:
     [1] Pang, Y. G., Zhang, M. H., Jiang, M., et al. Spatial heterogeneity and comprehensive quality assessment of soil physical and chemical properties and microbial characteristics in Gaoyao District, Zhaoqing City, Guangdong Province [J]. Journal of South China Agricultural University, 2025, 46: 151-163.
     [2] Chapin, F. S., Matson, P.A., Vitousek, P. M.. Principles of Terrestrial Ecosystem Ecology [M]. New York: Springer, 2014.
     [3] Htwe, N. M. P. S., Ruangrak, E. A review of sensing, uptake, and environmental factors influencing nitrate accumulation in crops [J]. Journal of Plant Nutrition, 2021.
     [4] Liu, L., Wei, G., Zhou, P. Prediction and mapping of total soil nitrogen in GF-5 imagery using machine learning optimization modeling [J]. Smart Agriculture (English and Chinese), 2024, 6: 61-73.
     [5] Song, X., Zhang, M., Zhou, H. Y,, et al. Estimation of total nitrogen content in soil in the Taihu region based on optimized spectral parameters [J]. Journal of Agricultural Resources and Environment, 2020, 37: 43-50. https://doi.org/10.13254/j.jare.2018.0365.
     [6] Zhang, H. L., Xie, C. Y., Tian, P., et al. Measurement of soil organic matter and total nitrogen using visi-ble/near-infrared spectroscopy and data-driven machine learning methods [J]. Spectroscopy and Spectral Analysis, 2023, 43: 2226-2231.
     [7] Zhao, C. J. Agricultural remote sensing research and application progress [J]. Journal of Agricultural Mechanization, 2014, 45: 277-293.
     [8] Nie, P. C., Qian, C., Qin, R. M., et al. Development status and trends of integrated information perception and fusion technologies for sky-earth integration [J]. Journal of Intelligent Agricultural Equipment (English and Chinese), 2023, 4: 1-11.
     [9] Zhang, S., Zhang, J., Bai, Y., et al. Evaluation and improvement of the daily Boreal Ecosystem Productivity Simulator in simulating gross primary productivity at 41 flux sites across Europe [J]. Ecological Modelling, 2018, 368: 205-232. https://doi.org/10.1016/j.ecolmodel.2017.11.023.
     [10] Yang, Z., Pan, X., Yuan, J., et al. Satellite monitoring of the Taihu blue algae dataset based on the random forest algorithm (2019) [J]. Journal of Global Change Data Science (English and Chinese), 2023, 7: 321-326, 433-438.
     [11] Pan, X. Research on intelligent classification methods for land cover types based on Google Earth Engine cloud platform [D]. Hohhot: Inner Mongolia Agricultural University, 2021.
     [12] Remote sensing cloud computing platform development and Earth science applications [OL]. https://d.wanfangdata.com.cn/Periodical/ygxb202101014. [2025-4-13].
     [13] Prasad, A. M., Iverson, L. R., Liaw, A. Newer tree classification and prediction techniques: Bagging and random forests for ecological regression [J]. Ecosystems, 2006, 9(2): 181-199.
     [14] Breiman, L. Random forests [J]. Machine Learning, 2001, 45: 5-32.
     [15] Wang, D. P., Wang, Z. L., Li, D. Y., et al. Comprehensive non-spectral information classification of desertified land using CART [J]. Journal of Remote Sensing, 2007: 487-492.
     [16] Breiman, L., Friedman, J. H., Olshen, R. A., et al. Classification and Regression Trees [M]. Belmont: Wadsworth International Group, 1984.
     [17] Friedman, J. H. Greedy function approximation: a gradient boosting machine [J]. Annals of Statistics, 2001, 29(5): 1189-1232.
     
Data Product:
ID |
Data Name |
Data Size |
Operation |