Journal of Global Change Data & Discovery2025.9(2):175-188

[PDF] [DATASET]

Citation:Lu, H. Y., Zhang, R., Jiang, L. L.Dataset Development of the Habitat Suitability and Richness of 285 Bird Species in China[J]. Journal of Global Change Data & Discovery,2025.9(2):175-188 .DOI: 10.3974/geodp.2025.02.06 .

Dataset Development of the Habitat Suitability and Richness of 285 Bird Species in China

Lu, H. Y.1,2  Zhang, R.1*  Jiang, L. L.3

1. State Key Laboratory of Cryospheric Science and Frozen Soil Engineering, Northwest Institute of Eco- Environment and Resources, Chinese Academy of Sciences, Lanzhou 730000, China;

2. School of Geography Science and Geomatics Engineering, Suzhou University of Science and Technology, Suzhou 215009, China;

3. School of Geography and Tourism, Anhui Normal University, Wuhu 241002, China

 

Abstract: China is one of the countries with the richest bird populations in the world. Based on bird observation data from the eBird and GBIF platforms, along with digital elevation model (DEM), annual precipitation, annual mean temperature, and evapotranspiration data, the authors developed the dataset of habitat suitability and richness of 285 bird species in China using the Biomod2 platform with GLM, MaxEnt, RF, and ensemble models. This dataset includes data of 285 bird species, such as the Northern Goshawk, Common Starling, and Eurasian Skylark, for the years 2000, 2005, 2010, 2015, and 2020, covering habitat suitability and species richness. Model validation results indicate high accuracy, with an average AUC of 0.991 and an average TSS of 0.923 for the test set. The dataset has a spatial resolution of 0.05?? and a temporal resolution of 5 years in .img and .tif data formats of 1,430 data files with data size of 5.66 GB (compressed into one single file with 37 MB).

Keywords: biodiversity; species distribution models; birds; remote sensing monitoring

DOI: https://doi.org/10.3974/geodp.2025.02.06

Dataset Availability Statement:

The dataset supporting this paper was published and is accessible through the Digital Journal of Global Change Data Repository at: https://doi.org/10.3974/geodb.2025.01.08.V1.

1 Introduction

As one of the most widely distributed species in the world, birds are not only indispensable members of ecosystems[1], but they can also help alleviate human psychological anxiety and promote physical and mental well-being[2]. However, The IUCN Red List of Threatened Species shows that hundreds of bird species worldwide are either extinct or extinct in the wild, and over 15% of bird species are categorized as Critically Endangered (CR), Endangered (EN), or Vulnerable (VU)[3]. The process of urbanization has converted large areas of biodiversity-rich lands, such as forests and wetlands, into construction land, compressing the living space for birds[4,5]. At the same time, the impacts of climate change on bird diversity are further intensifying. Since the Industrial Revolution in the 19th century, urbanization has been continuously advancing, and global temperatures have been steadily rising. Some bird species have shown a declining population trend due to their inability to adapt to climate change[6,7]. Birds are being forced to migrate spontaneously to cooler areas, such as the poles and high-altitude regions, to cope with climate change[8,9].

Species Distribution Models (SDMs), also known as niche models, are methods used to simulate potential species distributions by utilizing species observation data and environm­ental data[10]. SDMs can simulate species habitat suitability using only environmental data and a small number of species observation samples, making them an indispensable method for bird conservation[11,12]. Biomod2, a mainstream species distribution modeling platform, integrates several mainstream SDMs, such as GLM (Generalized Linear Model), MaxEnt (Maximum Entropy Model), and RF (Random Forest) models, and provides ensemble models (EM) to improve prediction accuracy by integrating different models[13].

For SDMs, the higher the quality of species observation data, the more accurate the simulation results tend to be. eBird provides bird observation data with a large number of records, broad spatial coverage, and long time-series data, making it the most widely used bird observation platform among citizen science platforms[14,15]. The Global Biodiversity Information Facility (GBIF) website, as the leading species observation record platform globally, collects species observation data from eBird. Therefore, this study uses bird observation data from eBird and GBIF to construct 3 single-species distribution models and ensemble models using Biomod2, to simulate bird distributions.

This study is based on bird observation data from the eBird and GBIF platforms and has developed a large-scale distribution dataset covering 285 bird species. The purposes of this dataset are: (1) to provide a basis for studying the spatial distribution patterns of birds; (2) to offer a reference for assessing changes in bird habitats under global climate change; and (3) to supply data support for the formulation of ecological conservation policies and the delineation of ecological redline areas.

2 Metadata of the Dataset

The metadata of the Dataset of habitat suitability and richness of 285 bird species in China[16] includes information on geographical region, data period, temporal resolution, spatial resolution, dataset composition, data publication and sharing platform, and data sharing policy, as shown in Table 1. 

3 Methods

The study first processed bird observation data by time series, splitting the data into different years based on observation dates. Then, the data for each year were further divided by species. Finally, spatial filtering was applied to remove records that were spatially close, ensuring relative spatial independence of bird observations. To improve model reliability and accuracy, only species with more than 5 available records each year during the study period were selected. Ultimately, 285 bird species were included.

 

Table 1  Metadata summary of the Dataset of habitat suitability and richness of 285 bird species in China

Item

Description

Dataset full name

Dataset of habitat suitability and richness of 285 bird species in China

Dataset short name

HabitatSuitability&Richness2000-2020

Authors

Lu, H. Y., Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, 19200214139@post.usts.edu.cn

Zhang, R., Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, zhangrui@radi.ac.cn

Jiang, L. L., Anhui Normal University, jiangll@ahnu.edu.cn

Geographical region

China (excluding smaller islands and reefs)

Year

2000?C2020

Temporal resolution

One phase every 5 years

Spatial resolution

0.05??

Data format

.img, .tif

Data size

37 MB (compressed)

Data files

Habitat suitability and species richness of 285 bird species for 2000, 2005, 2010, 2015, and 2020

Foundation

Jiangsu Education Department (202210332083Y)

Data publisher

Global Change Research Data Publishing & Repository, http://www.geodoi.ac.cn

Address

No. 11A, Datun Road, Chaoyang District, Beijing 100101, China

Data sharing policy

(1) Data are openly available and can be free downloaded via the Internet; (2) End users are encouraged to use Data subject to citation; (3) Users, who are by definition also value-added service providers, are welcome to redistribute Data subject to written permission from the GCdataPR Editorial Office and the issuance of a Data redistribution license; and (4) If Data are used to compile new datasets, the ??ten percent principal?? should be followed such that Data records utilized should not surpass 10% of the new dataset contents, while sources should be clearly noted in suitable places in the new dataset[17]

Communication and searchable system

DOI, CSTR, Crossref, DCI, CSCD, CNKI, SciEngine, WDS, GEOSS, PubScholar, CKRSC

 

 

 

Figure 1  Technical workflow

Species distributions were analyzed using the Biomod2 platform in R language, which integrates multiple single-species distribution models and builds ensemble models based on them. 3 single models??GLM, MaxEnt, and RF??were used and combined to construct an EM (Figure 1). GLM handles various data types and offers good interpretability. MaxEnt, based on information entropy, is among the most popular SDMs. RF is a machine learning method that builds multiple decision trees and aggregates predictions to improve accuracy and robustness, effectively capturing complex predictor relationships. During modeling, pseudo-absence points equal in number to presence points were generated. For each run, 80% of the data were randomly selected as training data and 20% as testing data. Each model run was repeated 5 times. 3 sets of pseudo-absence points were created, and models were run on each.

This study selected 4 environmental variables that may influence species distribution, including elevation, temperature, precipitation, and evapotra­ns­p­iration intensity (Table 2). The elevation data were sourced from the WorldClim dataset version 2.1. The temperature, precipitation, and evapotrans­piration data were sourced from the National Tibetan Plateau Data Center, with the dataset being derived from meteorological station data through interpolation and downscaling techniques[18].

 

Table 2  Data sources of environmental variables

Data name

Data source

Elevation

WorldClim dataset[19]

Average temperature

China 1-km monthly average temperature dataset (1901?C2023)[20]

Average precipitation

China 1-km monthly precipitation dataset (1901?C2023)[21]

Evapotranspiration intensity

China 1-km monthly potential evapotranspiration dataset (1901?C2023)[22]

4 Data Results and Validation

4.1 Dataset Composition

The Dataset of habitat suitability and richness of 285 bird species in China containing 285 .img files. The files are named as ??XXXX.img,?? where XXXX represents the species name. Additionally, each year contains one .tif file representing species richness, which is derived from the overlay of habitat suitability for all 285 bird species. The .img and .tif files, where XXXX denotes the species name, can be processed in GIS software and relevant programming environments. The dataset has a temporal resolution of 5 years per period, comprising a total of 5 periods, with each period consisting of 285 .img files and one .tif file, amounting to 286 files per period.

4.2 Data Products

The species richness of 285 bird species exhibited significant spatial heterogeneity, generally showing a decreasing gradient from the southeastern coastal areas to the northwestern inland regions (Figure 2). This distribution pattern is closely related to regional ecological and environmental differences. The coastal areas, characterized by a warm and humid monsoon climate, provide favorable conditions for vegetation growth and abundant food resources for birds. Additionally, coastal regions contain diverse ecosystems such as tidal flats, wetlands, and mangroves, offering rich habitats for many bird species. Due to its unique geographic location, the Qinghai-Xizang Plateau shows large internal variation in bird species richness. The southern part of the plateau benefits from better hydrothermal conditions, richer vegetation cover, and abundant food resources, which are more conducive to bird survival and reproduction. In contrast, the northern part of the plateau is hindered by topography and has less favorable hydrothermal conditions, resulting in lower bird species richness. Regions such as Xinjiang and Qinghai are dominated by deserts and semi-arid grasslands with low vegetation coverage and dry climates lacking water resources. The combined influence of vegetation and climate leads to low bird species richness in these areas. The northeastern region, especially Heilongjiang Province, experiences low temperatures and prolonged cold periods. This climate not only directly limits bird survival but also indirectly reduces food availability by affecting vegetation growth and insect activity, resulting in low bird species richness.

4.3 Data Validation

The Area Under the Curve (AUC) is an important metric for evaluating the perfor­ma­nce of species distribution models, rang­ing from 0 to 1. A higher AUC value indi­cates better predictive ability of the model[23]. The True Skill Statistic (TSS= Sen­sit­ivity+Specificity ?C1) is also frequ­en­tly used to assess model performance, ranging from ?C1 to 1, with higher values indicating better model accuracy[24]. Since using AUC alone may lead to overfitting

 

Figure 2  Maps of bird species diversity in China

 

 

Figure 3  Comparison of the accuracy of Species Distribution Models

 

and inaccurate assessment of model accuracy, this study employed both AUC and TSS to evaluate the predictive performance of species distribution models[25,26].

As shown in Figure 3, the mean AUC for the test set was 0.953 and the average TSS was 0.849 for the GLM model. For MaxEnt, the test set mean AUC was 0.930 and average TSS was 0.822. For RF, the test set mean AUC reached 0.982 with an average TSS of 0.920. The EM achieved the highest performance with a test set mean AUC of 0.991 and average TSS of 0.923. All 4 models had mean AUC values above 0.9 and average TSS values above 0.8, indicating excellent predictive performance. Among the single-species models, RF showed higher accuracy than MaxEnt and GLM, and the EM outperformed all individual models. Therefore, the EM model was used as the final predictive model in this study.

5 Discussion and Conclusion

The rapid advancement of urbanization has led to a decline in ecological environmental quality, global climate change, and the destruction of biological habitats, resulting in threats to bird survival[7,27]. The Dataset of habitat suitability and richness of 285 bird species in China utilizes bird observation data from the eBird and GBIF platforms, sampled using R language and ArcGIS software. Based on climate data and the Biomod2 programming platform, this dataset calculates the suitable habitat data for 285 bird species across the country from 2000 to 2020. The dataset reveals that the habitat suitability and species richness of 285 bird species in China exhibit significant spatial heterogeneity, generally showing a decreasing spatial distribution pattern from the southeastern coastal areas to the northwestern inland regions. Although the accuracy of the dataset meets the requirements for species distribution modeling, the following issues should still be noted during its use: The spatial distribution of species observation data from citizen science platforms is not uniform, and observation locations are not fixed, which inevitably affects the results of species distribution modeling. Furthermore, although the dataset employs ensemble models to improve accuracy, species distribution models simulate the potential distribution range of species rather than their actual distribution range, leading to discrepancies with real-world scenarios.

 

Author Contributions

Zhang, R. contributed to the overall design of the dataset development; Lu, H. Y. and Jiang, L. L. collected and processed the bird distribution data; Lu, H. Y. designed the models and algorithms, conducted data validation; Lu, H. Y. and Zhang, R. wrote the data paper.

 

Conflict of Interest Statement

The authors declare no conflicts of interest.

 

References

[1]        Ding, Z. H., Cao, J. J., Wang, Y. The construction and optimization of habitat networks for urban-natural symbiosis: a case study of the main urban area of Nanjing [J]. Forests, 2023, 14(1): 18. 

[2]        Buxton, R. T., Pearson, A. L., Lin, H. Y., et al. Exploring the relationship between bird diversity and anxiety and mood disorder hospitalisation rates [J]. Geo-Geography and Environment, 2023, 10(2): 8. 

[3]        Lees, A. C., Haskell, L., Allinson, T., et al. State of the world??s birds [J]. Annual Review of Environment and Resources, 2022, 47: 231‒260. 

[4]        Xu, C. L., Yu, Q., Wang, F., et al. Identifying and optimizing ecological spatial patterns based on the bird distribution in the Yellow River Basin, China [J]. Journal of Environmental Management, 2023, 348: 13. 

[5]        Lu, H. Y., Shang, Z. Y., Ruan, Y. L., et al. Study on urban expansion and population density changes based on the inverse S-shaped function [J]. Sustainability, 2023, 15(13): 19. 

[6]        Moller, A. P., Rubolini, D., Lehikoinen, E. Populations of migratory bird species that did not show a phenological response to climate change are declining [J]. Proceedings of the National Academy of Sciences of the United States of America, 2008, 105(42): 16195‒16200. 

[7]        Liu, Z. X., Zhang, W. W., Lu, H. Y., et al. Exploring evolution characteristics of eco-environment quality in the Yangtze River Basin based on remote sensing ecological index [J]. Heliyon, 2023, 9(12): 14. 

[8]        Zhu, B. R., Verhoeven, M. A., Velasco, N., et al. Current breeding distributions and predicted range shifts under climate change in two subspecies of Black-tailed Godwits in Asia [J]. Global Change Biology, 2022, 28(18): 5416‒5426. 

[9]        Virkkala, R., Rajasärkkä, A., Heikkinen, R. K., et al. Birds in boreal protected areas shift northwards in the warming climate but show different rates of population decline [J]. Biological Conservation, 2018, 226: 271‒279. 

[10]     Rousseau, J. S., Betts, M. G. Factors influencing transferability in species distribution models [J]. Ecography, 2022, 2022(7): 13. DOI: 10.1111/ecog.06060.

[11]     Gaul, W., Sadykova, D., White, H. J., et al. Data quantity is more important than its spatial bias for predictive species distribution modelling [J]. Peerj, 2020, 8: 27. 

[12]     Yu, H., Cooper, A. R., Infante, D. M. Improving species distribution model predictive accuracy using species abundance: application with boosted regression trees [J]. Ecological Modelling, 2020, 432: 11. 

[13]     Thuiller, W., Lafourcade, B., Engler, R., et al. BIOMOD??a platform for ensemble forecasting of species distributions [J]. Ecography, 2009, 32(3): 369‒373. 

[14]     Neate-Clegg, M. H. C., Horns, J. J., Adler, F. R., et al. Monitoring the world??s bird populations with community science data [J]. Biological Conservation, 2020, 248: 7. 

[15]     Tejeda, I., Medrano, F. eBird as a tool to improve the knowledge of Chilean birds [J]. Revista Chilena de Ornitologia, 2018, 24(2): 85‒94. 

[16]     Lu, H. Y., Zhang, R., Jiang, L. L. Dataset of habitat suitability and richness of 285 bird species inChina [J/DB/OL]. Digital Journal of Global Change Data Repository, 2025. https://doi.org/10.3974/geodb.2025.01.08.V1.

[17]     GCdataPR Editorial Office. GCdataPR data sharing policy [OL]. https://doi.org/10.3974/dp.policy.2014.05 (Updated 2017).

[18]     Peng, S. Z., Ding, Y. X., Wen, Z. M., et al. Spatiotemporal change and trend analysis of potential evapotranspiration over the Loess Plateau of China during 2011‒2100 [J]. Agricultural and Forest Meteorology, 2017, 233: 183‒194. 

[19]     Fick, S. E., Hijmans, R. J. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas [J]. International Journal of Climatology, 2017, 37(12): 4302‒4315. 

[20]     Peng, S. Z. 1-km monthly mean temperature dataset for china (1901‒2023) [DB/OL]. National Tibetan Plateau/Third Pole Environment Data Center, 2024. https://doi.org/10.11888/Meteoro.tpdc.270961.

[21]     Peng, S. Z. 1-km monthly precipitation dataset for China (1901‒2023) [DB/OL]. National Tibetan Plateau/ Third Pole Environment Data Center, 2024. https://doi.org/10.5281/zenodo.3114194.

[22]     Peng, S. Z. 1-km monthly potential evapotranspiration dataset for China (1901‒2023) [DB/OL]. National Tibetan Plateau/Third Pole Environment Data Center, 2024. https://doi.org/10.11866/db.loess.2021. 001. 

[23]     Phillips, S. J., Anderson, R. P., Schapire, R. E. Maximum entropy modeling of species geographic distributions [J]. Ecological Modelling, 2006, 190(3‒4): 231‒259. 

[24]     Xu, L., Fan, Y., Zheng, J. H., et al. Impacts of climate change and human activity on the potential distribution of Aconitum leucostomum in China [J]. Science of the Total Environment, 2024, 912: 12. 

[25]     Adeyemo, S. M., Granger, J. J. Habitat suitability model and range shift analysis for American Chestnut (Castanea dentata) in the United States [J]. Trees Forests and People, 2023, 11: 13. 

[26]     Lobo, J. M., Jim??nez-Valverde, A., Real, R. AUC: a misleading measure of the performance of predictive distribution models [J]. Global Ecology and Biogeography, 2008, 17(2): 145‒151. 

[27]     Zhang, W. W., Liu, Z. X., Qin, K., et al. Long-term dynamic monitoring and driving force analysis of eco-environmental quality in China [J]. Remote Sensing, 2024, 16(6): 22.

Co-Sponsors
Superintend