Earthquake loss prediction based on random forest algorithm
-
摘要:
针对现有的基于实际震害评估的大多研究仅限于某特定区域和某种结构类型,且所采用的数据样本量也十分有限,本文基于随机森林模型,采用2011年3月11日东日本MW9.0地震的37万8 037条建筑物实际震害数据,利用美国应用技术协会发布的地震震害等级划分标准(ATC-13)预测了建筑物地震破坏所引起的损失,对建筑物损失的影响因素进行了特征重要性分析。结果显示:通过合成少数类过采样技术(SMOTE)解决数据不均衡和贝叶斯优化超参数之后,得到了基于随机森林的预测模型测试集的准确率为68.8%,轻微破坏、中等破坏、严重破坏、倒塌等四种破坏等级的召回率分别为65.0%,53.6%,74.8%,81.8%;考虑生命安全性能将模型转换为二分类之后,模型准确率进一步提高至87.5%,极大地改善了现有研究应用于建筑损失预测中数据样本量受限、数据不均衡等导致的最严重破坏等级精度低等问题。对随机森林模型特征重要性的研究表明:震中距、峰值加速度和vS30是最影响模型输出的特征。
Abstract:Rapid assessment of building damage and its severity after an earthquake is crucial for emergency response and recovery. Accurate earthquake damage assessment is crucial for pre-earthquake disaster prevention and mitigation, post-earthquake disaster relief, and rapid reconstruction. Most existing studies based on actual earthquake damage assessment are limited to a specific region and a particular structure type, and the number of data samples used is also limited, resulting in subpar generalization performance for the model. Many factors affect the loss of buildings due to earthquakes. Traditional methods cannot fully consider the complex mapping relationship between the influencing factors. Therefore, finding a method to quickly and accurately assess building damage is essential. Machine learning provides a data-driven artificial intelligence method that can handle complex nonlinear relationships between input and output parameters by learning the underlying laws of big data. This paper proposes an earthquake damage prediction model based on combination of Bayesian optimization algorithm, synthetic minority over-sampling technique (SMOTE), and random forest algorithm. The core of the Bayesian optimization algorithm takes prior knowledge into account. It can continuously update and iterate until the optimal parameter combination is fitted, solving the problem of slow efficiency of traditional parameter adjustment. The core of the SMOTE method is to generate data samples of a few categories, solving the problem of uneven distribution of data samples. Based on the random forest model, this paper uses 378 037 actual building damage data from the March 11, 2011, MW9.0 Tohoku-Oki, Japan earthquake, comprehensively considers multidimensional building information such as ground shaking information, site information, and structural characteristics, and uses the earthquake damage classification issued by the American Applied Technical Council (ATC-13). This model can predict the damage caused by earthquake damage to buildings and analyze the feature importance of factors affecting building damage. The results show that after using SMOTE method to solve data imbalance and the Bayesian approach to optimize hyper-parameters, the accuracy on the test set of the random forest-based prediction model is 68.8%, and the recall rates for minor damage, moderate damage, severe damage and collapse are 65.0%, 53.6%, 74.8%, and 81.8%, respectively; the accuracy of the model is further increased to 87.5% by considering the life safety performance to convert the model to dichotomous classification, which significantly improves the existing research problems in building loss prediction, such as limited data, lack of regional generalization, lack of diversity in building attributes, imprecise classification of damage levels and low accuracy of the most severe damage state. The study of the importance of random forest features showed that the epicenter distance, PGA and vS30 have the most significant influences on the model output.The earthquake damage assessment model established by this study can achieve rapid and relatively accurate prediction of building damage caused by earthquakes, which is beneficial for pre-earthquake planning and timely rescue after the earthquake.
-
Keywords:
- building loss data /
- random forest /
- earthquake loss prediction /
- feature importance
-
-
图 3 随机森林四个超参数的学习曲线图
(a) 决策树个数;(b) 决策树最大深度;(c) 叶子节点最少样本数;(d) 节点划分最小样本数
Figure 3. Learning curves of four hyper-parameters for random forest model
(a) Number of estimators;(b) Maximum depth of estimators;(c) Minimum number of samples required to be at a leaf node;(d) Minimum number of samples required to split an internal node
表 1 依据ATC-13划分的四种破坏等级数据统计
Table 1 Statistics on four types of damage levels data according to ATC-13
建筑物破坏等级 损失率 记录数量 轻微破坏(0类) 5%<Dr≤10% 136 334 中等破坏(1类) 10%<Dr≤30% 178 594 严重破坏(2类) 30%<Dr≤60% 33 079 倒塌(3类) 60%<Dr≤100% 30 029 总数量 378 037 表 2 建筑物数据集的机器学习模型输入特征
Table 2 Input features of machine learning model for building datasets
类别 影响因素 影响因素的特征描述 计算方法或数据来源 地震
信息PGA 地震动峰值加速度 Zhao等(2 016a,b)公式 震中距 地震震中至建筑物地面距离 Robusto (1 957)计算
两点经纬度距离公式建筑物
信息层数 建筑物层数 数据库 地区编号 47个都道府县 建筑物建造年代 ① 1 867—1910;② 1 911—1 924;③ 1 925—1 987;④ 1 988—2 011 外墙材料类型 ① 混凝土;② 蒸压轻质混凝土;③ 砌块;④ 砂浆;⑤ 抹灰;
⑥ 镶石砖;⑦ 金属板;⑧ 玻璃板;⑨ 石板;⑩ 金属陶瓷;
⑪ 土藏造;⑫ 木制壁板;⑬ 木板建筑物结构类型 ① 木结构;② 混合结构(木和砂浆);③ 土藏造;④ 砌块;
⑤ 砌体结构;⑥ 钢结构;⑦ 混凝土结构;⑧ 其它柱子材料类型 ① 混凝土柱;② 防火涂层钢结构;③ 钢结构;④ 木框架;
⑤ 双料组合;⑥ 其它屋顶材料类型 ① 混凝土;② 金属板;③ 石板;④ 瓷砖瓦;⑤ 合成树脂;
⑥ 木板;⑦ 茅草建筑物使用用途 ① 住宅;② 其它 场地信息 vS30 地表以下30 m土层的加权平均剪切波速 USGS (2 007) 表 3 混淆矩阵
Table 3 Confusion matrix
混淆矩阵 预测值 正类 负类 真实值 正类 真阳性 假阴性 负类 假阳性 真阴性 表 4 随机森林模型超参数优化方法对比
Table 4 Comparison of hyper-parameter optimization methods for random forest models
超参数优化方法 决策树棵数 决策树最大深度 叶子节点最少样本数 节点划分最小样本数 模型准确率 贝叶斯优化 654 48 1 2 68.8% 学习曲线方法 550 20 2 8 65.9% 表 5 基于实地调查数据的震害预测研究结果比较
Table 5 Comparison of seismic damage prediction research results based on field survey data
来源 模型类型 分类
类型研究数据 最严重破坏
状态准确率模型
准确率Harirchian等(2 021b) 极端随机树 三分类 2 016年厄瓜多尔地震,172座受损钢筋混凝土建筑物 55.0% 70.2% 三分类 2 010年海地地震,145座受损钢筋混凝土建筑物 41.2% 58.5% 四分类 2 017年韩国浦项市地震,74座受损建筑物 50.0% 60.0% Harirchian等(2 020c) 支持向量机 三分类 2 016年厄瓜多尔地震,171座受损钢筋混凝土建筑物 54.0% 60.0% 三分类 2 010年海地地震,142座受损钢筋混凝土建筑物 54.0% 68.0% 四分类 2 015年尼泊尔地震,138座受损钢筋混凝土建筑物 33.0% 67.0% 四分类 2 017年韩国浦项市地震,67座受损建筑物 2 0.0% 48.0% Harirchian等(2 020a) 多层感知器 五分类 1 999年土耳其迪兹杰地震,484座受损建筑物 71.4% 52.0% Harirchian等(2 020b) 支持向量机 五分类 1 999年土耳其迪兹杰地震,484座受损建筑物 54.5% 52.0% Mangalathu和Burton (2 019) 长短期记忆网络 三分类 2 014年美国纳帕南部地震, 3423 座受损建筑物63.0% 86.0% Mangalathu等(2 020) 随机森林 三分类 2 014年美国纳帕南部地震, 2276 座受损建筑物13.0% 66.0% Roeslin等( 2020) 随机森林 二分类 2 017年墨西哥普埃布拉地震,237座受损建筑物 78.0% 67.0% Stojadinović等(2 022) 随机森林 五分类 2 010年塞尔维亚克拉列沃地震,1 979座受损建筑物 30.0% 85.0% Ghimire等(2 022) 随机森林 三分类 2 015年尼泊尔地震,76.2万座受损建筑物 70.0% 64.0% 本文 随机森林 四分类 2 011年东日本大地震,37.8万座受损建筑物 81.8% 68.8% 二分类 88.0% 87.5% -
鲍跃全,李惠. 2019. 人工智能时代的土木工程[J]. 土木工程学报,52(5):1–11. Bao Y Q,Li H. 2019. Artificial intelligence for civil engineering[J]. China Civil Engineering Journal,52(5):1–11 (in Chinese).
孙柏涛,胡少卿. 2005. 基于已有震害矩阵模拟的群体震害预测方法研究[J]. 地震工程与工程振动,25(6):102–108. Sun B T,Hu S Q. 2005. A method for earthquake damage prediction of building group based on existing earthquake damage matrix[J]. Earthquake Engineering and Engineering Vibration,25(6):102–108 (in Chinese).
王健峰. 2012. 基于改进网格搜索法SVM参数优化的说话人识别研究[D]. 哈尔滨:哈尔滨工程大学:25−26. Wang J F. 2012. Study on Speaker Recognition Based on Improved Grid Search Parameters Optimization Algorithm of SVM[D]. Harbin:Harbin Engineering University:25−26 (in Chinese).
王自法,Park S,Lee S,崔凯. 2014. 提高地震灾害损失估计精度的几点研究[J]. 地震工程与工程振动,34(4):110–114. Wang Z F,Park S,Lee S,Cui K. 2014. Quantification improvement of earthquake loss estimation[J]. Earthquake Engineering and Engineering Dynamics,34(4):110–114 (in Chinese).
隗永刚,蒋长胜. 2021. 人工智能技术在地震减灾应用中的研究进展[J]. 地球物理学进展,36(2):516–524. doi: 10.6038/pg2021EE0164 Wei Y G,Jiang C S. 2021. Research progress of artificial intelligence technology in the application of earthquake disaster reduction[J]. Progress in Geophysics,36(2):516–524 (in Chinese).
杨旭,李永华,盖增喜. 2021. 机器学习在地震学中的应用进展[J]. 地球与行星物理论评,52(1):76–88. Yang X,Li Y H,Gai Z X. 2021. Machine learning and its application in seismology[J]. Reviews of Geophysics and Planetary Physics,52(1):76–88 (in Chinese).
杨毅,卢诚波,徐根海. 2017. 面向不平衡数据集的一种精化Borderline-SMOTE方法[J]. 复旦学报(自然科学版),56(5):537–544. Yang Y,Lu C B,Xu G H. 2017. A refined borderline-SMOTE method for imbalanced data set[J]. Journal of Fudan University (Natural Science),56(5):537–544 (in Chinese).
于红梅,许建东,张素灵,潘波. 2006. 基于集集地震的建筑物易损性统计分析[J]. 防灾科技学院学报,8(4):17–20. doi: 10.3969/j.issn.1673-8047.2006.04.004 Yu H M,Xu J D,Zhang S L,Pan B. 2006. The statistical analysis of building vulnerability research on Jiji earthquake[J]. Journal of Institute of Disaster Prevention,8(4):17–20 (in Chinese).
张风华,谢礼立,范立础. 2004. 城市建构筑物地震损失预测研究[J]. 地震工程与工程振动,24(3):12–20. doi: 10.3969/j.issn.1000-1301.2004.03.002 Zhang F H,Xie L L,Fan L C. 2004. A study on disaster loss prediction caused by damaged structures under earthquake[J]. Earthquake Engineering and Engineering Vibration,24(3):12–20 (in Chinese).
张桂欣,孙柏涛. 2018. 基于模糊层次分析的建筑物单体震害预测方法研究[J]. 工程力学,35(12):185–193. Zhang G X,Sun B T. 2018. Seismic damage prediction for a single building based on a fuzzy analytical hierarchy approach[J]. Engineering Mechanics,35(12):185–193 (in Chinese).
张浩. 2018. 自动化特征工程与参数调整算法研究[D]. 成都:电子科技大学:26. Zhang H. 2018. Research of Automatic Feature Engineering and Parameter Adjustment Algorithm[D]. Chengdu:University of Electronic Science and Technology of China:26 (in Chinese).
张天翼,丁立新. 2021. 一种基于SMOTE的不平衡数据集重采样方法[J]. 计算机应用与软件,38(9):273–279. Zhang T Y,Ding L X. 2021. A new resampling method based on SMOTE for imbalanced data set[J]. Computer Applications and Software,38(9):273–279 (in Chinese).
赵登科,王自法,刘渊,仝文博. 2021. 基于新西兰实际震害资料的地震损失不确定性分析[J]. 地震工程与工程振动,41(2):84–95. Zhao D K,Wang Z F,Liu Y,Tong W B. 2021. Earthquake loss uncertainty based on detailed loss data in New Zealand[J]. Earthquake Engineering and Engineering Dynamics,41(2):84–95 (in Chinese).
Applied Technology Council. 1985. Earthquake Damage Evaluation Data for California[M]. Redwood City:Applied Technology Council:167−219.
Bergstra J,Bengio Y. 2012. Random search for hyper-parameter optimization[J]. J Machine Learn Res,13:281–305.
Breiman L. 2001. Random forests[J]. Mach Learn,45(1):5–32. doi: 10.1023/A:1010933404324
Calvi G M,Pinho R,Magenes G,Bommer J J,Restrepo-Vélez L F,Crowley H. 2006. Development of seismic vulnerability assessment methodologies over the past 30 years[J]. ISET J Earthq Technol,43(3):75–104.
Chawla N V,Bowyer K W,Hall L O,Kegelmeyer W P. 2002. SMOTE:Synthetic minority over-sampling technique[J]. J Artif Intell Res,16:321–357. doi: 10.1613/jair.953
Ghimire S,Guéguen P,Giffard-Roisin S,Schorlemmer D. 2022. Testing machine learning models for seismic damage prediction at a regional scale using building-damage dataset compiled after the 2015 Gorkha Nepal earthquake[J]. Earthq Spectra,38(4):2970–2993. doi: 10.1177/87552930221106495
Harirchian E,Lahmer T,Rasulzade S. 2020a. Earthquake hazard safety assessment of existing buildings using optimized multi-layer perceptron neural network[J]. Energies,13(8):2060. doi: 10.3390/en13082060
Harirchian E,Lahmer T,Kumari V,Jadhav K. 2020b. Application of support vector machine modeling for the rapid seismic hazard safety evaluation of existing buildings[J]. Energies,13(13):3340. doi: 10.3390/en13133340
Harirchian E,Kumari V,Jadhav K,Raj Das R,Rasulzade S,Lahmer T. 2020c. A machine learning framework for assessing seismic hazard safety of reinforced concrete buildings[J]. Appl Sci,10(20):7153. doi: 10.3390/app10207153
Harirchian E,Hosseini S E A,Jadhav K,Kumari V,Rasulzade S,Işık E,Wasif M,Lahmer T. 2021a. A review on application of soft computing techniques for the rapid visual safety evaluation and damage classification of existing buildings[J]. J Build Eng,43:102536. doi: 10.1016/j.jobe.2021.102536
Harirchian E,Kumari V,Jadhav K,Rasulzade S,Lahmer T,Raj Das R. 2021b. A synthesized study based on machine learning approaches for rapid classifying earthquake damage grades to RC buildings[J]. Appl Sci,11(16):7540. doi: 10.3390/app11167540
Hwang S H,Mangalathu S,Shin J,Jeon J S. 2021. Machine learning-based approaches for seismic demand and collapse of ductile reinforced concrete building frames[J]. J Build Eng,34:101905. doi: 10.1016/j.jobe.2020.101905
Lerman P M. 1980. Fitting segmented regression models by grid search[J]. J R Stat Soc Series C Appl Stat,29(1):77–84.
Mangalathu S,Burton H V. 2019. Deep learning-based classification of earthquake-impacted buildings using textual damage descriptions[J]. Int J Disast Risk Reduct,36:101111. doi: 10.1016/j.ijdrr.2019.101111
Mangalathu S,Sun H,Nweke C C,Yi Z X,Burton H V. 2020. Classifying earthquake damage to buildings using machine learning[J]. Earthq Spectra,36(1):183–208. doi: 10.1177/8755293019878137
Mansourdehghan S,Dolatshahi K M,Asjodi A H. 2022. Data-driven damage assessment of reinforced concrete shear walls using visual features of damage[J]. J Build Eng,53:104509. doi: 10.1016/j.jobe.2022.104509
McCormack T C,Rad F N. 1997. An earthquake loss estimation methodology for buildings based on ATC-13 and ATC-21[J]. Earthq Spectra,13(4):605–621. doi: 10.1193/1.1585971
Miyakoshi J,Hayashi Y,Tamura K,Fukuwa N. 1997. Damage ratio functions of buildings using damage data of the 1995 Hyogo-Ken Nanbu earthquake[C]//Proceedings of the 7th International Conference on Structural Safety and Reliability. Kyoto:International Association for Structural Safety and Reliability:349−354.
Pedregosa F,Varoquaux G,Gramfort A,Michel V,Thirion B,Grisel O,Blondel M,Prettenhofer P,Weiss R,Dubourg V,Vanderplas J,Passos A,Cournapeau D,Brucher M,Perrot M,Duchesnay E. 2011. Scikit-learn:Machine learning in Python[J]. J Mach Learn Res,12:2825–2830.
Robusto C C. 1957. The Cosine-Haversine formula[J]. Am Math Mon,64(1):38–40.
Roeslin S,Ma Q,Juárez-Garcia H,Gómez-Bernal A,Wicker J,Wotherspoon L. 2020. A machine learning damage prediction model for the 2017 Puebla-Morelos,Mexico,earthquake[J]. Earthq Spectra,36(S2):314–339.
Shahriari B,Swersky K,Wang Z Y,Adams R P,De Freitas N. 2016. Taking the human out of the loop:A review of Bayesian optimization[J]. Proc IEEE,104(1):148–175. doi: 10.1109/JPROC.2015.2494218
Singhal A,Kiremidjian A S. 1996. Method for probabilistic evaluation of seismic structural damage[J]. J Structural Eng,122(12):1459–1467. doi: 10.1061/(ASCE)0733-9445(1996)122:12(1459)
Snoek J,Larochelle H,Adams R P. 2012. Practical Bayesian optimization of machine learning algorithms[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe:Curran Associates Inc.:2951−2959.
Stojadinović Z,Kovačević M,Marinković D,Stojadinović B. 2022. Rapid earthquake loss assessment based on machine learning and representative sampling[J]. Earthq Spectra,38(1):152–177. doi: 10.1177/87552930211042393
Suryanita R,Maizir H,Yuniarto E,Zulfakar M,Jingga H. 2017. Damage level prediction of reinforced concrete building based on earthquake time history using artificial neural network[C]//The 6th International Conference of Euro Asia Civil Engineering Forum. Seoul:Euro Asia Civil Engineering Forum, 138 :02024.
Tesfamariam S,Liu Z. 2010. Earthquake induced damage classification for reinforced concrete buildings[J]. Struct Saf,32(2):154–164. doi: 10.1016/j.strusafe.2009.10.002
USGS. 2007. vS30 models and data[DB/OL]. [2022-08-01]. https://earthquake.usgs.gov/data/vs30/.
Wald D J,Allen T I. 2007. Topographic slope as a proxy for seismic site conditions and amplification[J]. Bull Seismol Soc Am,97(5):1379–1395. doi: 10.1785/0120060267
Whitman R V,Reed J W,Hong S T. 1973. Earthquake damage probability matrices[C]//Proceedings of the Fifth World Conference on Earthquake Engineering. Rome:Palazzo dei Congressi (EUR):2531−2540.
Yuan X Z,Chen G D,Jiao P,Li L J,Han J,Zhang H B. 2022. A neural network-based multivariate seismic classifier for simultaneous post-earthquake fragility estimation and damage classification[J]. Eng Struct,255:113918. doi: 10.1016/j.engstruct.2022.113918
Zhao J X,Liang X,Jiang F,Xing H,Zhu M,Hou R B,Zhang Y B,Lan X W,Rhoades D A,Irikura K,Fukushima Y,Somerville P G. 2016a. Ground-motion prediction equations for subduction interface earthquakes in Japan using site class and simple geometric attenuation functions[J]. Bull Seismol Soc Am,106(4):1518–1534. doi: 10.1785/0120150034
Zhao J X,Zhou S L,Zhou J,Zhao C,Zhang H,Zhang Y B,Gao P J,Lan X W,Rhoades D,Fukushima Y,Somerville P G,Irikura K. 2016b. Ground‐motion prediction equations for shallow crustal and upper-mantle earthquakes in Japan using site class and simple geometric attenuation functions[J]. Bull Seismol Soc Am,106(4):1552–1569. doi: 10.1785/0120150063