基于随机森林方法的地震损失预测

梁梓豪, 苗鹏宇, WangJianming, 王自法

梁梓豪,苗鹏宇,Wang J M,王自法. 2024. 基于随机森林方法的地震损失预测. 地震学报,46(4):649−662. DOI: 10.11939/jass.20220182
引用本文: 梁梓豪,苗鹏宇,Wang J M,王自法. 2024. 基于随机森林方法的地震损失预测. 地震学报,46(4):649−662. DOI: 10.11939/jass.20220182
Liang Z H,Miao P Y,Wang J M,Wang Z F. 2024. Earthquake loss prediction based on random forest algorithm. Acta Seismologica Sinica46(4):649−662. DOI: 10.11939/jass.20220182
Citation: Liang Z H,Miao P Y,Wang J M,Wang Z F. 2024. Earthquake loss prediction based on random forest algorithm. Acta Seismologica Sinica46(4):649−662. DOI: 10.11939/jass.20220182

基于随机森林方法的地震损失预测

基金项目: 国家自然科学基金(51978634)和中国地震局工程力学研究所基本科研业务费专项(2021B09)共同资助
详细信息
    作者简介:

    梁梓豪,在读硕士研究生,主要从事地震损失预测相关研究,e-mail:1005233581@qq.com

    通讯作者:

    王自法,博士,研究员,主要从事巨灾风险相关研究,e-mail:zifa@iem.ac.cn

  • 中图分类号: P315.9

Earthquake loss prediction based on random forest algorithm

  • 摘要:

    针对现有的基于实际震害评估的大多研究仅限于某特定区域和某种结构类型,且所采用的数据样本量也十分有限,本文基于随机森林模型,采用2011年3月11日东日本MW9.0地震的37万8 037条建筑物实际震害数据,利用美国应用技术协会发布的地震震害等级划分标准(ATC-13)预测了建筑物地震破坏所引起的损失,对建筑物损失的影响因素进行了特征重要性分析。结果显示:通过合成少数类过采样技术(SMOTE)解决数据不均衡和贝叶斯优化超参数之后,得到了基于随机森林的预测模型测试集的准确率为68.8%,轻微破坏、中等破坏、严重破坏、倒塌等四种破坏等级的召回率分别为65.0%,53.6%,74.8%,81.8%;考虑生命安全性能将模型转换为二分类之后,模型准确率进一步提高至87.5%,极大地改善了现有研究应用于建筑损失预测中数据样本量受限、数据不均衡等导致的最严重破坏等级精度低等问题。对随机森林模型特征重要性的研究表明:震中距、峰值加速度和vS30是最影响模型输出的特征。

    Abstract:

    Rapid assessment of building damage and its severity after an earthquake is crucial for emergency response and recovery. Accurate earthquake damage assessment is crucial for pre-earthquake disaster prevention and mitigation, post-earthquake disaster relief, and rapid reconstruction. Most existing studies based on actual earthquake damage assessment are limited to a specific region and a particular structure type, and the number of data samples used is also limited, resulting in subpar generalization performance for the model. Many factors affect the loss of buildings due to earthquakes. Traditional methods cannot fully consider the complex mapping relationship between the influencing factors. Therefore, finding a method to quickly and accurately assess building damage is essential. Machine learning provides a data-driven artificial intelligence method that can handle complex nonlinear relationships between input and output parameters by learning the underlying laws of big data. This paper proposes an earthquake damage prediction model based on combination of Bayesian optimization algorithm, synthetic minority over-sampling technique (SMOTE), and random forest algorithm. The core of the Bayesian optimization algorithm takes prior knowledge into account. It can continuously update and iterate until the optimal parameter combination is fitted, solving the problem of slow efficiency of traditional parameter adjustment. The core of the SMOTE method is to generate data samples of a few categories, solving the problem of uneven distribution of data samples. Based on the random forest model, this paper uses 378 037 actual building damage data from the March 11, 2011, MW9.0 Tohoku-Oki, Japan earthquake, comprehensively considers multidimensional building information such as ground shaking information, site information, and structural characteristics, and uses the earthquake damage classification issued by the American Applied Technical Council (ATC-13). This model can predict the damage caused by earthquake damage to buildings and analyze the feature importance of factors affecting building damage. The results show that after using SMOTE method to solve data imbalance and the Bayesian approach to optimize hyper-parameters, the accuracy on the test set of the random forest-based prediction model is 68.8%, and the recall rates for minor damage, moderate damage, severe damage and collapse are 65.0%, 53.6%, 74.8%, and 81.8%, respectively; the accuracy of the model is further increased to 87.5% by considering the life safety performance to convert the model to dichotomous classification, which significantly improves the existing research problems in building loss prediction, such as limited data, lack of regional generalization, lack of diversity in building attributes, imprecise classification of damage levels and low accuracy of the most severe damage state. The study of the importance of random forest features showed that the epicenter distance, PGA and vS30 have the most significant influences on the model output.The earthquake damage assessment model established by this study can achieve rapid and relatively accurate prediction of building damage caused by earthquakes, which is beneficial for pre-earthquake planning and timely rescue after the earthquake.

  • 图  1   2011年东日本MW9.0大地震的建筑物损失分布

    颜色深浅代表地区发生建筑物破坏的数量大小,颜色越深,建筑物破坏数量越多

    Figure  1.   Distribution of building damage resulted from 2011 MW9.0 Tohoku-Oki earthquake

    The shade of the color represents the amount of building damage occurred in the area。The darker the color,the more the building damage

    图  2   初始模型的混淆矩阵

    Figure  2.   Confusion matrix of the initial model

    图  3   随机森林四个超参数的学习曲线图

    (a) 决策树个数;(b) 决策树最大深度;(c) 叶子节点最少样本数;(d) 节点划分最小样本数

    Figure  3.   Learning curves of four hyper-parameters for random forest model

    (a) Number of estimators;(b) Maximum depth of estimators;(c) Minimum number of samples required to be at a leaf node;(d) Minimum number of samples required to split an internal node

    图  4   模型测试集的混淆矩阵

    Figure  4.   Confusion matrix of model test sets

    图  5   特征重要性和置换重要性排序方法比较

    Figure  5.   Comparison of feature importance and permutation importance ranking methods

    图  6   二分类混淆矩阵

    Figure  6.   Confusion matrix of binary classification

    表  1   依据ATC-13划分的四种破坏等级数据统计

    Table  1   Statistics on four types of damage levels data according to ATC-13

    建筑物破坏等级 损失率 记录数量
    轻微破坏(0类) 5%<Dr≤10% 136 334
    中等破坏(1类) 10%<Dr≤30% 178 594
    严重破坏(2类) 30%<Dr≤60% 33 079
    倒塌(3类) 60%<Dr≤100% 30 029
    总数量 378 037
    下载: 导出CSV

    表  2   建筑物数据集的机器学习模型输入特征

    Table  2   Input features of machine learning model for building datasets

    类别 影响因素 影响因素的特征描述 计算方法或数据来源
    地震
    信息
    PGA 地震动峰值加速度 Zhao等(2 016ab)公式
    震中距 地震震中至建筑物地面距离 Robusto (1 957)计算
    两点经纬度距离公式
    建筑物
    信息
    层数 建筑物层数 数据库
    地区编号 47个都道府县
    建筑物建造年代 ① 1 867—1910;② 1 911—1 924;③ 1 925—1 987;④ 1 988—2 011
    外墙材料类型 ① 混凝土;② 蒸压轻质混凝土;③ 砌块;④ 砂浆;⑤ 抹灰;
    ⑥ 镶石砖;⑦ 金属板;⑧ 玻璃板;⑨ 石板;⑩ 金属陶瓷;
    ⑪ 土藏造;⑫ 木制壁板;⑬ 木板
    建筑物结构类型 ① 木结构;② 混合结构(木和砂浆);③ 土藏造;④ 砌块;
    ⑤ 砌体结构;⑥ 钢结构;⑦ 混凝土结构;⑧ 其它
    柱子材料类型 ① 混凝土柱;② 防火涂层钢结构;③ 钢结构;④ 木框架;
    ⑤ 双料组合;⑥ 其它
    屋顶材料类型 ① 混凝土;② 金属板;③ 石板;④ 瓷砖瓦;⑤ 合成树脂;
    ⑥ 木板;⑦ 茅草
    建筑物使用用途 ① 住宅;② 其它
    场地信息 vS30 地表以下30 m土层的加权平均剪切波速 USGS (2 007
    下载: 导出CSV

    表  3   混淆矩阵

    Table  3   Confusion matrix

    混淆矩阵预测值
    正类负类
    真实值正类真阳性假阴性
    负类假阳性真阴性
    下载: 导出CSV

    表  4   随机森林模型超参数优化方法对比

    Table  4   Comparison of hyper-parameter optimization methods for random forest models

    超参数优化方法 决策树棵数 决策树最大深度 叶子节点最少样本数 节点划分最小样本数 模型准确率
    贝叶斯优化 654 48 1 2 68.8%
    学习曲线方法 550 20 2 8 65.9%
    下载: 导出CSV

    表  5   基于实地调查数据的震害预测研究结果比较

    Table  5   Comparison of seismic damage prediction research results based on field survey data

    来源 模型类型 分类
    类型
    研究数据 最严重破坏
    状态准确率
    模型
    准确率
    Harirchian等(2 021b 极端随机树 三分类 2 016年厄瓜多尔地震,172座受损钢筋混凝土建筑物 55.0% 70.2%
    三分类 2 010年海地地震,145座受损钢筋混凝土建筑物 41.2% 58.5%
    四分类 2 017年韩国浦项市地震,74座受损建筑物 50.0% 60.0%
    Harirchian等(2 020c 支持向量机 三分类 2 016年厄瓜多尔地震,171座受损钢筋混凝土建筑物 54.0% 60.0%
    三分类 2 010年海地地震,142座受损钢筋混凝土建筑物 54.0% 68.0%
    四分类 2 015年尼泊尔地震,138座受损钢筋混凝土建筑物 33.0% 67.0%
    四分类 2 017年韩国浦项市地震,67座受损建筑物 2 0.0% 48.0%
    Harirchian等(2 020a 多层感知器 五分类 1 999年土耳其迪兹杰地震,484座受损建筑物 71.4% 52.0%
    Harirchian等(2 020b 支持向量机 五分类 1 999年土耳其迪兹杰地震,484座受损建筑物 54.5% 52.0%
    Mangalathu和Burton (2 019 长短期记忆网络 三分类 2 014年美国纳帕南部地震,3423座受损建筑物 63.0% 86.0%
    Mangalathu等(2 020 随机森林 三分类 2 014年美国纳帕南部地震,2276座受损建筑物 13.0% 66.0%
    Roeslin等( 2020 随机森林 二分类 2 017年墨西哥普埃布拉地震,237座受损建筑物 78.0% 67.0%
    Stojadinović等(2 022 随机森林 五分类 2 010年塞尔维亚克拉列沃地震,1 979座受损建筑物 30.0% 85.0%
    Ghimire等(2 022 随机森林 三分类 2 015年尼泊尔地震,76.2万座受损建筑物 70.0% 64.0%
    本文 随机森林 四分类 2 011年东日本大地震,37.8万座受损建筑物 81.8% 68.8%
    二分类 88.0% 87.5%
    下载: 导出CSV
  • 鲍跃全,李惠. 2019. 人工智能时代的土木工程[J]. 土木工程学报,52(5):1–11.

    Bao Y Q,Li H. 2019. Artificial intelligence for civil engineering[J]. China Civil Engineering Journal,52(5):1–11 (in Chinese).

    孙柏涛,胡少卿. 2005. 基于已有震害矩阵模拟的群体震害预测方法研究[J]. 地震工程与工程振动,25(6):102–108.

    Sun B T,Hu S Q. 2005. A method for earthquake damage prediction of building group based on existing earthquake damage matrix[J]. Earthquake Engineering and Engineering Vibration,25(6):102–108 (in Chinese).

    王健峰. 2012. 基于改进网格搜索法SVM参数优化的说话人识别研究[D]. 哈尔滨:哈尔滨工程大学:25−26.

    Wang J F. 2012. Study on Speaker Recognition Based on Improved Grid Search Parameters Optimization Algorithm of SVM[D]. Harbin:Harbin Engineering University:25−26 (in Chinese).

    王自法,Park S,Lee S,崔凯. 2014. 提高地震灾害损失估计精度的几点研究[J]. 地震工程与工程振动,34(4):110–114.

    Wang Z F,Park S,Lee S,Cui K. 2014. Quantification improvement of earthquake loss estimation[J]. Earthquake Engineering and Engineering Dynamics,34(4):110–114 (in Chinese).

    隗永刚,蒋长胜. 2021. 人工智能技术在地震减灾应用中的研究进展[J]. 地球物理学进展,36(2):516–524. doi: 10.6038/pg2021EE0164

    Wei Y G,Jiang C S. 2021. Research progress of artificial intelligence technology in the application of earthquake disaster reduction[J]. Progress in Geophysics,36(2):516–524 (in Chinese).

    杨旭,李永华,盖增喜. 2021. 机器学习在地震学中的应用进展[J]. 地球与行星物理论评,52(1):76–88.

    Yang X,Li Y H,Gai Z X. 2021. Machine learning and its application in seismology[J]. Reviews of Geophysics and Planetary Physics,52(1):76–88 (in Chinese).

    杨毅,卢诚波,徐根海. 2017. 面向不平衡数据集的一种精化Borderline-SMOTE方法[J]. 复旦学报(自然科学版),56(5):537–544.

    Yang Y,Lu C B,Xu G H. 2017. A refined borderline-SMOTE method for imbalanced data set[J]. Journal of Fudan University (Natural Science),56(5):537–544 (in Chinese).

    于红梅,许建东,张素灵,潘波. 2006. 基于集集地震的建筑物易损性统计分析[J]. 防灾科技学院学报,8(4):17–20. doi: 10.3969/j.issn.1673-8047.2006.04.004

    Yu H M,Xu J D,Zhang S L,Pan B. 2006. The statistical analysis of building vulnerability research on Jiji earthquake[J]. Journal of Institute of Disaster Prevention,8(4):17–20 (in Chinese).

    张风华,谢礼立,范立础. 2004. 城市建构筑物地震损失预测研究[J]. 地震工程与工程振动,24(3):12–20. doi: 10.3969/j.issn.1000-1301.2004.03.002

    Zhang F H,Xie L L,Fan L C. 2004. A study on disaster loss prediction caused by damaged structures under earthquake[J]. Earthquake Engineering and Engineering Vibration,24(3):12–20 (in Chinese).

    张桂欣,孙柏涛. 2018. 基于模糊层次分析的建筑物单体震害预测方法研究[J]. 工程力学,35(12):185–193.

    Zhang G X,Sun B T. 2018. Seismic damage prediction for a single building based on a fuzzy analytical hierarchy approach[J]. Engineering Mechanics,35(12):185–193 (in Chinese).

    张浩. 2018. 自动化特征工程与参数调整算法研究[D]. 成都:电子科技大学:26.

    Zhang H. 2018. Research of Automatic Feature Engineering and Parameter Adjustment Algorithm[D]. Chengdu:University of Electronic Science and Technology of China:26 (in Chinese).

    张天翼,丁立新. 2021. 一种基于SMOTE的不平衡数据集重采样方法[J]. 计算机应用与软件,38(9):273–279.

    Zhang T Y,Ding L X. 2021. A new resampling method based on SMOTE for imbalanced data set[J]. Computer Applications and Software,38(9):273–279 (in Chinese).

    赵登科,王自法,刘渊,仝文博. 2021. 基于新西兰实际震害资料的地震损失不确定性分析[J]. 地震工程与工程振动,41(2):84–95.

    Zhao D K,Wang Z F,Liu Y,Tong W B. 2021. Earthquake loss uncertainty based on detailed loss data in New Zealand[J]. Earthquake Engineering and Engineering Dynamics,41(2):84–95 (in Chinese).

    Applied Technology Council. 1985. Earthquake Damage Evaluation Data for California[M]. Redwood City:Applied Technology Council:167−219.

    Bergstra J,Bengio Y. 2012. Random search for hyper-parameter optimization[J]. J Machine Learn Res,13:281–305.

    Breiman L. 2001. Random forests[J]. Mach Learn,45(1):5–32. doi: 10.1023/A:1010933404324

    Calvi G M,Pinho R,Magenes G,Bommer J J,Restrepo-Vélez L F,Crowley H. 2006. Development of seismic vulnerability assessment methodologies over the past 30 years[J]. ISET J Earthq Technol,43(3):75–104.

    Chawla N V,Bowyer K W,Hall L O,Kegelmeyer W P. 2002. SMOTE:Synthetic minority over-sampling technique[J]. J Artif Intell Res,16:321–357. doi: 10.1613/jair.953

    Ghimire S,Guéguen P,Giffard-Roisin S,Schorlemmer D. 2022. Testing machine learning models for seismic damage prediction at a regional scale using building-damage dataset compiled after the 2015 Gorkha Nepal earthquake[J]. Earthq Spectra,38(4):2970–2993. doi: 10.1177/87552930221106495

    Harirchian E,Lahmer T,Rasulzade S. 2020a. Earthquake hazard safety assessment of existing buildings using optimized multi-layer perceptron neural network[J]. Energies,13(8):2060. doi: 10.3390/en13082060

    Harirchian E,Lahmer T,Kumari V,Jadhav K. 2020b. Application of support vector machine modeling for the rapid seismic hazard safety evaluation of existing buildings[J]. Energies,13(13):3340. doi: 10.3390/en13133340

    Harirchian E,Kumari V,Jadhav K,Raj Das R,Rasulzade S,Lahmer T. 2020c. A machine learning framework for assessing seismic hazard safety of reinforced concrete buildings[J]. Appl Sci,10(20):7153. doi: 10.3390/app10207153

    Harirchian E,Hosseini S E A,Jadhav K,Kumari V,Rasulzade S,Işık E,Wasif M,Lahmer T. 2021a. A review on application of soft computing techniques for the rapid visual safety evaluation and damage classification of existing buildings[J]. J Build Eng,43:102536. doi: 10.1016/j.jobe.2021.102536

    Harirchian E,Kumari V,Jadhav K,Rasulzade S,Lahmer T,Raj Das R. 2021b. A synthesized study based on machine learning approaches for rapid classifying earthquake damage grades to RC buildings[J]. Appl Sci,11(16):7540. doi: 10.3390/app11167540

    Hwang S H,Mangalathu S,Shin J,Jeon J S. 2021. Machine learning-based approaches for seismic demand and collapse of ductile reinforced concrete building frames[J]. J Build Eng,34:101905. doi: 10.1016/j.jobe.2020.101905

    Lerman P M. 1980. Fitting segmented regression models by grid search[J]. J R Stat Soc Series C Appl Stat,29(1):77–84.

    Mangalathu S,Burton H V. 2019. Deep learning-based classification of earthquake-impacted buildings using textual damage descriptions[J]. Int J Disast Risk Reduct,36:101111. doi: 10.1016/j.ijdrr.2019.101111

    Mangalathu S,Sun H,Nweke C C,Yi Z X,Burton H V. 2020. Classifying earthquake damage to buildings using machine learning[J]. Earthq Spectra,36(1):183–208. doi: 10.1177/8755293019878137

    Mansourdehghan S,Dolatshahi K M,Asjodi A H. 2022. Data-driven damage assessment of reinforced concrete shear walls using visual features of damage[J]. J Build Eng,53:104509. doi: 10.1016/j.jobe.2022.104509

    McCormack T C,Rad F N. 1997. An earthquake loss estimation methodology for buildings based on ATC-13 and ATC-21[J]. Earthq Spectra,13(4):605–621. doi: 10.1193/1.1585971

    Miyakoshi J,Hayashi Y,Tamura K,Fukuwa N. 1997. Damage ratio functions of buildings using damage data of the 1995 Hyogo-Ken Nanbu earthquake[C]//Proceedings of the 7th International Conference on Structural Safety and Reliability. Kyoto:International Association for Structural Safety and Reliability:349−354.

    Pedregosa F,Varoquaux G,Gramfort A,Michel V,Thirion B,Grisel O,Blondel M,Prettenhofer P,Weiss R,Dubourg V,Vanderplas J,Passos A,Cournapeau D,Brucher M,Perrot M,Duchesnay E. 2011. Scikit-learn:Machine learning in Python[J]. J Mach Learn Res,12:2825–2830.

    Robusto C C. 1957. The Cosine-Haversine formula[J]. Am Math Mon,64(1):38–40.

    Roeslin S,Ma Q,Juárez-Garcia H,Gómez-Bernal A,Wicker J,Wotherspoon L. 2020. A machine learning damage prediction model for the 2017 Puebla-Morelos,Mexico,earthquake[J]. Earthq Spectra,36(S2):314–339.

    Shahriari B,Swersky K,Wang Z Y,Adams R P,De Freitas N. 2016. Taking the human out of the loop:A review of Bayesian optimization[J]. Proc IEEE,104(1):148–175. doi: 10.1109/JPROC.2015.2494218

    Singhal A,Kiremidjian A S. 1996. Method for probabilistic evaluation of seismic structural damage[J]. J Structural Eng,122(12):1459–1467. doi: 10.1061/(ASCE)0733-9445(1996)122:12(1459)

    Snoek J,Larochelle H,Adams R P. 2012. Practical Bayesian optimization of machine learning algorithms[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe:Curran Associates Inc.:2951−2959.

    Stojadinović Z,Kovačević M,Marinković D,Stojadinović B. 2022. Rapid earthquake loss assessment based on machine learning and representative sampling[J]. Earthq Spectra,38(1):152–177. doi: 10.1177/87552930211042393

    Suryanita R,Maizir H,Yuniarto E,Zulfakar M,Jingga H. 2017. Damage level prediction of reinforced concrete building based on earthquake time history using artificial neural network[C]//The 6th International Conference of Euro Asia Civil Engineering Forum. Seoul:Euro Asia Civil Engineering Forum, 138 :02024.

    Tesfamariam S,Liu Z. 2010. Earthquake induced damage classification for reinforced concrete buildings[J]. Struct Saf,32(2):154–164. doi: 10.1016/j.strusafe.2009.10.002

    USGS. 2007. vS30 models and data[DB/OL]. [2022-08-01]. https://earthquake.usgs.gov/data/vs30/.

    Wald D J,Allen T I. 2007. Topographic slope as a proxy for seismic site conditions and amplification[J]. Bull Seismol Soc Am,97(5):1379–1395. doi: 10.1785/0120060267

    Whitman R V,Reed J W,Hong S T. 1973. Earthquake damage probability matrices[C]//Proceedings of the Fifth World Conference on Earthquake Engineering. Rome:Palazzo dei Congressi (EUR):2531−2540.

    Yuan X Z,Chen G D,Jiao P,Li L J,Han J,Zhang H B. 2022. A neural network-based multivariate seismic classifier for simultaneous post-earthquake fragility estimation and damage classification[J]. Eng Struct,255:113918. doi: 10.1016/j.engstruct.2022.113918

    Zhao J X,Liang X,Jiang F,Xing H,Zhu M,Hou R B,Zhang Y B,Lan X W,Rhoades D A,Irikura K,Fukushima Y,Somerville P G. 2016a. Ground-motion prediction equations for subduction interface earthquakes in Japan using site class and simple geometric attenuation functions[J]. Bull Seismol Soc Am,106(4):1518–1534. doi: 10.1785/0120150034

    Zhao J X,Zhou S L,Zhou J,Zhao C,Zhang H,Zhang Y B,Gao P J,Lan X W,Rhoades D,Fukushima Y,Somerville P G,Irikura K. 2016b. Ground‐motion prediction equations for shallow crustal and upper-mantle earthquakes in Japan using site class and simple geometric attenuation functions[J]. Bull Seismol Soc Am,106(4):1552–1569. doi: 10.1785/0120150063

图(6)  /  表(5)
计量
  • 文章访问数:  248
  • HTML全文浏览量:  74
  • PDF下载量:  61
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-09-28
  • 修回日期:  2022-11-29
  • 网络出版日期:  2023-09-27
  • 刊出日期:  2024-07-14

目录

    /

    返回文章
    返回