基于深度学习的地震事件分类:数据表示方法与模型架构的性能对比研究

Deep Learning-based seismic event classification: A comparative study of data representations and model architectures

  • 摘要: 深度学习方法在地震事件分类方面具有显著的优势,然而,数据表示和信噪比(SNR)对模型性能的影响尚不清楚。本文使用不同信噪比条件下的波形图和频谱图输入来评估四种CNN架构(ResNet18,VGG16,DenseNet121和Inception V3)的分类性能。结果显示,与波形图相比,使用频谱图输入通常会产生更好的分类性能(F1得分高达0.949 1±0.004 7),除非在高信噪比条件下,使用波形图训练后分类性能更优(F1分数高达0.974 9±0.014 5)。通过对模型结构进行梯度加权类激活映射(Grad CAM)可视化,其结果显示:不同的架构专注于不同的时间和频率特征,其中ResNet18在两种数据表示中呈现最一致的性能。这表明,数据表示策略应根据信号质量进行调整,频谱图更适合含噪数据,波形图更适合高质量信号。

     

    Abstract:
    Deep learning methods, particularly Convolutional Neural Networks (CNNs), have demonstrated significant advantages in seismic event classification so far. However, the effects of data representation and signal-to-noise ratio (SNR) on model performance remain unclear. This study evaluates four CNN architectures (ResNet18, VGG16, DenseNet121, and Inception V3) for the classification of natural mine earthquakes and blast events, using both waveform and spectrogram inputs under varying SNR conditions. Our research utilizes seismic data collected from a coal mine in Honggu, Gansu Province, China, where frequent mining-induced seismicity poses significant safety challenges. The microseismic monitoring system, equipped with 24 short-period vertical-component seismic detectors sampling at 500 Hz, provides a comprehensive dataset of both natural mining earthquakes and controlled blasting events. We extracted 155 4 blast waveforms and 990 natural mine earthquake waveforms for classification, with each sample containing 1 000 data points (2 seconds) starting from the earthquake origin time. We implemented three distinct data processing approaches: direct waveform input, waveform image conversion, and spectrogram generation. The direct waveform approach used preprocessed time series data as CNN input with appropriate padding to ensure consistent input size. The waveform image method involved plotting each preprocessed waveform as a 256×256 pixel image with black lines on a white background, removing all axes and ticks to eliminate extraneous information. For the spectrogram approach, we employed short-time Fourier transform (STFT) with a 256-sample Hanning window and 50% overlap, converting the resultant frequency domain data into grayscale images. Our experimental results reveal that data representation significantly impacts classification performance. For CNN-based architectural models, image-based inputs (both waveforms and spectrograms) substantially outperformed direct waveform inputs, with classification accuracies exceeding 90% compared with less than 70% for direct waveforms. The superior performance of image-based inputs can be attributed to the intrinsic design of CNNs, which were originally developed for computer vision tasks and are particularly adept at extracting spatial features from images through local receptive fields and weight sharing characteristics. Among image-based representations, spectrograms generally yielded better classification performance than waveform images for ResNet18, DenseNet121, and Inception V3. This superiority may stem from several factors: spectrograms simultaneously present frequency distribution and energy characteristics of signals, enabling better discrimination of key patterns associated with event types in a structured time-frequency space; they achieve dimensionality reduction and feature concentration, providing more compact representation of key features in two-dimensional space; and different types of seismic events often exhibit distinctive frequency patterns and energy distribution characteristics that are more prominently displayed in spectrograms. The impact of SNR on classification performance proved to be substantial. After dividing our dataset into high and low SNR groups, we observed that all models exhibited excellent performance on high SNR data, with waveform images surprisingly outperforming spectrograms—a reversal of the trend observed in mixed SNR conditions. Inception V3 achieved the best performance with waveform image inputs under high SNR conditions (F1 score: 0.974 9±0.014 5), suggesting that waveform images better preserve critical temporal features when signal quality is high. Conversely, for low SNR data, all models showed marked performance degradation, but spectrogram inputs maintained a performance advantage, with Inception V3 achieving the best results (F1 score: 0.844 5±0.028 2). This indicates that spectral domain representation offers greater robustness in noisy environments, possibly because spectral transformation can partially separate signals from noise, making feature extraction more reliable. To elucidate the decision-making mechanisms of different CNN models, we employed gradient-weighted class activation mapping (Grad CAM) visualization. This technique precisely locates the regions most influential in classification decisions by calculating gradients of target class scores with respect to feature maps. Our analysis revealed that high-performing models typically focused on seismic phase arrival regions and their temporal context. ResNet18 demonstrated large attention areas directly targeting seismic waveform regions, particularly P-wave arrival zones, where natural mine earthquakes and blast events exhibit significant waveform characteristic differences. This effective weight distribution contributed to ResNet18’s superior performance across both waveform and spectrogram representations. In contrast, VGG16 showed smaller focus areas concentrated in blank regions of the images, potentially explaining its relatively poorer classification performance.
    This study demonstrates that data representation choice significantly affects model performance in seismic event classification. For CNN-based architectures, image-based inputs substantially outperform direct waveform inputs. Under mixed SNR conditions, spectrograms generally outperform waveforms; however, this advantage reverses under high SNR conditions, where waveform representations better preserve critical temporal features. Model architecture plays a crucial role in classification robustness, with ResNet18 exhibiting excellent performance across both data representations, likely due to its residual connection structure facilitating effective feature extraction. Finally, model visualization through Grad CAM provides intuitive understanding of decision processes, revealing that superior models typically focus on seismic phase arrival regions and surrounding temporal context—a finding consistent with traditional seismological expertise and validating the models’ learning strategies.

     

/

返回文章
返回