Abstract:
Seismicity prediction remains a critical challenge in seismology, requiring a delicate balance between data-driven insights and domain-specific physical principles. Traditional statistical methods, such as the epidemic-type aftershock sequence (ETAS) model, have long served as fundamental tools for analyzing earthquake catalogs. However, these approaches struggle to fully utilize rapidly growing seismic data due to their reliance on simplified parametric assumptions and limited adaptability to complex spatiotemporal patterns. On the other hand, purely data-driven machine learning models, while capable of processing high dimensional datasets, often produce predictions lacking physical interpretability that can violate established seismological laws. To bridge this gap, this study proposes TransSeisNet, a hybrid framework that synergizes the computational power of deep learning with the empirical rigor of statistical seismology. By directly embedding domain knowledge into the model architecture and optimization process, TransSeisNet achieves both high predictive accuracy and adherence to physical constraints, providing a robust solution for earthquake forecasting.
Methodological framework
The TransSeisNet architecture is based on the Transformer neural network paradigm, renowned for its ability to model long-term dependencies in sequential data through self-attention mechanisms. The model processes earthquake catalogs (continuous records of seismic events containing temporal, spatial, and magnitude information) to predict future seismic activity. Key innovations include: ① Physical constraint layer. The physical constraint layer is integrated into the output layer to enforce compliance with empirical seismological laws. For instance, the magnitude distribution of predicted events is normalized to follow the power-law relationship of the Gutenberg-Richter (GR) Law, ensuring model outputs conform to observed frequency-magnitude scaling. Additionally, temporal clustering patterns, such as the rapid aftershock decay described by the Omori-Utsu Law, are explicitly encoded to prevent non-physical predictions. ② Knowledge-guided loss function. The training objective combines conventional negative log-likelihood terms with regularization terms derived from statistical seismology. For example, deviations from the GR Law’s b-value or violations of Omori-Utsu decay parameters are penalized during optimization. This dual-objective approach ensures simultaneous optimization of data fidelity and physical consistency.
Integration of domain knowledge
TransSeisNet systematically incorporates three pillars of statistical seismology: ① Gutenberg-Richter Law. It constrains the predicted event magnitude-frequency distribution to follow a power-law scaling, preventing unrealistic overprediction of large-magnitude events. ② ETAS model characteristics. The self-attention mechanism implicitly captures the ETAS model’s core premise — earthquakes can trigger subsequent events — by modeling temporal and spatial triggering probabilities. ③ Omori-Utsu decay. It regularizes the temporal decay of aftershock productivity to align with empirically observed trends, ensuring predicted aftershock sequences decay at rates consistent with historical observations.
Experimental results
TransSeisNet was rigorously evaluated on two real-world earthquake catalogs and one synthetic catalog: ① Southern California catalog, covering the San Jacinto fault zone (1981−2023), and containing over 12 000 events with magnitudes M≥2.0; ② Japan Meteorological Agency catalog, covering the Tohoku and Kanto regions (1990−2023), and containing over 20 000 events, with emphasis on subduction zone seismicity; ③ Synthetic catalog, generated using ETAS parameters to validate the model’s ability to recover known triggering dynamics. Performance Highlights: ① Superior accuracy. TransSeisNet consistently outperformed the benchmark ETAS model in predicting the timing, location, and magnitude of seismic events across all catalogs. For instance, in the Southern California catalog, the model demonstrated a 30% improvement in likelihood-based evaluation metrics compared to ETAS. ② Enhanced stability. By incorporating physical constraints, TransSeisNet exhibited reduced sensitivity to data noise and outliers, producing stable predictions even during periods of high seismic activity (e.g., aftershock sequences following major earthquakes). ③ Generalization capability. The model maintained robust performance across diverse tectonic settings, including strike-slip fault systems (San Jacinto) and subduction zones (Japan), highlighting its adaptability to varying seismogenic regimes.
Model optimization and analysis
Architectural depth: Comparative studies of model depth revealed that a 6-layer Transformer configuration achieved optimal performance, balancing computational efficiency and predictive power. Shallower architectures (e.g., four layers) exhibited underfitting on complex sequences, while deeper models (more than eight layers) showed diminishing returns. Activation function selection: Experiments with ReLU, ELU, and Swish activations indicated minimal performance differences, though ELU marginally improved training stability due to its smooth gradient properties. Comparative analysis with machine learning baselines: TransSeisNet outperformed alternative machine learning architectures including LSTM and GRU networks, which struggled to simultaneously capture long-term dependencies and enforce physical constraints.
Conclusion
TransSeisNet represents a significant advancement in seismicity prediction by unifying data-driven machine learning with empirical seismological principles. Its dual knowledge-data-driven framework addresses limitations of both traditional statistical methods (e.g., rigid parametric assumptions) and purely machine learning approaches (e.g., lack of interpretability). The model’s success underscores the value of integrating domain knowledge into neural network design, particularly in geophysical applications where physical plausibility is paramount. Future work will focus on extending the framework to incorporate real-time geodetic data (e.g., GNSS measurements) and multi-physics simulations, further enhancing its utility for operational earthquake forecasting and hazard assessment.