Loading...
Thumbnail Image
Publication

Accelerating Fokker-Planck Simulations by Substituting the Moment Closure with a GPU-Native Deep Neural Network

Roohi, Ehsan
Citations
Abstract
Particle-based Fokker-Planck (FP) models represent a high-fidelity method for simulating rarefied gas dynamics, but they suffer from a severe computational bottleneck: the “closure problem.” This step requires the expensive, cell-wise calculation of high-order moments and the solution of a 9 × 9 linear system at every simulation time step. This paper introduces a new computational methodology designed to eliminate this bottleneck by substituting the physics-based solver with a Deep Neural Network (DNN) surrogate deployed via a novel, high-performance strategy. Our workflow makes a critical distinction between a complex offline training phase (where a 16-256-256-256-256-9 DNN is trained) and a lightweight online inference phase. Crucially, for online deployment, we avoid all framework overhead and I/O bottlenecks by extracting the raw parameters (weights and biases) and executing the model’s forward pass as a simple, batched matrix-multiplication function written natively in CuPy, ensuring all operations remain on the GPU. We validate this approach through a rigorous, multi-stage test campaign. First, for 1D Couette flow, a model trained on a Knudsen number sweep (Kn ≈ 0.0015−0.3) demonstrates outstanding accuracy in both interpolations (Kn = 0.05 and Kn = 0.09) and significant extrapolation (Kn = 0.7). To test fundamental generalization, we deployed this 1D-trained model to the 2D cavity geometry. This test yielded excellent agreement for velocity and density structures but produced minor, localized errors in the temperature field, confirming that representative multi-dimensional data is required for full thermal accuracy. Consequently, a robust 2D cavity model, trained on a lid velocity sweep (50 m/s to 600 m/s), proves capable of extreme extrapolation, accurately predicting the complex, high-energy physics of a hyper-velocity 800 m/s case. The primary finding of this work is a fundamental shift in the computational paradigm for this method. Performance benchmarks show a 1.63x–1.73x speedup, but critically, a strong-scaling analysis proves this acceleration reaches the theoretical maximum predicted by Amdahl’s Law. This result provides a definitive insight: the GPU-native surrogate is, for all practical purposes, “infinitely fast” (zero-cost) relative to the remaining tasks, and the true computational bottleneck has been decisively shifted from the physics solver to the particle moment-gathering process itself.
Type
Article
Date
2025-12-02
Publisher
Degree
Advisors
License
License
Research Projects
Organizational Units
Journal Issue
Embargo Lift Date
DOI
Publisher Version
Embedded videos
Related Item(s)