Emulation of CPU-demanding reactive transport models: a comparison of Gaussian processes, polynomial chaos expansion, and deep neural networks

Research outputpeer-review


Simulating the fate and transport behavior of radionuclides and other reactive solutes in the vadose zone and aquifers requires reactive transport models (RTMs). These RTMs can be rather computationally demanding and any task that necessitates many RTM runs may benefit from the construction of an emulator or “surrogate” model. Here we present a detailed benchmarking of 3 methods for the non-intrusive emulation of moderately low-dimensional (that is, 8 to 13-dimensional) CPU-intensive reactive transport models: Gaussian processes (GP), polynomial chaos expansion (PCE) and deep neural networks (DNNs). State-of-the-art open-source libraries are used for each emulation method while the CPU-time incurred by one forward run of the considered two RTMs varies from 1h to between 1h30 and 5 days. Using distributed computing, these large computational demands limit the offline creation of training examples to at most 500 samples. Furthermore, we consider four emulation-based tasks: (1) direct or plain emulation, (2) global sensitivity analysis (GSA), (3) uncertainty propagation (UP), and (4) probabilistic calibration or inversion. Overall, our selected DNN is found to outperform GP and PCE for plain emulation, GSA, and UP. This even though the used training sets are only of size 75 to 500. Most surprisingly, despite its superior emulation capabilities the chosen DNN is the worst performing method for the considered synthetic inverse problem which involves 1224 measurement data with low noise. This is at least partially caused by the (very) small but complex deterministic noise that plagues the DNN-based predictions. This complicated bias can indeed drive the emulated solutions far away from the true solution when the available measurement data are of high quality. Among the considered 3 methods only GP allows for finding emulated posterior solutions that simultaneously (1) fit the synthetic high-quality measurement data to the correct noise levels and (2) most closely approximate the true model parameter values.
Original languageEnglish
Pages (from-to)1193-1215
Number of pages23
JournalComputational Geosciences
Issue number5
StatePublished - 1 Oct 2019

Cite this