2024

Z. Bastiani, R.M. Kirby, J. Hochhalter, S. Zhe.
**“Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients,”** Subtitled **“arXiv:2406.06751,”** 2024.

This paper proposes a novel deep symbolic regression approach to enhance the robustness and interpretability of data-driven mathematical expression discovery. Despite the success of the state-of-the-art method, DSR, it is built on recurrent neural networks, purely guided by data fitness, and potentially meet tail barriers, which can zero out the policy gradient and cause inefficient model updates. To overcome these limitations, we use transformers in conjunction with breadth-first-search to improve the learning performance. We use Bayesian information criterion (BIC) as the reward function to explicitly account for the expression complexity and optimize the trade-off between interpretability and data fitness. We propose a modified risk-seeking policy that not only ensures the unbiasness of the gradient, but also removes the tail barriers, thus ensuring effective updates from top performers. Through a series of benchmarks and systematic experiments, we demonstrate the advantages of our approach.

M. Cooley, S. Zhe, R.M. Kirby, V. Shankar.
**“Polynomial-Augmented Neural Networks (PANNs) with Weak Orthogonality Constraints for Enhanced Function and PDE Approximation,”** Subtitled **“arXiv preprint arXiv:2406.02336,”** 2024.

We present polynomial-augmented neural networks (PANNs), a novel machine learning architecture that combines deep neural networks (DNNs) with a polynomial approximant. PANNs combine the strengths of DNNs (flexibility and efficiency in higher-dimensional approximation) with those of polynomial approximation (rapid convergence rates for smooth functions). To aid in both stable training and enhanced accuracy over a variety of problems, we present (1) a family of orthogonality constraints that impose mutual orthogonality between the polynomial and the DNN within a PANN; (2) a simple basis pruning approach to combat the curse of dimensionality introduced by the polynomial component; and (3) an adaptation of a polynomial preconditioning strategy to both DNNs and polynomials. We test the resulting architecture for its polynomial reproduction properties, ability to approximate both smooth functions and functions of limited smoothness, and as a method for the solution of partial differential equations (PDEs). Through these experiments, we demonstrate that PANNs offer superior approximation properties to DNNs for both regression and the numerical solution of PDEs, while also offering enhanced accuracy over both polynomial and DNN-based regression (each) when regressing functions with limited smoothness.

M. Penwarden, H. Owhadi, R.M. Kirby.
**“Kolmogorov n-Widths for Multitask Physics-Informed Machine Learning (PIML) Methods: Towards Robust Metrics,”** Subtitled **“arXiv preprint arXiv:2402.11126,”** 2024.

Physics-informed machine learning (PIML) as a means of solving partial differential equations (PDE) has garnered much attention in the Computational Science and Engineering (CS&E) world. This topic encompasses a broad array of methods and models aimed at solving a single or a collection of PDE problems, called multitask learning. PIML is characterized by the incorporation of physical laws into the training process of machine learning models in lieu of large data when solving PDE problems. Despite the overall success of this collection of methods, it remains incredibly difficult to analyze, benchmark, and generally compare one approach to another. Using Kolmogorov n-widths as a measure of effectiveness of approximating functions, we judiciously apply this metric in the comparison of various multitask PIML architectures. We compute lower accuracy bounds and analyze the model's learned basis functions on various PDE problems. This is the first objective metric for comparing multitask PIML architectures and helps remove uncertainty in model validation from selective sampling and overfitting. We also identify avenues of improvement for model architectures, such as the choice of activation function, which can drastically affect model generalization to "worst-case" scenarios, which is not observed when reporting task-specific errors. We also incorporate this metric into the optimization process through regularization, which improves the models' generalizability over the multitask PDE problem.

2023

B. Charoenwong, R.M. Kirby, J. Reiter.
**“Computer Science Abstractions To Help Reason About Decentralized Stablecoin Design,”** In *IEEE Access*, IEEE, 2023.

Computer science as a discipline is known for its penchant for using abstractions as a tool for reasoning. It is no surprise that computer science might have something valuable to lend to the world of decentralized stablecoin design, as it is in fact a “computing" problem. In this paper, we examine the possibility of a decentralized and capital-efficient stablecoin using smart contracts that algorithmically trade to maintain stability and study the potential new functionality that smart contracts enable. By exploiting traditional abstractions from computer science, we show that a capital-efficient algorithmic stablecoin cannot be provably stable. Additionally, we provide a formal exposition of the workings of Central Bank Digital Currencies, connecting this to the space of possible stablecoin designs. We then discuss several outstanding conjectures from both academics and practitioners and finally highlight the regulatory similarities between money-market funds and working stablecoins. Our work builds upon the current and growing interplay between the realms of engineering and financial services, and it also demonstrates how ways of thinking as a computer scientist can aid practitioners. We believe this research is vital for understanding and developing the future of financial technology.

H. Dai, M. Penwarden, R.M. Kirby, S. Joshi.
**“Neural Operator Learning for Ultrasound Tomography Inversion,”** Subtitled **“arXiv:2304.03297v1,”** 2023.

Neural operator learning as a means of mapping between complex function spaces has garnered significant attention in the field of computational science and engineering (CS&E). In this paper, we apply Neural operator learning to the time-of-flight ultrasound computed tomography (USCT) problem. We learn the mapping between time-of-flight (TOF) data and the heterogeneous sound speed field using a full-wave solver to generate the training data. This novel application of operator learning circumnavigates the need to solve the computationally intensive iterative inverse problem. The operator learns the non-linear mapping offline and predicts the heterogeneous sound field with a single forward pass through the model. This is the first time operator learning has been used for ultrasound tomography and is the first step in potential real-time predictions of soft tissue distribution for tumor identification in beast imaging.

S. Fang, S. Zhe, H.M. Lin, A.A. Azad, H. Fettke, E.M. Kwan, L. Horvath, B. Mak, T. Zheng, P. Du, S. Jia, R.M. Kirby, M. Kohli.
**“Multi-Omic Integration of Blood-Based Tumor-Associated Genomic and Lipidomic Profiles Using Machine Learning Models in Metastatic Prostate Cancer,”** In *Clinical Cancer Informatics*, 2023.

PURPOSE

To determine prognostic and predictive clinical outcomes in metastatic hormone-sensitive prostate cancer (mHSPC) and metastatic castrate-resistant prostate cancer (mCRPC) on the basis of a combination of plasma-derived genomic alterations and lipid features in a longitudinal cohort of patients with advanced prostate cancer.

METHODS

A multifeature classifier was constructed to predict clinical outcomes using plasma-based genomic alterations detected in 120 genes and 772 lipidomic species as informative features in a cohort of 71 patients with mHSPC and 144 patients with mCRPC. Outcomes of interest were collected over 11 years of follow-up. These included in mHSPC state early failure of androgen-deprivation therapy (ADT) and exceptional responders to ADT; early death (poor prognosis) and long-term survivors in mCRPC state. The approach was to build binary classification models that identified discriminative candidates with optimal weights to predict outcomes. To achieve this, we built multi-omic feature-based classifiers using traditional machine learning (ML) methods, including logistic regression with sparse regularization, multi-kernel Gaussian process regression, and support vector machines.

RESULTS

The levels of specific ceramides (d18:1/14:0 and d18:1/17:0), and the presence of CHEK2 mutations, AR amplification, and RB1 deletion were identified as the most crucial factors associated with clinical outcomes. Using ML models, the optimal multi-omics feature combination determined resulted in AUC scores of 0.751 for predicting mHSPC survival and 0.638 for predicting ADT failure; and in mCRPC state, 0.687 for prognostication and 0.727 for exceptional survival. The models were observed to be superior than using a limited candidate number of features for developing multi-omic prognostic and predictive signatures.

CONCLUSION

Using a ML approach that incorporates multiple omic features improves the prediction accuracy for metastatic prostate cancer outcomes significantly. Validation of these models will be needed in independent data sets in future.

S. Li, X. Yu, W. Xing, R.M. Kirby, A. Narayan, S. Zhe.
**“Multi-Resolution Active Learning of Fourier Neural Operators,”** Subtitled **“arXiv:2309.16971,”** 2023.

Fourier Neural Operator (FNO) is a popular operator learning framework. It not only achieves the state-of-the-art performance in many tasks, but also is highly efficient in training and prediction. However, collecting training data for the FNO can be a costly bottleneck in practice, because it often demands expensive physical simulations. To overcome this problem, we propose Multi-Resolution Active learning of FNO (MRA-FNO), which can dynamically select the input functions and resolutions to lower the data cost as much as possible while optimizing the learning efficiency. Specifically, we propose a probabilistic multi-resolution FNO and use ensemble Monte-Carlo to develop an effective posterior inference algorithm. To conduct active learning, we maximize a utility-cost ratio as the acquisition function to acquire new examples and resolutions at each step. We use moment matching and the matrix determinant lemma to enable tractable, efficient utility computation. Furthermore, we develop a cost annealing framework to avoid over-penalizing high-resolution queries at the early stage. The over-penalization is severe when the cost difference is significant between the resolutions, which renders active learning often stuck at low-resolution queries and inferior performance. Our method overcomes this problem and applies to general multi-fidelity active learning and optimization problems. We have shown the advantage of our method in several benchmark operator learning tasks.

D. Long, W.W. Xing, A.S. Krishnapriyan, R.M. Kirby, S. Zhe, M.W. Mahoney.
**“Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels,”** Subtitled **“arXiv:2310.05387v1,”** 2023.

Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity as well as noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS). We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We combine it with a Bayesian spike-and-slab prior — an ideal Bayesian sparse distribution — for effective operator selection and uncertainty quantification. We develop an expectation propagation expectation-maximization (EP-EM) algorithm for efficient posterior inference and function estimation. To overcome the computational challenge of kernel regression, we place the function values on a mesh and induce a Kronecker product construction, and we use tensor algebra methods to enable efficient computation and optimization. We show the significant advantages of KBASS on a list of benchmark ODE and PDE discovery tasks.

H. Oh, R. Amici, G. Bomarito, S. Zhe, R.M. Kirby, J. Hochhalter.
**“Inherently interpretable machine learning solutions to differential equations,”** In *Engineering with Computers*, 2023.

A machine learning method for the discovery of analytic solutions to differential equations is assessed. The method utilizes an inherently interpretable machine learning algorithm, genetic programming-based symbolic regression. An advantage of its interpretability is the output of symbolic expressions that can be used to assess error in algebraic terms, as opposed to purely numerical quantities. Therefore, models output by the developed method are verified by assessing its ability to recover known analytic solutions for two differential equations, as opposed to assessing numerical error. To demonstrate its improvement, the developed method is compared to a conventional, purely data-driven genetic programming-based symbolic regression algorithm. The reliability of successful evolution of the true solution, or an algebraic equivalent, is demonstrated.

T. A. J. Ouermi, R. M Kirby, M. Berzins.
**“HiPPIS A High-Order Positivity-Preserving Mapping Software for Structured Meshes,”** In *ACM Trans. Math. Softw*, ACM, Nov, 2023.

ISSN: 0098-3500

DOI: 10.1145/3632291

Polynomial interpolation is an important component of many computational problems. In several of these computational problems, failure to preserve positivity when using polynomials to approximate or map data values between meshes can lead to negative unphysical quantities. Currently, most polynomial-based methods for enforcing positivity are based on splines and polynomial rescaling. The spline-based approaches build interpolants that are positive over the intervals in which they are defined and may require solving a minimization problem and/or system of equations. The linear polynomial rescaling methods allow for high-degree polynomials but enforce positivity only at limited locations (e.g., quadrature nodes). This work introduces open-source software (HiPPIS) for high-order data-bounded interpolation (DBI) and positivity-preserving interpolation (PPI) that addresses the limitations of both the spline and polynomial rescaling methods. HiPPIS is suitable for approximating and mapping physical quantities such as mass, density, and concentration between meshes while preserving positivity. This work provides Fortran and Matlab implementations of the DBI and PPI methods, presents an analysis of the mapping error in the context of PDEs, and uses several 1D and 2D numerical examples to demonstrate the benefits and limitations of HiPPIS.

M. Penwarden, S. Zhe, A. Narayan, R.M. Kirby.
**“A Metalearning Approach for Physics-Informed Neural Networks (PINNs): Application to Parameterized PDEs,”** In *Journal of Computational Physics*, Elsevier, 2023.

DOI: https://doi.org/10.1016/j.jcp.2023.111912

Physics-informed neural networks (PINNs) as a means of discretizing partial differential equations (PDEs) are garnering much attention in the Computational Science and Engineering (CS&E) world. At least two challenges exist for PINNs at present: an understanding of accuracy and convergence characteristics with respect to tunable parameters and identification of optimization strategies that make PINNs as efficient as other computational science tools. The cost of PINNs training remains a major challenge of Physics-informed Machine Learning (PiML) – and, in fact, machine learning (ML) in general. This paper is meant to move towards addressing the latter through the study of PINNs on new tasks, for which parameterized PDEs provides a good testbed application as tasks can be easily defined in this context. Following the ML world, we introduce metalearning of PINNs with application to parameterized PDEs. By introducing metalearning and transfer learning concepts, we can greatly accelerate the PINNs optimization process. We present a survey of model-agnostic metalearning, and then discuss our model-aware metalearning applied to PINNs as well as implementation considerations and algorithmic complexity. We then test our approach on various canonical forward parameterized PDEs that have been presented in the emerging PINNs literature.

M. Penwarden, A.D. Jagtap, S. Zhe, G.E. Karniadakis, R.M. Kirby.
**“A unified scalable framework for causal sweeping strategies for Physics-Informed Neural Networks (PINNs) and their temporal decompositions,”** Subtitled **“arXiv:2302.14227v1,”** 2023.

Physics-informed neural networks (PINNs) as a means of solving partial differential equations (PDE) have garnered much attention in the Computational Science and Engineering (CS&E) world. However, a recent topic of interest is exploring various training (i.e., optimization) challenges – in particular, arriving at poor local minima in the optimization landscape results in a PINN approximation giving an inferior, and sometimes trivial, solution when solving forward time-dependent PDEs with no data. This problem is also found in, and in some sense more difficult, with domain decomposition strategies such as temporal decomposition using XPINNs. To address this problem, we first enable a general categorization for previous causality methods, from which we identify a gap (e.g., opportunity) in the previous approaches. We then furnish examples and explanations for different training challenges, their cause, and how they relate to information propagation and temporal decomposition. We propose a solution to fill this gap by reframing these causality concepts into a generalized information propagation framework in which any prior method or combination of methods can be described. This framework is easily modifiable via user parameters in the open-source code accompanying this paper. Our unified framework moves toward reducing the number of PINN methods to consider and the reimplementation and retuning cost for thorough comparisons rather than increasing it. Using the idea of information propagation, we propose a new stacked-decomposition method that bridges the gap between time-marching PINNs and XPINNs. We also introduce significant computational speed-ups by using transfer learning concepts to initialize subnetworks in the domain and loss tolerance-based propagation for the subdomains. Finally, we formulate a new time-sweeping collocation point algorithm inspired by the previous PINNs causality literature, which our framework can still describe, and provides a significant computational speed-up via reduced-cost collocation point segmentation. The proposed methods overcome training challenges in PINNs and XPINNs for time-dependent PDEs by respecting the causality in multiple forms and improving scalability by limiting the computation required per optimization iteration. Finally, we provide numerical results for these methods on baseline PDE problems for which unmodified PINNs and XPINNs struggle to train.

K. Shukla, V. Oommen, A. Peyvan, M. Penwarden, L. Bravo, A. Ghoshal, R.M. Kirby, G. Karniadakis.
**“Deep neural operators can serve as accurate surrogates for shape optimization: A case study for airfoils,”** Subtitled **“arXiv:2302.00807v1,”** 2023.

Deep neural operators, such as DeepONets, have changed the paradigm in high-dimensional nonlinear regression from function regression to (differential) operator regression, paving the way for significant changes in computational engineering applications. Here, we investigate the use of DeepONets to infer flow fields around unseen airfoils with the aim of shape optimization, an important design problem in aerodynamics that typically taxes computational resources heavily. We present results which display little to no degradation in prediction accuracy, while reducing the online optimization cost by orders of magnitude. We consider NACA airfoils as a test case for our proposed approach, as their shape can be easily defined by the four-digit parametrization. We successfully optimize the constrained NACA four-digit problem with respect to maximizing the lift-to-drag ratio and validate all results by comparing them to a high-order CFD solver. We find that DeepONets have low generalization error, making them ideal for generating solutions of unseen shapes. Specifically, pressure, density, and velocity fields are accurately inferred at a fraction of a second, hence enabling the use of general objective functions beyond the maximization of the lift-to-drag ratio considered in the current work.

K. Shukla, V. Oommen, A. Peyvan, M. Penwarden, N. Plewacki, L. Bravo, A. Ghoshal, R.M. Kirby, G. Karniadakis.
**“Deep neural operators as accurate surrogates for shape optimization,”** In *Engineering Applications of Artificial Intelligence*, Vol. 129, pp. 107615. 2023.

ISSN: 0952-1976

Deep neural operators, such as DeepONet, have changed the paradigm in high-dimensional nonlinear regression, paving the way for significant generalization and speed-up in computational engineering applications. Here, we investigate the use of DeepONet to infer flow fields around unseen airfoils with the aim of shape constrained optimization, an important design problem in aerodynamics that typically taxes computational resources heavily. We present results that display little to no degradation in prediction accuracy while reducing the online optimization cost by orders of magnitude. We consider NACA airfoils as a test case for our proposed approach, as the four-digit parameterization can easily define their shape. We successfully optimize the constrained NACA four-digit problem with respect to maximizing the lift-to-drag ratio and validate all results by comparing them to a high-order CFD solver. We find that DeepONets have a low generalization error, making them ideal for generating solutions of unseen shapes. Specifically, pressure, density, and velocity fields are accurately inferred at a fraction of a second, hence enabling the use of general objective functions beyond the maximization of the lift-to-drag ratio considered in the current work. Finally, we validate the ability of DeepONet to handle a complex 3D waverider geometry at hypersonic flight by inferring shear stress and heat flux distributions on its surface at unseen angles of attack. The main contribution of this paper is a modular integrated design framework that uses an over-parametrized neural operator as a surrogate model with good generalizability coupled seamlessly with multiple optimization solvers in a plug-and-play mode.

2022

S. Fang, A. Narayan, R.M. Kirby, S. Zhe.
**“Bayesian Continuous-Time Tucker Decomposition,”** In *Proceedings of the 39 th International Conference on Machine Learning*, 2022.

Tensor decomposition is a dominant framework for multiway data analysis and prediction. Although practical data often contains timestamps for the observed entries, existing tensor decomposition approaches overlook or under-use this valuable time information. They either drop the timestamps or bin them into crude steps and hence ignore the temporal dynamics within each step or use simple parametric time coefficients. To overcome these limitations, we propose Bayesian Continuous-Time Tucker Decomposition (BCTT). We model the tensor-core of the classical Tucker decomposition as a time-varying function, and place a Gaussian process prior to flexibly estimate all kinds of temporal dynamics. In this way, our model maintains the interpretability while is flexible enough to capture various complex temporal relationships between the tensor nodes. For efficient and high-quality posterior inference, we use the stochastic differential equation (SDE) representation of temporal GPs to build an equivalent state-space prior, which avoids huge kernel matrix computation and sparse/low-rank approximations. We then use Kalman filtering, RTS smoothing, and conditional moment matching to develop a scalable message-passing inference algorithm. We show the advantage of our method in simulation and several real-world applications.

J.D. Hogue, R.M. Kirby, A. Narayan.
**“Dimensionality Reduction in Deep Learning via Kronecker Multi-layer Architectures,”** Subtitled **“arXiv:2204.04273,”** 2022.

Deep learning using neural networks is an effective technique for generating models of complex data. However, training such models can be expensive when networks have large model capacity resulting from a large number of layers and nodes. For training in such a computationally prohibitive regime, dimensionality reduction techniques ease the computational burden, and allow implementations of more robust networks. We propose a novel type of such dimensionality reduction via a new deep learning architecture based on fast matrix multiplication of a Kronecker product decomposition; in particular our network construction can be viewed as a Kronecker product-induced sparsification of an "extended" fully connected network. Analysis and practical examples show that this architecture allows a neural network to be trained and implemented with a significant reduction in computational time and resources, while achieving a similar error level compared to a traditional feedforward neural network.

V. Keshavarzzadeh, R.M. Kirby, A. Narayan.
**“Variational Inference for Nonlinear Inverse Problems via Neural Net Kernels: Comparison to Bayesian Neural Networks, Application to Topology Optimization,”** Subtitled **“arXiv:2205.03681,”** 2022.

Inverse problems and, in particular, inferring unknown or latent parameters from data are ubiquitous in engineering simulations. A predominant viewpoint in identifying unknown parameters is Bayesian inference where both prior information about the parameters and the information from the observations via likelihood evaluations are incorporated into the inference process. In this paper, we adopt a similar viewpoint with a slightly different numerical procedure from standard inference approaches to provide insight about the localized behavior of unknown underlying parameters. We present a variational inference approach which mainly incorporates the observation data in a point-wise manner, i.e. we invert a limited number of observation data leveraging the gradient information of the forward map with respect to parameters, and find true individual samples of the latent parameters when the forward map is noise-free and one-to-one. For statistical calculations (as the ultimate goal in simulations), a large number of samples are generated from a trained neural network which serves as a transport map from the prior to posterior latent parameters. Our neural network machinery, developed as part of the inference framework and referred to as Neural Net Kernels (NNK), is based on hierarchical (deep) kernels which provide greater flexibility for training compared to standard neural networks. We showcase the effectiveness of our inference procedure in identifying bimodal and irregular distributions compared to a number of approaches including Markov Chain Monte Carlo sampling approaches and a Bayesian neural network approach.

S. Li, R.M. Kirby, S. Zhe.
**“Decomposing Temporal High-Order Interactions via Latent ODEs,”** In *Proceedings of the 39 th International Conference on Machine Learning*, 2022.

High-order interactions between multiple objects are common in real-world applications. Although tensor decomposition is a popular framework for high-order interaction analysis and prediction, most methods cannot well exploit the valuable timestamp information in data. The existent methods either discard the timestamps or convert them into discrete steps or use over-simplistic decomposition models. As a result, these methods might not be capable enough of capturing complex, finegrained temporal dynamics or making accurate predictions for long-term interaction results. To overcome these limitations, we propose a novel Temporal High-order Interaction decompoSition model based on Ordinary Differential Equations (THIS-ODE). We model the time-varying interaction result with a latent ODE. To capture the complex temporal dynamics, we use a neural network (NN) to learn the time derivative of the ODE state. We use the representation of the interaction objects to model the initial value of the ODE and to constitute a part of the NN input to compute the state. In this way, the temporal relationships of the participant objects can be estimated and encoded into their representations. For tractable and scalable inference, we use forward sensitivity analysis to efficiently compute the gradient of ODE state, based on which we use integral transform to develop a stochastic mini-batch learning algorithm. We demonstrate the advantage of our approach in simulation and four real-world applications.

S. Li, Z Wang, R.M. Kirby, S. Zhe.
**“Infinite-Fidelity Coregionalization for Physical Simulation,”** Subtitled **“arXiv:2207.00678,”** 2022.

Multi-fidelity modeling and learning are important in physical simulation-related applications. It can leverage both low-fidelity and high-fidelity examples for training so as to reduce the cost of data generation while still achieving good performance. While existing approaches only model finite, discrete fidelities, in practice, the fidelity choice is often continuous and infinite, which can correspond to a continuous mesh spacing or finite element length. In this paper, we propose Infinite Fidelity Coregionalization (IFC). Given the data, our method can extract and exploit rich information within continuous, infinite fidelities to bolster the prediction accuracy. Our model can interpolate and/or extrapolate the predictions to novel fidelities, which can be even higher than the fidelities of training data. Specifically, we introduce a low-dimensional latent output as a continuous function of the fidelity and input, and multiple it with a basis matrix to predict high-dimensional solution outputs. We model the latent output as a neural Ordinary Differential Equation (ODE) to capture the complex relationships within and integrate information throughout the continuous fidelities. We then use Gaussian processes or another ODE to estimate the fidelity-varying bases. For efficient inference, we reorganize the bases as a tensor, and use a tensor-Gaussian variational posterior to develop a scalable inference algorithm for massive outputs. We show the advantage of our method in several benchmark tasks in computational physics.

S. Li, J.M. Phillips, X. Yu, R.M. Kirby, S. Zhe.
**“Batch Multi-Fidelity Active Learning with Budget Constraints,”** Subtitled **“arXiv:2210.12704v1,”** 2022.

Learning functions with high-dimensional outputs is critical in many applications, such as physical simulation and engineering design. However, collecting training examples for these applications is often costly, e.g. by running numerical solvers. The recent work (Li et al., 2022) proposes the first multi-fidelity active learning approach for high-dimensional outputs, which can acquire examples at different fidelities to reduce the cost while improving the learning performance. However, this method only queries at one pair of fidelity and input at a time, and hence has a risk to bring in strongly correlated examples to reduce the learning efficiency. In this paper, we propose Batch Multi-Fidelity Active Learning with Budget Constraints (BMFAL-BC), which can promote the diversity of training examples to improve the benefit-cost ratio, while respecting a given budget constraint for batch queries. Hence, our method can be more practically useful. Specifically, we propose a novel batch acquisition function that measures the mutual information between a batch of multi-fidelity queries and the target function, so as to penalize highly correlated queries and encourages diversity. The optimization of the batch acquisition function is challenging in that it involves a combinatorial search over many fidelities while subject to the budget constraint. To address this challenge, we develop a weighted greedy algorithm that can sequentially identify each (fidelity, input) pair, while achieving a near -approximation of the optimum. We show the advantage of our method in several computational physics and engineering applications.