Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.
Deep brain stimulation
BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).
Developing software tools for science has always been a central vision of the SCI Institute.

Scientific Computing

Numerical simulation of real-world phenomena provides fertile ground for building interdisciplinary relationships. The SCI Institute has a long tradition of building these relationships in a win-win fashion – a win for the theoretical and algorithmic development of numerical modeling and simulation techniques and a win for the discipline-specific science of interest. High-order and adaptive methods, uncertainty quantification, complexity analysis, and parallelization are just some of the topics being investigated by SCI faculty. These areas of computing are being applied to a wide variety of engineering applications ranging from fluid mechanics and solid mechanics to bioelectricity.


martin

Martin Berzins

Parallel Computing
GPUs
mike

Mike Kirby

Finite Element Methods
Uncertainty Quantification
GPUs
pascucci

Valerio Pascucci

Scientific Data Management
chris

Chris Johnson

Problem Solving Environments
amir

Amir Arzani

Scientific machine learning
Data-driven fluid flow modeling

Funded Research Projects:


Publications in Scientific Computing:


Adaptive Placement of Data Analysis Tasks For Staging Based In-Situ Processing
Z. Wang, P. Subedi, M. Dorier, P.E. Davis, M. Parashar. In 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC), pp. 242-251. 2021.
DOI: 10.1109/HiPC53243.2021.00038

In-situ processing addresses the gap between speeds of computing and I/O capabilities by processing data close to the data source, i.e., on the same system as the data source (e.g., a simulation). However, the effective implementation of in-situ processing workflows requires the optimization of several design parameters such as where on the system workflow data analysis/visualization (ana/vis) as placed and how execution as well as the interaction and data exchanges between ana/vis are coordinated. For example, in the case of hybrid in-situ processing, interacting ana/vis may be tightly or loosely coupled depending on their placement, and this can lead to very different performance and scalability. A key challenge is deciding the most appropriate ana/vis placement, which depends on dynamic applications, workflow, and system characteristics that might change at runtime. In this paper, we present a framework to support online adaptive data analysis placement during the execution of an in-situ workflow. Specifically, the paper presents a model and architecture, and explores several data analysis placement strategies. Evaluation results show that dynamically choosing appropriate data analysis placement strategies can balance the benefits and overhead of different data analysis placement patterns to reduce in-situ processing time.



MDSC: Modelling Distributed Stream Processing across the Edge-to-Cloud Continuum
D. Balouek-Thomert, P. Silva, K. Fauvel, A. Costan, G. Antoniu, M. Parashar. In DML-ICC 2021 workshop (held in conjunction with UCC 2021), December, 2021.

The growth of the Internet of Things is resulting in an explosion of data volumes at the Edge of the Internet. To reduce costs incurred due to data movement and centralized cloud-based processing, it is becoming increasingly important to process and analyze such data closer to the data sources. Exploiting Edge computing capabilities for stream-based processing is however challenging. It requires addressing the complex characteristics and constraints imposed by all the resources along the data path, as well as the large set of heterogeneous data processing and management frameworks. Consequently, the community needs tools that can facilitate the modeling of this complexity and can integrate the various components involved. In this work, we introduce MDSC, a hierarchical approach for modeling distributed stream-based applications on Edge-to-Cloud continuum infrastructures. We demonstrate how MDSC can be applied to a concrete real-life ML-based application -early earthquake warning - to help answer questions such as: when is it worth decentralizing the classification load from the Cloud to the Edge and how?



GP-HMAT: Scalable, $O(n\log (n)) $ Gaussian Process Regression with Hierarchical Low-Rank Matrices
Subtitled “arXiv preprint arXiv:2201.00888,” V. Keshavarzzadeh, S. Zhe, R.M. Kirby, A. Narayan. 2021.

A Gaussian process (GP) is a powerful and widely used regression technique. The main building block of a GP regression is the covariance kernel, which characterizes the relationship between pairs in the random field. The optimization to find the optimal kernel, however, requires several large-scale and often unstructured matrix inversions. We tackle this challenge by introducing a hierarchical matrix approach, named HMAT, which effectively decomposes the matrix structure, in a recursive manner, into significantly smaller matrices where a direct approach could be used for inversion. Our matrix partitioning uses a particular aggregation strategy for data points, which promotes the low-rank structure of off-diagonal blocks in the hierarchical kernel matrix. We employ a randomized linear algebra method for matrix reduction on the low-rank off-diagonal blocks without factorizing a large matrix. We provide analytical error and cost estimates for the inversion of the matrix, investigate them empirically with numerical computations, and demonstrate the application of our approach on three numerical examples involving GP regression for engineering problems and a large-scale real dataset. We provide the computer implementation of GP-HMAT, HMAT adapted for GP likelihood and derivative computations, and the implementation of the last numerical example on a real dataset. We demonstrate superior scalability of the HMAT approach compared to built-in operator in MATLAB for large-scale linear solves Ax=y via a repeatable and verifiable empirical study. An extension to hierarchical semiseparable (HSS) matrices is discussed as future research.



Evaluating policy-driven adaptation on the Edge-to-Cloud Continuum
D. Balouek-Thomert, I. Rodero, M. Parashar. In IEEE/ACM HPC for Urgent Decision Making (UrgentHPC), pp. 11-20. 2021.
DOI: 10.1109/UrgentHPC54802.2021.00007

Developing data-driven applications requires developers and service providers to orchestrate data-to-discovery pipelines across distributed data sources and computing units. Realizing such pipelines poses two major challenges: programming analytics that reacts at runtime to unforeseen events, and adaptation of the resources and computing paths between the edge and the cloud. While these concerns are interdependent, they must be separated during the design process of the application and the deployment operations of the infrastructure. This work proposes a system stack for the adaptation of distributed analytics across the computing continuum. We implemented this software stack to evaluate its ability to continually balance the computation or data movement’s cost with the value of operations to the application objectives. Using a disaster response application, we observe that the system can select appropriate configurations while managing trade-offs between user-defined constraints, quality of results, and resource utilization. The evaluation shows that our model is able to adapt to variations in the data input size, bandwidth, and CPU capacities with minimal deadline violations (close to 10%). This constitutes encouraging results to benefit and facilitate the creation of ad-hoc computing paths for urgent science and time-critical decision-making.



An Adaptive Elasticity Policy For Staging Based In-Situ Processing
Z. Wang, M. Dorier, P. Subedi, P..E Davis, M. Parashar. In IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), pp. 33-41. 2021.
DOI: 10.1109/WORKS54523.2021.00010

In-situ processing alleviates the gap between computation and I/O capabilities by performing data analysis close to the data source. With simulation data varying in size and content during workflow execution, it becomes necessary for insitu processing to support resource elasticity, i.e., the ability to change resource configurations such as the number of computing nodes/processes during workflow execution. An elastic job may dynamically adjust resource configurations; it may use a few resources at the beginning and more resources towards the end of the job when interesting data appears. However, it is hard to predict a priori how many computing nodes/processes need to be added/removed during the workflow execution to adapt to changing workflow needs. How to efficiently guide elasticity operations, such as growing or shrinking the number of processes used for in-situ analysis during workflow execution, is an open-ended research question. In this paper, we present an adaptive elasticity policy that adopts workflow runtime information collected online to predict how to trigger the addition and removal of processes in order to minimize in-situ processing overheads. We integrate the presented elasticity policy into a staging-based elastic workflow and evaluate its efficiency in multiple elasticity scenarios. The results indicate that an adaptive elasticity policy can save overhead in finding a proper resource configuration, when compared with a static policy that uses a fixed number of processes for each rescaling operation. Finally, we discuss multiple existing research opportunities of elastic insitu processing from different aspects.



Toward Democratizing Access to Facilities Data: A Framework for Intelligent Data Discovery and Delivery
Subtitled “arXiv:2112.06479,” Y. Qin, I. Rodero, M. Parashar. 2021.

Data collected by large-scale instruments, observatories, and sensor networks are key enablers of scientific discoveries in many disciplines. However, ensuring that these data can be accessed, integrated, and analyzed in a democratized and timely manner remains a challenge. In this article, we explore how state-of-the-art techniques for data discovery and access can be adapted to facility data and develop a conceptual framework for intelligent data access and discovery.



Model Reduction of Linear Dynamical Systems via Balancing for Bayesian Inference
Subtitled “arXiv preprint arXiv:2111.13246,” E. Qian, J.M. Tabeart, C. Beattie, S. Gugercin, J. Jiang, P. Kramer, A. Narayan. 2021.

We consider the Bayesian approach to the linear Gaussian inference problem of inferring the initial condition of a linear dynamical system from noisy output measurements taken after the initial time. In practical applications, the large dimension of the dynamical system state poses a computational obstacle to computing the exact posterior distribution. Model reduction offers a variety of computational tools that seek to reduce this computational burden. In particular, balanced truncation is a system-theoretic approach to model reduction which obtains an efficient reduced-dimension dynamical system by projecting the system operators onto state directions which trade off the reachability and observability of state directions as expressed through the associated Gramians. We introduce Gramian definitions relevant to the inference setting and propose a balanced truncation approach based on these inference Gramians that yield a reduced dynamical system that can be used to cheaply approximate the posterior mean and covariance. Our definitions exploit natural connections between (i) the reachability Gramian and the prior covariance and (ii) the observability Gramian and the Fisher information. The resulting reduced model then inherits stability properties and error bounds from system theoretic considerations, and in some settings yields an optimal posterior covariance approximation. Numerical demonstrations on two benchmark problems in model reduction show that our method can yield near-optimal posterior covariance approximations with order-of-magnitude state dimension reduction.



Time Stepping with Space and Time Errors and Stability of the Material Point Method
M. Berzins. In Proceedings of VII International Conference on Particle-Based Methods, PARTICLES 2021, Edited by P. Wriggers, M. Bischoff, E. Onate, M. Bischoff, A. Duster & T. Zohdi, 2021.

The choice of the time step for the Material Point Method (MPM) is often addressed by using a simple stability criterion, such as the speed of sound or a CFL condition. Recently there have been several advances in understanding the stability of MPM. These range from non-linear stability analysis, through to Von Neumann type approaches. While in many instances this works well it is important to understand how this relates to the overall errors present in the method. Although it has been observed that spatial errors may dominate temporal ones at stable time steps, recent work has made more precise the sources and forms of the different MPM errors. This now makes it possible to understand how the different errors and the stability analysis are connected. At the same time this also requires simple computable estimates of the different errors in the material point method. The use of simple estimates of these errors makes it possible to connect some of the errors introduced with the stability criteria used. A number of simple computational experiments are used to illustrate the theoretical results.



Material point method: Overview and challenges ahead (without videos)
W. T. Sołowski, M. Berzins, W. Coombs, J. Guilkey, M. Möller, Q. A. Tran, T. Adibaskoro, S. Seyedan, R. Tielen, K. Soga. In Advances in Applied Mechanics, 1, Vol. 14, Ch. 2, Elsevier, pp. 113-204. 2021.

The paper gives an overview of Material Point Method and shows its evolution over the last 25 years. The Material Point Method developments followed a logical order. The article aims at identifying this order and show not only the current state of the art, but explain the drivers behind the developments and identify what is currently still missing.The paper explores modern implementations of both explicit and implicit Material Point Method. It concentrates mainly on uses of the method in engineering, but also gives a short overview of Material Point Method application in computer graphics and animation. Furthermore, the article gives overview of errors in the material point method algorithms, as well as identify gaps in knowledge, filling which would hopefully lead to a much more efficient and accurate Material Point Method. The paper also briefly discusses algorithms related to contact and boundaries, coupling the Material Point Method with other numerical methods and modeling of fractures. It also gives an overview of modeling of multi-phase continua with Material Point Method. The paper closes with numerical examples, aiming at showing the capabilities of Material Point Method in advanced simulations. Those include landslide modeling, multiphysics simulation of shaped charge explosion and simulations of granular material flow out of a silo undergoing changes from continuous to discontinuous and back to continuous behavior.The paper uniquely illustrates many of the developments not only with figures but also with videos, giving the whole extend of simulation instead of just a timestamped image



Material point method: Overview and challenges ahead (with videos)
W. T. Sołowski, M. Berzins, W. Coombs, J. Guilkey, M. Möller, Q. A. Tran, T. Adibaskoro, S. Seyedan, R. Tielen, K. Soga. In Advances in Applied Mechanics, 1, Vol. 14, Ch. 2, Elsevier, pp. 113-204. 2021.
ISBN: 978-0-323-88519-5

The paper gives an overview of Material Point Method and shows its evolution over the last 25 years. The Material Point Method developments followed a logical order. The article aims at identifying this order and show not only the current state of the art, but explain the drivers behind the developments and identify what is currently still missing.The paper explores modern implementations of both explicit and implicit Material Point Method. It concentrates mainly on uses of the method in engineering, but also gives a short overview of Material Point Method application in computer graphics and animation. Furthermore, the article gives overview of errors in the material point method algorithms, as well as identify gaps in knowledge, filling which would hopefully lead to a much more efficient and accurate Material Point Method. The paper also briefly discusses algorithms related to contact and boundaries, coupling the Material Point Method with other numerical methods and modeling of fractures. It also gives an overview of modeling of multi-phase continua with Material Point Method. The paper closes with numerical examples, aiming at showing the capabilities of Material Point Method in advanced simulations. Those include landslide modeling, multiphysics simulation of shaped charge explosion and simulations of granular material flow out of a silo undergoing changes from continuous to discontinuous and back to continuous behavior.The paper uniquely illustrates many of the developments not only with figures but also with videos, giving the whole extend of simulation instead of just a timestamped image



PrefixRL: Optimization of Parallel Prefix Circuits using Deep Reinforcement Learning
R. Roy, J. Raiman, N. Kant, I. Elkin, R. Kirby, M. Siu, S. Oberman, S. Godil, B. Catanzaro. In 2021 58th ACM/IEEE Design Automation Conference (DAC), IEEE, pp. 853-858. 2021.
DOI: 10.1109/DAC18074.2021.9586094

In this work, we present a reinforcement learning (RL) based approach to designing parallel prefix circuits such as adders or priority encoders that are fundamental to high-performance digital design. Unlike prior methods, our approach designs solutions tabula rasa purely through learning with synthesis in the loop. We design a grid-based state-action representation and an RL environment for constructing legal prefix circuits. Deep Convolutional RL agents trained on this environment produce prefix adder circuits that Pareto-dominate existing baselines with up to 16.0% and 30.2% lower area for the same delay in the 32b and 64b settings respectively. We observe that agents trained with open-source synthesis tools and cell library can design adder circuits that achieve lower area and delay than commercial tool adders in an industrial cell library.



Uintah+Hedgehog: Combining Parallelism Models for End-to-End Large-Scale Simulation Performance
J. K. Holmen, D. Sahasrabudhe, M. Berzins, A. Bardakoff, T. J. Blattner, . Keyrouz. Scientific Computing and Imaging Institute, 2021.

The complexity of heterogeneous nodes near and at exascale has increased the need for “heroic” programming efforts. To accommodate this complexity, significant investment is required for codes not yet optimizing for low-level architecture features (e.g., wide vector units) and/or running at large-scale. This paper describes ongoing efforts to combine two codes, Hedgehog and Uintah, lying at both extremes to ease programming efforts. The end goals of this effort are (1) to combine the two codes to make an asynchronous many-task runtime system specializing in both node-level and large-scale performance and (2) to further improve the accessibility of both with portable abstractions. A prototype adopting Hedgehog in Uintah and a prototype extending Hedgehog to support MPI+X hybrid parallelism are discussed. Results achieving ∼60% of NVIDIA V100 GPU peak performance for a distributed DGEMM problem are shown for a naive MPI+Hedgehog implementation before any attempt to optimize for performance.

Authors note: This is a refereed but unpublished report that was
submitted to, reviewed for and accepted in revised form for a presentation of the same material at the Hipar Workshop at Supercomputing 21



Enabling microservices management for Deep Learning applications across the Edge-Cloud Continuum
Z. Houmani, D. Balouek-Thomert, E. Caron, M. Parashar. In SBAC-PAD 2021 - IEEE 33rd International Symposium on Computer Architecture and High Performance Computing, October, 2021.

Deep Learning has shifted the focus of traditional batch workflows to data-driven feature engineering on streaming data. In particular, the execution of Deep Learning workflows presents expectations of near-real-time results with user-defined acceptable accuracy. Meeting the objectives of such applications across heterogeneous resources located at the edge of the network, the core, and in-between requires managing trade-offs between the accuracy and the urgency of the results. However, current data analysis rarely manages the entire Deep Learning pipeline along the data path, making it complex for developers to implement strategies in real-world deployments. Driven by an object detection use case, this paper presents an architecture for time-critical Deep Learning workflows by providing a data-driven scheduling approach to distribute the pipeline across Edge to Cloud resources. Furthermore, it adopts a data management strategy that reduces the resolution of incoming data when potential trade-off optimizations are available. We illustrate the system's viability through a performance evaluation of the object detection use case on the Grid'5000 testbed. We demonstrate that in a multiuser scenario, with a standard frame rate of 25 frames per second, the system speed-up data analysis up to 54.4% compared to a Cloud-only-based scenario with an analysis accuracy higher than a fixed threshold.



Physics-Informed Neural Networks (PINNs) for Parameterized PDEs: A Metalearning Approach
Subtitled “arXiv preprint arXiv:2110.13361,” M. Penwarden, S. Zhe, A. Narayan, R. M. Kirby. 2021.

Physics-informed neural networks (PINNs) as a means of discretizing partial differential equations (PDEs) are garnering much attention in the Computational Science and Engineering (CS&E) world. At least two challenges exist for PINNs at present: an understanding of accuracy and convergence characteristics with respect to tunable parameters and identification of optimization strategies that make PINNs as efficient as other computational science tools. The cost of PINNs training remains a major challenge of Physics-informed Machine Learning (PiML) – and, in fact, machine learning (ML) in general. This paper is meant to move towards addressing the latter through the study of PINNs for parameterized PDEs. Following the ML world, we introduce metalearning of PINNs for parameterized PDEs. By introducing metalearning and transfer learning concepts, we can greatly accelerate the PINNs optimization process. We present a survey of model-agnostic metalearning, and then discuss our model-aware metalearning applied to PINNs. We provide theoretically motivated and empirically backed assumptions that make our metalearning approach possible. We then test our approach on various canonical forward parameterized PDEs that have been presented in the emerging PINNs literature.



Exploring the Role of Machine Learning in Scientific Workflows: Opportunities and Challenges
Subtitled “arXiv preprint arXiv:2110.13999,” A. Nouri, P.E. Davis, P. Subedi, M. Parashar. 2021.

In this survey, we discuss the challenges of executing scientific workflows as well as existing Machine Learning (ML) techniques to alleviate those challenges. We provide the context and motivation for applying ML to each step of the execution of these workflows. Furthermore, we provide recommendations on how to extend ML techniques to unresolved challenges in the execution of scientific workflows. Moreover, we discuss the possibility of using ML techniques for in-situ operations. We explore the challenges of in-situ workflows and provide suggestions for improving the performance of their execution using ML techniques.



Meta-Learning with Adjoint Methods
Subtitled “arXiv preprint arXiv:2110.08432,” S. Li, Z. Wang, A. Narayan, R. Kirby, S. Zhe. 2021.

Model Agnostic Meta-Learning (MAML) is widely used to find a good initialization for a family of tasks. Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t the initialization of a long training trajectory for the sampled tasks, because the computation graph can rapidly explode and the computational cost is very expensive. To address this problem, we propose Adjoint MAML (A-MAML). We view gradient descent in the inner optimization as the evolution of an Ordinary Differential Equation (ODE). To efficiently compute the gradient of the validation loss w.r.t the initialization, we use the adjoint method to construct a companion, backward ODE. To obtain the gradient w.r.t the initialization, we only need to run the standard ODE solver twice -- one is forward in time that evolves a long trajectory of gradient flow for the sampled task; the other is backward and solves the adjoint ODE. We need not create or expand any intermediate computational graphs, adopt aggressive approximations, or impose proximal regularizers in the training loss. Our approach is cheap, accurate, and adaptable to different trajectory lengths. We demonstrate the advantage of our approach in both synthetic and real-world meta-learning tasks.



Scalable Graph Embedding LearningOn A Single GPU
Subtitled “arXiv preprint arXiv:2110.06991,” A. Nouri, P.E. Davis, P. Subedi, M. Parashar. 2021.

Graph embedding techniques have attracted growing interest since they convert the graph data into continuous and low-dimensional space. Effective graph analytic provides users a deeper understanding of what is behind the data and thus can benefit a variety of machine learning tasks. With the current scale of real-world applications, most graph analytic methods suffer high computation and space costs. These methods and systems can process a network with thousands to a few million nodes. However, scaling to large-scale networks remains a challenge. The complexity of training graph embedding system requires the use of existing accelerators such as GPU. In this paper, we introduce a hybrid CPU-GPU framework that addresses the challenges of learning embedding of large-scale graphs. The performance of our method is compared qualitatively and quantitatively with the existing embedding systems on common benchmarks. We also show that our system can scale training to datasets with an order of magnitude greater than a single machine's total memory capacity. The effectiveness of the learned embedding is evaluated within multiple downstream applications. The experimental results indicate the effectiveness of the learned embedding in terms of performance and accuracy.



RISE: Reducing I/O Contention in Staging-based Extreme-Scale In-situ Workflows,
P. Subedi, P.E .Davis, M. Parashar. In 2021 IEEE International Conference on Cluster Computing (CLUSTER), pp. 146--156. 2021.

While in-situ workflow formulations have addressed some of the data-related challenges associated with extreme-scale scientific workflows, these workflows involve complex interactions and different modes of data exchange. In the context of increasing system complexity, such workflows present significant resource management challenges, requiring complex cost-performance tradeoffs. This paper presents RISE, an intelligent staging-based data management middleware, which builds on the DataSpaces framework and performs intelligent scheduling of data management operations to reduce I/O contention. In RISE, data are always written immediately to local buffers to reduce the effect of the transfer impact upon application performance. RISE identifies applications’ data access patterns and moves data towards data consumers only when the network is expected to be idle, reducing the impact of asynchronous …



Structured Adaptive Mesh Refinement Adaptations to Retain Performance Portability With Increasing Heterogeneity
A. Dubey, M. Berzins, C. Burstedde, M.l L. Norman, D. Unat, M. Wahib. In Computing in Science & Engineering, Vol. 23, No. 5, pp. 62-66. 2021.
ISSN: 1521-9615
DOI: 10.1109/MCSE.2021.3099603

Adaptive mesh refinement (AMR) is an important method that enables many mesh-based applications to run at effectively higher resolution within limited computing resources by allowing high resolution only where really needed. This advantage comes at a cost, however: greater complexity in the mesh management machinery and challenges with load distribution. With the current trend of increasing heterogeneity in hardware architecture, AMR presents an orthogonal axis of complexity. The usual techniques, such as asynchronous communication and hierarchy management for parallelism and memory that are necessary to obtain reasonable performance are very challenging to reason about with AMR. Different groups working with AMR are bringing different approaches to this challenge. Here, we examine the design choices of several AMR codes and also the degree to which demands placed on them by their users influence these choices.



Characterizing possible failure modes in physics-informed neural networks
Subtitled “arXiv preprint arXiv:2109.01050,” A.S. Krishnapriyan, A. Gholami, S. Zhe, R.M. Kirby, M.W. Mahoney. 2021.

Recent work in scientific machine learning has developed so-called physics-informed neural network (PINN) models. The typical approach is to incorporate physical domain knowledge as soft constraints on an empirical loss function and use existing machine learning methodologies to train the model. We demonstrate that, while existing PINN methodologies can learn good models for relatively trivial problems, they can easily fail to learn relevant physical phenomena even for simple PDEs. In particular, we analyze several distinct situations of widespread physical interest, including learning differential equations with convection, reaction, and diffusion operators. We provide evidence that the soft regularization in PINNs, which involves differential operators, can introduce a number of subtle problems, including making the problem ill-conditioned. Importantly, we show that these possible failure modes are not due to the lack of expressivity in the NN architecture, but that the PINN's setup makes the loss landscape very hard to optimize. We then describe two promising solutions to address these failure modes. The first approach is to use curriculum regularization, where the PINN's loss term starts from a simple PDE regularization, and becomes progressively more complex as the NN gets trained. The second approach is to pose the problem as a sequence-to-sequence learning task, rather than learning to predict the entire space-time at once. Extensive testing shows that we can achieve up to 1-2 orders of magnitude lower error with these methods as compared to regular PINN training.