Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.
Deep brain stimulation
BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).
Developing software tools for science has always been a central vision of the SCI Institute.

SCI Publications

2022


M. Berzins. “Energy conservation and accuracy of some MPM formulations,” In Computational Particle Mechanics, 2022.
DOI: 10.1007/s40571-021-00457-3

ABSTRACT

The success of the Material Point Method (MPM) in solving many challenging problems nevertheless raises some open questions regarding the fundamental properties of the method such as time integration accuracy and energy conservation. The traditional MPM time integration methods are often based upon the symplectic Euler method or staggered central differences. This raises the question of how to best ensure energy conservation in explicit time integration for MPM. Two approaches are used here, one is to extend the Symplectic Euler method (Cromer Euler) to provide better energy conservation and the second is to use a potentially more accurate symplectic methods, namely the widely-used Stormer-Verlet Method. The Stormer-Verlet method is shown to have locally third order time accuracy of energy conservation in time, in contrast to the second order accuracy in energy conservation of the symplectic Euler methods that are used in many MPM calculations. It is shown that there is an extension to the Symplectic Euler stress-last method that provides better energy conservation that is comparable with the Stormer-Verlet method. This extension is referred to as TRGIMP and also has third order accuracy in energy conservation. When the interactions between space and time errors are studied it is seen that spatial errors may dominate in computed quantities such as displacement and velocity. This connection between the local errors in space and time is made explicit mathematically and explains the observed results that displacement and velocity errors are very similar for both methods. The observed and theoretically predicted third-order energy conservation accuracy and computational costs are demonstrated on a standard MPM test example.



M. Berzins. “Computational Error Estimation for The Material Point Method,” 2022.

ABSTRACT

A common feature of many methods in computational mechanics is that there is often a way of estimating the error in the computed solution. The situation for computational mechanics codes based upon the Material Point Method is very different in that there has been comparatively little work on computable error estimates for these methods. This work is concerned with introducing such an approach for the Material Point Method. Although it has been observed that spatial errors may dominate temporal ones at stable time steps, recent work has made more precise the sources and forms of the different MPM errors. There is then a need to estimate these errors computationally through computable estimates of the different errors in the material point method. Estimates of the different spatial errors in the Material Point Method are constructed based upon nodal derivatives of the different physical variables in MPM. These derivatives are then estimated using standard difference approximations calculated on the background mesh. The use of these estimates of the spatial error makes it possible to measure the growth of errors over time. A number of computational experiments are used to illustrate the performance of the computed error estimates. As the key feature of the approach is the calculation of derivatives on the regularly spaced background mesh, the extension to calculating derivatives and hence to error estimates for higher dimensional problems is clearly possible.



J.K. Holmen, D. Sahasrabudhe, M. Berzins. “Porting Uintah to Heterogeneous Systems,” In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC22) Best Paper Award, ACM, 2022.

ABSTRACT

The Uintah Computational Framework is being prepared to make portable use of forthcoming exascale systems, initially the DOE Aurora system through the Aurora Early Science Program. This paper describes the evolution of Uintah to be ready for such architectures. A key part of this preparation has been the adoption of the Kokkos performance portability layer in Uintah. The sheer size of the Uintah codebase has made it imperative to have a representative benchmark. The design of this benchmark and the use of Kokkos within it is discussed. This paper complements recent work with additional details and new scaling studies run 24x further than earlier studies. Results are shown for two benchmarks executing workloads representative of typical Uintah applications. These results demonstrate single-source portability across the DOE Summit and NSF Frontera systems with good strong-scaling characteristics. The challenge of extending this approach to anticipated exascale systems is also considered.



T.A.J. Ouermi, R.M. Kirby, M. Berzins. “ENO-Based High-Order Data-Bounded and Constrained Positivity-Preserving Interpolation,” Subtitled “https://arxiv.org/abs/2204.06168,” In Numerical Algorithms, 2022.

ABSTRACT

A number of key scientific computing applications that are based upon tensor-product grid constructions, such as numerical weather prediction (NWP) and combustion simulations, require property-preserving interpolation. Essentially Non-Oscillatory (ENO) interpolation is a classic example of such interpolation schemes. In the aforementioned application areas, property preservation often manifests itself as a requirement for either data boundedness or positivity preservation. For example, in NWP, one may have to interpolate between the grid on which the dynamics is calculated to a grid on which the physics is calculated (and back). Interpolating density or other key physical quantities without accounting for property preservation may lead to negative values that are nonphysical and result in inaccurate representations and/or interpretations of the physical data. Property-preserving interpolation is straightforward when used in the context of low-order numerical simulation methods. High-order property-preserving interpolation is, however, nontrivial, especially in the case where the interpolation points are not equispaced. In this paper, we demonstrate that it is possible to construct high-order interpolation methods that ensure either data boundedness or constrained positivity preservation. A novel feature of the algorithm is that the positivity-preserving interpolant is constrained; that is, the amount by which it exceeds the data values may be strictly controlled. The algorithm we have developed comes with theoretical estimates that provide sufficient conditions for data boundedness and constrained positivity preservation. We demonstrate the application of our algorithm on a collection of 1D and 2D numerical examples, and show that in all cases property preservation is respected.


2021


M. Berzins. “Symplectic Time Integration Methods for the Material Point Method, Experiments, Analysis and Order Reduction,” In WCCM-ECCOMAS2020 virtual Conference, January, 2021.

ABSTRACT

The provision of appropriate time integration methods for the Material Point Method (MPM) involves considering stability, accuracy and energy conservation. A class of methods that addresses many of these issues are the widely-used symplectic time integration methods. Such methods have good conservation properties and have the potential to achieve high accuracy. In this work we build on the work in [5] and consider high order methods for the time integration of the Material Point Method. The results of practical experiments show that while high order methods in both space and time have good accuracy initially, unless the problem has relatively little particle movement then the accuracy of the methods for later time is closer to that of low order methods. A theoretical analysis explains these results as being similar to the stage error found in Runge Kutta methods, though in this case the stage error arises from the MPM differentiations and interpolations from particles to grid and back again, particularly in cases in which there are many grid crossings.



M. Berzins. “Time Stepping with Space and Time Errors and Stability of the Material Point Method,” In Proceedings of VII International Conference on Particle-Based Methods, PARTICLES 2021, Edited by P. Wriggers, M. Bischoff, E. Onate, M. Bischoff, A. Duster & T. Zohdi, 2021.

ABSTRACT

The choice of the time step for the Material Point Method (MPM) is often addressed by using a simple stability criterion, such as the speed of sound or a CFL condition. Recently there have been several advances in understanding the stability of MPM. These range from non-linear stability analysis, through to Von Neumann type approaches. While in many instances this works well it is important to understand how this relates to the overall errors present in the method. Although it has been observed that spatial errors may dominate temporal ones at stable time steps, recent work has made more precise the sources and forms of the different MPM errors. This now makes it possible to understand how the different errors and the stability analysis are connected. At the same time this also requires simple computable estimates of the different errors in the material point method. The use of simple estimates of these errors makes it possible to connect some of the errors introduced with the stability criteria used. A number of simple computational experiments are used to illustrate the theoretical results.



A. Dubey, M. Berzins, C. Burstedde, M.l L. Norman, D. Unat, M. Wahib. “Structured Adaptive Mesh Refinement Adaptations to Retain Performance Portability With Increasing Heterogeneity,” In Computing in Science & Engineering, Vol. 23, No. 5, pp. 62-66. 2021.
ISSN: 1521-9615
DOI: 10.1109/MCSE.2021.3099603

ABSTRACT

Adaptive mesh refinement (AMR) is an important method that enables many mesh-based applications to run at effectively higher resolution within limited computing resources by allowing high resolution only where really needed. This advantage comes at a cost, however: greater complexity in the mesh management machinery and challenges with load distribution. With the current trend of increasing heterogeneity in hardware architecture, AMR presents an orthogonal axis of complexity. The usual techniques, such as asynchronous communication and hierarchy management for parallelism and memory that are necessary to obtain reasonable performance are very challenging to reason about with AMR. Different groups working with AMR are bringing different approaches to this challenge. Here, we examine the design choices of several AMR codes and also the degree to which demands placed on them by their users influence these choices.



J. K. Holmen, D. Sahasrabudhe, M. Berzins. “A Heterogeneous MPI+PPL Task Scheduling Approach for Asynchronous Many-Task Runtime Systems,” In Proceedings of the Practice and Experience in Advanced Research Computing 2021 on Sustainability, Success and Impact (PEARC21), ACM, 2021.

ABSTRACT

Asynchronous many-task runtime systems and MPI+X hybrid parallelism approaches have shown promise for helping manage the increasing complexity of nodes in current and emerging high performance computing (HPC) systems, including those for exascale. The increasing architectural diversity, however, poses challenges for large legacy runtime systems emphasizing broad support for major HPC systems. Performance portability layers (PPL) have shown promise for helping manage this diversity. This paper describes a heterogeneous MPI+PPL task scheduling approach for combining these promising solutions with additional consideration for parallel third party libraries facing similar challenges to help prepare such a runtime for the diverse heterogeneous systems accompanying exascale computing. This approach is demonstrated using a heterogeneous MPI+Kokkos task scheduler and the accompanying portable abstractions [15] implemented in the Uintah Computational Framework, an asynchronous many-task runtime system, with additional consideration for hypre, a parallel third party library. Results are shown for two challenging problems executing workloads representative of typical Uintah applications. These results show performance improvements up to 4.4x when using this scheduler and the accompanying portable abstractions [15] to port a previously MPI-Only problem to Kokkos::OpenMP and Kokkos::CUDA to improve multi-socket, multi-device node use. Good strong-scaling to 1,024 NVIDIA V100 GPUs and 512 IBM POWER9 processor are also shown using MPI+Kokkos::OpenMP+Kokkos::CUDA at scale.



J. K. Holmen, D. Sahasrabudhe, M. Berzins, A. Bardakoff, T. J. Blattner, . Keyrouz. “Uintah+Hedgehog: Combining Parallelism Models for End-to-End Large-Scale Simulation Performance,” Scientific Computing and Imaging Institute, 2021.

ABSTRACT

The complexity of heterogeneous nodes near and at exascale has increased the need for “heroic” programming efforts. To accommodate this complexity, significant investment is required for codes not yet optimizing for low-level architecture features (e.g., wide vector units) and/or running at large-scale. This paper describes ongoing efforts to combine two codes, Hedgehog and Uintah, lying at both extremes to ease programming efforts. The end goals of this effort are (1) to combine the two codes to make an asynchronous many-task runtime system specializing in both node-level and large-scale performance and (2) to further improve the accessibility of both with portable abstractions. A prototype adopting Hedgehog in Uintah and a prototype extending Hedgehog to support MPI+X hybrid parallelism are discussed. Results achieving ∼60% of NVIDIA V100 GPU peak performance for a distributed DGEMM problem are shown for a naive MPI+Hedgehog implementation before any attempt to optimize for performance.

Authors note: This is a refereed but unpublished report that was
submitted to, reviewed for and accepted in revised form for a presentation of the same material at the Hipar Workshop at Supercomputing 21



W. T. Sołowski, M. Berzins, W. Coombs, J. Guilkey, M. Möller, Q. A. Tran, T. Adibaskoro, S. Seyedan, R. Tielen, K. Soga. “Material point method: Overview and challenges ahead (with videos),” In Advances in Applied Mechanics, 1, Vol. 14, Ch. 2, Elsevier, pp. 113-204. 2021.
ISBN: 978-0-323-88519-5

ABSTRACT

The paper gives an overview of Material Point Method and shows its evolution over the last 25 years. The Material Point Method developments followed a logical order. The article aims at identifying this order and show not only the current state of the art, but explain the drivers behind the developments and identify what is currently still missing.The paper explores modern implementations of both explicit and implicit Material Point Method. It concentrates mainly on uses of the method in engineering, but also gives a short overview of Material Point Method application in computer graphics and animation. Furthermore, the article gives overview of errors in the material point method algorithms, as well as identify gaps in knowledge, filling which would hopefully lead to a much more efficient and accurate Material Point Method. The paper also briefly discusses algorithms related to contact and boundaries, coupling the Material Point Method with other numerical methods and modeling of fractures. It also gives an overview of modeling of multi-phase continua with Material Point Method. The paper closes with numerical examples, aiming at showing the capabilities of Material Point Method in advanced simulations. Those include landslide modeling, multiphysics simulation of shaped charge explosion and simulations of granular material flow out of a silo undergoing changes from continuous to discontinuous and back to continuous behavior.The paper uniquely illustrates many of the developments not only with figures but also with videos, giving the whole extend of simulation instead of just a timestamped image



W. T. Sołowski, M. Berzins, W. Coombs, J. Guilkey, M. Möller, Q. A. Tran, T. Adibaskoro, S. Seyedan, R. Tielen, K. Soga. “Material point method: Overview and challenges ahead (without videos),” In Advances in Applied Mechanics, 1, Vol. 14, Ch. 2, Elsevier, pp. 113-204. 2021.

ABSTRACT

The paper gives an overview of Material Point Method and shows its evolution over the last 25 years. The Material Point Method developments followed a logical order. The article aims at identifying this order and show not only the current state of the art, but explain the drivers behind the developments and identify what is currently still missing.The paper explores modern implementations of both explicit and implicit Material Point Method. It concentrates mainly on uses of the method in engineering, but also gives a short overview of Material Point Method application in computer graphics and animation. Furthermore, the article gives overview of errors in the material point method algorithms, as well as identify gaps in knowledge, filling which would hopefully lead to a much more efficient and accurate Material Point Method. The paper also briefly discusses algorithms related to contact and boundaries, coupling the Material Point Method with other numerical methods and modeling of fractures. It also gives an overview of modeling of multi-phase continua with Material Point Method. The paper closes with numerical examples, aiming at showing the capabilities of Material Point Method in advanced simulations. Those include landslide modeling, multiphysics simulation of shaped charge explosion and simulations of granular material flow out of a silo undergoing changes from continuous to discontinuous and back to continuous behavior.The paper uniquely illustrates many of the developments not only with figures but also with videos, giving the whole extend of simulation instead of just a timestamped image



R. Zambre, D. Sahasrabudhe, H. Zhou, M. Berzins, A. Chandramowlishwaran, P. Balaji. “Logically Parallel Communication for Fast MPI+Threads Communication,” In Proceedings of the Transactions on Parallel and Distributed Computing, IEEE, April, 2021.

ABSTRACT

Supercomputing applications are increasingly adopting the MPI+threads programming model over the traditional “MPI everywhere” approach to better handle the disproportionate increase in the number of cores compared with other on-node resources. In practice, however, most applications observe a slower performance with MPI+threads primarily because of poor communication performance. Recent research efforts on MPI libraries address this bottleneck by mapping logically parallel communication, that is, operations that are not subject to MPI’s ordering constraints to the underlying network parallelism. Domain scientists, however, typically do not expose such communication independence information because the existing MPI-3.1 standard’s semantics can be limiting. Researchers had initially proposed user-visible endpoints to combat this issue, but such a solution requires intrusive changes to the standard (new APIs). The upcoming MPI-4.0 standard, on the other hand, allows applications to relax unneeded semantics and provides them with many opportunities to express logical communication parallelism. In this paper, we show how MPI+threads applications can achieve high performance with logically parallel communication. Through application case studies, we compare the capabilities of the new MPI-4.0 standard with those of the existing one and user-visible endpoints (upper bound). Logical communication parallelism can boost the overall performance of an application by over 2x.


2020


T. A. J. Ouermi, R. M. Kirby, M. Berzins. “Numerical Testing of a New Positivity-Preserving Interpolation Algorithm,” Subtitled “arXiv,” 2020.

ABSTRACT

An important component of a number of computational modeling algorithms is an interpolation method that preserves the positivity of the function being interpolated. This report describes the numerical testing of a new positivity-preserving algorithm that is designed to be used when interpolating from a solution defined on one grid to different spatial grid. The motivating application is a numerical weather prediction (NWP) code that uses spectral elements as the discretization choice for its dynamics core and Cartesian product meshes for the evaluation of its physics routines. This combination of spectral elements, which use nonuniformly spaced quadrature/collocation points, and uniformly-spaced Cartesian meshes combined with the desire to maintain positivity when moving between these necessitates our work. This new approach is evaluated against several typical algorithms in use on a range of test problems in one or more space dimensions. The results obtained show that the new method is competitive in terms of observed accuracy while at the same time preserving the underlying positivity of the functions being interpolated.



D. Sahasrabudhe, M. Berzins. “Improving Performance of the Hypre Iterative Solver for Uintah Combustion Codes on Manycore Architectures Using MPI Endpoints and Kernel Consolidation,” In Computational Science -- ICCS 2020, 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part I, Springer International Publishing, pp. 175--190. 2020.
ISBN: 978-3-030-50371-0

ABSTRACT

The solution of large-scale combustion problems with codes such as the Arches component of Uintah on next generation computer architectures requires the use of a many and multi-core threaded approach and/or GPUs to achieve performance. Such codes often use a low-Mach number approximation, that require the iterative solution of a large system of linear equations at every time step. While the discretization routines in such a code can be improved by the use of, say, OpenMP or Cuda Approaches, it is important that the linear solver be able to perform well too. For Uintah the Hypre iterative solver has proved to solve such systems in a scalable way. The use of Hypre with OpenMP leads to at least 2x slowdowns due to OpenMP overheads, however. This behavior is analyzed and a solution proposed by using the MPI Endpoints approach is implemented within Hypre, where each team of threads acts as a different MPI rank. This approach minimized OpenMP synchronization overhead, avoided slowdowns, performed as fast or (up to 1.5x) faster than Hypre’s MPI only version, and allowed the rest of Uintah to be optimized using OpenMP. Profiling of the GPU version of Hypre showed the bottleneck to be the launch overhead of thousands of micro-kernels. The GPU performance was improved by fusing these micro kernels and was further optimized by using Cuda-aware MPI. The overall speedup of 1.26x to 1.44x was observed compared to the baseline GPU implementation.



D. Sahasrabudhe, R. Zambre, A. Chandramowlishwaran, M. Berzins. “Optimizing the Hypre solver for manycore and GPU architectures,” In Journal of Computational Science, Springer International Publishing, pp. 101279. 2020.
ISBN: 978-3-030-50371-0
ISSN: 1877-7503
DOI: https://doi.org/10.1016/j.jocs.2020.101279

ABSTRACT

The solution of large-scale combustion problems with codes such as Uintah on modern computer architectures requires the use of multithreading and GPUs to achieve performance. Uintah uses a low-Mach number approximation that requires iteratively solving a large system of linear equations. The Hypre iterative solver has solved such systems in a scalable way for Uintah, but the use of OpenMP with Hypre leads to at least 2x slowdown due to OpenMP overheads. The proposed solution uses the MPI Endpoints within Hypre, where each team of threads acts as a different MPI rank. This approach minimizes OpenMP synchronization overhead and performs as fast or (up to 1.44x) faster than Hypre’s MPI-only version, and allows the rest of Uintah to be optimized using OpenMP. The profiling of the GPU version of Hypre shows the bottleneck to be the launch overhead of thousands of micro-kernels. The GPU performance was improved by fusing these micro-kernels and was further optimized by using Cuda-aware MPI, resulting in an overall speedup of 1.16–1.44x compared to the baseline GPU implementation.

The above optimization strategies were published in the International Conference on Computational Science 2020. This work extends the previously published research by carrying out the second phase of communication-centered optimizations in Hypre to improve its scalability on large-scale supercomputers. This includes an efficient non-blocking inter-thread communication scheme, communication-reducing patch assignment, and expression of logical communication parallelism to a new version of the MPICH library that utilizes the underlying network parallelism. The above optimizations avoid communication bottlenecks previously observed during strong scaling and improve performance by up to 2x on 256 nodes of Intel Knight’s Landing processor.


2019


M. Berzins. “Time Integration Errors and Energy Conservation Properties of the Stormer Verlet Method Applied to MPM,” In Proceedings of VI International Conference on Particle-based Methods – Fundamentals and Applications, Barcelona, Edited by E. O ̃ nate, M. Bischoff, D.R.J. Owen, P. Wriggers & T. Zohdi, PARTICLES 2019, pp. 555-566. October, 2019.
ISBN: 978-84-121101-1-1

ABSTRACT

The success of the Material Point Method (MPM) in solving many challenging problems nevertheless raises some open questions regarding the fundamental properties of the method such as the energy conservation since being addressed by Bardenhagen and by Love and Sulsky. Similarly while low order symplectic time integration techniques are used with MPM, higher order methods have not been used. For this reason the Stormer Verlet method, a popular and widely-used symplectic method is applied to MPM. Both the time integration error and the energy conservation properties of this method applied to MPM are considered. The method is shown to have locally third order accuracy of energy conservation in time. This is in contrast to the locally second order accuracy in energy conservation of the methods that are used in many MPM calculations. This third accuracy accuracy is demonstrated both locally and globally on a standard MPM test example.



J. K. Holmen, B. Peterson, A. Humphrey, D. Sunderland, O. H. Diaz-Ibarra, J. N. Thornock, M. Berzins. “Portably Improving Uintah's Readiness for Exascale Systems Through the Use of Kokkos,” SCI Institute, 2019.

ABSTRACT

Uncertainty and diversity in future HPC systems, including those for exascale, makes portable codebases desirable. To ease future ports, the Uintah Computational Framework has adopted the Kokkos C++ Performance Portability Library. This paper describes infrastructure advancements and performance improvements using partitioning functionality recently added to Kokkos within Uintah's MPI+Kokkos hybrid parallelism approach. Results are presented for two challenging calculations that have been refactored to support Kokkos::OpenMP and Kokkos::Cuda back-ends. These results demonstrate performance improvements up to (i) 2.66x when refactoring for portability, (ii) 81.59x when adding loop-level parallelism via Kokkos back-ends, and (iii) 2.63x when more eciently using a node. Good strong-scaling characteristics to 442,368 threads across 1728 Knights Landing processors are also shown. These improvements have been achieved with little added overhead (sub-millisecond, consuming up to 0.18% of per-timestep time). Kokkos adoption and refactoring lessons are also discussed.



J. K. Holmen, B. Peterson, M. Berzins. “An Approach for Indirectly Adopting a Performance Portability Layer in Large Legacy Codes,” In 2nd International Workshop on Performance, Portability, and Productivity in HPC (P3HPC), In conjunction with SC19, 2019.

ABSTRACT

Diversity among supported architectures in current and emerging high performance computing systems, including those for exascale, makes portable codebases desirable. Portability of a codebase can be improved using a performance portability layer to provide access to multiple underlying programming models through a single interface. Direct adoption of a performance portability layer, however, poses challenges for large pre-existing software frameworks that may need to preserve legacy code and/or adopt other programming models in the future. This paper describes an approach for indirect adoption that introduces a framework-specific portability layer between the application developer and the adopted performance portability layer to help improve legacy code support and long-term portability for future architectures and programming models. This intermediate layer uses loop-level, application-level, and build-level components to ease adoption of a performance portability layer in large legacy codebases. Results are shown for two challenging case studies using this approach to make portable use of OpenMP and CUDA via Kokkos in an asynchronous many-task runtime system, Uintah. These results show performance improvements up to 2.7x when refactoring for portability and 2.6x when more efficiently using a node. Good strong-scaling to 442,368 threads across 1,728 Knights Landing processors are also shown using MPI+Kokkos at scale.



A. Humphrey, M. Berzins. “An Evaluation of An Asynchronous Task Based Dataflow Approach For Uintah,” In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Vol. 2, pp. 652-657. July, 2019.
ISSN: 0730-3157
DOI: 10.1109/COMPSAC.2019.10282

ABSTRACT

The challenge of running complex physics code on the largest computers available has led to dataflow paradigms being explored. While such approaches are often applied at smaller scales, the challenge of extreme-scale data flow computing remains. The Uintah dataflow framework has consistently used dataflow computing at the largest scales on complex physics applications. At present Uintah contains two main dataflow models. Both are based upon asynchronous communication. One uses a static graph-based approach with asynchronous communication and the other uses a more dynamic approach that was introduced almost a decade ago. Subsequent changes within the Uintah runtime system combined with many more large scale experiments, has necessitated a reevaluation of these two approaches, comparing them in the context of large scale problems. While the static approach has worked well for some large-scale simulations, the dynamic approach is seen to offer performance improvements over the static case for a challenging fluid-structure interaction problem at large scale that involves fluid flow and a moving solid represented using particle method on an adaptive mesh.



D. Sahasrabudhe, M. Berzins, J. Schmidt. “Node failure resiliency for Uintah without checkpointing,” In Concurrency and Computation: Practice and Experience, pp. e5340. 2019.
DOI: doi:10.1002/cpe.5340

ABSTRACT

The frequency of failures in upcoming exascale supercomputers may well be greater than at present due to many-core architectures if component failure rates remain unchanged. This potential increase in failure frequency coupled with I/O challenges at exascale may prove problematic for current resiliency approaches such as checkpoint restarting, although the use of fast intermediate memory may help. Algorithm-Based Fault Tolerance (ABFT) using Adaptive Mesh Refinement (AMR) is one resiliency approach used to address these challenges. For adaptive mesh codes, a coarse mesh version of the solution may be used to restore the fine mesh solution. This paper addresses the implementation of the ABFT approach within the Uintah software framework: both at a software level within Uintah and in the data reconstruction method used for the recovery of lost data. This method has two problems: inaccuracies introduced during the reconstruction propagate forward in time, and the physical consistency of variables such as positivity or boundedness may be violated during interpolation. These challenges can be addressed by the combination of two techniques: 1. a fault-tolerant MPI implementation to recover from runtime node failures, and 2. high-order interpolation schemes to preserve the physical solution and reconstruct lost data. The approach considered here uses a "Limited Essentially Non-Oscillatory" (LENO) scheme along with AMR to rebuild the lost data without checkpointing using Uintah. Experiments were carried out using a fault-tolerant MPI - ULFM to recover from runtime failure, and LENO to recover data on patches belonging to failed ranks, while the simulation was continued to the end. Results show that this ABFT approach is up to 10x faster than the traditional checkpointing method. The new interpolation approach is more accurate than linear interpolation and not subject to the overshoots found in other interpolation methods.