Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.
Deep brain stimulation
BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).
Developing software tools for science has always been a central vision of the SCI Institute.

SCI Publications

2022


J. Adams, N. Khan, A. Morris, S. Elhabian. “Spatiotemporal Cardiac Statistical Shape Modeling: A Data-Driven Approach,” Subtitled “arXiv preprint arXiv:2209.02736,” 2022.

ABSTRACT

Clinical investigations of anatomy’s structural changes over time could greatly benefit from population-level quantification of shape, or spatiotemporal statistic shape modeling (SSM). Such a tool enables characterizing patient organ cycles or disease progression in relation to a cohort of interest. Constructing shape models requires establishing a quantitative shape representation (e.g., corresponding landmarks). Particle-based shape modeling (PSM) is a data-driven SSM approach that captures population-level shape variations by optimizing landmark placement. However, it assumes cross-sectional study designs and hence has limited statistical power in representing shape changes over time. Existing methods for modeling spatiotemporal or longitudinal shape changes require predefined shape atlases and pre-built shape models that are typically constructed cross-sectionally. This paper proposes a data-driven approach inspired by the PSM method to learn population-level spatiotemporal shape changes directly from shape data. We introduce a novel SSM optimization scheme that produces landmarks that are in correspondence both across the population (inter-subject) and across time-series (intra-subject). We apply the proposed method to 4D cardiac data from atrial-fibrillation patients and demonstrate its efficacy in representing the dynamic change of the left atrium. Furthermore, we show that our method outperforms an image-based approach for spatiotemporal SSM with respect to a generative time-series model, the Linear Dynamical System (LDS). LDS fit using a spatiotemporal shape model optimized via our approach provides better generalization and specificity, indicating it accurately captures the underlying time-dependency.



M. Alirezaei, T. Tasdizen. “Adversarially Robust Classification by Conditional Generative Model Inversion,” Subtitled “arXiv preprint arXiv:2201.04733,” 2022.

ABSTRACT

Most adversarial attack defense methods rely on obfuscating gradients. These methods are successful in defending against gradient-based attacks; however, they are easily circumvented by attacks which either do not use the gradient or by attacks which approximate and use the corrected gradient. Defenses that do not obfuscate gradients such as adversarial training exist, but these approaches generally make assumptions about the attack such as its magnitude. We propose a classification model that does not obfuscate gradients and is robust by construction without assuming prior knowledge about the attack. Our method casts classification as an optimization problem where we "invert" a conditional generator trained on unperturbed, natural images to find the class that generates the closest sample to the query image. We hypothesize that a potential source of brittleness against adversarial attacks is the high-to-low-dimensional nature of feed-forward classifiers which allows an adversary to find small perturbations in the input space that lead to large changes in the output space. On the other hand, a generative model is typically a low-to-high-dimensional mapping. While the method is related to Defense-GAN, the use of a conditional generative model and inversion in our model instead of the feed-forward classifier is a critical difference. Unlike Defense-GAN, which was shown to generate obfuscated gradients that are easily circumvented, we show that our method does not obfuscate gradients. We demonstrate that our model is extremely robust against black-box attacks and has improved robustness against white-box attacks compared to naturally trained, feed-forward classifiers.



T. M. Athawale, D. Maljovec. L. Yan, C. R. Johnson, V. Pascucci, B. Wang. “Uncertainty Visualization of 2D Morse Complex Ensembles Using Statistical Summary Maps,” In IEEE Transactions on Visualization and Computer Graphics, Vol. 28, No. 4, pp. 1955-1966. April, 2022.
ISSN: 1077-2626
DOI: 10.1109/TVCG.2020.3022359

ABSTRACT

Morse complexes are gradient-based topological descriptors with close connections to Morse theory. They are widely applicable in scientific visualization as they serve as important abstractions for gaining insights into the topology of scalar fields. Data uncertainty inherent to scalar fields due to randomness in their acquisition and processing, however, limits our understanding of Morse complexes as structural abstractions. We, therefore, explore uncertainty visualization of an ensemble of 2D Morse complexes that arises from scalar fields coupled with data uncertainty. We propose several statistical summary maps as new entities for quantifying structural variations and visualizing positional uncertainties of Morse complexes in ensembles. Specifically, we introduce three types of statistical summary maps – the probabilistic map , the significance map , and the survival map – to characterize the uncertain behaviors of gradient flows. We demonstrate the utility of our proposed approach using wind, flow, and ocean eddy simulation datasets.



J. Baker, E. Cherkaev, A. Narayan, B. Wang. “Learning POD of Complex Dynamics Using Heavy-ball Neural ODEs,” Subtitled “arXiv:2202.12373,” 2022.

ABSTRACT

Proper orthogonal decomposition (POD) allows reduced-order modeling of complex dynamical systems at a substantial level, while maintaining a high degree of accuracy in modeling the underlying dynamical systems. Advances in machine learning algorithms enable learning POD-based dynamics from data and making accurate and fast predictions of dynamical systems. In this paper, we leverage the recently proposed heavy-ball neural ODEs (HBNODEs) [Xia et al. NeurIPS, 2021] for learning data-driven reduced-order models (ROMs) in the POD context, in particular, for learning dynamics of time-varying coefficients generated by the POD analysis on training snapshots generated from solving full order models. HBNODE enjoys several practical advantages for learning POD-based ROMs with theoretical guarantees, including 1) HBNODE can learn long-term dependencies effectively from sequential observations and 2) HBNODE is computationally efficient in both training and testing. We compare HBNODE with other popular ROMs on several complex dynamical systems, including the von Kármán Street flow, the Kurganov-Petrova-Popov equation, and the one-dimensional Euler equations for fluids modeling.



J. Baker, H. Xia, Y. Wang, E. Cherkaev, A. Narayan, L. Chen, J. Xin, A. L. Bertozzi, S. J. Osher, B. Wang. “Proximal Implicit ODE Solvers for Accelerating Learning Neural ODEs,” Subtitled “arXiv preprint arXiv:2204.08621,” 2022.

ABSTRACT

Learning neural ODEs often requires solving very stiff ODE systems, primarily using explicit adaptive step size ODE solvers. These solvers are computationally expensive, requiring the use of tiny step sizes for numerical stability and accuracy guarantees. This paper considers learning neural ODEs using implicit ODE solvers of different orders leveraging proximal operators. The proximal implicit solver consists of inner-outer iterations: the inner iterations approximate each implicit update step using a fast optimization algorithm, and the outer iterations solve the ODE system over time. The proximal implicit ODE solver guarantees superiority over explicit solvers in numerical stability and computational efficiency. We validate the advantages of proximal implicit solvers over existing popular neural ODE solvers on various challenging benchmark tasks, including learning continuous-depth graph neural networks and continuous normalizing flows.



W. Bangerth, C. R. Johnson, D. K. Njeru, B. van Bloemen Waanders. “Estimating and using information in inverse problems,” Subtitled “arXiv:2208.09095,” 2022.

ABSTRACT

For inverse problems one attempts to infer spatially variable functions from indirect measurements of a system. To practitioners of inverse problems, the concept of ``information'' is familiar when discussing key questions such as which parts of the function can be inferred accurately and which cannot. For example, it is generally understood that we can identify system parameters accurately only close to detectors, or along ray paths between sources and detectors, because we have ``the most information'' for these places.

Although referenced in many publications, the ``information'' that is invoked in such contexts is not a well understood and clearly defined quantity. Herein, we present a definition of information density that is based on the variance of coefficients as derived from a Bayesian reformulation of the inverse problem. We then discuss three areas in which this information density can be useful in practical algorithms for the solution of inverse problems, and illustrate the usefulness in one of these areas -- how to choose the discretization mesh for the function to be reconstructed -- using numerical experiments.



J. A. Bergquist, J. Coll-Font, B. Zenger, L. C. Rupp, W. W. Good, D. H. Brooks, R. S. MacLeod. “Reconstruction of cardiac position using body surface potentials,” In Computers in Biology and Medicine, Vol. 142, pp. 105174. 2022.
DOI: https://doi.org/10.1016/j.compbiomed.2021.105174

ABSTRACT

Electrocardiographic imaging (ECGI) is a noninvasive technique to assess the bioelectric activity of the heart which has been applied to aid in clinical diagnosis and management of cardiac dysfunction. ECGI is built on mathematical models that take into account several patient specific factors including the position of the heart within the torso. Errors in the localization of the heart within the torso, as might arise due to natural changes in heart position from respiration or changes in body position, contribute to errors in ECGI reconstructions of the cardiac activity, thereby reducing the clinical utility of ECGI. In this study we present a novel method for the reconstruction of cardiac geometry utilizing noninvasively acquired body surface potential measurements. Our geometric correction method simultaneously estimates the cardiac position over a series of heartbeats by leveraging an iterative approach which alternates between estimating the cardiac bioelectric source across all heartbeats and then estimating cardiac positions for each heartbeat. We demonstrate that our geometric correction method is able to reduce geometric error and improve ECGI accuracy in a wide range of testing scenarios. We examine the performance of our geometric correction method using different activation sequences, ranges of cardiac motion, and body surface electrode configurations. We find that after geometric correction resulting ECGI solution accuracy is improved and variability of the ECGI solutions between heartbeats is substantially reduced.



M. Berzins. “Energy conservation and accuracy of some MPM formulations,” In Computational Particle Mechanics, 2022.
DOI: 10.1007/s40571-021-00457-3

ABSTRACT

The success of the Material Point Method (MPM) in solving many challenging problems nevertheless raises some open questions regarding the fundamental properties of the method such as time integration accuracy and energy conservation. The traditional MPM time integration methods are often based upon the symplectic Euler method or staggered central differences. This raises the question of how to best ensure energy conservation in explicit time integration for MPM. Two approaches are used here, one is to extend the Symplectic Euler method (Cromer Euler) to provide better energy conservation and the second is to use a potentially more accurate symplectic methods, namely the widely-used Stormer-Verlet Method. The Stormer-Verlet method is shown to have locally third order time accuracy of energy conservation in time, in contrast to the second order accuracy in energy conservation of the symplectic Euler methods that are used in many MPM calculations. It is shown that there is an extension to the Symplectic Euler stress-last method that provides better energy conservation that is comparable with the Stormer-Verlet method. This extension is referred to as TRGIMP and also has third order accuracy in energy conservation. When the interactions between space and time errors are studied it is seen that spatial errors may dominate in computed quantities such as displacement and velocity. This connection between the local errors in space and time is made explicit mathematically and explains the observed results that displacement and velocity errors are very similar for both methods. The observed and theoretically predicted third-order energy conservation accuracy and computational costs are demonstrated on a standard MPM test example.



M. Berzins. “Computational Error Estimation for The Material Point Method,” 2022.

ABSTRACT

A common feature of many methods in computational mechanics is that there is often a way of estimating the error in the computed solution. The situation for computational mechanics codes based upon the Material Point Method is very different in that there has been comparatively little work on computable error estimates for these methods. This work is concerned with introducing such an approach for the Material Point Method. Although it has been observed that spatial errors may dominate temporal ones at stable time steps, recent work has made more precise the sources and forms of the different MPM errors. There is then a need to estimate these errors computationally through computable estimates of the different errors in the material point method. Estimates of the different spatial errors in the Material Point Method are constructed based upon nodal derivatives of the different physical variables in MPM. These derivatives are then estimated using standard difference approximations calculated on the background mesh. The use of these estimates of the spatial error makes it possible to measure the growth of errors over time. A number of computational experiments are used to illustrate the performance of the computed error estimates. As the key feature of the approach is the calculation of derivatives on the regularly spaced background mesh, the extension to calculating derivatives and hence to error estimates for higher dimensional problems is clearly possible.



J.D. Blum, J. Beiriger, C. Kalmar, R.A. Avery, S. Lang, D.F. Villavisanis, L. Cheung, D.Y. Cho, W. Tao, R. Whitaker, S.P. Bartlett, J.A. Taylor, J.A. Goldstein, J.W. Swanson. “Relating Metopic Craniosynostosis Severity to Intracranial Pressure,” In The Journal of Craniofacial Surgery, 2022.
DOI: 10.1097/SCS.0000000000008748

ABSTRACT

Purpose:

A subset of patients with metopic craniosynostosis are noted to have elevated intracranial pressure (ICP). However, it is not known if the propensity for elevated ICP is influenced by the severity of metopic cranial dysmorphology.

Methods:

Children with nonsyndromic single-suture metopic synostosis were prospectively enrolled and underwent optical coherence tomography to measure optic nerve head morphology. Preoperative head computed tomography scans were assessed for endocranial bifrontal angle as well as scaled metopic synostosis severity score (MSS) and cranial morphology deviation score determined by CranioRate, an automated severity classifier.
Results:

Forty-seven subjects were enrolled between 2014 and 2019, at an average age of 8.5 months at preoperative computed tomography and 11.8 months at index procedure. Fourteen patients (29.7%) had elevated optical coherence tomography parameters suggestive of elevated ICP at the time of surgery. Ten patients (21.3%) had been diagnosed with developmental delay, eight of whom demonstrated elevated ICP. There were no significant associations between measures of metopic severity and ICP. Metopic synostosis severity score and endocranial bifrontal angle were inversely correlated, as expected (r=−0.545, P<0.001). A negative correlation was noted between MSS and formally diagnosed developmental delay (r=−0.387, P=0.008). Likewise, negative correlations between age at procedure and both MSS and cranial morphology deviation was observed (r=−0.573, P<0.001 and r=−0.312, P=0.025, respectively).
Conclusions:

Increased metopic severity was not associated with elevated ICP at the time of surgery. Patients who underwent later surgical correction showed milder phenotypic dysmorphology with an increased incidence of developmental delay.



M. K. Bruce, W. Tao, J. Beiriger, C. Christensen, M. J. Pfaff, R. Whitaker, J. A. Goldstein. “3D Photography to Quantify the Severity of Metopic Craniosynostosis,” In The Cleft Palate-Craniofacial Journal, SAGE Publications, 2022.

ABSTRACT

Objective

This study aims to determine the utility of 3D photography for evaluating the severity of metopic craniosynostosis (MCS) using a validated, supervised machine learning (ML) algorithm.

Design/Setting/Patients

This single-center retrospective cohort study included patients who were evaluated at our tertiary care center for MCS from 2016 to 2020 and underwent both head CT and 3D photography within a 2-month period.
Main Outcome Measures

The analysis method builds on our previously established ML algorithm for evaluating MCS severity using skull shape from CT scans. In this study, we regress the model to analyze 3D photographs and correlate the severity scores from both imaging modalities.
Results

14 patients met inclusion criteria, 64.3% male (n = 9). The mean age in years at 3D photography and CT imaging was 0.97 and 0.94, respectively. Ten patient images were obtained preoperatively, and 4 patients did not require surgery. The severity prediction of the ML algorithm correlates closely when comparing the 3D photographs to CT bone data (Spearman correlation coefficient [SCC] r = 0.75; Pearson correlation coefficient [PCC] r = 0.82).

Conclusion

The results of this study show that 3D photography is a valid alternative to CT for evaluation of head shape in MCS. Its use will provide an objective, quantifiable means of assessing outcomes in a rigorous manner while decreasing radiation exposure in this patient population.



N. Cheng, O.A. Malik, Y. Xu, S. Becker, A. Doostan, A. Narayan. “Quadrature Sampling of Parametric Models with Bi-fidelity Boosting,” Subtitled “arXiv:2209.05705v1,” 2022.

ABSTRACT

Least squares regression is a ubiquitous tool for building emulators (a.k.a. surrogate models) of problems across science and engineering for purposes such as design space exploration and uncertainty quantification. When the regression data are generated using an experimental design process (e.g., a quadrature grid) involving computationally expensive models, or when the data size is large, sketching techniques have shown promise to reduce the cost of the construction of the regression model while ensuring accuracy comparable to that of the full data. However, random sketching strategies, such as those based on leverage scores, lead to regression errors that are random and may exhibit large variability. To mitigate this issue, we present a novel boosting approach that leverages cheaper, lower-fidelity data of the problem at hand to identify the best sketch among a set of candidate sketches. This in turn specifies the sketch of the intended high-fidelity model and the associated data. We provide theoretical analyses of this bi-fidelity boosting (BFB) approach and discuss the conditions the low- and high-fidelity data must satisfy for a successful boosting. In doing so, we derive a bound on the residual norm of the BFB sketched solution relating it to its ideal, but computationally expensive, high-fidelity boosted counterpart. Empirical results on both manufactured and PDE data corroborate the theoretical analyses and illustrate the efficacy of the BFB solution in reducing the regression error, as compared to the non-boosted solution.



H. Dai, M. Bauer, P.T. Fletcher, S.C. Joshi. “Deep Learning the Shape of the Brain Connectome,” Subtitled “arXiv preprint arXiv:2203.06122, 2022,” 2022.

ABSTRACT

To statistically study the variability and differences between normal and abnormal brain connectomes, a mathematical model of the neural connections is required. In this paper, we represent the brain connectome as a Riemannian manifold, which allows us to model neural connections as geodesics. We show for the first time how one can leverage deep neural networks to estimate a Riemannian metric of the brain that can accommodate fiber crossings and is a natural modeling tool to infer the shape of the brain from DWMRI. Our method achieves excellent performance in geodesic-white-matter-pathway alignment and tackles the long-standing issue in previous methods: the inability to recover the crossing fibers with high fidelity.



M. Dorier, Z. Wang, U. Ayachit, S. Snyder, R. Ross, M. Parashar. “Colza: Enabling Elastic In Situ Visualization for High-performance Computing Simulations,” In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 538-548. 2022.
DOI: 10.1109/IPDPS53621.2022.00059

ABSTRACT

In situ analysis and visualization have grown increasingly popular for enabling direct access to data from high-performance computing (HPC) simulations. As a simulation progresses and interesting physical phenomena emerge, however, the data produced may become increasingly complex, and users may need to dynamically change the type and scale of in situ analysis tasks being carried out and consequently adapt the amount of resources allocated to such tasks. To date, none of the production in situ analysis frameworks offer such an elasticity feature, and for good reason: the assumption that the number of processes could vary during run time would force developers to rethink software and algorithms at every level of the in situ analysis stack. In this paper we present Colza, a data staging service with elastic in situ visualization capabilities. Colza relies on the widely used ParaView Catalyst in situ visualization framework and enables elasticity by replacing MPI with a custom collective communication library based on the Mochi suite of libraries. To the best of our knowledge, this work is the first to enable elastic in situ visualization capabilities for HPC applications on top of existing production analysis tools.



S. Fang, A. Narayan, R.M. Kirby, S. Zhe. “Bayesian Continuous-Time Tucker Decomposition,” In Proceedings of the 39 th International Conference on Machine Learning, 2022.

ABSTRACT

Tensor decomposition is a dominant framework for multiway data analysis and prediction. Although practical data often contains timestamps for the observed entries, existing tensor decomposition approaches overlook or under-use this valuable time information. They either drop the timestamps or bin them into crude steps and hence ignore the temporal dynamics within each step or use simple parametric time coefficients. To overcome these limitations, we propose Bayesian Continuous-Time Tucker Decomposition (BCTT). We model the tensor-core of the classical Tucker decomposition as a time-varying function, and place a Gaussian process prior to flexibly estimate all kinds of temporal dynamics. In this way, our model maintains the interpretability while is flexible enough to capture various complex temporal relationships between the tensor nodes. For efficient and high-quality posterior inference, we use the stochastic differential equation (SDE) representation of temporal GPs to build an equivalent state-space prior, which avoids huge kernel matrix computation and sparse/low-rank approximations. We then use Kalman filtering, RTS smoothing, and conditional moment matching to develop a scalable message-passing inference algorithm. We show the advantage of our method in simulation and several real-world applications.



A. Ferrero, B. Knudsen, D. Sirohi, R. Whitaker. “A Pathologist-Informed Workflow for Classification of Prostate Glands in Histopathology,” In Medical Optical Imaging and Virtual Microscopy Image Analysis, Springer Nature Switzerland, pp. 53--62. 2022.
DOI: 10.1007/978-3-031-16961-8_6

ABSTRACT

Pathologists diagnose and grade prostate cancer by examining tissue from needle biopsies on glass slides. The cancer's severity and risk of metastasis are determined by the Gleason grade, a score based on the organization and morphology of prostate cancer glands. For diagnostic work-up, pathologists first locate glands in the whole biopsy core, and---if they detect cancer---they assign a Gleason grade. This time-consuming process is subject to errors and significant inter-observer variability, despite strict diagnostic criteria. This paper proposes an automated workflow that follows pathologists' modus operandi, isolating and classifying multi-scale patches of individual glands in whole slide images (WSI) of biopsy tissues using distinct steps: (1) two fully convolutional networks segment epithelium versus stroma and gland boundaries, respectively; (2) a classifier network separates benign from cancer glands at high magnification; and (3) an additional classifier predicts the grade of each cancer gland at low magnification. Altogether, this process provides a gland-specific approach for prostate cancer grading that we compare against other machine-learning-based grading methods.



M. Grant, M. R. Kunz, K. Iyer, L. I. Held, T. Tasdizen, J. A. Aguiar, P. P. Dholabhai. “Integrating atomistic simulations and machine learning to design multi-principal element alloys with superior elastic modulus,” In Journal of Materials Research, Springer International Publishing, pp. 1--16. 2022.

ABSTRACT

Multi-principal element, high entropy alloys (HEAs) are an emerging class of materials that have found applications across the board. Owing to the multitude of possible candidate alloys, exploration and compositional design of HEAs for targeted applications is challenging since it necessitates a rational approach to identify compositions exhibiting enriched performance. Here, we report an innovative framework that integrates molecular dynamics and machine learning to explore a large chemical-configurational space for evaluating elastic modulus of equiatomic and non-equiatomic HEAs along primary crystallographic directions. Vital thermodynamic properties and machine learning features have been incorporated to establish fundamental relationships correlating Young’s modulus with Gibbs free energy, valence electron concentration, and atomic size difference. In HEAs, as the number of elements increases …



J. Gu, P. Davis, G. Eisenhauer, W. Godoy, A. Huebl, S. Klasky, M. Parashar, N. Podhorszki, F. Poeschel, J. Vay, L. Wan, R. Wang, K. Wu. “Organizing Large Data Sets for Efficient Analyses on HPC Systems,” In Journal of Physics: Conference Series, Vol. 2224, No. 1, IOP Publishing, pp. 012042. 2022.

ABSTRACT

Upcoming exascale applications could introduce significant data management challenges due to their large sizes, dynamic work distribution, and involvement of accelerators such as graphical processing units, GPUs. In this work, we explore the performance of reading and writing operations involving one such scientific application on two different supercomputers. Our tests showed that the Adaptable Input and Output System, ADIOS, was able to achieve speeds over 1TB/s, a significant fraction of the peak I/O performance on Summit. We also demonstrated the querying functionality in ADIOS could effectively support common selective data analysis operations, such as conditional histograms. In tests, this query mechanism was able to reduce the execution time by a factor of five. More importantly, ADIOS data management framework allows us to achieve these performance improvements with only a minimal amount …



M. Han, S. Sane, C. R. Johnson. “Exploratory Lagrangian-Based Particle Tracing Using Deep Learning,” In Journal of Flow Visualization and Image Processing, Begell, 2022.
DOI: 10.1615/JFlowVisImageProc.2022041197

ABSTRACT

Time-varying vector fields produced by computational fluid dynamics simulations are often prohibitively large and pose challenges for accurate interactive analysis and exploration. To address these challenges, reduced Lagrangian representations have been increasingly researched as a means to improve scientific time-varying vector field exploration capabilities. This paper presents a novel deep neural network-based particle tracing method to explore time-varying vector fields represented by Lagrangian flow maps. In our workflow, in situ processing is first utilized to extract Lagrangian flow maps, and deep neural networks then use the extracted data to learn flow field behavior. Using a trained model to predict new particle trajectories offers a fixed small memory footprint and fast inference. To demonstrate and evaluate the proposed method, we perform an in-depth study of performance using a well-known analytical data set, the Double Gyre. Our study considers two flow map extraction strategies, the impact of the number of training samples and integration durations on efficacy, evaluates multiple sampling options for training and testing, and informs hyperparameter settings. Overall, we find our method requires a fixed memory footprint of 10.5 MB to encode a Lagrangian representation of a time-varying vector field while maintaining accuracy. For post hoc analysis, loading the trained model costs only two seconds, significantly reducing the burden of I/O when reading data for visualization. Moreover, our parallel implementation can infer one hundred locations for each of two thousand new pathlines in 1.3 seconds using one NVIDIA Titan RTX GPU.



J.D. Hogue, R.M. Kirby, A. Narayan. “Dimensionality Reduction in Deep Learning via Kronecker Multi-layer Architectures,” Subtitled “arXiv:2204.04273,” 2022.

ABSTRACT

Deep learning using neural networks is an effective technique for generating models of complex data. However, training such models can be expensive when networks have large model capacity resulting from a large number of layers and nodes. For training in such a computationally prohibitive regime, dimensionality reduction techniques ease the computational burden, and allow implementations of more robust networks. We propose a novel type of such dimensionality reduction via a new deep learning architecture based on fast matrix multiplication of a Kronecker product decomposition; in particular our network construction can be viewed as a Kronecker product-induced sparsification of an "extended" fully connected network. Analysis and practical examples show that this architecture allows a neural network to be trained and implemented with a significant reduction in computational time and resources, while achieving a similar error level compared to a traditional feedforward neural network.