Publications

Page 1 of 9

SCI Publications

2025

E. Ghelichkhan, T. Tasdizen. “A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data,” Subtitled “arXiv:2503.01037,” 2025.

ABSTRACT

Chest diseases rank among the most prevalent and dangerous global health issues. Object detection and phrase grounding deep learning models interpret complex radiology data to assist healthcare professionals in diagnosis. Object detection locates abnormalities for classes, while phrase grounding locates abnormalities for textual descriptions. This paper investigates how text enhances abnormality localization in chest X-rays by comparing the performance and explainability of these two tasks. To establish an explainability benchmark, we proposed an automatic pipeline to generate image regions for report sentences using radiologists’ eye-tracking data. The better performance - mIoU = 36% vs. 20% - and explainability - Containment ratio 48% vs. 26% - of the phrase grounding model infers the effectiveness of text in enhancing chest X-ray abnormality localization.

B. Hunt, E. Kwan, J. Bergquist, J. Brundage, B. Orkild, J. Dong, E. Paccione, K. Yazaki, R.S. MacLeod, D. Dosdall, T. Tasdizen, R. Ranjan. “Contrastive Pretraining Improves Deep Learning Classification of Endocardial Electrograms in a Preclinical Model,” In Heart Rhythm O2, Elsevier, 2025.
ISSN: 2666-5018
DOI: https://doi.org/10.1016/j.hroo.2025.01.008

ABSTRACT

Background

Rotors and focal ectopies, or “drivers,” are hypothesized mechanisms of persistent atrial fibrillation (AF). Machine learning algorithms have been employed to identify these drivers, but the limited size of current driver datasets constrains their performance.

Objective

We proposed that pretraining using unsupervised learning on a substantial dataset of unlabeled electrograms could enhance classifier accuracy when applied to a smaller driver dataset.

Methods

We utilized a SimCLR-based framework to pretrain a residual neural network on 113,000 unlabeled 64-electrode measurements from a canine model of AF. The network was then fine-tuned to identify drivers from intra-cardiac electrograms. Various augmentations, including cropping, Gaussian blurring, and rotation, were applied during pretraining to improve the robustness of the learned representations.

Results

Pretraining significantly improved driver detection accuracy compared to a non-pretrained network (80.8% vs. 62.5%). The pretrained network also demonstrated greater resilience to reductions in training dataset size, maintaining higher accuracy even with a 30% reduction in data. Grad-CAM analysis revealed that the network’s attention aligned well with manually annotated driver regions, suggesting that the network learned meaningful features for driver detection.

Conclusion

This study demonstrates that contrastive pretraining can enhance the accuracy of driver detection algorithms in AF. The findings support the broader application of transfer learning to other electrogram-based tasks, potentially improving outcomes in clinical electrophysiology.

2024

J.A. Bergquist, B. Zenger, J. Brundage, R.S. MacLeod, T.J. Bunch, R. Shah, X. Ye, A. Lyons, M. Torre, R. Ranjan, T. Tasdizen, B.A. Steinberg. “Performance of Off-the-Shelf Machine Learning Architectures and Biases in Low Left Ventricular Ejection Fraction Detection,” In Heart Rhythm O2, Vol. 5, No. 9, pp. 644 - 654. 2024.

ABSTRACT

Background

Artificial intelligence–machine learning (AI-ML) has demonstrated the ability to extract clinically useful information from electrocardiograms (ECGs) not available using traditional interpretation methods. There exists an extensive body of AI-ML research in fields outside of cardiology including several open-source AI-ML architectures that can be translated to new problems in an “off-the-shelf” manner.

Objective

We sought to address the limited investigation of which if any of these off-the-shelf architectures could be useful in ECG analysis as well as how and when these AI-ML approaches fail.

Methods

We applied 6 off-the-shelf AI-ML architectures to detect low left ventricular ejection fraction (LVEF) in a cohort of ECGs from 24,868 patients. We assessed LVEF classification and explored patient characteristics associated with inaccurate (false positive or false negative) LVEF prediction.

Results

We found that all of these network architectures produced LVEF detection area under the receiver-operating characteristic curve values above 0.9 (averaged over 5 instances per network), with the ResNet 18 network performing the highest (average area under the receiver-operating characteristic curve of 0.917). We also observed that some patient-specific characteristics such as race, sex, and presence of several comorbidities were associated with lower LVEF prediction performance.

Conclusions

This demonstrates the ability of off-the-shelf AI-ML architectures to detect clinically useful information from ECGs with performance matching contemporary custom-build AI-ML architectures. We also highlighted the presence of possible biases in these AI-ML approaches in the context of patient characteristics. These findings should be considered in the pursuit of efficient and equitable deployment of AI-ML technologies moving forward.

J.A. Bergquist, D. Dade, B. Zenger, R.S. MacLeod, X. Ye, R. Ranjan, T. Tasdizen, B.A. Steinberg. “Machine Learning Prediction of Blood Potassium at Different Time Cutoffs,” In Computing in Cardiology 2024, 2024.

ABSTRACT

Because serum potassium and ECG morphology changes exhibit a well-understood connection, and the timeline of ECG changes can be relatively quick, there is motivation to explore the sensitivity of ML based prediction of serum potassium using 12 lead ECG data with respect to the time between the ECG and potassium readings.

We trained a convolutional neural network to classify abnormal (serum potassium above 5 mEq/L) vs normal (serum potassium between 4 and 5 mEq/L) from the ECG alone. We compared training with ECGs and potassium measurements filtered to be within 1 hour, 30 minutes, and 15 minutes of each other. We explored scenarios that both leveraged all available data at each time cutoff as well as restricted data to match training set sizes across the time cutoffs. For each case, we trained five separate instances of our neural network to account for variability.

The 1 hour cutoff with all data resulted in an average area under the receiver operator curve (AUC) of 0.850 and a weighted accuracy of 76.3%, 15 minutes resulted in 0.814, 72.5%, and 30 minutes. Truncating the training sets to the same size as the 15 minute cutoff results in comparable average accuracy and AUC for all. Our future studies will continue to explore the performance of ML potassium predictions through investigations of failure cases, identification of biases, and explainability analyses.

A.M. Chalifoux, L. Gibb, K.N. Wurth, T. Tenner, T. Tasdizen, L. MacDonald. “Morphology of uranium oxides reduced from magnesium and sodium diuranate,” In Radiochimica Acta, Vol. 112, No. 2, pp. 73-84. 2024.

ABSTRACT

Morphological analysis of uranium materials has proven to be a key signature for nuclear forensic purposes. This study examines the morphological changes to magnesium diuranate (MDU) and sodium diuranate (SDU) during reduction in a 10 % hydrogen atmosphere with and without steam present. Impurity concentrations of the materials were also examined pre and post reduction using energy dispersive X-ray spectroscopy combined with scanning electron microscopy (SEM-EDX). The structures of the MDU, SDU, and UO_x samples were analyzed using powder X-ray diffraction (p-XRD). Using this method, UO_x from MDU was found to be a mixture of UO₂, U₄O₉, and MgU₂O₆ while UO_x from SDU were combinations of UO₂, U₄O₉, U₃O₈, and UO₃. By SEM, the MDU and UO_x from MDU had identical morphologies comprised of large agglomerates of rounded particles in an irregular pattern. SEM-EDX revealed pockets of high U and high Mg content distributed throughout the materials. The SDU and UO_x from SDU had slightly different morphologies. The SDU consisted of massive agglomerates of platy sheets with rough surfaces. The UO_x from SDU was comprised of massive agglomerates of acicular and sub-rounded particles that appeared slightly sintered. Backscatter images of SDU and related UO_x materials showed sub-rounded dark spots indicating areas of high Na content, especially in UO_x materials created in the presence of steam. SEM-EDX confirmed the presence of high sodium concentration spots in the SDU and UO_x from SDU. Elemental compositions were found to not change between pre and post reduction of MDU and SDU indicating that reduction with or without steam does not affect Mg or Na concentrations. The identification of Mg and Na impurities using SEM analysis presents a readily accessible tool in nuclear material analysis with high Mg and Na impurities likely indicating processing via MDU or SDU, respectively. Machine learning using convolutional neural networks (CNNs) found that the MDU and SDU had unique morphologies compared to previous publications and that there are distinguishing features between materials created with and without steam.

A. Ferrero, E. Ghelichkhan, H. Manoochehri, M.M. Ho, D.J. Albertson, B.J. Brintz, T. Tasdizen, R.T. Whitaker, B. Knudsen. “HistoEM: A Pathologist-Guided and Explainable Workflow Using Histogram Embedding for Gland Classification,” In Modern Pathology, Vol. 37, No. 4, 2024.

ABSTRACT

Pathologists have, over several decades, developed criteria for diagnosing and grading prostate cancer. However, this knowledge has not, so far, been included in the design of convolutional neural networks (CNN) for prostate cancer detection and grading. Further, it is not known whether the features learned by machine-learning algorithms coincide with diagnostic features used by pathologists. We propose a framework that enforces algorithms to learn the cellular and subcellular differences between benign and cancerous prostate glands in digital slides from hematoxylin and eosin–stained tissue sections. After accurate gland segmentation and exclusion of the stroma, the central component of the pipeline, named HistoEM, utilizes a histogram embedding of features from the latent space of the CNN encoder. Each gland is represented by 128 feature-wise histograms that provide the input into a second network for benign vs cancer classification of the whole gland. Cancer glands are further processed by a U-Net structured network to separate low-grade from high-grade cancer. Our model demonstrates similar performance compared with other state-of-the-art prostate cancer grading models with gland-level resolution. To understand the features learned by HistoEM, we first rank features based on the distance between benign and cancer histograms and visualize the tissue origins of the 2 most important features. A heatmap of pixel activation by each feature is generated using Grad-CAM and overlaid on nuclear segmentation outlines. We conclude that HistoEM, similar to pathologists, uses nuclear features for the detection of prostate cancer. Altogether, this novel approach can be broadly deployed to visualize computer-learned features in histopathology images.

S. Ghorbany, M. Hu, S. Yao, C. Wang, Q.C. Nguyen, X. Yue, M. Alirezaei, T. Tasdizen, M Sisk. “Examining the role of passive design indicators in energy burden reduction: Insights from a machine learning and deep learning approach,” In Building and Environment, Elsevier, 2024.

ABSTRACT

Passive design characteristics (PDC) play a pivotal role in reducing the energy burden on households without imposing additional financial constraints on project stakeholders. However, the scarcity of PDC data has posed a challenge in previous studies when assessing their energy-saving impact. To tackle this issue, this research introduces an innovative approach that combines deep learning-powered computer vision with machine learning techniques to examine the relationship between PDC and energy burden in residential buildings. In this study, we employ a convolutional neural network computer vision model to identify and measure key indicators, including window-to-wall ratio (WWR), external shading, and operable window types, using Google Street View images within the Chicago metropolitan area as our case study. Subsequently, we utilize the derived passive design features in conjunction with demographic characteristics to train and compare various machine learning methods. These methods encompass Decision Tree Regression, Random Forest Regression, and Support Vector Regression, culminating in the development of a comprehensive model for energy burden prediction. Our framework achieves a 74.2 % accuracy in forecasting the average energy burden. These results yield invaluable insights for policymakers and urban planners, paving the way toward the realization of smart and sustainable cities.

M.M. Ho, S. Dubey, Y. Chong, B. Knudsen, T. Tasdizen. “F2FLDM: Latent Diffusion Models with Histopathology Pre-Trained Embeddings for Unpaired Frozen Section to FFPE Translation,” Subtitled “arXiv:2404.12650v1,” 2024.

ABSTRACT

The Frozen Section (FS) technique is a rapid and efficient method, taking only 15-30 minutes to prepare slides for pathologists' evaluation during surgery, enabling immediate decisions on further surgical interventions. However, FS process often introduces artifacts and distortions like folds and ice-crystal effects. In contrast, these artifacts and distortions are absent in the higher-quality formalin-fixed paraffin-embedded (FFPE) slides, which require 2-3 days to prepare. While Generative Adversarial Network (GAN)-based methods have been used to translate FS to FFPE images (F2F), they may leave morphological inaccuracies with remaining FS artifacts or introduce new artifacts, reducing the quality of these translations for clinical assessments. In this study, we benchmark recent generative models, focusing on GANs and Latent Diffusion Models (LDMs), to overcome these limitations. We introduce a novel approach that combines LDMs with Histopathology Pre-Trained Embeddings to enhance restoration of FS images. Our framework leverages LDMs conditioned by both text and pre-trained embeddings to learn meaningful features of FS and FFPE histopathology images. Through diffusion and denoising techniques, our approach not only preserves essential diagnostic attributes like color staining and tissue morphology but also proposes an embedding translation mechanism to better predict the targeted FFPE representation of input FS images. As a result, this work achieves a significant improvement in classification performance, with the Area Under the Curve rising from 81.99% to 94.64%, accompanied by an advantageous CaseFD. This work establishes a new benchmark for FS to FFPE image translation quality, promising enhanced reliability and accuracy in histopathology FS image analysis.

M.M. Ho, E. Ghelichkhan, Y. Chong, Y. Zhou, B.S. Knudsen, T. Tasdizen. “DISC: Latent Diffusion Models with Self-Distillation from Separated Conditions for Prostate Cancer Grading,” Subtitled “arXiv:2404.13097,” 2024.

ABSTRACT

Latent Diffusion Models (LDMs) can generate high-fidelity images from noise, offering a promising approach for augmenting histopathology images for training cancer grading models. While previous works successfully generated high-fidelity histopathology images using LDMs, the generation of image tiles to improve prostate cancer grading has not yet been explored. Additionally, LDMs face challenges in accurately generating admixtures of multiple cancer grades in a tile when conditioned by a tile mask. In this study, we train specific LDMs to generate synthetic tiles that contain multiple Gleason Grades (GGs) by leveraging pixel-wise annotations in input tiles. We introduce a novel framework named Self-Distillation from Separated Conditions (DISC) that generates GG patterns guided by GG masks. Finally, we deploy a training framework for pixel-level and slide-level prostate cancer grading, where synthetic tiles are effectively utilized to improve the cancer grading performance of existing models. As a result, this work surpasses previous works in two domains: 1) our LDMs enhanced with DISC produce more accurate tiles in terms of GG patterns, and 2) our training scheme, incorporating synthetic data, significantly improves the generalization of the baseline model for prostate cancer grading, particularly in challenging cases of rare GG5, demonstrating the potential of generative models to enhance cancer grading when data is limited.

J Johnson, L McDonald, T Tasdizen. “Improving uranium oxide pathway discernment and generalizability using contrastive self-supervised learning,” In Computational Materials Science, Vol. 223, Elsevier, 2024.

ABSTRACT

In the field of Nuclear Forensics, there exists a plethora of different tools to aid investigators when performing analysis of unknown nuclear materials. Many of these tools offer visual representations of the uranium ore concentrate (UOC) materials that include complimentary and contrasting information. In this paper, we present a novel technique drawing from state-of-the-art machine learning methods that allows information from scanning electron microscopy images (SEM) to be combined to create digital encodings of the material that can be used to determine the material’s processing route. Our technique can classify UOC processing routes with greater than 96% accuracy in a fraction of a second and can be adapted to unseen samples at similarly high accuracy. The technique’s high accuracy and speed allow forensic investigators to quickly get preliminary results, while generalization allows the model to be adapted to new materials or processing routes quickly without the need for complete retraining of the model.

V. Koppelmans, M.F.L. Ruitenberg, S.Y. Schaefer, J.B. King, J.M. Jacobo, B.P. Silvester, A.F. Mejia, J. van der Geest, J.M. Hoffman, T. Tasdizen, K. Duff. “Classification of Mild Cognitive Impairment and Alzheimer's Disease Using Manual Motor Measures,” In Neurodegener Dis, 2024.
DOI: 10.1159/000539800
PubMed ID: 38865972

ABSTRACT

Introduction: Manual motor problems have been reported in mild cognitive impairment (MCI) and Alzheimer's disease (AD), but the specific aspects that are affected, their neuropathology, and potential value for classification modeling is unknown. The current study examined if multiple measures of motor strength, dexterity, and speed are affected in MCI and AD, related to AD biomarkers, and are able to classify MCI or AD.

Methods: Fifty-three cognitively normal (CN), 33 amnestic MCI, and 28 AD subjects completed five manual motor measures: grip force, Trail Making Test A, spiral tracing, finger tapping, and a simulated feeding task. Analyses included: 1) group differences in manual performance; 2) associations between manual function and AD biomarkers (PET amyloid β, hippocampal volume, and APOE ε4 alleles); and 3) group classification accuracy of manual motor function using machine learning.

Results: amnestic MCI and AD subjects exhibited slower psychomotor speed and AD subjects had weaker dominant hand grip strength than CN subjects. Performance on these measures was related to amyloid β deposition (both) and hippocampal volume (psychomotor speed only). Support vector classification well-discriminated control and AD subjects (area under the curve of 0.73 and 0.77 respectively), but poorly discriminated MCI from controls or AD.

Conclusion: Grip strength and spiral tracing appear preserved, while psychomotor speed is affected in amnestic MCI and AD. The association of motor performance with amyloid β deposition and atrophy could indicate that this is due to amyloid deposition in- and atrophy of motor brain regions, which generally occurs later in the disease process. The promising discriminatory abilities of manual motor measures for AD emphasize their value alongside other cognitive and motor assessment outcomes in classification and prediction models, as well as potential enrichment of outcome variables in AD clinical trials.

X. Li, R. Mohammed, T. Mangin, S. Saha, K. Kelly, R.T. Whitaker, T. Tasdizen. “Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies,” Subtitled “arXiv:2410.21170v1,” 2024.

ABSTRACT

Idling vehicle detection (IVD) can be helpful in monitoring and reducing unnecessary idling and can be integrated into real-time systems to address the resulting pollution and harmful products. The previous approach, a non-end-to-end model, requires extra user clicks to specify a part of the input, making system deployment more error-prone or even not feasible. In contrast, we introduce an end-to-end joint audio-visual IVD task designed to detect vehicles visually under three states: moving, idling and engine off. Unlike feature co-occurrence task such as audio-visual vehicle tracking, our IVD task addresses complementary features, where labels cannot be determined by a single modality alone. To this end, we propose AVIVD-Net, a novel network that integrates audio and visual features through a bidirectional attention mechanism. AVIVD-Net streamlines the input process by learning a joint feature space, reducing the deployment complexity of previous methods. Additionally, we introduce the AVIVD dataset, which is seven times larger than previous datasets, offering significantly more annotated samples to study the IVD problem. Our model achieves performance comparable to prior approaches, making it suitable for automated deployment. Furthermore, by evaluating AVIVDNet on the feature co-occurrence public dataset MAVD, we demonstrate its potential for extension to self-driving vehicle video-camera setups.

H. Manoochehri, B. Zhang, B.S. Knudsen, T. Tasdizen. “PathMoCo: A Novel Framework to Improve Feature Embedding in Self-supervised Contrastive Learning for Histopathological Images,” Subtitled “arXiv:2410.17514,” 2024.

ABSTRACT

Self-supervised learning has become a cornerstone in various areas, particularly histopathological image analysis. Image augmentation plays a crucial role in self-supervised learning, as it generates variations in image samples. However, traditional image augmentation techniques often overlook the unique characteristics of histopathological images. In this paper, we propose a new histopathology-specific image augmentation method called stain reconstruction augmentation (SRA). We integrate our SRA with MoCo v3, a leading model in self-supervised contrastive learning, along with our additional contrastive loss terms, and call the new model SRA-MoCo v3. We demonstrate that our SRA-MoCo v3 always outperforms the standard MoCo v3 across various downstream tasks and achieves comparable or superior performance to other foundation models pre-trained on significantly larger histopathology datasets.

Q.C. Nguyen, T. Tasdizen, M. Alirezaei, H. Mane, X. Yue, J.S. Merchant, W. Yu, L. Drew, D. Li, T.T. Nguyen. “Neighborhood built environment, obesity, and diabetes: A Utah siblings study,” In SSM - Population Health, Vol. 26, 2024.

ABSTRACT

Background

This study utilizes innovative computer vision methods alongside Google Street View images to characterize neighborhood built environments across Utah.

Methods

Convolutional Neural Networks were used to create indicators of street greenness, crosswalks, and building type on 1.4 million Google Street View images. The demographic and medical profiles of Utah residents came from the Utah Population Database (UPDB). We implemented hierarchical linear models with individuals nested within zip codes to estimate associations between neighborhood built environment features and individual-level obesity and diabetes, controlling for individual- and zip code-level characteristics (n = 1,899,175 adults living in Utah in 2015). Sibling random effects models were implemented to account for shared family attributes among siblings (n = 972,150) and twins (n = 14,122).

Results

Consistent with prior neighborhood research, the variance partition coefficients (VPC) of our unadjusted models nesting individuals within zip codes were relatively small (0.5%–5.3%), except for HbA1c (VPC = 23%), suggesting a small percentage of the outcome variance is at the zip code-level. However, proportional change in variance (PCV) attributable to zip codes after the inclusion of neighborhood built environment variables and covariates ranged between 11% and 67%, suggesting that these characteristics account for a substantial portion of the zip code-level effects. Non-single-family homes (indicator of mixed land use), sidewalks (indicator of walkability), and green streets (indicator of neighborhood aesthetics) were associated with reduced diabetes and obesity. Zip codes in the third tertile for non-single-family homes were associated with a 15% reduction (PR: 0.85; 95% CI: 0.79, 0.91) in obesity and a 20% reduction (PR: 0.80; 95% CI: 0.70, 0.91) in diabetes. This tertile was also associated with a BMI reduction of −0.68 kg/m2 (95% CI: −0.95, −0.40)

Conclusion

We observe associations between neighborhood characteristics and chronic diseases, accounting for biological, social, and cultural factors shared among siblings in this large population-based study.

Q.C. Nguyen, M. Alirezaei, X. Yue, H. Mane, D. Li, L. Zhao, T.T. Nguyen, R. Patel, W. Yu, M. Hu, D. Quistberg, T. Tasdizen. “Leveraging computer vision for predicting collision risks: a cross-sectional analysis of 2019–2021 fatal collisions in the USA,” In Injury Prevention, BMJ, 2024.

ABSTRACT

Objective The USA has higher rates of fatal motor vehicle collisions than most high-income countries. Previous studies examining the role of the built environment were generally limited to small geographic areas or single cities. This study aims to quantify associations between built environment characteristics and traffic collisions in the USA.

Methods Built environment characteristics were derived from Google Street View images and summarised at the census tract level. Fatal traffic collisions were obtained from the 2019–2021 Fatality Analysis Reporting System. Fatal and non-fatal traffic collisions in Washington DC were obtained from the District Department of Transportation. Adjusted Poisson regression models examined whether built environment characteristics are related to motor vehicle collisions in the USA, controlling for census tract sociodemographic characteristics.

Results Census tracts in the highest tertile of sidewalks, single-lane roads, streetlights and street greenness had 70%, 50%, 30% and 26% fewer fatal vehicle collisions compared with those in the lowest tertile. Street greenness and single-lane roads were associated with 37% and 38% fewer pedestrian-involved and cyclist-involved fatal collisions. Analyses with fatal and non-fatal collisions in Washington DC found streetlights and stop signs were associated with fewer pedestrians and cyclists-involved vehicle collisions while road construction had an adverse association.

Conclusion This study demonstrates the utility of using data algorithms that can automatically analyse street segments to create indicators of the built environment to enhance understanding of large-scale patterns and inform interventions to decrease road traffic injuries and fatalities.

D. Alex Quistberg, S.J. Mooney, T. Tasdizen, P. Arbelaez, Q.C. Nguyen. “Deep Learning-Methods to Amplify Epidemiological Data Collection and Analyses,” In American Journal of Epidemiology, Oxford University Press, 2024.

ABSTRACT

Deep learning is a subfield of artificial intelligence and machine learning based mostly on neural networks and often combined with attention algorithms that has been used to detect and identify objects in text, audio, images, and video. Serghiou and Rough (Am J Epidemiol. 0000;000(00):0000-0000) present a primer for epidemiologists on deep learning models. These models provide substantial opportunities for epidemiologists to expand and amplify their research in both data collection and analyses by increasing the geographic reach of studies, including more research subjects, and working with large or high dimensional data. The tools for implementing deep learning methods are not quite yet as straightforward or ubiquitous for epidemiologists as traditional regression methods found in standard statistical software, but there are exciting opportunities for interdisciplinary collaboration with deep learning experts, just as epidemiologists have with statisticians, healthcare providers, urban planners, and other professionals. Despite the novelty of these methods, epidemiological principles of assessing bias, study design, interpretation and others still apply when implementing deep learning methods or assessing the findings of studies that have used them.

X. Tang, B. Zhang, B.S. Knudsen, T. Tasdizen. “DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention,” Subtitled “arXiv:2407.13920,” 2024.

ABSTRACT

We here propose a novel hierarchical transformer model that adeptly integrates the feature extraction capabilities of Convolutional Neural Networks (CNNs) with the advanced representational potential of Vision Transformers (ViTs). Addressing the lack of inductive biases and dependence on extensive training datasets in ViTs, our model employs a CNN backbone to generate hierarchical visual representations. These representations are then adapted for transformer input through an innovative patch tokenization. We also introduce a ’scale attention’ mechanism that captures cross-scale dependencies, complementing patch attention to enhance spatial understanding and preserve global perception. Our approach significantly outperforms baseline models on small and medium-sized medical datasets, demonstrating its efficiency and generalizability. The components are designed as plug-and-play for different CNN architectures and can be adapted for multiple applications.

X. Tang, J. Berquist, B.A. Steinberg, T. Tasdizen. “Hierarchical Transformer for Electrocardiogram Diagnosis,” Subtitled “arXiv:2411.00755,” 2024.

ABSTRACT

Transformers, originally prominent in NLP and computer vision, are now being adapted for ECG signal analysis. This paper introduces a novel hierarchical transformer architecture that segments the model into multiple stages by assessing the spatial size of the embeddings, thus eliminating the need for additional downsampling strategies or complex attention designs. A classification token aggregates information across feature scales, facilitating interactions between different stages of the transformer. By utilizing depth-wise convolutions in a six-layer convolutional encoder, our approach preserves the relationships between different ECG leads. Moreover, an attention gate mechanism learns associations among the leads prior to classification. This model adapts flexibly to various embedding networks and input sizes while enhancing the interpretability of transformers in ECG signal analysis.

T. Tasdizen. “VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models,” Subtitled “arXiv:2410.04609,” 2024.

ABSTRACT

The recent developments in deep learning (DL) led to the integration of natural language processing (NLP) with computer vision, resulting in powerful integrated Vision and Language Models (VLMs). Despite their remarkable capabilities, these models are frequently regarded as black boxes within the machine learning research community. This raises a critical question: which parts of an image correspond to specific segments of text, and how can we decipher these associations? Understanding these connections is essential for enhancing model transparency, interpretability, and trustworthiness. To answer this question, we present an image-text aligned human visual attention dataset that maps specific associations between image regions and corresponding text segments. We then compare the internal heatmaps generated by VL models with this dataset, allowing us to analyze and better understand the model’s decisionmaking process. This approach aims to enhance model transparency, interpretability, and trustworthiness by providing insights into how these models align visual and linguistic information. We conducted a comprehensive study on text-guided visual saliency detection in these VL models. This study aims to understand how different models prioritize and focus on specific visual elements in response to corresponding text segments, providing deeper insights into their internal mechanisms and improving our ability to interpret their outputs.

H.Y. Zewdie, O.L. Sarmiento, J.D. Pinzón, M.A. Wilches-Mogollon, P. A. Arbelaez, L. Baldovino-Chiquillo, D. Hidalgo, L. Guzman, S.J. Mooney, Q.C. Nguyen, T. Tasdizen, D.A. Quistberg . “Road Traffic Injuries and the Built Environment in Bogotá, Colombia, 2015–2019: A Cross-Sectional Analysis,” In Journal of Urban Health, Springer, 2024.

ABSTRACT

Nine in 10 road traffic deaths occur in low- and middle-income countries (LMICs). Despite this disproportionate burden, few studies have examined built environment correlates of road traffic injury in these settings, including in Latin America. We examined road traffic collisions in Bogotá, Colombia, occurring between 2015 and 2019, and assessed the association between neighborhood-level built environment features and pedestrian injury and death. We used descriptive statistics to characterize all police-reported road traffic collisions that occurred in Bogotá between 2015 and 2019. Cluster detection was used to identify spatial clustering of pedestrian collisions. Adjusted multivariate Poisson regression models were fit to examine associations between several neighborhood-built environment features and rate of pedestrian road traffic injury and death. A total of 173,443 police-reported traffic collisions occurred in Bogotá between 2015 and 2019. Pedestrians made up about 25% of road traffic injuries and 50% of road traffic deaths in Bogotá between 2015 and 2019. Pedestrian collisions were spatially clustered in the southwestern region of Bogotá. Neighborhoods with more street trees (RR, 0.90; 95% CI, 0.82–0.98), traffic signals (0.89, 0.81–0.99), and bus stops (0.89, 0.82–0.97) were associated with lower pedestrian road traffic deaths. Neighborhoods with greater density of large roads were associated with higher pedestrian injury. Our findings highlight the potential for pedestrian-friendly infrastructure to promote safer interactions between pedestrians and motorists in Bogotá and in similar urban contexts globally.

Page 1 of 9

SCIENTIFIC COMPUTING AND IMAGING INSTITUTEat the University of Utah

SCI Publications

Background

Objective

Methods

Results

Conclusion

Background

Objective

Methods

Results

Conclusions

SCIENTIFIC COMPUTING AND IMAGING INSTITUTE
at the University of Utah