Source author record

Hien Nguyen

Hien Nguyen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.CO Machine Learning Artificial Intelligence astro-ph.IM eess.IV Applications astro-ph.GA Computation Computer Vision Distributed, Parallel, and Cluster Computing eess.SY Information Retrieval Methodology physics.ins-det Systems and Control

Catalog footprint

What is connected

14works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

CapsNet for Medical Image Segmentation

Convolutional Neural Networks (CNNs) have been successful in solving tasks in computer vision including medical image segmentation due to their ability to automatically extract features from unstructured data. However, CNNs are sensitive to rotation and affine transformation and their success relies on huge-scale labeled datasets capturing various input variations. This network paradigm has posed challenges at scale because acquiring annotated data for medical segmentation is expensive, and strict privacy regulations. Furthermore, visual representation learning with CNNs has its own flaws, e.g., it is arguable that the pooling layer in traditional CNNs tends to discard positional information and CNNs tend to fail on input images that differ in orientations and sizes. Capsule network (CapsNet) is a recent new architecture that has achieved better robustness in representation learning by replacing pooling layers with dynamic routing and convolutional strides, which has shown potential results on popular tasks such as classification, recognition, segmentation, and natural language processing. Different from CNNs, which result in scalar outputs, CapsNet returns vector outputs, which aim to preserve the part-whole relationships. In this work, we first introduce the limitations of CNNs and fundamentals of CapsNet. We then provide recent developments of CapsNet for the task of medical image segmentation. We finally discuss various effective network architectures to implement a CapsNet for both 2D images and 3D volumetric medical image segmentation.

preprint2022arXiv

SPHERExLabTools (SLT): A Python Data Acquisition System for SPHEREx Characterization and Calibration

Selected as the next NASA Medium Class Explorer mission, SPHEREx, the Spectro-Photometer for the History of the Universe, Epoch of Reionization, and Ices Explorer is planned for launch in early 2025. SPHEREx calibration data products include detector spectral response, non-linearity, persistence, and telescope focus error measurements. To produce these calibration products, we have developed a dedicated data acquisition and instrument control system, SPHERExLabTools (SLT). SLT implements driver-level software for control of all testbed instrumentation, graphical interfaces for control of instruments and automated measurements, real-time data visualization, processing, and data archival tools for a variety of output file formats. This work outlines the architecture of the SLT software as a framework for general purpose laboratory data acquisition and instrument control. Initial SPHEREx calibration products acquired while using SLT are also presented.

preprint2021arXiv

Explaining the data or explaining a model? Shapley values that uncover non-linear dependencies

Shapley values have become increasingly popular in the machine learning literature thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of `fairness'. The flexibility arises from the myriad potential forms of the Shapley value \textit{game formulation}. Amongst the consequences of this flexibility is that there are now many types of Shapley values being discussed, with such variety being a source of potential misunderstanding. To the best of our knowledge, all existing game formulations in the machine learning and statistics literature fall into a category which we name the model-dependent category of game formulations. In this work, we consider an alternative and novel formulation which leads to the first instance of what we call model-independent Shapley values. These Shapley values use a (non-parametric) measure of non-linear dependence as the characteristic function. The strength of these Shapley values is in their ability to uncover and attribute non-linear dependencies amongst features. We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert-Shmidt independence criterion as Shapley value characteristic functions. In particular, we demonstrate their potential value for exploratory data analysis and model diagnostics. We conclude with an interesting expository application to a classical medical survey data set.

preprint2021arXiv

Shapley values for feature selection: The good, the bad, and the axioms

The Shapley value has become popular in the Explainable AI (XAI) literature, thanks, to a large extent, to a solid theoretical foundation, including four "favourable and fair" axioms for attribution in transferable utility games. The Shapley value is provably the only solution concept satisfying these axioms. In this paper, we introduce the Shapley value and draw attention to its recent uses as a feature selection tool. We call into question this use of the Shapley value, using simple, abstract "toy" counterexamples to illustrate that the axioms may work against the goals of feature selection. From this, we develop a number of insights that are then investigated in concrete simulation settings, with a variety of Shapley value formulations, including SHapley Additive exPlanations (SHAP) and Shapley Additive Global importancE (SAGE).

preprint2020arXiv

DVNet: A Memory-Efficient Three-Dimensional CNN for Large-Scale Neurovascular Reconstruction

Maps of brain microarchitecture are important for understanding neurological function and behavior, including alterations caused by chronic conditions such as neurodegenerative disease. Techniques such as knife-edge scanning microscopy (KESM) provide the potential for whole organ imaging at sub-cellular resolution. However, multi-terabyte data sizes make manual annotation impractical and automatic segmentation challenging. Densely packed cells combined with interconnected microvascular networks are a challenge for current segmentation algorithms. The massive size of high-throughput microscopy data necessitates fast and largely unsupervised algorithms. In this paper, we investigate a fully-convolutional, deep, and densely-connected encoder-decoder for pixel-wise semantic segmentation. The excessive memory complexity often encountered with deep and dense networks is mitigated using skip connections, resulting in fewer parameters and enabling a significant performance increase over prior architectures. The proposed network provides superior performance for semantic segmentation problems applied to open-source benchmarks. We finally demonstrate our network for cellular and microvascular segmentation, enabling quantitative metrics for organ-scale neurovascular analysis.

preprint2020arXiv

SARGDV: Efficient identification of groundwater-dependent vegetation using synthetic aperture radar

Groundwater depletion impacts the sustainability of numerous groundwater-dependent vegetation (GDV) globally, placing significant stress on their capacity to provide environmental and ecological support for flora, fauna, and anthropic benefits. Industries such as mining, agriculture, and plantations are heavily reliant on groundwater, the over-exploitation of which risks impacting groundwater regimes, quality, and accessibility for nearby GDVs. Cost effective methods of GDV identification will enable strategic protection of these critical ecological systems, through improved and sustainable groundwater management by communities and industry. Recent application of synthetic aperture radar (SAR) earth observation data in Australia has demonstrated the utility of radar for identifying terrestrial groundwater-dependent ecosystems at scale. We propose a robust classification method to advance identification of GDVs at scale using processed SAR data products adapted from a recent previous method. The method includes the development of SARGDV, a binary classification model, which uses the extreme gradient boosting (XGBoost) algorithm in conjunction with three data cubes composed of Sentinel-1 SAR interferometric wide images. The images were collected as a one-year time series over Mount Gambier, a region in South Australia, known to support GDVs. The SARGDV model demonstrated high performance for classifying GDVs with 77% precision, 76% true positive rate and 96% accuracy. This method may be used to support the protection of GDV communities globally by providing a long term, cost-effective solution to identify GDVs over variable regions and climates, via the use of freely available, high-resolution, globally available Sentinel-1 SAR data sets.

preprint2020arXiv

Shapley value confidence intervals for attributing variance explained

The coefficient of determination, the $R^2$, is often used to measure the variance explained by an affine combination of multiple explanatory covariates. An attribution of this explanatory contribution to each of the individual covariates is often sought in order to draw inference regarding the importance of each covariate with respect to the response phenomenon. A recent method for ascertaining such an attribution is via the game theoretic Shapley value decomposition of the coefficient of determination. Such a decomposition has the desirable efficiency, monotonicity, and equal treatment properties. Under a weak assumption that the joint distribution is pseudo-elliptical, we obtain the asymptotic normality of the Shapley values. We then utilize this result in order to construct confidence intervals and hypothesis tests regarding such quantities. Monte Carlo studies regarding our results are provided. We found that our asymptotic confidence intervals are computationally superior to competing bootstrap methods and are able to improve upon the performance of such intervals. In an expository application to Australian real estate price modelling, we employ Shapley value confidence intervals to identify significant differences between the explanatory contributions of covariates, between models, which otherwise share approximately the same $R^2$ value. These different models are based on real estate data from the same periods in 2019 and 2020, the latter covering the early stages of the arrival of the novel coronavirus, COVID-19.

preprint2020arXiv

Thermal Kinetic Inductance Detectors for millimeter-wave detection

Thermal Kinetic Inductance Detectors (TKIDs) combine the excellent noise performance of traditional bolometers with a radio frequency multiplexing architecture that enables the large detector counts needed for the next generation of millimeter-wave instruments. In this paper, we first discuss the expected noise sources in TKIDs and derive the limits where the phonon noise contribution dominates over the other detector noise terms: generation-recombination, amplifier, and two-level system (TLS) noise. Second, we characterize aluminum TKIDs in a dark environment. We present measurements of TKID resonators with quality factors of about $10^5$ at 80 mK. We also discuss the bolometer thermal conductance, heat capacity, and time constants. These were measured by the use of a resistor on the thermal island to excite the bolometers. These dark aluminum TKIDs demonstrate a noise equivalent power NEP = $2 \times 10^{-17} \mathrm{W}/\mathrm{\sqrt{Hz}} $, with a $1/f$ knee at 0.1 Hz, which provides background noise limited performance for ground-based telescopes observing at 150 GHz.

preprint2016arXiv

Checkpointing to minimize completion time for Inter-dependent Parallel Processes on Volunteer Grids

Volunteer computing is being used successfully for large scale scientific computations. This research is in the context of Volpex, a programming framework that supports communicating parallel processes in a volunteer environment. Redundancy and checkpointing are combined to ensure consistent forward progress with Volpex in this unique execution environment characterized by heterogeneous failure prone nodes and interdependent replicated processes. An important parameter for optimizing performance with Volpex is the frequency of checkpointing. The paper presents a mathematical model to minimize the completion time for inter-dependent parallel processes running in a volunteer environment by finding a suitable checkpoint interval. Validation is performed with a sample real world application running on a pool of distributed volunteer nodes. The results indicate that the performance with our predicted checkpoint interval is fairly close to the best performance obtained empirically by varying the checkpoint interval.

preprint2016arXiv

Science Impacts of the SPHEREx All-Sky Optical to Near-Infrared Spectral Survey: Report of a Community Workshop Examining Extragalactic, Galactic, Stellar and Planetary Science

SPHEREx is a proposed SMEX mission selected for Phase A. SPHEREx will carry out the first all-sky spectral survey and provide for every 6.2" pixel a spectra between 0.75 and 4.18 $μ$m [with R$\sim$41.4] and 4.18 and 5.00 $μ$m [with R$\sim$135]. The SPHEREx team has proposed three specific science investigations to be carried out with this unique data set: cosmic inflation, interstellar and circumstellar ices, and the extra-galactic background light. It is readily apparent, however, that many other questions in astrophysics and planetary sciences could be addressed with the SPHEREx data. The SPHEREx team convened a community workshop in February 2016, with the intent of enlisting the aid of a larger group of scientists in defining these questions. This paper summarizes the rich and varied menu of investigations that was laid out. It includes studies of the composition of main belt and Trojan/Greek asteroids; mapping the zodiacal light with unprecedented spatial and spectral resolution; identifying and studying very low-metallicity stars; improving stellar parameters in order to better characterize transiting exoplanets; studying aliphatic and aromatic carbon-bearing molecules in the interstellar medium; mapping star formation rates in nearby galaxies; determining the redshift of clusters of galaxies; identifying high redshift quasars over the full sky; and providing a NIR spectrum for most eROSITA X-ray sources. All of these investigations, and others not listed here, can be carried out with the nominal all-sky spectra to be produced by SPHEREx. In addition, the workshop defined enhanced data products and user tools which would facilitate some of these scientific studies. Finally, the workshop noted the high degrees of synergy between SPHEREx and a number of other current or forthcoming programs, including JWST, WFIRST, Euclid, GAIA, K2/Kepler, TESS, eROSITA and LSST.

preprint2015arXiv

Cosmology with the SPHEREX All-Sky Spectral Survey

SPHEREx (Spectro-Photometer for the History of the Universe, Epoch of Reionization, and Ices Explorer) ( http://spherex.caltech.edu ) is a proposed all-sky spectroscopic survey satellite designed to address all three science goals in NASA's Astrophysics Division: probe the origin and destiny of our Universe; explore whether planets around other stars could harbor life; and explore the origin and evolution of galaxies. SPHEREx will scan a series of Linear Variable Filters systematically across the entire sky. The SPHEREx data set will contain R=40 spectra fir 0.75$<λ<$4.1$μ$m and R=150 spectra for 4.1$<λ<$4.8$μ$m for every 6.2 arc second pixel over the entire-sky. In this paper, we detail the extra-galactic and cosmological studies SPHEREx will enable and present detailed systematic effect evaluations. We also outline the Ice and Galaxy Evolution Investigations.

preprint2013arXiv

The Decision-Theoretic Interactive Video Advisor

The need to help people choose among large numbers of items and to filter through large amounts of information has led to a flood of research in construction of personal recommendation agents. One of the central issues in constructing such agents is the representation and elicitation of user preferences or interests. This topic has long been studied in Decision Theory, but surprisingly little work in the area of recommender systems has made use of formal decision-theoretic techniques. This paper describes DIVA, a decision-theoretic agent for recommending movies that contains a number of novel features. DIVA represents user preferences using pairwise comparisons among items, rather than numeric ratings. It uses a novel similarity measure based on the concept of the probability of conflict between two orderings of items. The system has a rich representation of preference, distinguishing between a user's general taste in movies and his immediate interests. It takes an incremental approach to preference elicitation in which the user can provide feedback if not satisfied with the recommendation list. We empirically evaluate the performance of the system using the EachMovie collaborative filtering database.

preprint2010arXiv

A high signal to noise ratio map of the Sunyaev-Zel'dovich increment at 1.1 mm wavelength in Abell 1835

We present an analysis of an 8 arcminute diameter map of the area around the galaxy cluster Abell 1835 from jiggle map observations at a wavelength of 1.1 mm using the Bolometric Camera (Bolocam) mounted on the Caltech Submillimeter Observatory (CSO). The data is well described by a model including an extended Sunyaev-Zel'dovich (SZ) signal from the cluster gas plus emission from two bright background submm galaxies magnified by the gravitational lensing of the cluster. The best-fit values for the central Compton value for the cluster and the fluxes of the two main point sources in the field: SMM J140104+0252, and SMM J14009+0252 are found to be $y_{0}=(4.34\pm0.52\pm0.69)\times10^{-4}$, 6.5$\pm{2.0}\pm0.7$ mJy and 11.3$\pm{1.9}\pm1.1$ mJy, where the first error represents the statistical measurement error and the second error represents the estimated systematic error in the result. This measurement assumes the presence of dust emission from the cluster's central cD galaxy of $1.8\pm0.5$ mJy, based on higher frequency observations of Abell 1835. The cluster image represents one of the highest-significance SZ detections of a cluster in the positive region of the thermal SZ spectrum to date. The inferred central intensity is compared to other SZ measurements of Abell 1835 and this collection of results is used to obtain values for $y_{0} = (3.60\pm0.24)\times10^{-4}$ and the cluster peculiar velocity $v_{z} = -226\pm275$ km/s.

preprint2010arXiv

The Herschel-SPIRE Legacy Survey (HSLS): the scientific goals of a shallow and wide submillimeter imaging survey with SPIRE

A large sub-mm survey with Herschel will enable many exciting science opportunities, especially in an era of wide-field optical and radio surveys and high resolution cosmic microwave background experiments. The Herschel-SPIRE Legacy Survey (HSLS), will lead to imaging data over 4000 sq. degrees at 250, 350, and 500 micron. Major Goals of HSLS are: (a) produce a catalog of 2.5 to 3 million galaxies down to 26, 27 and 33 mJy (50% completeness; 5 sigma confusion noise) at 250, 350 and 500 micron, respectively, in the southern hemisphere (3000 sq. degrees) and in an equatorial strip (1000 sq. degrees), areas which have extensive multi-wavelength coverage and are easily accessible from ALMA. Two thirds of the of the sources are expected to be at z > 1, one third at z > 2 and about a 1000 at z > 5. (b) Remove point source confusion in secondary anisotropy studies with Planck and ground-based CMB data. (c) Find at least 1200 strongly lensed bright sub-mm sources leading to a 2% test of general relativity. (d) Identify 200 proto-cluster regions at z of 2 and perform an unbiased study of the environmental dependence of star formation. (e) Perform an unbiased survey for star formation and dust at high Galactic latitude and make a census of debris disks and dust around AGB stars and white dwarfs.

Hien Nguyen

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

CapsNet for Medical Image Segmentation

SPHERExLabTools (SLT): A Python Data Acquisition System for SPHEREx Characterization and Calibration

Explaining the data or explaining a model? Shapley values that uncover non-linear dependencies

Shapley values for feature selection: The good, the bad, and the axioms

DVNet: A Memory-Efficient Three-Dimensional CNN for Large-Scale Neurovascular Reconstruction

SARGDV: Efficient identification of groundwater-dependent vegetation using synthetic aperture radar

Shapley value confidence intervals for attributing variance explained

Thermal Kinetic Inductance Detectors for millimeter-wave detection

Checkpointing to minimize completion time for Inter-dependent Parallel Processes on Volunteer Grids

Science Impacts of the SPHEREx All-Sky Optical to Near-Infrared Spectral Survey: Report of a Community Workshop Examining Extragalactic, Galactic, Stellar and Planetary Science

Cosmology with the SPHEREX All-Sky Spectral Survey

The Decision-Theoretic Interactive Video Advisor

A high signal to noise ratio map of the Sunyaev-Zel'dovich increment at 1.1 mm wavelength in Abell 1835

The Herschel-SPIRE Legacy Survey (HSLS): the scientific goals of a shallow and wide submillimeter imaging survey with SPIRE