Source author record

Thomas Yu

Thomas Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.IV physics.med-ph Applications Artificial Intelligence Computation and Language math.CO math.DG math.NA Numerical Analysis

Catalog footprint

What is connected

6works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Slice estimation in diffusion MRI of neonatal and fetal brains in image and spherical harmonics domains using autoencoders

Diffusion MRI (dMRI) of the developing brain can provide valuable insights into the white matter development. However, slice thickness in fetal dMRI is typically high (i.e., 3-5 mm) to freeze the in-plane motion, which reduces the sensitivity of the dMRI signal to the underlying anatomy. In this study, we aim at overcoming this problem by using autoencoders to learn unsupervised efficient representations of brain slices in a latent space, using raw dMRI signals and their spherical harmonics (SH) representation. We first learn and quantitatively validate the autoencoders on the developing Human Connectome Project pre-term newborn data, and further test the method on fetal data. Our results show that the autoencoder in the signal domain better synthesized the raw signal. Interestingly, the fractional anisotropy and, to a lesser extent, the mean diffusivity, are best recovered in missing slices by using the autoencoder trained with SH coefficients. A comparison was performed with the same maps reconstructed using an autoencoder trained with raw signals, as well as conventional interpolation methods of raw signals and SH coefficients. From these results, we conclude that the recovery of missing/corrupted slices should be performed in the signal domain if the raw signal is aimed to be recovered, and in the SH domain if diffusion tensor properties (i.e., fractional anisotropy) are targeted. Notably, the trained autoencoders were able to generalize to fetal dMRI data acquired using a much smaller number of diffusion gradients and a lower b-value, where we qualitatively show the consistency of the estimated diffusion tensor maps.

preprint2022arXiv

The NLP Sandbox: an efficient model-to-data system to enable federated and unbiased evaluation of clinical NLP models

Objective The evaluation of natural language processing (NLP) models for clinical text de-identification relies on the availability of clinical notes, which is often restricted due to privacy concerns. The NLP Sandbox is an approach for alleviating the lack of data and evaluation frameworks for NLP models by adopting a federated, model-to-data approach. This enables unbiased federated model evaluation without the need for sharing sensitive data from multiple institutions. Materials and Methods We leveraged the Synapse collaborative framework, containerization software, and OpenAPI generator to build the NLP Sandbox (nlpsandbox.io). We evaluated two state-of-the-art NLP de-identification focused annotation models, Philter and NeuroNER, using data from three institutions. We further validated model performance using data from an external validation site. Results We demonstrated the usefulness of the NLP Sandbox through de-identification clinical model evaluation. The external developer was able to incorporate their model into the NLP Sandbox template and provide user experience feedback. Discussion We demonstrated the feasibility of using the NLP Sandbox to conduct a multi-site evaluation of clinical text de-identification models without the sharing of data. Standardized model and data schemas enable smooth model transfer and implementation. To generalize the NLP Sandbox, work is required on the part of data owners and model developers to develop suitable and standardized schemas and to adapt their data or model to fit the schemas. Conclusions The NLP Sandbox lowers the barrier to utilizing clinical data for NLP model evaluation and facilitates federated, multi-site, unbiased evaluation of NLP models.

preprint2020arXiv

Model-Informed Machine Learning for Multi-component T2 Relaxometry

Recovering the T2 distribution from multi-echo T2 magnetic resonance (MR) signals is challenging but has high potential as it provides biomarkers characterizing the tissue micro-structure, such as the myelin water fraction (MWF). In this work, we propose to combine machine learning and aspects of parametric (fitting from the MRI signal using biophysical models) and non-parametric (model-free fitting of the T2 distribution from the signal) approaches to T2 relaxometry in brain tissue by using a multi-layer perceptron (MLP) for the distribution reconstruction. For training our network, we construct an extensive synthetic dataset derived from biophysical models in order to constrain the outputs with \textit{a priori} knowledge of \textit{in vivo} distributions. The proposed approach, called Model-Informed Machine Learning (MIML), takes as input the MR signal and directly outputs the associated T2 distribution. We evaluate MIML in comparison to non-parametric and parametric approaches on synthetic data, an ex vivo scan, and high-resolution scans of healthy subjects and a subject with Multiple Sclerosis. In synthetic data, MIML provides more accurate and noise-robust distributions. In real data, MWF maps derived from MIML exhibit the greatest conformity to anatomical scans, have the highest correlation to a histological map of myelin volume, and the best unambiguous lesion visualization and localization, with superior contrast between lesions and normal appearing tissue. In whole-brain analysis, MIML is 22 to 4980 times faster than non-parametric and parametric methods, respectively.

preprint2020arXiv

Numerical Methods for Biomembranes: conforming subdivision methods versus non-conforming PL methods

The Canham-Helfrich-Evans models of biomembranes consist of a family of geometric constrained variational problems. In this article, we compare two classes of numerical methods for these variational problems based on piecewise linear (PL) and subdivision surfaces (SS). Since SS methods are based on spline approximation and can be viewed as higher order versions of PL methods, one may expect that the only difference between the two methods is in the accuracy order. In this paper, we prove that a numerical method based on minimizing any one of the `PL Willmore energies' proposed in the literature would fail to converge to a solution of the continuous problem, whereas a method based on minimization of the bona fide Willmore energy, well-defined for SS but not PL surfaces, succeeds. Motivated by this analysis, we propose also a regularization method for the PL method based on techniques from conformal geometry. We address a number of implementation issues crucial for the efficiency of our solver. A software package called Wmincon accompanies this article, provides parallel implementations of all the relevant geometric functionals. When combined with a standard constrained optimization solver, the geometric variational problems can then be solved numerically. To this end, we realize that some of the available optimization algorithms/solvers are capable of preserving symmetry, while others manage to break symmetry; we explore the consequences of this observation.

preprint2020arXiv

On the Uniqueness of Clifford Torus with Prescribed Isoperimetric Ratio

The Marques-Neves theorem asserts that among all the torodial (i.e. genus 1) closed surfaces, the Clifford torus has the minimal Willmore energy $\int H^2 \, dA$. % It is a natural conjecture that if one prescribes the isoperimetric Since the Willmore energy is invariant M{ö}bius transformations, it can be shown that there is a one-parameter family, up to homotheties, of genus 1 Willmore minimizers. It is then a natural conjecture that such a minimizer is unique if one prescribes its isoperimetric ratio. In this article, we show that this conjecture can be reduced to the positivity question of a polynomial recurrence.

preprint2016arXiv

Reducing overfitting in challenge-based competitions

Over-fitting is a dreaded foe in challenge-based competitions. Because participants rely on public leaderboards to evaluate and refine their models, there is always the danger they might over-fit to the holdout data supporting the leaderboard. The recently published Ladder algorithm aims to address this problem by preventing the participants from exploiting willingly or inadvertently minor fluctuations in public leaderboard scores during model refinement. In this paper, we report a vulnerability of the Ladder that induces severe over-fitting of the leaderboard when the sample size is small. To circumvent this attack, we propose a variation of the Ladder that releases a bootstrapped estimate of the public leaderboard score instead of providing participants with a direct measure of performance. We also extend the scope of the Ladder to arbitrary performance metrics by relying on a more broadly applicable testing procedure based on the Bayesian bootstrap. Our method makes it possible to use a leaderboard, with the technical and social advantages that it provides, even in cases where data is scant.

Thomas Yu

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Slice estimation in diffusion MRI of neonatal and fetal brains in image and spherical harmonics domains using autoencoders

The NLP Sandbox: an efficient model-to-data system to enable federated and unbiased evaluation of clinical NLP models

Model-Informed Machine Learning for Multi-component T2 Relaxometry

Numerical Methods for Biomembranes: conforming subdivision methods versus non-conforming PL methods

On the Uniqueness of Clifford Torus with Prescribed Isoperimetric Ratio

Reducing overfitting in challenge-based competitions