Researcher profile

David Carlson

David Carlson contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

AugmentedPCA: A Python Package of Supervised and Adversarial Linear Factor Models

Deep autoencoders are often extended with a supervised or adversarial loss to learn latent representations with desirable properties, such as greater predictivity of labels and outcomes or fairness with respects to a sensitive variable. Despite the ubiquity of supervised and adversarial deep latent factor models, these methods should demonstrate improvement over simpler linear approaches to be preferred in practice. This necessitates a reproducible linear analog that still adheres to an augmenting supervised or adversarial objective. We address this methodological gap by presenting methods that augment the principal component analysis (PCA) objective with either a supervised or an adversarial objective and provide analytic and reproducible solutions. We implement these methods in an open-source Python package, AugmentedPCA, that can produce excellent real-world baselines. We demonstrate the utility of these factor models on an open-source, RNA-seq cancer gene expression dataset, showing that augmenting with a supervised objective results in improved downstream classification performance, produces principal components with greater class fidelity, and facilitates identification of genes aligned with the principal axes of data variance with implications to development of specific types of cancer.

preprint2022arXiv

Multiple Domain Causal Networks

Observational studies are regarded as economic alternatives to randomized trials, often used in their stead to investigate and determine treatment efficacy. Due to lack of sample size, observational studies commonly combine data from multiple sources or different sites/centers. Despite the benefits of an increased sample size, a naive combination of multicenter data may result in incongruities stemming from center-specific protocols for generating cohorts or reactions towards treatments distinct to a given center, among other things. These issues arise in a variety of other contexts, including capturing a treatment effect related to an individual's unique biological characteristics. Existing methods for estimating heterogeneous treatment effects have not adequately addressed the multicenter context, but rather treat it simply as a means to obtain sufficient sample size. Additionally, previous approaches to estimating treatment effects do not straightforwardly generalize to the multicenter design, especially when required to provide treatment insights for patients from a new, unobserved center. To address these shortcomings, we propose Multiple Domain Causal Networks (MDCN), an approach that simultaneously strengthens the information sharing between similar centers while addressing the selection bias in treatment assignment through learning of a new feature embedding. In empirical evaluations, MDCN is consistently more accurate when estimating the heterogeneous treatment effect in new centers compared to benchmarks that adjust solely based on treatment imbalance or general center differences. Finally, we justify our approach by providing theoretical analyses that demonstrate that MDCN improves on the generalization bound of the new, unobserved target center.

preprint2022arXiv

Supervising the Decoder of Variational Autoencoders to Improve Scientific Utility

Probabilistic generative models are attractive for scientific modeling because their inferred parameters can be used to generate hypotheses and design experiments. This requires that the learned model provide an accurate representation of the input data and yield a latent space that effectively predicts outcomes relevant to the scientific question. Supervised Variational Autoencoders (SVAEs) have previously been used for this purpose, where a carefully designed decoder can be used as an interpretable generative model while the supervised objective ensures a predictive latent representation. Unfortunately, the supervised objective forces the encoder to learn a biased approximation to the generative posterior distribution, which renders the generative parameters unreliable when used in scientific models. This issue has remained undetected as reconstruction losses commonly used to evaluate model performance do not detect bias in the encoder. We address this previously-unreported issue by developing a second order supervision framework (SOS-VAE) that influences the decoder to induce a predictive latent representation. This ensures that the associated encoder maintains a reliable generative interpretation. We extend this technique to allow the user to trade-off some bias in the generative parameters for improved predictive performance, acting as an intermediate option between SVAEs and our new SOS-VAE. We also use this methodology to address missing data issues that often arise when combining recordings from multiple scientific experiments. We demonstrate the effectiveness of these developments using synthetic data and electrophysiological recordings with an emphasis on how our learned representations can be used to design scientific experiments.

preprint2022arXiv

Universal visible emitters in nanoscale integrated photonics

Visible wavelengths of light control the quantum matter of atoms and molecules and are foundational for quantum technologies, including computers, sensors, and clocks. The development of visible integrated photonics opens the possibility for scalable circuits with complex functionalities, advancing both the scientific and technological frontiers. We experimentally demonstrate an inverse design approach based on superposition of guided-mode sources, allowing the generation and full control of free-space radiation directly from within a single 150 nm layer Ta2O5, showing low loss across visible and near-infrared spectra. We generate diverging circularly-polarized beams at the challenging 461 nm wavelength that can be directly used for magneto-optical traps of strontium atoms, constituting a fundamental building block for a range of atomic-physics-based quantum technologies. Our generated topological vortex beams and spatially-varying polarization emitters could open unexplored light-matter interaction pathways, enabling a broad new photonic-atomic paradigm. Our platform highlights the generalizability of nanoscale devices for visible-laser emission and will be critical for scaling quantum technologies.

preprint2020arXiv

Mid-infrared frequency combs at 10 GHz

We demonstrate mid-infrared (MIR) frequency combs at 10 GHz repetition rate via intra-pulse difference-frequency generation (DFG) in quasi-phase-matched nonlinear media. Few-cycle pump pulses ($\mathbf{\lesssim}$15 fs, 100 pJ) from a near-infrared (NIR) electro-optic frequency comb are provided via nonlinear soliton-like compression in photonic-chip silicon-nitride waveguides. Subsequent intra-pulse DFG in periodically-poled lithium niobate waveguides yields MIR frequency combs in the 3.1--4.1 $μ$m region, while orientation-patterned gallium phosphide provides coverage across 7--11 $μ$m. Cascaded second-order nonlinearities simultaneously provide access to the carrier-envelope-offset frequency of the pump source via in-line f-2f nonlinear interferometry. The high-repetition rate MIR frequency combs introduced here can be used for condensed phase spectroscopy and applications such as laser heterodyne radiometry.