Researcher profile

David Smith

David Smith contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

A saturation-absorption rubidium magnetometer with multilevel optical Bloch-equation modeling for intermediate-to-high fields

We present SASHMAG (Saturated Absorption Spectroscopy High-field MAGnetometer), an atomic sensor designed for precision magnetic-field measurements in the intermediate-to-high field regime ($>0.2\,\text{T}$) using Rubidium-87 ($^{87}Rb$). The sensor operates in the hyperfine Paschen-Back regime, where the hyperfine and Zeeman interactions decouple, and utilizes counter-propagating pump-probe configuration in Faraday geometry to resolve isolated, Doppler-free Zeeman transitions. To interpret the resulting spectra in this strongly field-dependent regime, we developed a comprehensive multilevel optical Bloch-equation model solved explicitly in the uncoupled $\ket{m_I, m_J}$ basis, capturing state mixing and nonlinear saturation dynamics. This model reproduces measured spectra at sub-Doppler resolution and is consistent with analytical expectations for power broadening and thermal Doppler scaling. Magnetic field estimation is performed using a physics-constrained optimization routine that infers the magnetic field by minimizing the residual between experimentally extracted line centers and calculated transition frequencies from the field-dependent Hamiltonian. We demonstrate magnetic field retrieval from $0.2\,\text{T}$ to $0.4\,\text{T}$ with a precision of $\pm 0.0017 \,\text{T}$). Furthermore, the validated simulation establishes a foundation for generating synthetic training datasets, paving the way for autonomous, Machine Learning-enhanced magnetometry in applications ranging from MRI to fusion reactors.

preprint2022arXiv

Private Graph Data Release: A Survey

The application of graph analytics to various domains has yielded tremendous societal and economical benefits in recent years. However, the increasingly widespread adoption of graph analytics comes with a commensurate increase in the need to protect private information in graph data, especially in light of the many privacy breaches in real-world graph data that was supposed to preserve sensitive information. This paper provides a comprehensive survey of private graph data release algorithms that seek to achieve the fine balance between privacy and utility, with a specific focus on provably private mechanisms. Many of these mechanisms are natural extensions of the Differential Privacy framework to graph data, but we also investigate more general privacy formulations like Pufferfish Privacy that address some of the limitations of Differential Privacy. We also provide a wide-ranging survey of the applications of private graph data release mechanisms to social networks, finance, supply chain, and health care. This survey paper and the taxonomy it provides should benefit practitioners and researchers alike in the increasingly important area of private analytics and data release.

preprint2022arXiv

The Fellowship of the Authors: Disambiguating Names from Social Network Context

Most NLP approaches to entity linking and coreference resolution focus on retrieving similar mentions using sparse or dense text representations. The common "Wikification" task, for instance, retrieves candidate Wikipedia articles for each entity mention. For many domains, such as bibliographic citations, authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities mostly occur in the context of other named entities. Unlike prior work, therefore, we seek to leverage the information that can be gained from looking at association networks of individuals derived from textual evidence in order to disambiguate names. We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods. We experiment with data consisting of lists of names from two domains: bibliographic citations from CrossRef and chains of transmission (isnads) from classical Arabic histories. We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora, and that the availability of bibliographic information, such as publication venue or title, can also increase performance on this task. We also present a novel supervised cluster inference model which gives competitive performance for little computational effort, making it ideal for situations where individuals must be identified without relying on an exhaustive authority list.

preprint2022arXiv

Tradeoffs in Resampling and Filtering for Imbalanced Classification

Imbalanced classification problems are extremely common in natural language processing and are solved using a variety of resampling and filtering techniques, which often involve making decisions on how to select training data or decide which test examples should be labeled by the model. We examine the tradeoffs in model performance involved in choices of training sample and filter training and test data in heavily imbalanced token classification task and examine the relationship between the magnitude of these tradeoffs and the base rate of the phenomenon of interest. In experiments on sequence tagging to detect rare phenomena in English and Arabic texts, we find that different methods of selecting training data bring tradeoffs in effectiveness and efficiency. We also see that in highly imbalanced cases, filtering test data using first-pass retrieval models is as important for model performance as selecting training data. The base rate of a rare positive class has a clear effect on the magnitude of the changes in performance caused by the selection of training or test data. As the base rate increases, the differences brought about by those choices decreases.

preprint2021arXiv

The art of coarse Stokes: Richardson extrapolation improves the accuracy and efficiency of the method of regularized stokeslets

The method of regularised stokeslets is widely used in microscale biological fluid dynamics due to its ease of implementation, natural treatment of complex moving geometries, and removal of singular functions to integrate. The standard implementation of the method is subject to high computational cost due to the coupling of the linear system size to the numerical resolution required to resolve the rapidly-varying regularised stokeslet kernel. Here we show how Richardson extrapolation with coarse values of the regularisation parameter is ideally-suited to reduce the quadrature error, hence dramatically reducing the storage and solution costs without loss of accuracy. Numerical experiments on the resistance and mobility problems in Stokes flow support the analysis, confirming several orders of magnitude improvement in accuracy and/or efficiency.

preprint2020arXiv

Designing Environments Conducive to Interpretable Robot Behavior

Designing robots capable of generating interpretable behavior is a prerequisite for achieving effective human-robot collaboration. This means that the robots need to be capable of generating behavior that aligns with human expectations and, when required, provide explanations to the humans in the loop. However, exhibiting such behavior in arbitrary environments could be quite expensive for robots, and in some cases, the robot may not even be able to exhibit the expected behavior. Given structured environments (like warehouses and restaurants), it may be possible to design the environment so as to boost the interpretability of the robot's behavior or to shape the human's expectations of the robot's behavior. In this paper, we investigate the opportunities and limitations of environment design as a tool to promote a type of interpretable behavior -- known in the literature as explicable behavior. We formulate a novel environment design framework that considers design over multiple tasks and over a time horizon. In addition, we explore the longitudinal aspect of explicable behavior and the trade-off that arises between the cost of design and the cost of generating explicable behavior over a time horizon.

preprint2020arXiv

Peer-to-Peer Trading in Electricity Networks: An Overview

Peer-to-peer trading is a next-generation energy management technique that economically benefits proactive consumers (prosumers) transacting their energy as goods and services. At the same time, peer-to-peer energy trading is also expected to help the grid by reducing peak demand, lowering reserve requirements, and curtailing network loss. However, large-scale deployment of peer-to-peer trading in electricity networks poses a number of challenges in modeling transactions in both the virtual and physical layers of the network. As such, this article provides a comprehensive review of the state-of-the-art in research on peer-to-peer energy trading techniques. By doing so, we provide an overview of the key features of peer-to-peer trading and its benefits of relevance to the grid and prosumers. Then, we systematically classify the existing research in terms of the challenges that the studies address in the virtual and the physical layers. We then further identify and discuss those technical approaches that have been extensively used to address the challenges in peer-to-peer transactions. Finally, the paper is concluded with potential future research directions.

preprint2020arXiv

The Cost of Privacy in Asynchronous Differentially-Private Machine Learning

We consider training machine learning models using Training data located on multiple private and geographically-scattered servers with different privacy settings. Due to the distributed nature of the data, communicating with all collaborating private data owners simultaneously may prove challenging or altogether impossible. In this paper, we develop differentially-private asynchronous algorithms for collaboratively training machine-learning models on multiple private datasets. The asynchronous nature of the algorithms implies that a central learner interacts with the private data owners one-on-one whenever they are available for communication without needing to aggregate query responses to construct gradients of the entire fitness function. Therefore, the algorithm efficiently scales to many data owners. We define the cost of privacy as the difference between the fitness of a privacy-preserving machine-learning model and the fitness of trained machine-learning model in the absence of privacy concerns. We prove that we can forecast the performance of the proposed privacy-preserving asynchronous algorithms. We demonstrate that the cost of privacy has an upper bound that is inversely proportional to the combined size of the training datasets squared and the sum of the privacy budgets squared. We validate the theoretical results with experiments on financial and medical datasets. The experiments illustrate that collaboration among more than 10 data owners with at least 10,000 records with privacy budgets greater than or equal to 1 results in a superior machine-learning model in comparison to a model trained in isolation on only one of the datasets, illustrating the value of collaboration and the cost of the privacy. The number of the collaborating datasets can be lowered if the privacy budget is higher.

preprint2019arXiv

Multi-stage Antenna Selection for Adaptive Beamforming in MIMO Arrays

Increasing the number of transmit and receive elements in multiple-input-multiple-output (MIMO) antenna arrays imposes a substantial increase in hardware and computational costs. We mitigate this problem by employing a reconfigurable MIMO array where large transmit and receive arrays are multiplexed in a smaller set of k baseband signals. We consider four stages for the MIMO array configuration and propose four different selection strategies to offer dimensionality reduction in post-processing and achieve hardware cost reduction in digital signal processing (DSP) and radio-frequency (RF) stages. We define the problem as a determinant maximization and develop a unified formulation to decouple the joint problem and select antennas/elements in various stages in one integrated problem. We then analyze the performance of the proposed selection approaches and prove that, in terms of the output SINR, a joint transmit-receive selection method performs best followed by matched-filter, hybrid and factored selection methods. The theoretical results are validated numerically, demonstrating that all methods allow an excellent trade-off between performance and cost.