Source author record

Joseph Wang

Joseph Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Biological Physics cond-mat.soft cond-mat.dis-nn Molecular Networks Social and Information Networks Artificial Intelligence Computation and Language Computer Vision Data Structures and Algorithms Discrete Mathematics eess.AS Other Quantitative Biology physics.flu-dyn Sound

Catalog footprint

What is connected

15works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Wakeword Detection under Distribution Shifts

We propose a novel approach for semi-supervised learning (SSL) designed to overcome distribution shifts between training and real-world data arising in the keyword spotting (KWS) task. Shifts from training data distribution are a key challenge for real-world KWS tasks: when a new model is deployed on device, the gating of the accepted data undergoes a shift in distribution, making the problem of timely updates via subsequent deployments hard. Despite the shift, we assume that the marginal distributions on labels do not change. We utilize a modified teacher/student training framework, where labeled training data is augmented with unlabeled data. Note that the teacher does not have access to the new distribution as well. To train effectively with a mix of human and teacher labeled data, we develop a teacher labeling strategy based on confidence heuristics to reduce entropy on the label distribution from the teacher model; the data is then sampled to match the marginal distribution on the labels. Large scale experimental results show that a convolutional neural network (CNN) trained on far-field audio, and evaluated on far-field audio drawn from a different distribution, obtains a 14.3% relative improvement in false discovery rate (FDR) at equal false reject rate (FRR), while yielding a 5% improvement in FDR under no distribution shift. Under a more severe distribution shift from far-field to near-field audio with a smaller fully connected network (FCN) our approach achieves a 52% relative improvement in FDR at equal FRR, while yielding a 20% relative improvement in FDR on the original distribution.

preprint2016arXiv

Optimally Pruning Decision Tree Ensembles With Feature Cost

We consider the problem of learning decision rules for prediction with feature budget constraint. In particular, we are interested in pruning an ensemble of decision trees to reduce expected feature cost while maintaining high prediction accuracy for any test example. We propose a novel 0-1 integer program formulation for ensemble pruning. Our pruning formulation is general - it takes any ensemble of decision trees as input. By explicitly accounting for feature-sharing across trees together with accuracy/cost trade-off, our method is able to significantly reduce feature cost by pruning subtrees that introduce more loss in terms of feature cost than benefit in terms of prediction accuracy gain. Theoretically, we prove that a linear programming relaxation produces the exact solution of the original integer program. This allows us to use efficient convex optimization tools to obtain an optimally pruned ensemble for any given budget. Empirically, we see that our pruning algorithm significantly improves the performance of the state of the art ensemble method BudgetRF.

preprint2016arXiv

Pruning Random Forests for Prediction on a Budget

We propose to prune a random forest (RF) for resource-constrained prediction. We first construct a RF and then prune it to optimize expected feature cost & accuracy. We pose pruning RFs as a novel 0-1 integer program with linear constraints that encourages feature re-use. We establish total unimodularity of the constraint set to prove that the corresponding LP relaxation solves the original integer program. We then exploit connections to combinatorial optimization and develop an efficient primal-dual algorithm, scalable to large datasets. In contrast to our bottom-up approach, which benefits from good RF initialization, conventional methods are top-down acquiring features based on their utility value and is generally intractable, requiring heuristics. Empirically, our pruning algorithm outperforms existing state-of-the-art resource-constrained algorithms.

preprint2016arXiv

Resource Constrained Structured Prediction

We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition (OCR) and dependency parsing and show strong performance in reduction of the feature costs without degrading accuracy.

preprint2015arXiv

Efficient Learning by Directed Acyclic Graph For Resource Constrained Prediction

We study the problem of reducing test-time acquisition costs in classification systems. Our goal is to learn decision rules that adaptively select sensors for each example as necessary to make a confident prediction. We model our system as a directed acyclic graph (DAG) where internal nodes correspond to sensor subsets and decision functions at each node choose whether to acquire a new sensor or classify using the available measurements. This problem can be naturally posed as an empirical risk minimization over training data. Rather than jointly optimizing such a highly coupled and non-convex problem over all decision nodes, we propose an efficient algorithm motivated by dynamic programming. We learn node policies in the DAG by reducing the global objective to a series of cost sensitive learning problems. Our approach is computationally efficient and has proven guarantees of convergence to the optimal system for a fixed architecture. In addition, we present an extension to map other budgeted learning problems with large number of sensors to our DAG architecture and demonstrate empirical performance exceeding state-of-the-art algorithms for data composed of both few and many sensors.

preprint2015arXiv

Feature-Budgeted Random Forest

We seek decision rules for prediction-time cost reduction, where complete data is available for training, but during prediction-time, each feature can only be acquired for an additional cost. We propose a novel random forest algorithm to minimize prediction error for a user-specified {\it average} feature acquisition budget. While random forests yield strong generalization performance, they do not explicitly account for feature costs and furthermore require low correlation among trees, which amplifies costs. Our random forest grows trees with low acquisition cost and high strength based on greedy minimax cost-weighted-impurity splits. Theoretically, we establish near-optimal acquisition cost guarantees for our algorithm. Empirically, on a number of benchmark datasets we demonstrate superior accuracy-cost curves against state-of-the-art prediction-time algorithms.

preprint2015arXiv

Max-Cost Discrete Function Evaluation Problem under a Budget

We propose novel methods for max-cost Discrete Function Evaluation Problem (DFEP) under budget constraints. We are motivated by applications such as clinical diagnosis where a patient is subjected to a sequence of (possibly expensive) tests before a decision is made. Our goal is to develop strategies for minimizing max-costs. The problem is known to be NP hard and greedy methods based on specialized impurity functions have been proposed. We develop a broad class of \emph{admissible} impurity functions that admit monomials, classes of polynomials, and hinge-loss functions that allow for flexible impurity design with provably optimal approximation bounds. This flexibility is important for datasets when max-cost can be overly sensitive to "outliers." Outliers bias max-cost to a few examples that require a large number of tests for classification. We design admissible functions that allow for accuracy-cost trade-off and result in $O(\log n)$ guarantees of the optimal cost among trees with corresponding classification accuracy levels.

preprint2015arXiv

Sensor Selection by Linear Programming

We learn sensor trees from training data to minimize sensor acquisition costs during test time. Our system adaptively selects sensors at each stage if necessary to make a confident classification. We pose the problem as empirical risk minimization over the choice of trees and node decision rules. We decompose the problem, which is known to be intractable, into combinatorial (tree structures) and continuous parts (node decision rules) and propose to solve them separately. Using training data we greedily solve for the combinatorial tree structures and for the continuous part, which is a non-convex multilinear objective function, we derive convex surrogate loss functions that are piecewise linear. The resulting problem can be cast as a linear program and has the advantage of guaranteed convergence, global optimality, repeatability and computational efficiency. We show that our proposed approach outperforms the state-of-art on a number of benchmark datasets.

preprint2011arXiv

Biomolecular Filters for Improved Separation of Output Signals in Enzyme Logic Systems Applied to Biomedical Analysis

Biomolecular logic systems processing biochemical input signals and producing "digital" outputs in the form of YES/NO were developed for analysis of physiological conditions characteristic of liver injury, soft tissue injury and abdominal trauma. Injury biomarkers were used as input signals for activating the logic systems. Their normal physiological concentrations were defined as logic-0 level, while their pathologically elevated concentrations were defined as logic-1 values. Since the input concentrations applied as logic 0 and 1 values were not sufficiently different, the output signals being at low and high values (0, 1 outputs) were separated with a short gap making their discrimination difficult. Coupled enzymatic reactions functioning as a biomolecular signal processing system with a built-in filter property were developed. The filter process involves a partial back-conversion of the optical-output-signal-yielding product, but only at its low concentrations, thus allowing the proper discrimination between 0 and 1 output values.

preprint2011arXiv

High-Speed Propulsion of Flexible Nanowire Motors: Theory and Experiments

Micro/nano-scale propulsion has attracted considerable recent attention due to its promise for biomedical applications such as targeted drug delivery. In this paper, we report on a new experimental design and theoretical modelling of high-speed fuel-free magnetically-driven propellers which exploit the flexibility of nanowires for propulsion. These readily prepared nanomotors display both high dimensional propulsion velocities (up to ~ 21 micrometer per second) and dimensionless speeds (in body lengths per revolution) when compared with natural microorganisms and other artificial propellers. Their propulsion characteristics are studied theoretically using an elastohydrodynamic model which takes into account the elasticity of the nanowire and its hydrodynamic interaction with the fluid medium. The critical role of flexibility in this mode of propulsion is illustrated by simple physical arguments, and is quantitatively investigated with the help of an asymptotic analysis for small-amplitude swimming. The theoretical predictions are then compared with experimental measurements and we obtain good agreement. Finally, we demonstrate the operation of these nanomotors in a real biological environment (human serum), emphasizing the robustness of their propulsion performance and their promise for biomedical applications.

preprint2011arXiv

Structural Similarity and Distance in Learning

We propose a novel method of introducing structure into existing machine learning techniques by developing structure-based similarity and distance measures. To learn structural information, low-dimensional structure of the data is captured by solving a non-linear, low-rank representation problem. We show that this low-rank representation can be kernelized, has a closed-form solution, allows for separation of independent manifolds, and is robust to noise. From this representation, similarity between observations based on non-linear structure is computed and can be incorporated into existing feature transformations, dimensionality reduction techniques, and machine learning methods. Experimental results on both synthetic and real data sets show performance improvements for clustering, and anomaly detection through the use of structural similarity.

preprint2010arXiv

Enzymatic AND Logic Gates Operated Under Conditions Characteristic of Biomedical Applications

Experimental and theoretical analyses of the lactate dehydrogenase and glutathione reductase based enzymatic AND logic gates in which the enzymes and their substrates serve as logic inputs are performed. These two systems are examples of the novel, previously unexplored, class of biochemical logic gates that illustrate potential biomedical applications of biochemical logic. They are characterized by input concentrations at logic 0 and 1 states corresponding to normal and abnormal physiological conditions. Our analysis shows that the logic gates under investigation have similar noise characteristics. Both significantly amplify random noise present in inputs, however we establish that for realistic widths of the input noise distributions, it is still possible to differentiate between the logic 0 and 1 states of the output. This indicates that reliable detection of abnormal biomedical conditions is indeed possible with such enzyme-logic systems.

preprint2009arXiv

Towards Biosensing Strategies Based on Biochemical Logic Systems

Recent advances in biochemical computing, i.e., information processing with cascades of primarily enzymatic reactions realizing computing gates, such as AND, OR, etc., as well as progress in networking these gates and coupling of the resulting systems to smart/responsive electrodes for output readout, have opened new biosensing opportunities. Here we survey existing enabling research results, as well as ideas and research avenues for future development of a new paradigm of digitally operating biosensors logically processing multiple biochemical signals through Boolean logic networks composed of biomolecular reactions, yielding the final output signals as YES/NO responses. Such systems can lead to high-fidelity biosensing compared to common single or parallel sensing devices.

preprint2002arXiv

A steady state model for graph power laws

Power law distribution seems to be an important characteristic of web graphs. Several existing web graph models generate power law graphs by adding new vertices and non-uniform edge connectivities to existing graphs. Researchers have conjectured that preferential connectivity and incremental growth are both required for the power law distribution. In this paper, we propose a different web graph model with power law distribution that does not require incremental growth. We also provide a comparison of our model with several others in their ability to predict web graph clustering behavior.

preprint2000arXiv

Fast Approximation of Centrality

Social studies researchers use graphs to model group activities in social networks. An important property in this context is the centrality of a vertex: the inverse of the average distance to each other vertex. We describe a randomized approximation algorithm for centrality in weighted graphs. For graphs exhibiting the small world phenomenon, our method estimates the centrality of all vertices with high probability within a (1+epsilon) factor in near-linear time.

Joseph Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Wakeword Detection under Distribution Shifts

Optimally Pruning Decision Tree Ensembles With Feature Cost

Pruning Random Forests for Prediction on a Budget

Resource Constrained Structured Prediction

Efficient Learning by Directed Acyclic Graph For Resource Constrained Prediction

Feature-Budgeted Random Forest

Max-Cost Discrete Function Evaluation Problem under a Budget

Sensor Selection by Linear Programming

Biomolecular Filters for Improved Separation of Output Signals in Enzyme Logic Systems Applied to Biomedical Analysis

High-Speed Propulsion of Flexible Nanowire Motors: Theory and Experiments

Structural Similarity and Distance in Learning

Enzymatic AND Logic Gates Operated Under Conditions Characteristic of Biomedical Applications

Towards Biosensing Strategies Based on Biochemical Logic Systems

A steady state model for graph power laws

Fast Approximation of Centrality