Source author record

Fanny Yang

Fanny Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT math.ST Statistics Theory Computer Vision Cryptography and Security Methodology Robotics Sound

Catalog footprint

What is connected

14works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Counterfactual Spaces

We mathematically axiomatise the stochastics of counterfactuals, by introducing two related frameworks, called counterfactual probability spaces and counterfactual causal spaces, which we collectively term counterfactual spaces. They are, respectively, probability and causal spaces whose underlying measurable spaces are products of world-specific measurable spaces. In contrast to more familiar accounts of counterfactuals founded on causal models, we do not view interventions as a necessary component of a theory of counterfactuals. As an alternative to Pearl's celebrated ladder of causation, we view counterfactuals and interventions are orthogonal concepts, respectively mathematised in counterfactual probability spaces and causal spaces. The two concepts are then combined to form counterfactual causal spaces. At the heart of our theory is the notion of shared information between the worlds, encoded completely within the probability measure and causal kernels, and whose extremes are characterised by independence and synchronisation of worlds. Compared to existing frameworks, counterfactual spaces enable the mathematical treatment of a strictly broader spectrum of counterfactuals.

preprint2026arXiv

WavFlow: Audio Generation in Waveform Space

Modern audio generation predominantly relies on latent-space compression, introducing additional complexity and potential information loss. In this work, we challenge this paradigm with WavFlow, a framework that generates high-fidelity audio directly in raw waveform space without intermediate representations. To overcome the inherent difficulties of modeling high-dimensional and low-energy signals, we reshape audio into 2D token grids through waveform patchify and introduce amplitude lifting to align signal scales, enabling stable optimization via direct x-prediction in flow matching. To capture complex semantic alignment and temporal synchronization, we leverage an automated data pipeline to curate 5 million high-quality video-text-audio triplets, allowing the model to learn fine-grained acoustic patterns from scratch. Experimental results show that WavFlow achieves competitive performance on the video-to-audio benchmark VGGSound (FD_PaSST: 59.98, IS_PANNs: 17.40, DeSync: 0.44) and the text-to-audio benchmark AudioCaps (FD_PANNs: 10.63, IS_PANNs: 12.62), matching or exceeding the performance of established latent-based methods. Our work demonstrates that intermediate compression is not a prerequisite for high-quality synthesis, offering a simpler and more scalable alternative for multimodal audio generation.

preprint2023arXiv

Tight bounds for maximum $\ell_1$-margin classifiers

Popular iterative algorithms such as boosting methods and coordinate descent on linear models converge to the maximum $\ell_1$-margin classifier, a.k.a. sparse hard-margin SVM, in high dimensional regimes where the data is linearly separable. Previous works consistently show that many estimators relying on the $\ell_1$-norm achieve improved statistical rates for hard sparse ground truths. We show that surprisingly, this adaptivity does not apply to the maximum $\ell_1$-margin classifier for a standard discriminative setting. In particular, for the noiseless setting, we prove tight upper and lower bounds for the prediction error that match existing rates of order $\frac{\|w^*\|_1^{2/3}}{n^{1/3}}$ for general ground truths. To complete the picture, we show that when interpolating noisy observations, the error vanishes at a rate of order $\frac{1}{\sqrt{\log(d/n)}}$. We are therefore first to show benign overfitting for the maximum $\ell_1$-margin classifier.

preprint2022arXiv

Provable concept learning for interpretable predictions using variational autoencoders

In safety-critical applications, practitioners are reluctant to trust neural networks when no interpretable explanations are available. Many attempts to provide such explanations revolve around pixel-based attributions or use previously known concepts. In this paper we aim to provide explanations by provably identifying \emph{high-level, previously unknown ground-truth concepts}. To this end, we propose a probabilistic modeling framework to derive (C)oncept (L)earning and (P)rediction (CLAP) -- a VAE-based classifier that uses visually interpretable concepts as predictors for a simple classifier. Assuming a generative model for the ground-truth concepts, we prove that CLAP is able to identify them while attaining optimal classification accuracy. Our experiments on synthetic datasets verify that CLAP identifies distinct ground-truth concepts on synthetic datasets and yields promising results on the medical Chest X-Ray dataset.

preprint2022arXiv

Self-supervised Reinforcement Learning with Independently Controllable Subgoals

To successfully tackle challenging manipulation tasks, autonomous agents must learn a diverse set of skills and how to combine them. Recently, self-supervised agents that set their own abstract goals by exploiting the discovered structure in the environment were shown to perform well on many different tasks. In particular, some of them were applied to learn basic manipulation skills in compositional multi-object environments. However, these methods learn skills without taking the dependencies between objects into account. Thus, the learned skills are difficult to combine in realistic environments. We propose a novel self-supervised agent that estimates relations between environment components and uses them to independently control different parts of the environment state. In addition, the estimated relations between objects can be used to decompose a complex goal into a compatible sequence of subgoals. We show that, by using this framework, an agent can efficiently and automatically learn manipulation tasks in multi-object environments with different relations between objects.

preprint2022arXiv

Semi-supervised novelty detection using ensembles with regularized disagreement

Deep neural networks often predict samples with high confidence even when they come from unseen classes and should instead be flagged for expert evaluation. Current novelty detection algorithms cannot reliably identify such near OOD points unless they have access to labeled data that is similar to these novel samples. In this paper, we develop a new ensemble-based procedure for semi-supervised novelty detection (SSND) that successfully leverages a mixture of unlabeled ID and novel-class samples to achieve good detection performance. In particular, we show how to achieve disagreement only on OOD data using early stopping regularization. While we prove this fact for a simple data distribution, our extensive experiments suggest that it holds true for more complex scenarios: our approach significantly outperforms state-of-the-art SSND methods on standard image data sets (SVHN/CIFAR-10/CIFAR-100) and medical image data sets with only a negligible increase in computation cost.

preprint2022arXiv

Tight bounds for minimum l1-norm interpolation of noisy data

We provide matching upper and lower bounds of order $σ^2/\log(d/n)$ for the prediction error of the minimum $\ell_1$-norm interpolator, a.k.a. basis pursuit. Our result is tight up to negligible terms when $d \gg n$, and is the first to imply asymptotic consistency of noisy minimum-norm interpolation for isotropic features and sparse ground truths. Our work complements the literature on "benign overfitting" for minimum $\ell_2$-norm interpolation, where asymptotic consistency can be achieved only when the features are effectively low-dimensional.

preprint2022arXiv

Why adversarial training can hurt robust accuracy

Machine learning classifiers with high test accuracy often perform poorly under adversarial attacks. It is commonly believed that adversarial training alleviates this issue. In this paper, we demonstrate that, surprisingly, the opposite may be true -- Even though adversarial training helps when enough data is available, it may hurt robust generalization in the small sample size regime. We first prove this phenomenon for a high-dimensional linear classification setting with noiseless observations. Our proof provides explanatory insights that may also transfer to feature learning models. Further, we observe in experiments on standard image datasets that the same behavior occurs for perceptible attacks that effectively reduce class information such as mask attacks and object corruptions.

preprint2020arXiv

Understanding and Mitigating the Tradeoff Between Robustness and Accuracy

Adversarial training augments the training set with perturbations to improve the robust error (over worst-case perturbations), but it often leads to an increase in the standard error (on unperturbed test inputs). Previous explanations for this tradeoff rely on the assumption that no predictor in the hypothesis class has low standard and robust error. In this work, we precisely characterize the effect of augmentation on the standard error in linear regression when the optimal linear predictor has zero standard and robust error. In particular, we show that the standard error could increase even when the augmented perturbations have noiseless observations from the optimal linear predictor. We then prove that the recently proposed robust self-training (RST) estimator improves robust error without sacrificing standard error for noiseless linear regression. Empirically, for neural networks, we find that RST with different adversarial training methods improves both standard and robust error for random and adversarial rotations and adversarial $\ell_\infty$ perturbations in CIFAR-10.

preprint2019arXiv

Regularized Learning for Domain Adaptation under Label Shifts

We propose Regularized Learning under Label shifts (RLLS), a principled and a practical domain-adaptation algorithm to correct for shifts in the label distribution between a source and a target domain. We first estimate importance weights using labeled source data and unlabeled target data, and then train a classifier on the weighted source samples. We derive a generalization bound for the classifier on the target domain which is independent of the (ambient) data dimensions, and instead only depends on the complexity of the function class. To the best of our knowledge, this is the first generalization bound for the label-shift problem where the labels in the target domain are not available. Based on this bound, we propose a regularized estimator for the small-sample regime which accounts for the uncertainty in the estimated weights. Experiments on the CIFAR-10 and MNIST datasets show that RLLS improves classification accuracy, especially in the low sample and large-shift regimes, compared to previous methods.

preprint2015arXiv

Statistical and Computational Guarantees for the Baum-Welch Algorithm

The Hidden Markov Model (HMM) is one of the mainstays of statistical modeling of discrete time series, with applications including speech recognition, computational biology, computer vision and econometrics. Estimating an HMM from its observation process is often addressed via the Baum-Welch algorithm, which is known to be susceptible to local optima. In this paper, we first give a general characterization of the basin of attraction associated with any global optimum of the population likelihood. By exploiting this characterization, we provide non-asymptotic finite sample guarantees on the Baum-Welch updates, guaranteeing geometric convergence to a small ball of radius on the order of the minimax rate around a global optimum. As a concrete example, we prove a linear rate of convergence for a hidden Markov mixture of two isotropic Gaussians given a suitable mean separation and an initialization within a ball of large radius around (one of) the true parameters. To our knowledge, these are the first rigorous local convergence guarantees to global optima for the Baum-Welch algorithm in a setting where the likelihood function is nonconvex. We complement our theoretical results with thorough numerical simulations studying the convergence of the Baum-Welch algorithm and illustrating the accuracy of our predictions.

preprint2014arXiv

Phase retrieval from low-rate samples

The paper considers the phase retrieval problem in N-dimensional complex vector spaces. It provides two sets of deterministic measurement vectors which guarantee signal recovery for all signals, excluding only a specific subspace and a union of subspaces, respectively. A stable analytic reconstruction procedure of low complexity is given. Additionally it is proven that signal recovery from these measurements can be solved exactly via a semidefinite program. A practical implementation with 4 deterministic diffraction patterns is provided and some numerical experiments with noisy measurements complement the analytic approach.

preprint2014arXiv

Phaseless Signal Recovery in Infinite Dimensional Spaces using Structured Modulations

This paper considers the recovery of continuous signals in infinite dimensional spaces from the magnitude of their frequency samples. It proposes a sampling scheme which involves a combination of oversampling and modulations with complex exponentials. Sufficient conditions are given such that almost every signal with compact support can be reconstructed up to a unimodular constant using only its magnitude samples in the frequency domain. Finally it is shown that an average sampling rate of four times the Nyquist rate is enough to reconstruct almost every time-limited signal.

preprint2013arXiv

Phase Retrieval via Structured Modulations in Paley-Wiener Spaces

This paper considers the recovery of continuous time signals from the magnitude of its samples. It uses a combination of structured modulation and oversampling and provides sufficient conditions on the signal and the sampling system such that signal recovery is possible. In particular, it is shown that an average sampling rate of four times the Nyquist rate is sufficient to reconstruct a signal from its magnitude measurements.

Fanny Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Counterfactual Spaces

WavFlow: Audio Generation in Waveform Space

Tight bounds for maximum $\ell_1$-margin classifiers

Provable concept learning for interpretable predictions using variational autoencoders

Self-supervised Reinforcement Learning with Independently Controllable Subgoals

Semi-supervised novelty detection using ensembles with regularized disagreement

Tight bounds for minimum l1-norm interpolation of noisy data

Why adversarial training can hurt robust accuracy

Understanding and Mitigating the Tradeoff Between Robustness and Accuracy

Regularized Learning for Domain Adaptation under Label Shifts

Statistical and Computational Guarantees for the Baum-Welch Algorithm

Phase retrieval from low-rate samples

Phaseless Signal Recovery in Infinite Dimensional Spaces using Structured Modulations

Phase Retrieval via Structured Modulations in Paley-Wiener Spaces