Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

On the Limits of Latent Reuse in Diffusion Models

Diffusion models are often trained in low-dimensional latent spaces, which are then reused for related but shifted datasets. In this work, we study when such latent reuse remains reliable under distribution shift. We consider a source-target setting in which both datasets are approximately low-dimensional but may lie near different subspaces. We show that freezing and reusing a source latent space induces a target-domain score error governed by two quantities: the principal-angle misalignment between the source and target subspaces, and the target ambient noise amplified by the diffusion time scale. Motivated by these limits, we further study mixed source-target training and characterize how the required shared latent dimension depends on the relative geometry of the two distributions. Our results provide theoretical guidance on when latent reuse is reliable and when learning a shared representation may be necessary.

preprint2022arXiv

Adversarial Robustness of Visual Dialog

Adversarial robustness evaluates the worst-case performance scenario of a machine learning model to ensure its safety and reliability. This study is the first to investigate the robustness of visually grounded dialog models towards textual attacks. These attacks represent a worst-case scenario where the input question contains a synonym which causes the previously correct model to return a wrong answer. Using this scenario, we first aim to understand how multimodal input components contribute to model robustness. Our results show that models which encode dialog history are more robust, and when launching an attack on history, model prediction becomes more uncertain. This is in contrast to prior work which finds that dialog history is negligible for model performance on this task. We also evaluate how to generate adversarial test examples which successfully fool the model but remain undetected by the user/software designer. We find that the textual, as well as the visual context are important to generate plausible worst-case scenarios.

preprint2022arXiv

Continually Learning Self-Supervised Representations with Projected Functional Regularization

Recent self-supervised learning methods are able to learn high-quality image representations and are closing the gap with supervised approaches. However, these methods are unable to acquire new knowledge incrementally -- they are, in fact, mostly used only as a pre-training phase over IID data. In this work we investigate self-supervised methods in continual learning regimes without any replay mechanism. We show that naive functional regularization, also known as feature distillation, leads to lower plasticity and limits continual learning performance. Instead, we propose Projected Functional Regularization in which a separate temporal projection network ensures that the newly learned feature space preserves information of the previous one, while at the same time allowing for the learning of new features. This prevents forgetting while maintaining the plasticity of the learner. Comparison with other incremental learning approaches applied to self-supervision demonstrates that our method obtains competitive performance in different scenarios and on multiple datasets.

preprint2022arXiv

eX-ViT: A Novel eXplainable Vision Transformer for Weakly Supervised Semantic Segmentation

Recently vision transformer models have become prominent models for a range of vision tasks. These models, however, are usually opaque with weak feature interpretability. Moreover, there is no method currently built for an intrinsically interpretable transformer, which is able to explain its reasoning process and provide a faithful explanation. To close these crucial gaps, we propose a novel vision transformer dubbed the eXplainable Vision Transformer (eX-ViT), an intrinsically interpretable transformer model that is able to jointly discover robust interpretable features and perform the prediction. Specifically, eX-ViT is composed of the Explainable Multi-Head Attention (E-MHA) module, the Attribute-guided Explainer (AttE) module and the self-supervised attribute-guided loss. The E-MHA tailors explainable attention weights that are able to learn semantically interpretable representations from local patches in terms of model decisions with noise robustness. Meanwhile, AttE is proposed to encode discriminative attribute features for the target object through diverse attribute discovery, which constitutes faithful evidence for the model's predictions. In addition, a self-supervised attribute-guided loss is developed for our eX-ViT, which aims at learning enhanced representations through the attribute discriminability mechanism and attribute diversity mechanism, to localize diverse and discriminative attributes and generate more robust explanations. As a result, we can uncover faithful and robust interpretations with diverse attributes through the proposed eX-ViT.

preprint2022arXiv

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance

We study stochastic convex optimization under infinite noise variance. Specifically, when the stochastic gradient is unbiased and has uniformly bounded $(1+κ)$-th moment, for some $κ\in (0,1]$, we quantify the convergence rate of the Stochastic Mirror Descent algorithm with a particular class of uniformly convex mirror maps, in terms of the number of iterations, dimensionality and related geometric parameters of the optimization problem. Interestingly this algorithm does not require any explicit gradient clipping or normalization, which have been extensively used in several recent empirical and theoretical works. We complement our convergence results with information-theoretic lower bounds showing that no other algorithm using only stochastic first-order oracles can achieve improved rates. Our results have several interesting consequences for devising online/streaming stochastic approximation algorithms for problems arising in robust statistics and machine learning.

preprint2022arXiv

Q-LIC: Quantizing Learned Image Compression with Channel Splitting

Learned image compression (LIC) has reached a comparable coding gain with traditional hand-crafted methods such as VVC intra. However, the large network complexity prohibits the usage of LIC on resource-limited embedded systems. Network quantization is an efficient way to reduce the network burden. This paper presents a quantized LIC (QLIC) by channel splitting. First, we explore that the influence of quantization error to the reconstruction error is different for various channels. Second, we split the channels whose quantization has larger influence to the reconstruction error. After the splitting, the dynamic range of channels is reduced so that the quantization error can be reduced. Finally, we prune several channels to keep the number of overall channels as origin. By using the proposal, in the case of 8-bit quantization for weight and activation of both main and hyper path, we can reduce the BD-rate by 0.61%-4.74% compared with the previous QLIC. Besides, we can reach better coding gain compared with the state-of-the-art network quantization method when quantizing MS-SSIM models. Moreover, our proposal can be combined with other network quantization methods to further improve the coding gain. The moderate coding loss caused by the quantization validates the feasibility of the hardware implementation for QLIC in the future.

preprint2022arXiv

Self-Training for Class-Incremental Semantic Segmentation

In class-incremental semantic segmentation, we have no access to the labeled data of previous tasks. Therefore, when incrementally learning new classes, deep neural networks suffer from catastrophic forgetting of previously learned knowledge. To address this problem, we propose to apply a self-training approach that leverages unlabeled data, which is used for the rehearsal of previous knowledge. Specifically, we first learn a temporary model for the current task, and then pseudo labels for the unlabeled data are computed by fusing information from the old model of the previous task and the current temporary model. Additionally, conflict reduction is proposed to resolve the conflicts of pseudo labels generated from both the old and temporary models. We show that maximizing self-entropy can further improve results by smoothing the overconfident predictions. Interestingly, in the experiments we show that the auxiliary data can be different from the training data and that even general-purpose but diverse auxiliary data can lead to large performance gains. The experiments demonstrate state-of-the-art results: obtaining a relative gain of up to 114% on Pascal-VOC 2012 and 8.5% on the more challenging ADE20K compared to previous state-of-the-art methods.

preprint2021arXiv

False Discovery Rates in Biological Networks

The increasing availability of data has generated unprecedented prospects for network analyses in many biological fields, such as neuroscience (e.g., brain networks), genomics (e.g., gene-gene interaction networks), and ecology (e.g., species interaction networks). A powerful statistical framework for estimating such networks is Gaussian graphical models, but standard estimators for the corresponding graphs are prone to large numbers of false discoveries. In this paper, we introduce a novel graph estimator based on knockoffs that imitate the partial correlation structures of unconnected nodes. We show that this new estimator guarantees accurate control of the false discovery rate in theory, simulations, and biological applications, and we provide easy-to-use R code.

preprint2020arXiv

Addressing Class-Imbalance Problem in Personalized Ranking

Pairwise ranking models have been widely used to address recommendation problems. The basic idea is to learn the rank of users' preferred items through separating items into \emph{positive} samples if user-item interactions exist, and \emph{negative} samples otherwise. Due to the limited number of observable interactions, pairwise ranking models face serious \emph{class-imbalance} issues. Our theoretical analysis shows that current sampling-based methods cause the vertex-level imbalance problem, which makes the norm of learned item embeddings towards infinite after a certain training iterations, and consequently results in vanishing gradient and affects the model inference results. We thus propose an efficient \emph{\underline{Vi}tal \underline{N}egative \underline{S}ampler} (VINS) to alleviate the class-imbalance issue for pairwise ranking model, in particular for deep learning models optimized by gradient methods. The core of VINS is a bias sampler with reject probability that will tend to accept a negative candidate with a larger degree weight than the given positive item. Evaluation results on several real datasets demonstrate that the proposed sampling method speeds up the training procedure 30\% to 50\% for ranking models ranging from shallow to deep, while maintaining and even improving the quality of ranking results in top-N item recommendation.

preprint2020arXiv

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Structured non-convex learning problems, for which critical points have favorable statistical properties, arise frequently in statistical machine learning. Algorithmic convergence and statistical estimation rates are well-understood for such problems. However, quantifying the uncertainty associated with the underlying training algorithm is not well-studied in the non-convex setting. In order to address this shortcoming, in this work, we establish an asymptotic normality result for the constant step size stochastic gradient descent (SGD) algorithm--a widely used algorithm in practice. Specifically, based on the relationship between SGD and Markov Chains [DDB19], we show that the average of SGD iterates is asymptotically normally distributed around the expected value of their unique invariant distribution, as long as the non-convex and non-smooth objective function satisfies a dissipativity property. We also characterize the bias between this expected value and the critical points of the objective function under various local regularity conditions. Together, the above two results could be leveraged to construct confidence intervals for non-convex problems that are trained using the SGD algorithm.

preprint2020arXiv

Compressing Facial Makeup Transfer Networks by Collaborative Distillation and Kernel Decomposition

Although the facial makeup transfer network has achieved high-quality performance in generating perceptually pleasing makeup images, its capability is still restricted by the massive computation and storage of the network architecture. We address this issue by compressing facial makeup transfer networks with collaborative distillation and kernel decomposition. The main idea of collaborative distillation is underpinned by a finding that the encoder-decoder pairs construct an exclusive collaborative relationship, which is regarded as a new kind of knowledge for low-level vision tasks. For kernel decomposition, we apply the depth-wise separation of convolutional kernels to build a light-weighted Convolutional Neural Network (CNN) from the original network. Extensive experiments show the effectiveness of the compression method when applied to the state-of-the-art facial makeup transfer network -- BeautyGAN.

preprint2020arXiv

Leidenfrost drop impact on inclined superheated substrates

In real applications, drops always impact on solid walls with various inclinations. For the oblique impact of a Leidenfrost drop, which has a vapor layer under its bottom surface to prevent its direct contact with the superheated substrate, the drop can nearly frictionlessly slide along the substrate accompanied by the spreading and the retracting. To individually study these processes, we experimentally observe ethanol drops impact on superheated inclined substrates using high-speed imaging from two different views synchronously. We first study the dynamic Leidenfrost temperature, which mainly depends on the normal Weber number ${We}_\perp $. Then the substrate temperature is set to be high enough to study the Leidenfrost drop behaviors. During the spreading process, drops always keep uniform. And the maximum spreading factor $D_m/D_0$ follows a power-law dependence on the large normal Weber number ${We}_\perp $ as $D_m/D_0 = \sqrt{We_\perp /12+2}$ for $We_\perp \geq 30$. During the retracting process, drops with low impact velocities become non-uniform due to the gravity effect. For the sliding process, the residence time of all studied drops is nearly a constant, which is not affected by the inclination and $We$ number. The frictionless vapor layer results in the dimensionless sliding distance $L/D_0$ follows a power-law dependence on the parallel Weber number $We_\parallel$ as $L/D_0 \propto We_\parallel^{1/2}$. Without direct contact with the substrate, the behaviors of drops can be separately determined by ${We}_\perp $ and $We_\parallel$. When the impact velocity is too high, the drop fragments into many tiny droplets, which is called the splashing phenomenon. The critical splashing criterion is found to be $We_\perp ^*\simeq$ 120 or $K_\perp = We_\perp Re_\perp^{1/2} \simeq$ 5300 in the current parameter regime.

preprint2020arXiv

Protocol Proxy: An FTE-based Covert Channel

In a hostile network environment, users must communicate without being detected. This involves blending in with the existing traffic. In some cases, a higher degree of secrecy is required. We present a proof-of-concept format transforming encryption (FTE)-based covert channel for tunneling TCP traffic through protected static protocols. Protected static protocols are UDP-based protocols with variable fields that cannot be blocked without collateral damage, such as power grid failures. We (1) convert TCP traffic to UDP traffic, (2) introduce observation-based FTE, and (3) model interpacket timing with a deterministic Hidden Markov Model (HMM). The resulting Protocol Proxy has a very low probability of detection and is an alternative to current covert channels. We tunnel a TCP session through a UDP protocol and guarantee delivery. Observation-based FTE ensures traffic cannot be detected by traditional rule-based analysis or DPI. A deterministic HMM ensures the Protocol Proxy accurately models interpacket timing to avoid detection by side-channel analysis. Finally, the choice of a protected static protocol foils stateful protocol analysis and causes collateral damage with false positives.

preprint2020arXiv

Semantic Drift Compensation for Class-Incremental Learning

Class-incremental learning of deep networks sequentially increases the number of classes to be classified. During training, the network has only access to data of one task at a time, where each task contains several classes. In this setting, networks suffer from catastrophic forgetting which refers to the drastic drop in performance on previous tasks. The vast majority of methods have studied this scenario for classification networks, where for each new task the classification layer of the network must be augmented with additional weights to make room for the newly added classes. Embedding networks have the advantage that new classes can be naturally included into the network without adding new weights. Therefore, we study incremental learning for embedding networks. In addition, we propose a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars. We approximate the drift of previous tasks based on the drift that is experienced by current task data. We perform experiments on fine-grained datasets, CIFAR100 and ImageNet-Subset. We demonstrate that embedding networks suffer significantly less from catastrophic forgetting. We outperform existing methods which do not require exemplars and obtain competitive results compared to methods which store exemplars. Furthermore, we show that our proposed SDC when combined with existing methods to prevent forgetting consistently improves results.