Source author record

Yiwei Zhang

Yiwei Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.DS Information Theory math.IT Computer Vision Machine Learning Artificial Intelligence math-ph math.CO math.MP Cryptography and Security eess.AS math.GR math.OC math.PR Multimedia Robotics Social and Information Networks Sound

Catalog footprint

What is connected

20works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Information Theoretic Adversarial Training of Large Language Models

Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors under novel attack strategies. While adversarial training can improve robustness, existing approaches are computationally expensive and difficult to scale. Recent continuous adversarial training methods, such as Continuous adversarial training (CAT) and Continuous Adversarial Preference Optimization (CAPO), address this challenge by leveraging gradient-based perturbations in the embedding space, enabling more efficient and expressive attacks. Building on this paradigm, we propose WARDEN, a distributionally robust adversarial training framework for LLMs that dynamically reweights adversarial examples through an f -divergence ambiguity set around the empirical training distribution. Our method optimizes the worst-case adversarial loss within a divergence ball around the empirical data distribution, automatically emphasizing harder adversarial examples. Using the convex dual formulation, the objective reduces to a log-sum-exp form under the KL divergence, with a dynamical parameter controlling the strength of reweighting. This study leads to a new class of information-theoretic objectives that significantly reduce attack success rates while maintaining model utility. Across multiple LLMs and attack settings, WARDEN substantially reduces attack success rates with computational and utility costs comparable to CAT-, CAPO-, and MixAT-based baselines, making it a practical approach for scalable robust alignment.

preprint2026arXiv

Integrating Diverse Assignment Strategies into DETRs

Label assignment is a critical component in object detectors, particularly within DETR-style frameworks where the one-to-one matching strategy, despite its end-to-end elegance, suffers from slow convergence due to sparse supervision. While recent works have explored one-to-many assignments to enrich supervisory signals, they often introduce complex, architecture-specific modifications and typically focus on a single auxiliary strategy, lacking a unified and scalable design. In this paper, we first systematically investigate the effects of ``one-to-many'' supervision and reveal a surprising insight that performance gains are driven not by the sheer quantity of supervision, but by the diversity of the assignment strategies employed. This finding suggests that a more elegant, parameter-efficient approach is attainable. Building on this insight, we propose LoRA-DETR, a flexible and lightweight framework that seamlessly integrates diverse assignment strategies into any DETR-style detector. Our method augments the primary network with multiple Low-Rank Adaptation (LoRA) branches during training, each instantiating a different one-to-many assignment rule. These branches act as auxiliary modules that inject rich, varied supervisory gradients into the main model and are discarded during inference, thus incurring no additional computational cost. This design promotes robust joint optimization while maintaining the architectural simplicity of the original detector. Extensive experiments on different baselines validate the effectiveness of our approach. Our work presents a new paradigm for enhancing detectors, demonstrating that diverse ``one-to-many'' supervision can be integrated to achieve state-of-the-art results without compromising model elegance.

preprint2026arXiv

SoLA-Vision: Fine-grained Layer-wise Linear Softmax Hybrid Attention

Standard softmax self-attention excels in vision tasks but incurs quadratic complexity O(N^2), limiting high-resolution deployment. Linear attention reduces the cost to O(N), yet its compressed state representations can impair modeling capacity and accuracy. We present an analytical study that contrasts linear and softmax attention for visual representation learning from a layer-stacking perspective. We further conduct systematic experiments on layer-wise hybridization patterns of linear and softmax attention. Our results show that, compared with rigid intra-block hybrid designs, fine-grained layer-wise hybridization can match or surpass performance while requiring fewer softmax layers. Building on these findings, we propose SoLA-Vision (Softmax-Linear Attention Vision), a flexible layer-wise hybrid attention backbone that enables fine-grained control over how linear and softmax attention are integrated. By strategically inserting a small number of global softmax layers, SoLA-Vision achieves a strong trade-off between accuracy and computational cost. On ImageNet-1K, SoLA-Vision outperforms purely linear and other hybrid attention models. On dense prediction tasks, it consistently surpasses strong baselines by a considerable margin. Code will be released.

preprint2022arXiv

Coding schemes for locally balanced constraints

Motivated by applications in DNA-based storage, we study explicit encoding and decoding schemes of binary strings satisfying locally balanced constraints, where the $(\ell,δ)$-locally balanced constraint requires that the weight of any consecutive substring of length $\ell$ is between $\frac{\ell}{2}-δ$ and $\frac{\ell}{2}+δ$. In this paper we present coding schemes for the strongly locally balanced constraints and the locally balanced constraints, respectively. Moreover, we introduce an additional result on the linear recurrence formula of the number of binary strings which are $(6,1)$-locally balanced, as a further attempt to both capacity characterization and new coding strategies for locally balanced constraints.

preprint2022arXiv

Neuro-Symbolic Learning: Principles and Applications in Ophthalmology

Neural networks have been rapidly expanding in recent years, with novel strategies and applications. However, challenges such as interpretability, explainability, robustness, safety, trust, and sensibility remain unsolved in neural network technologies, despite the fact that they will unavoidably be addressed for critical applications. Attempts have been made to overcome the challenges in neural network computing by representing and embedding domain knowledge in terms of symbolic representations. Thus, the neuro-symbolic learning (NeSyL) notion emerged, which incorporates aspects of symbolic representation and bringing common sense into neural networks (NeSyL). In domains where interpretability, reasoning, and explainability are crucial, such as video and image captioning, question-answering and reasoning, health informatics, and genomics, NeSyL has shown promising outcomes. This review presents a comprehensive survey on the state-of-the-art NeSyL approaches, their principles, advances in machine and deep learning algorithms, applications such as opthalmology, and most importantly, future perspectives of this emerging field.

preprint2020arXiv

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

A crucial ability of mobile intelligent agents is to integrate the evidence from multiple sensory inputs in an environment and to make a sequence of actions to reach their goals. In this paper, we attempt to approach the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given only raw egocentric visual and audio sensory data. To accomplish this task, the agent is required to learn from various modalities, i.e. relating the audio signal to the visual environment. Here we describe an approach to audio-visual embodied navigation that takes advantage of both visual and audio pieces of evidence. Our solution is based on three key ideas: a visual perception mapper module that constructs its spatial memory of the environment, a sound perception module that infers the relative location of the sound source from the agent, and a dynamic path planner that plans a sequence of actions based on the audio-visual observations and the spatial memory of the environment to navigate toward the goal. Experimental results on a newly collected Visual-Audio-Room dataset using the simulated multi-modal environment demonstrate the effectiveness of our approach over several competitive baselines.

preprint2020arXiv

Measuring Similarity between Brands using Followers' Post in Social Media

In this paper, we propose a new measure to estimate the similarity between brands via posts of brands' followers on social network services (SNS). Our method was developed with the intention of exploring the brands that customers are likely to jointly purchase. Nowadays, brands use social media for targeted advertising because influencing users' preferences can greatly affect the trends in sales. We assume that data on SNS allows us to make quantitative comparisons between brands. Our proposed algorithm analyzes the daily photos and hashtags posted by each brand's followers. By clustering them and converting them to histograms, we can calculate the similarity between brands. We evaluated our proposed algorithm with purchase logs, credit card information, and answers to the questionnaires. The experimental results show that the purchase data maintained by a mall or a credit card company can predict the co-purchase very well, but not the customer's willingness to buy products of new brands. On the other hand, our method can predict the users' interest on brands with a correlation value over 0.53, which is pretty high considering that such interest to brands are high subjective and individual dependent.

preprint2020arXiv

On Coupling Lemma and Stochastic Properties with Unbounded Observables for 1-d Expanding Maps

In this paper, we establish a coupling lemma for standard families in the setting of piecewise expanding interval maps with countably many branches. Our method merely requires that the expanding map satisfies Chernov's one-step expansion at $q$-scale and eventually covers a magnet interval. Therefore, our approach is particularly powerful for maps whose inverse Jacobian has low regularity and those who does not satisfy the big image property. The main ingredients of our coupling method are two crucial lemmas: the growth lemma in terms of the characteristic $\cZ$ function and the covering ratio lemma over the magnet interval. We first prove the existence of an absolutely continuous invariant measure. What is more important, we further show that the growth lemma enables the liftablity of the Lebesgue measure to the associated Hofbauer tower, and the resulting invariant measure on the tower admits a decomposition of Pesin-Sinai type. Furthermore, we obtain the exponential decay of correlations and the almost sure invariance principle (which is a functional version of the central limit theorem). For the first time, we are able to make a direct relation between the mixing rates and the $\cZ$ function, see (\ref{equ:totalvariation1}). The novelty of our results relies on establishing the regularity of invariant density, as well as verifying the stochastic properties for a large class of unbounded observables. Finally, we verify our assumptions for several well known examples that were previously studied in the literature, and unify results to these examples in our framework.

preprint2016arXiv

Centralized coded caching schemes: A hypergraph theoretical approach

The centralized coded caching scheme is a technique proposed by Maddah-Ali and Niesen as a solution to reduce the network burden in peak times in a wireless system. Later Yan et al. reformulated the problem as designing a corresponding placement delivery array, and proposed two new schemes from this perspective. These schemes above significantly reduce the transmission rate $R$, compared with the uncoded caching scheme. However, to implement the new schemes, each file should be cut into $F$ pieces, where $F$ increases exponentially with the number of users $K$. Such constraint is obviously infeasible in the practical setting, especially when $K$ is large. Thus it is desirable to design caching schemes with constant rate $R$ (independent of $K$) as well as small $F$. In this paper we view the centralized coded caching problem in a hypergraph perspective and show that designing a feasible placement delivery array is equivalent to constructing a linear and (6, 3)-free 3-uniform 3-partite hypergraph. Several new results and constructions arise from our novel point of view. First, by using the famous (6, 3)-theorem in extremal combinatorics, we show that constant rate caching schemes with $F$ growing linearly with $K$ do not exist. Second, we present two infinite classes of centralized coded caching schemes, which include the schemes of Ali-Niesen and Yan et al. as special cases, respectively. Moreover, our constructions show that constant rate caching schemes with $F$ growing sub-exponentially with $K$ do exist.

preprint2016arXiv

Invertible binary matrix with maximum number of $2$-by-$2$ invertible submatrices

The problem is related to all-or-nothing transforms (AONT) suggested by Rivest as a preprocessing for encrypting data with a block cipher. Since then there have been various applications of AONTs in cryptography and security. D'Arco, Esfahani and Stinson posed the problem on the constructions of binary matrices for which the desired properties of an AONT hold with the maximum probability. That is, for given integers $t\le s$, what is the maximum number of $t$-by-$t$ invertible submatrices in a binary matrix of order $s$? For the case $t=2$, let $R_2(s)$ denote the maximal proportion of 2-by-2 invertible submatrices. D'Arco, Esfahani and Stinson conjectured that the limit is between 0.492 and 0.625. We completely solve the case $t=2$ by showing that $\lim_{s\rightarrow\infty}R_2(s)=0.5$.

preprint2016arXiv

New bounds of permutation codes under Hamming metric and Kendall's $τ$-metric

Permutation codes are widely studied objects due to their numerous applications in various areas, such as power line communications, block ciphers, and the rank modulation scheme for flash memories. Several kinds of metrics are considered for permutation codes according to their specific applications. This paper concerns some improvements on the bounds of permutation codes under Hamming metric and Kendall's $τ$-metric respectively, using mainly a graph coloring approach. Specifically, under Hamming metric, we improve the Gilbert-Varshamov bound asymptotically by a factor $n$, when the minimum Hamming distance $d$ is fixed and the code length $n$ goes to infinity. Under Kendall's $τ$-metric, we narrow the gap between the known lower bounds and upper bounds. Besides, we also obtain some sporadic results under Kendall's $τ$-metric for small parameters.

preprint2016arXiv

On private information retrieval array codes

Given a database, the private information retrieval (PIR) protocol allows a user to make queries to several servers and retrieve a certain item of the database via the feedbacks, without revealing the privacy of the specific item to any single server. Classical models of PIR protocols require that each server stores a whole copy of the database. Recently new PIR models are proposed with coding techniques arising from distributed storage system. In these new models each server only stores a fraction $1/s$ of the whole database, where $s>1$ is a given rational number. PIR array codes are recently proposed by Fazeli, Vardy and Yaakobi to characterize the new models. Consider a PIR array code with $m$ servers and the $k$-PIR property (which indicates that these $m$ servers may emulate any efficient $k$-PIR protocol). The central problem is to design PIR array codes with optimal rate $k/m$. Our contribution to this problem is three-fold. First, for the case $1<s\le 2$, although PIR array codes with optimal rate have been constructed recently by Blackburn and Etzion, the number of servers in their construction is impractically large. We determine the minimum number of servers admitting the existence of a PIR array code with optimal rate for a certain range of parameters. Second, for the case $s>2$, we derive a new upper bound on the rate of a PIR array code. Finally, for the case $s>2$, we analyze a new construction by Blackburn and Etzion and show that its rate is better than all the other existing constructions.

preprint2015arXiv

Ergodic optimization of prevalent super-continuous functions

Given a dynamical system, we say that a performance function has property P if its time averages along orbits are maximized at a periodic orbit. It is conjectured by several authors that for sufficiently hyperbolic dynamical systems, property P should be typical among sufficiently regular performance functions. In this paper we address this problem using a probabilistic notion of typicality that is suitable to infinite dimension: the concept of prevalence as introduced by Hunt, Sauer, and Yorke. For the one-sided shift on two symbols, we prove that property P is prevalent in spaces of functions with a strong modulus of regularity. Our proof uses Haar wavelets to approximate the ergodic optimization problem by a finite-dimensional one, which can be conveniently restated as a maximum cycle mean problem on a de Bruijin graph.

preprint2015arXiv

On the mixing properties of piecewise expanding maps under composition with permutations

We consider the effect on the mixing properties of a piecewise smooth interval map $f$ when its domain is divided into $N$ equal subintervals and $f$ is composed with a permutation of these. The case of the stretch-and-fold map $f(x)=mx \bmod 1$ for integers $m \geq 2$ is examined in detail. We give a combinatorial description of those permutations $σ$ for which $σ\circ f$ is still (topologically) mixing, and show that the proportion of such permutations tends to $1$ as $N \to \infty$. We then investigate the mixing rate of $σ\circ f$ (as measured by the modulus of the second largest eigenvalue of the transfer operator). In contrast to the situation for continuous time diffusive systems, we show that composition with a permutation cannot improve the mixing rate of $f$, but typically makes it worse. Under some mild assumptions on $m$ and $N$, we obtain a precise value for the worst mixing rate as $σ$ ranges through all permutations; this can be made arbitrarily close to $1$ as $N \to \infty$ (with $m$ fixed). We illustrate the geometric distribution of the second largest eigenvalues in the complex plane for small $m$ and $N$, and propose a conjecture concerning their location in general. Finally, we give examples of other interval maps $f$ for which composition with permutations produces different behaviour than that obtained from the stretch-and-fold map.

preprint2015arXiv

On the mixing properties of piecewise expanding maps under composition with permutations, II: Maps of non-constant orientation

For an integer $m \geq 2$, let $\mathcal{P}_m$ be the partition of the unit interval $I$ into $m$ equal subintervals, and let $\mathcal{F}_m$ be the class of piecewise linear maps on $I$ with constant slope $\pm m$ on each element of $\mathcal{P}_m$. We investigate the effect on mixing properties when $f \in \mathcal{F}_m$ is composed with the interval exchange map given by a permutation $σ\in S_N$ interchanging the $N$ subintervals of $\mathcal{P}_N$. This extends the work in a previous paper [N.P. Byott, M. Holland and Y. Zhang, DCDS, {\bf 33}, (2013) 3365--3390], where we considered only the "stretch-and-fold" map $f_{sf}(x)=mx \bmod 1$.

preprint2015arXiv

Snake-in-the-Box Codes for Rank Modulation under Kendall's $τ$-Metric

For a Gray code in the scheme of rank modulation for flash memories, the codewords are permutations and two consecutive codewords are obtained using a push-to-the-top operation. We consider snake-in-the-box codes under Kendall's $τ$-metric, which is a Gray code capable of detecting one Kendall's $τ$-error. We answer two open problems posed by Horovitz and Etzion. Firstly, we prove the validity of a construction given by them, resulting in a snake of size $M_{2n+1}=\frac{(2n+1)!}{2}-2n+1$. Secondly, we come up with a different construction aiming at a longer snake of size $M_{2n+1}=\frac{(2n+1)!}{2}-2n+3$. The construction is applied successfully to $S_7$.

preprint2015arXiv

Thermodynamic formalism of interval maps for upper semi-continuous potentials: Makarov-Smirnov's formalism

In this paper, we study the thermodynamic formalism of interval maps $f$ with sufficient regularity, for a sub class $\mathcal{U}$ composed of upper semi-continuous potentials which includes both Hölder and geometric potentials. We show that for a given $u\in \mathcal{U}$ and negative values of $t$, the pressure function $P(f,-tu)$ can be calculated in terms of the corresponding hidden pressure function $\widetilde{P}(f,-tu)$. Determination of the values $t\in(-\infty,0)$ at which $P(f,-tu)\neq \widetilde{P}(f,-tu)$ is also characterized explicitly. When restricting to the Hölder continuous potentials, our result recovers Theorem B in [Li \& Rivera-Letelier 2013] for maps with non-flat critical points. While restricting to the geometric potentials, we develop a real version of Makarnov-Smirnov's formalism, in parallel to the complex version shown in [Makarnov \& Smirnov 2000, Theo A,B]. Moreover, our results also provide a new and simpler proof (using [Ruelle 1992, Coro6.3]) of the original Makarnov-Smirnov's formalism in the complex setting, under an additional assumption about non-exceptionality, i.e., [Makarnov \& Smirnov 2000, Theo 3.1].

preprint2015arXiv

Tree pressure for hyperbolic and non-exceptional upper semi-continuous potentials

In this note, we investigate the tree pressure for multi-modal interval maps with a certain class of hyperbolic and non-exceptional upper semi-continuous functions. In particular, we obtain a generalized version of Corollary 2.2 in the paper \cite{LRL14} by Li and Rivera-Letelier. This property will be used to prove the existence of a conformal measure for the geometric potential in the negative spectrum.

preprint2012arXiv

Dimension results for inhomogeneous Moran set constructions

We compute the Hausdorff, upper box and packing dimensions for certain inhomogeneous Moran set constructions. These constructions are beyond the classical theory of iterated function systems, as different nonlinear contraction transformations are applied at each step. Moreover, we also allow the contractions to be weakly conformal and consider situations where the contraction rates have an infimum of zero. In addition, the basic sets of the construction are allowed to have a complicated topology such as having fractal boundaries. Using techniques from thermodynamic formalism we calculate the fractal dimension of the limit set of the construction. As a main application we consider dimension results for stochastic inhomogeneous Moran set constructions, where chaotic dynamical systems are used to control the contraction factors at each step of the construction.

preprint2011arXiv

Invariant Measures with Bounded Variation Densities for Piecewise Area Preserving Maps

We investigate the properties of absolutely continuous invariant probability measures (ACIPs), especially those measures with bounded variation densities, for piecewise area preserving maps (PAPs) on $\mathbb{R}^d$. This class of maps unifies piecewise isometries (PWIs) and piecewise hyperbolic maps where Lebesgue measure is locally preserved. Using a functional analytic approach, we first explore the relationship between topological transitivity and uniqueness of ACIPs, and then give an approach to construct invariant measures with bounded variation densities for PWIs. Our results "partially" answer one of the fundamental questions posed in \cite{Goetz03} - to determine all invariant non-atomic probability Borel measures in piecewise rotations. When restricting PAPs to interval exchange transformations (IETs), our results imply that for non-uniquely ergodic IETs with two or more ACIPs, these ACIPs have very irregular densities, i.e., they have unbounded variation.

Yiwei Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Information Theoretic Adversarial Training of Large Language Models

Integrating Diverse Assignment Strategies into DETRs

SoLA-Vision: Fine-grained Layer-wise Linear Softmax Hybrid Attention

Coding schemes for locally balanced constraints

Neuro-Symbolic Learning: Principles and Applications in Ophthalmology

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

Measuring Similarity between Brands using Followers' Post in Social Media

On Coupling Lemma and Stochastic Properties with Unbounded Observables for 1-d Expanding Maps

Centralized coded caching schemes: A hypergraph theoretical approach

Invertible binary matrix with maximum number of $2$-by-$2$ invertible submatrices

New bounds of permutation codes under Hamming metric and Kendall's $τ$-metric

On private information retrieval array codes

Ergodic optimization of prevalent super-continuous functions

On the mixing properties of piecewise expanding maps under composition with permutations

On the mixing properties of piecewise expanding maps under composition with permutations, II: Maps of non-constant orientation

Snake-in-the-Box Codes for Rank Modulation under Kendall's $τ$-Metric

Thermodynamic formalism of interval maps for upper semi-continuous potentials: Makarov-Smirnov's formalism

Tree pressure for hyperbolic and non-exceptional upper semi-continuous potentials

Dimension results for inhomogeneous Moran set constructions

Invariant Measures with Bounded Variation Densities for Piecewise Area Preserving Maps