Source author record

Lifeng Lai

Lifeng Lai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning Cryptography and Security Artificial Intelligence math.OC Systems and Control Distributed, Parallel, and Cluster Computing eess.SP math.PR math.ST Networking and Internet Architecture Statistics Theory

Catalog footprint

What is connected

23works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Efficient Preference Poisoning Attack on Offline RLHF

Offline Reinforcement Learning from Human Feedback (RLHF) pipelines such as Direct Preference Optimization (DPO) train on a pre-collected preference dataset, which makes them vulnerable to preference poisoning attack. We study label flip attacks against log-linear DPO. We first illustrate that flipping one preference label induces a parameter-independent shift in the DPO gradient. Using this key property, we can then convert the targeted poisoning problem into a structured binary sparse approximation problem. To solve this problem, we develop two attack methods: Binary-Aware Lattice Attack (BAL-A) and Binary Matching Pursuit Attack (BMP-A). BAL-A embeds the binary flip selection problem into a binary-aware lattice and applies Lenstra-Lenstra-Lovász reduction and Babai's nearest plane algorithm; we provide sufficient conditions that enforce binary coefficients and recover the minimum-flip objective. BMP-A adapts binary matching pursuit to our non-normalized gradient dictionary and yields coherence-based recovery guarantees and robustness (impossibility) certificates for $K$-flip budgets. Experiments on synthetic dictionaries and the Stanford Human Preferences dataset validate the theory and highlight how dictionary geometry governs attack success.

preprint2026arXiv

Transformers Provably Implement In-Context Reinforcement Learning with Policy Improvement

We investigate the ability of transformers to perform in-context reinforcement learning (ICRL), where a model must infer and execute learning algorithms from trajectory data without parameter updates. We show that a linear self-attention transformer block can provably implement policy-improvement methods, including semi-gradient SARSA and actor-critic, via explicit parameter constructions. Beyond existence, we design a teacher-mimicking training procedure, analyze its gradient-flow dynamics, and establish the first convergence guarantee in the ICRL literature: under suitable richness conditions on the training MDP distribution, gradient flow converges locally and exponentially to an optimal parameter manifold corresponding to the desired RL update. Empirically, training transformers on randomly generated tabular MDPs confirms these predictions: the learned models recover the parameter structure of our explicit constructions and, when deployed on unseen MDPs, deliver strong in-context control performance. Together, these results illuminate how transformer architectures internalize and execute classical reinforcement learning algorithms in context, bridging mechanistic understanding and training dynamics in ICRL.

preprint2021arXiv

Distributed Dual Coordinate Ascent in General Tree Networks and Communication Network Effect on Synchronous Machine Learning

Due to the big size of data and limited data storage volume of a single computer or a single server, data are often stored in a distributed manner. Thus, performing large-scale machine learning operations with the distributed datasets through communication networks is often required. In this paper, we study the convergence rate of the distributed dual coordinate ascent for distributed machine learning problems in a general tree-structured network. Since a tree network model can be understood as the generalization of a star network model, our algorithm can be thought of as the generalization of the distributed dual coordinate ascent in a star network model. We provide the convergence rate of the distributed dual coordinate ascent over a general tree network in a recursive manner and analyze the network effect on the convergence rate. Secondly, by considering network communication delays, we optimize the distributed dual coordinate ascent algorithm to maximize its convergence speed. From our analytical result, we can choose the optimal number of local iterations depending on the communication delay severity to achieve the fastest convergence speed. In numerical experiments, we consider machine learning scenarios over communication networks, where local workers cannot directly reach to a central node due to constraints in communication, and demonstrate that the usability of our distributed dual coordinate ascent algorithm in tree networks. Additionally, we show that adapting number of local and global iterations to network communication delays in the distributed dual coordinated ascent algorithm can improve its convergence speed.

preprint2020arXiv

Minimax Optimal Estimation of KL Divergence for Continuous Distributions

Estimating Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains. One simple and effective estimator is based on the k nearest neighbor distances between these samples. In this paper, we analyze the convergence rates of the bias and variance of this estimator. Furthermore, we derive a lower bound of the minimax mean square error and show that kNN method is asymptotically rate optimal.

preprint2020arXiv

On the adversarial robustness of robust estimators

Motivated by recent data analytics applications, we study the adversarial robustness of robust estimators. Instead of assuming that only a fraction of the data points are outliers as considered in the classic robust estimation setup, in this paper, we consider an adversarial setup in which an attacker can observe the whole dataset and can modify all data samples in an adversarial manner so as to maximize the estimation error caused by his attack. We characterize the attacker's optimal attack strategy, and further introduce adversarial influence function (AIF) to quantify an estimator's sensitivity to such adversarial attacks. We provide an approach to characterize AIF for any given robust estimator, and then design optimal estimator that minimizes AIF, which implies it is least sensitive to adversarial attacks and hence is most robust against adversarial attacks. From this characterization, we identify a tradeoff between AIF (i.e., robustness against adversarial attack) and influence function, a quantity used in classic robust estimators to measure robustness against outliers, and design estimators that strike a desirable tradeoff between these two quantities.

preprint2019arXiv

On the Adversarial Robustness of Subspace Learning

In this paper, we study the adversarial robustness of subspace learning problems. Different from the assumptions made in existing work on robust subspace learning where data samples are contaminated by gross sparse outliers or small dense noises, we consider a more powerful adversary who can first observe the data matrix and then intentionally modify the whole data matrix. We first characterize the optimal rank-one attack strategy that maximizes the subspace distance between the subspace learned from the original data matrix and that learned from the modified data matrix. We then generalize the study to the scenario without the rank constraint and characterize the corresponding optimal attack strategy. Our analysis shows that the optimal strategies depend on the singular values of the original data matrix and the adversary's energy budget. Finally, we provide numerical experiments and practical applications to demonstrate the efficiency of the attack strategies.

preprint2015arXiv

Are Slepian-Wolf Rates Necessary for Distributed Parameter Estimation?

We consider a distributed parameter estimation problem, in which multiple terminals send messages related to their local observations using limited rates to a fusion center who will obtain an estimate of a parameter related to observations of all terminals. It is well known that if the transmission rates are in the Slepian-Wolf region, the fusion center can fully recover all observations and hence can construct an estimator having the same performance as that of the centralized case. One natural question is whether Slepian-Wolf rates are necessary to achieve the same estimation performance as that of the centralized case. In this paper, we show that the answer to this question is negative. We establish our result by explicitly constructing an asymptotically minimum variance unbiased estimator (MVUE) that has the same performance as that of the optimal estimator in the centralized case while requiring information rates less than the conditions required in the Slepian-Wolf rate region.

preprint2015arXiv

Precise Phase Transition of Total Variation Minimization

Characterizing the phase transitions of convex optimizations in recovering structured signals or data is of central importance in compressed sensing, machine learning and statistics. The phase transitions of many convex optimization signal recovery methods such as $\ell_1$ minimization and nuclear norm minimization are well understood through recent years' research. However, rigorously characterizing the phase transition of total variation (TV) minimization in recovering sparse-gradient signal is still open. In this paper, we fully characterize the phase transition curve of the TV minimization. Our proof builds on Donoho, Johnstone and Montanari's conjectured phase transition curve for the TV approximate message passing algorithm (AMP), together with the linkage between the minmax Mean Square Error of a denoising problem and the high-dimensional convex geometry for TV minimization.

preprint2014arXiv

An Information Theoretic Approach to Secret Sharing

A novel information theoretic approach is proposed to solve the secret sharing problem, in which a dealer distributes one or multiple secrets among a set of participants that for each secret only qualified sets of users can recover it by pooling their shares together while non-qualified sets of users obtain no information about the secret even if they pool their shares together. While existing secret sharing systems (implicitly) assume that communications between the dealer and participants are noiseless, this paper takes a more practical assumption that the dealer delivers shares to the participants via a noisy broadcast channel. An information theoretic approach is proposed, which exploits the channel as additional resources to achieve secret sharing requirements. In this way, secret sharing problems can be reformulated as equivalent secure communication problems via wiretap channels, and can be solved by employing powerful information theoretic security techniques. This approach is first developed for the classic secret sharing problem, in which only one secret is to be shared. This classic problem is shown to be equivalent to a communication problem over a compound wiretap channel. The lower and upper bounds on the secrecy capacity of the compound channel provide the corresponding bounds on the secret sharing rate. The power of the approach is further demonstrated by a more general layered multi-secret sharing problem, which is shown to be equivalent to the degraded broadcast multiple-input multiple-output (MIMO) channel with layered decoding and secrecy constraints. The secrecy capacity region for the degraded MIMO broadcast channel is characterized, which provides the secret sharing capacity region. Furthermore, these secure encoding schemes that achieve the secrecy capacity region provide an information theoretic scheme for sharing the secrets.

preprint2014arXiv

Bayesian Quickest Change Point Detection with Sampling Right Constraints

In this paper, Bayesian quickest change detection problems with sampling right constraints are considered. Specifically, there is a sequence of random variables whose probability density function will change at an unknown time. The goal is to detect this change in a way such that a linear combination of the average detection delay and the false alarm probability is minimized. Two types of sampling right constrains are discussed. The first one is a limited sampling right constraint, in which the observer can take at most $N$ observations from this random sequence. Under this setup, we show that the cost function can be written as a set of iterative functions, which can be solved by Markov optimal stopping theory. The optimal stopping rule is shown to be a threshold rule. An asymptotic upper bound of the average detection delay is developed as the false alarm probability goes to zero. This upper bound indicates that the performance of the limited sampling right problem is close to that of the classic Bayesian quickest detection for several scenarios of practical interest. The second constraint discussed in this paper is a stochastic sampling right constraint, in which sampling rights are consumed by taking observations and are replenished randomly. The observer cannot take observations if there are no sampling rights left. We characterize the optimal solution, which has a very complex structure. For practical applications, we propose a low complexity algorithm, in which the sampling rule is to take observations as long as the observer has sampling rights left and the detection scheme is a threshold rule. We show that this low complexity scheme is first order asymptotically optimal as the false alarm probability goes to zero.

preprint2014arXiv

Byzantine Fault Tolerant Distributed Quickest Change Detection

We introduce and solve the problem of Byzantine fault tolerant distributed quickest change detection in both continuous and discrete time setups. In this problem, multiple sensors sequentially observe random signals from the environment and send their observations to a control center that will determine whether there is a change in the statistical behavior of the observations. We assume that the signals are independent and identically distributed across sensors. An unknown subset of sensors are compromised and will send arbitrarily modified and even artificially generated signals to the control center. It is shown that the performance of the the so-called CUSUM statistic, which is optimal when all sensors are honest, will be significantly degraded in the presence of even a single dishonest sensor. In particular, instead of in a logarithmically the detection delay grows linearly with the average run length (ARL) to false alarm. To mitigate such a performance degradation, we propose a fully distributed low complexity detection scheme. We show that the proposed scheme can recover the log scaling. We also propose a centralized group-wise scheme that can further reduce the detection delay.

preprint2014arXiv

On the Simulatability Condition in Key Generation Over a Non-authenticated Public Channel

Simulatability condition is a fundamental concept in studying key generation over a non-authenticated public channel, in which Eve is active and can intercept, modify and falsify messages exchanged over the non-authenticated public channel. Using this condition, Maurer and Wolf showed a remarkable "all or nothing" result: if the simulatability condition does not hold, the key capacity over the non-authenticated public channel will be the same as that of the case with a passive Eve, while the key capacity over the non-authenticated channel will be zero if the simulatability condition holds. However, two questions remain open so far: 1) For a given joint probability mass function (PMF), are there efficient algorithms (polynomial complexity algorithms) for checking whether the simulatability condition holds or not?; and 2) If the simulatability condition holds, are there efficient algorithms for finding the corresponding attack strategy? In this paper, we answer these two open questions affirmatively. In particular, for a given joint PMF, we construct a linear programming (LP) problem and show that the simulatability condition holds \textit{if and only if} the optimal value obtained from the constructed LP is zero. Furthermore, we construct another LP and show that the minimizer of the newly constructed LP is a valid attack strategy. Both LPs can be solved with a polynomial complexity.

preprint2014arXiv

The Capacity Region of the Source-Type Model for Secret Key and Private Key Generation

The problem of simultaneously generating a secret key (SK) and private key (PK) pair among three terminals via public discussion is investigated, in which each terminal observes a component of correlated sources. All three terminals are required to generate a common secret key concealed from an eavesdropper that has access to public discussion, while two designated terminals are required to generate an extra private key concealed from both the eavesdropper and the remaining terminal. An outer bound on the SK-PK capacity region was established in [1], and was shown to be achievable for one case. In this paper, achievable schemes are designed to achieve the outer bound for the remaining two cases, and hence the SK-PK capacity region is established in general. The main technique lies in the novel design of a random binning-joint decoding scheme that achieves the existing outer bound.

preprint2013arXiv

Compressed Hypothesis Testing: to Mix or Not to Mix?

In this paper, we study the hypothesis testing problem of, among $n$ random variables, determining $k$ random variables which have different probability distributions from the rest $(n-k)$ random variables. Instead of using separate measurements of each individual random variable, we propose to use mixed measurements which are functions of multiple random variables. It is demonstrated that $O({\displaystyle \frac{k \log(n)}{\min_{P_i, P_j} C(P_i, P_j)}})$ observations are sufficient for correctly identifying the $k$ anomalous random variables with high probability, where $C(P_i, P_j)$ is the Chernoff information between two possible distributions $P_i$ and $P_j$ for the proposed mixed observations. We characterized the Chernoff information respectively under fixed time-invariant mixed observations, random time-varying mixed observations, and deterministic time-varying mixed observations; in our derivations, we introduced the \emph{inner and outer conditional Chernoff information} for time-varying measurements. It is demonstrated that mixed observations can strictly improve the error exponent of hypothesis testing, over separate observations of individual random variables. We also characterized the optimal mixed observations maximizing the error exponent, and derived an explicit construction of the optimal mixed observations for the case of Gaussian random variables. These results imply that mixed observations of random variables can reduce the number of required samples in hypothesis testing applications. Compared with compressed sensing problems, this paper considers random variables which are allowed to dramatically change values in different measurements.

preprint2013arXiv

Non-Bayesian Quickest Detection with Stochastic Sample Right Constraints

In this paper, we study the design and analysis of optimal detection scheme for sensors that are deployed to monitor the change in the environment and are powered by the energy harvested from the environment. In this type of applications, detection delay is of paramount importance. We model this problem as quickest change detection problem with a stochastic energy constraint. In particular, a wireless sensor powered by renewable energy takes observations from a random sequence, whose distribution will change at a certain unknown time. Such a change implies events of interest. The energy in the sensor is consumed by taking observations and is replenished randomly. The sensor cannot take observations if there is no energy left in the battery. Our goal is to design a power allocation scheme and a detection strategy to minimize the worst case detection delay, which is the difference between the time when an alarm is raised and the time when the change occurs. Two types of average run length (ARL) constraint, namely an algorithm level ARL constraint and an system level ARL constraint, are considered. We propose a low complexity scheme in which the energy allocation rule is to spend energy to take observations as long as the battery is not empty and the detection scheme is the Cumulative Sum test. We show that this scheme is optimal for the formulation with the algorithm level ARL constraint and is asymptotically optimal for the formulations with the system level ARL constraint.

preprint2013arXiv

Quickest Change Point Detection and Identification Across a Generic Sensor Array

In this paper, we consider the problem of quickest change point detection and identification over a linear array of $N$ sensors, where the change pattern could first reach any of these sensors, and then propagate to the other sensors. Our goal is not only to detect the presence of such a change as quickly as possible, but also to identify which sensor that the change pattern first reaches. We jointly design two decision rules: a stopping rule, which determines when we should stop sampling and claim a change occurred, and a terminal decision rule, which decides which sensor that the change pattern reaches first, with the objective to strike a balance among the detection delay, the false alarm probability, and the false identification probability. We show that this problem can be converted to a Markov optimal stopping time problem, from which some technical tools could be borrowed. Furthermore, to avoid the high implementation complexity issue of the optimal rules, we develop a scheme with a much simpler structure and certain performance guarantee.

preprint2013arXiv

Quickest Search Over Multiple Sequences with Mixed Observations

The problem of sequentially finding an independent and identically distributed (i.i.d.) sequence that is drawn from a probability distribution $F_1$ by searching over multiple sequences, some of which are drawn from $F_1$ and the others of which are drawn from a different distribution $F_0$, is considered. The sensor is allowed to take one observation at a time. It has been shown in a recent work that if each observation comes from one sequence, Cumulative Sum (CUSUM) test is optimal. In this paper, we propose a new approach in which each observation can be a linear combination of samples from multiple sequences. The test has two stages. In the first stage, namely scanning stage, one takes a linear combination of a pair of sequences with the hope of scanning through sequences that are unlikely to be generated from $F_1$ and quickly identifying a pair of sequences such that at least one of them is highly likely to be generated by $F_1$. In the second stage, namely refinement stage, one examines the pair identified from the first stage more closely and picks one sequence to be the final sequence. The problem under this setup belongs to a class of multiple stopping time problems. In particular, it is an ordered two concatenated Markov stopping time problem. We obtain the optimal solution using the tools from the multiple stopping time theory. Numerical simulation results show that this search strategy can significantly reduce the searching time, especially when $F_{1}$ is rare.

preprint2010arXiv

Combating False Reports for Secure Networked Control in Smart Grid via Trustiness Evaluation

Smart grid, equipped with modern communication infrastructures, is subject to possible cyber attacks. Particularly, false report attacks which replace the sensor reports with fraud ones may cause the instability of the whole power grid or even result in a large area blackout. In this paper, a trustiness system is introduced to the controller, who computes the trustiness of different sensors by comparing its prediction, obtained from Kalman filtering, on the system state with the reports from sensor. The trustiness mechanism is discussed and analyzed for the Linear Quadratic Regulation (LQR) controller. Numerical simulations show that the trustiness system can effectively combat the cyber attacks to smart grid.

preprint2010arXiv

Decoding the `Nature Encoded' Messages for Distributed Energy Generation Control in Microgrid

The communication for the control of distributed energy generation (DEG) in microgrid is discussed. Due to the requirement of realtime transmission, weak or no explicit channel coding is used for the message of system state. To protect the reliability of the uncoded or weakly encoded messages, the system dynamics are considered as a `nature encoding' similar to convolution code, due to its redundancy in time. For systems with or without explicit channel coding, two decoding procedures based on Kalman filtering and Pearl's Belief Propagation, in a similar manner to Turbo processing in traditional data communication systems, are proposed. Numerical simulations have demonstrated the validity of the schemes, using a linear model of electric generator dynamic system.

preprint2008arXiv

Interference Alignment for Secrecy

This paper studies the frequency/time selective $K$-user Gaussian interference channel with secrecy constraints. Two distinct models, namely the interference channel with confidential messages and the one with an external eavesdropper, are analyzed. The key difference between the two models is the lack of channel state information (CSI) about the external eavesdropper. Using interference alignment along with secrecy pre-coding, it is shown that each user can achieve non-zero secure Degrees of Freedom (DoF) for both cases. More precisely, the proposed coding scheme achieves $\frac{K-2}{2K-2}$ secure DoF {\em with probability one} per user in the confidential messages model. For the external eavesdropper scenario, on the other hand, it is shown that each user can achieve $\frac{K-2}{2K}$ secure DoF {\em in the ergodic setting}. Remarkably, these results establish the {\em positive impact} of interference on the secrecy capacity region of wireless networks.

preprint2008arXiv

On the Secure Degrees of Freedom in the K-User Gaussian Interference Channel

This paper studies the K-user Gaussian interference channel with secrecy constraints. Two distinct network models, namely the interference channel with confidential messages and the one with an external eavesdropper, are analyzed. Using interference alignment along with secrecy pre-coding at each transmitter, it is shown that each user in the network can achieve non-zero secure Degrees of Freedoms (DoFs) in both scenarios. In particular, the proposed coding scheme achieves (K-2)/(2K-2) secure DoFs for each user in the interference channel with confidential messages model, and (K-2)/2K secure DoFs in the case of an external eavesdropper. The fundamental difference between the two scenarios stems from the lack of channel state information (CSI) about the external eavesdropper. Remarkably, the results establish the positive impact of interference on the secrecy capacity of wireless networks.

preprint2008arXiv

Optimal Medium Access Control in Cognitive Radios: A Sequential Design Approach

The design of medium access control protocols for a cognitive user wishing to opportunistically exploit frequency bands within parts of the radio spectrum having multiple bands is considered. In the scenario under consideration, the availability probability of each channel is unknown a priori to the cognitive user. Hence efficient medium access strategies must strike a balance between exploring the availability of channels and exploiting the opportunities identified thus far. Using a sequential design approach, an optimal medium access strategy is derived. To avoid the prohibitive computational complexity of this optimal strategy, a low complexity asymptotically optimal strategy is also developed. The proposed strategy does not require any prior statistical knowledge about the traffic pattern on the different channels.

preprint2007arXiv

The Wiretap Channel with Feedback: Encryption over the Channel

In this work, the critical role of noisy feedback in enhancing the secrecy capacity of the wiretap channel is established. Unlike previous works, where a noiseless public discussion channel is used for feedback, the feed-forward and feedback signals share the same noisy channel in the present model. Quite interestingly, this noisy feedback model is shown to be more advantageous in the current setting. More specifically, the discrete memoryless modulo-additive channel with a full-duplex destination node is considered first, and it is shown that the judicious use of feedback increases the perfect secrecy capacity to the capacity of the source-destination channel in the absence of the wiretapper. In the achievability scheme, the feedback signal corresponds to a private key, known only to the destination. In the half-duplex scheme, a novel feedback technique that always achieves a positive perfect secrecy rate (even when the source-wiretapper channel is less noisy than the source-destination channel) is proposed. These results hinge on the modulo-additive property of the channel, which is exploited by the destination to perform encryption over the channel without revealing its key to the source. Finally, this scheme is extended to the continuous real valued modulo-$Λ$ channel where it is shown that the perfect secrecy capacity with feedback is also equal to the capacity in the absence of the wiretapper.

Lifeng Lai

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

Efficient Preference Poisoning Attack on Offline RLHF

Transformers Provably Implement In-Context Reinforcement Learning with Policy Improvement

Distributed Dual Coordinate Ascent in General Tree Networks and Communication Network Effect on Synchronous Machine Learning

Minimax Optimal Estimation of KL Divergence for Continuous Distributions

On the adversarial robustness of robust estimators

On the Adversarial Robustness of Subspace Learning

Are Slepian-Wolf Rates Necessary for Distributed Parameter Estimation?

Precise Phase Transition of Total Variation Minimization

An Information Theoretic Approach to Secret Sharing

Bayesian Quickest Change Point Detection with Sampling Right Constraints

Byzantine Fault Tolerant Distributed Quickest Change Detection

On the Simulatability Condition in Key Generation Over a Non-authenticated Public Channel

The Capacity Region of the Source-Type Model for Secret Key and Private Key Generation

Compressed Hypothesis Testing: to Mix or Not to Mix?

Non-Bayesian Quickest Detection with Stochastic Sample Right Constraints

Quickest Change Point Detection and Identification Across a Generic Sensor Array

Quickest Search Over Multiple Sequences with Mixed Observations

Combating False Reports for Secure Networked Control in Smart Grid via Trustiness Evaluation

Decoding the `Nature Encoded' Messages for Distributed Energy Generation Control in Microgrid

Interference Alignment for Secrecy

On the Secure Degrees of Freedom in the K-User Gaussian Interference Channel

Optimal Medium Access Control in Cognitive Radios: A Sequential Design Approach

The Wiretap Channel with Feedback: Encryption over the Channel