Source author record

Negar Kiyavash

Negar Kiyavash appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT Cryptography and Security Artificial Intelligence Discrete Mathematics math.OC math.CO Methodology math.NT Multiagent Systems Multimedia Networking and Internet Architecture Social and Information Networks

Catalog footprint

What is connected

34works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Active Context Selection Improves Simple Regret in Contextual Bandits

We study the contextual multi-armed bandit problem with a finite context space (a.k.a. subpopulations), where the learner recommends a best action for each context and is evaluated by context-weighted simple regret. Our guarantees are worst-case over the reward distributions, while remaining instance-dependent with respect to the context distribution vector $p$. Akin to experimental design problems where the population of interest is fixed but the sampled subpopulation can be controlled, we allow the learner to actively choose which context to sample from. For a known $p$, we characterize tight regret rates: passive sampling where contexts are randomly revealed achieves regret of order $\sqrt{n/T \, \lVert p \rVert_{1/2}}$, whereas active sampling with allocation $q_j \propto p_j^{2/3}$ achieves the tight rate $\sqrt{n/T} \, \lVert p \rVert_{2/3}$. The resulting improvement can be as large as $Θ(k^{1/4})$, where $k$ is the number of contexts. We further extend the analysis to budgeted active sampling, characterize the corresponding tight rate, and identify when a limited active budget suffices to recover the fully active rate. When $p$ is unknown, we propose the Explore-Explore-Then-Commit (EETC) algorithm, which optimally balances estimating the context distribution and the time to switch to active allocation, such that for large horizons, it matches the known-$p$ active rate up to constants. Experiments on synthetic and real-world data support our theoretical findings.

preprint2026arXiv

Inference Time Causal Probing in LLMs

Causal probing methods aim to test and control how internal representations influence the behavior of generative models. In causal probing, an intervention modifies hidden states so that a property takes on a different value. Most existing approaches define such interventions by training an auxiliary probe classifier, which ties the method to a specific task or model and risks misalignment with the model's predictive geometry. We propose Hidden-state Driven Margin Intervention (HDMI), a probe-free, gradient-based technique that directly steers hidden states using the model's native output. HDMI applies a margin objective that increases the probability of a target continuation while decreasing that of the source, without relying on probe classifiers. We further introduce a lookahead variant (LA-HDMI) for text editing that backpropagates through the softmax embeddings, modifying the current hidden state so that the likelihood of user-specified tokens increases in next token generations while preserving fluency. To evaluate interventions, we measure completeness (whether the targeted property changes as intended) and selectivity (whether unrelated properties are preserved), and report their harmonic mean as an overall measure of reliability. HDMI consistently achieves higher reliability than prior methods on the LGD agreement corpus and the CausalGym benchmark, across Meta-Llama-3-8B-Instruct, and Pythia-70M.

preprint2026arXiv

Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets

We study optimistic bilevel optimization when the lower-level problem has a non-isolated manifold of minimizers. In this setting, the hyper-objective may be non-differentiable because the upper-level criterion must choose among multiple lower-level solutions. Under a local Polyak--Łojasiewicz (PŁ) condition, we show that differentiability does not require the lower-level solution set to be a singleton: uniqueness of the optimistic selection is sufficient. This yields an explicit pseudoinverse-based hyper-gradient formula extending the classical singleton-minimizer result. We further characterize the regularity of the hyper-objective: non-degeneracy of the selected minimizer along the solution manifold yields local smoothness, while failure of uniqueness can create many non-differentiable points and failure of non-degeneracy can destroy all positive Hölder regularity of the hyper-gradient. Motivated by this theory, we propose HG-MS, a select-then-differentiate method combining explicit optimistic selection with efficient pseudoinverse-based hyper-gradient computation. Despite the nonconvex nature of optimistic selection over the lower-level solution manifold, we show that HG-MS converges to a stationary point of the optimistic objective with complexity governed by the intrinsic dimension of the solution manifold rather than its ambient dimension. Empirically, we test a practical variant of HG-MS for matched-budget LLM source reweighting. This variant preserves the select-then-differentiate principle and obtains the best GSM8K/MATH scores across the tested backbones, along with competitive or best MT-Bench instruction-following results.

preprint2024arXiv

s-ID: Causal Effect Identification in a Sub-Population

Causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup, which is distinguished from the whole population through the influence of systematic biases in the sampling process. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem.

preprint2022arXiv

Causal Effect Identification with Context-specific Independence Relations of Control Variables

We study the problem of causal effect identification from observational distribution given the causal graph and some context-specific independence (CSI) relations. It was recently shown that this problem is NP-hard, and while a sound algorithm to learn the causal effects is proposed in Tikka et al. (2019), no complete algorithm for the task exists. In this work, we propose a sound and complete algorithm for the setting when the CSI relations are limited to observed nodes with no parents in the causal graph. One limitation of the state of the art in terms of its applicability is that the CSI relations among all variables, even unobserved ones, must be given (as opposed to learned). Instead, We introduce a set of graphical constraints under which the CSI relations can be learned from mere observational distribution. This expands the set of identifiable causal effects beyond the state of the art.

preprint2022arXiv

Novel Ordering-based Approaches for Causal Structure Learning in the Presence of Unobserved Variables

We propose ordering-based approaches for learning the maximal ancestral graph (MAG) of a structural equation model (SEM) up to its Markov equivalence class (MEC) in the presence of unobserved variables. Existing ordering-based methods in the literature recover a graph through learning a causal order (c-order). We advocate for a novel order called removable order (r-order) as they are advantageous over c-orders for structure learning. This is because r-orders are the minimizers of an appropriately defined optimization problem that could be either solved exactly (using a reinforcement learning approach) or approximately (using a hill-climbing search). Moreover, the r-orders (unlike c-orders) are invariant among all the graphs in a MEC and include c-orders as a subset. Given that set of r-orders is often significantly larger than the set of c-orders, it is easier for the optimization problem to find an r-order instead of a c-order. We evaluate the performance and the scalability of our proposed approaches on both real-world and randomly generated networks.

preprint2022arXiv

Revisiting the General Identifiability Problem

We revisit the problem of general identifiability originally introduced in [Lee et al., 2019] for causal inference and note that it is necessary to add positivity assumption of observational distribution to the original definition of the problem. We show that without such an assumption the rules of do-calculus and consequently the proposed algorithm in [Lee et al., 2019] are not sound. Moreover, adding the assumption will cause the completeness proof in [Lee et al., 2019] to fail. Under positivity assumption, we present a new algorithm that is provably both sound and complete. A nice property of this new algorithm is that it establishes a connection between general identifiability and classical identifiability by Pearl [1995] through decomposing the general identifiability problem into a series of classical identifiability sub-problems.

preprint2021arXiv

Impact of Data Processing on Fairness in Supervised Learning

We study the impact of pre and post processing for reducing discrimination in data-driven decision makers. We first analyze the fundamental trade-off between fairness and accuracy in a pre-processing approach, and propose a design for a pre-processing module based on a convex optimization program, which can be added before the original classifier. This leads to a fundamental lower bound on attainable discrimination, given any acceptable distortion in the outcome. Furthermore, we reformulate an existing post-processing method in terms of our accuracy and fairness measures, which allows comparing post-processing and pre-processing approaches. We show that under some mild conditions, pre-processing outperforms post-processing. Finally, we show that by appropriate choice of the discrimination measure, the optimization problem for both pre and post processing approaches will reduce to a linear program and hence can be solved efficiently.

preprint2020arXiv

Characterizing Distribution Equivalence and Structure Learning for Cyclic and Acyclic Directed Graphs

The main approach to defining equivalence among acyclic directed causal graphical models is based on the conditional independence relationships in the distributions that the causal models can generate, in terms of the Markov equivalence. However, it is known that when cycles are allowed in the causal structure, conditional independence may not be a suitable notion for equivalence of two structures, as it does not reflect all the information in the distribution that is useful for identification of the underlying structure. In this paper, we present a general, unified notion of equivalence for linear Gaussian causal directed graphical models, whether they are cyclic or acyclic. In our proposed definition of equivalence, two structures are equivalent if they can generate the same set of data distributions. We also propose a weaker notion of equivalence called quasi-equivalence, which we show is the extent of identifiability from observational data. We propose analytic as well as graphical methods for characterizing the equivalence of two structures. Additionally, we propose a score-based method for learning the structure from observational data, which successfully deals with both acyclic and cyclic structures.

preprint2020arXiv

Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems

Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant step size can potentially diverge even in the convex setting. In this work, we show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-Łojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate. We further develop a variance reduced algorithm that attains a provably faster rate than AGDA when the problem has the finite-sum structure.

preprint2020arXiv

LazyIter: A Fast Algorithm for Counting Markov Equivalent DAGs and Designing Experiments

The causal relationships among a set of random variables are commonly represented by a Directed Acyclic Graph (DAG), where there is a directed edge from variable $X$ to variable $Y$ if $X$ is a direct cause of $Y$. From the purely observational data, the true causal graph can be identified up to a Markov Equivalence Class (MEC), which is a set of DAGs with the same conditional independencies between the variables. The size of an MEC is a measure of complexity for recovering the true causal graph by performing interventions. We propose a method for efficient iteration over possible MECs given intervention results. We utilize the proposed method for computing MEC sizes and experiment design in active and passive learning settings. Compared to previous work for computing the size of MEC, our proposed algorithm reduces the time complexity by a factor of $O(n)$ for sparse graphs where $n$ is the number of variables in the system. Additionally, integrating our approach with dynamic programming, we design an optimal algorithm for passive experiment design. Experimental results show that our proposed algorithms for both computing the size of MEC and experiment design outperform the state of the art.

preprint2020arXiv

Model-Augmented Estimation of Conditional Mutual Information for Feature Selection

Markov blanket feature selection, while theoretically optimal, is generally challenging to implement. This is due to the shortcomings of existing approaches to conditional independence (CI) testing, which tend to struggle either with the curse of dimensionality or computational complexity. We propose a novel two-step approach which facilitates Markov blanket feature selection in high dimensions. First, neural networks are used to map features to low-dimensional representations. In the second step, CI testing is performed by applying the $k$-NN conditional mutual information estimator to the learned feature maps. The mappings are designed to ensure that mapped samples both preserve information and share similar information about the target variable if and only if they are close in Euclidean distance. We show that these properties boost the performance of the $k$-NN estimator in the second step. The performance of the proposed method is evaluated on both synthetic and real data.

preprint2020arXiv

Toward Optimal Adversarial Policies in the Multiplicative Learning System with a Malicious Expert

We consider a learning system based on the conventional multiplicative weight (MW) rule that combines experts' advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. The loss of the system is naturally defined to be the aggregate absolute difference between the sequence of predicted outcomes and the true outcomes. We consider this problem under both offline and online settings. In the offline setting where the malicious expert must choose its entire sequence of decisions a priori, we show somewhat surprisingly that a simple greedy policy of always reporting false prediction is asymptotically optimal with an approximation ratio of $1+O(\sqrt{\frac{\ln N}{N}})$, where $N$ is the total number of prediction stages. In particular, we describe a policy that closely resembles the structure of the optimal offline policy. For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in $O(N^3)$. Our results provide a new direction for vulnerability assessment of commonly used learning algorithms to adversarial attacks where the threat is an integral part of the system.

preprint2016arXiv

Improved Achievability and Converse Bounds for Erdős-Rényi Graph Matching

We consider the problem of perfectly recovering the vertex correspondence between two correlated Erdős-Rényi (ER) graphs. For a pair of correlated graphs on the same vertex set, the correspondence between the vertices can be obscured by randomly permuting the vertex labels of one of the graphs. In some cases, the structural information in the graphs allow this correspondence to be recovered. We investigate the information-theoretic threshold for exact recovery, i.e. the conditions under which the entire vertex correspondence can be correctly recovered given unbounded computational resources. Pedarsani and Grossglauser provided an achievability result of this type. Their result establishes the scaling dependence of the threshold on the number of vertices. We improve on their achievability bound. We also provide a converse bound, establishing conditions under which exact recovery is impossible. Together, these establish the scaling dependence of the threshold on the level of correlation between the two graphs. The converse and achievability bounds differ by a factor of two for sparse, significantly correlated graphs.

preprint2016arXiv

Learning Network of Multivariate Hawkes Processes: A Time Series Approach

Learning the influence structure of multiple time series data is of great interest to many disciplines. This paper studies the problem of recovering the causal structure in network of multivariate linear Hawkes processes. In such processes, the occurrence of an event in one process affects the probability of occurrence of new events in some other processes. Thus, a natural notion of causality exists between such processes captured by the support of the excitation matrix. We show that the resulting causal influence network is equivalent to the Directed Information graph (DIG) of the processes, which encodes the causal factorization of the joint distribution of the processes. Furthermore, we present an algorithm for learning the support of excitation matrix (or equivalently the DIG). The performance of the algorithm is evaluated on synthesized multivariate Hawkes networks as well as a stock market and MemeTracker real-world dataset.

preprint2016arXiv

On the Simultaneous Preservation of Privacy and Community Structure in Anonymized Networks

We consider the problem of performing community detection on a network, while maintaining privacy, assuming that the adversary has access to an auxiliary correlated network. We ask the question "Does there exist a regime where the network cannot be deanonymized perfectly, yet the community structure could be learned?." To answer this question, we derive information theoretic converses for the perfect deanonymization problem using the Stochastic Block Model and edge sub-sampling. We also provide an almost tight achievability result for perfect deanonymization. We also evaluate the performance of percolation based deanonymization algorithm on Stochastic Block Model data-sets that satisfy the conditions of our converse. Although our converse applies to exact deanonymization, the algorithm fails drastically when the conditions of the converse are met. Additionally, we study the effect of edge sub-sampling on the community structure of a real world dataset. Results show that the dataset falls under the purview of the idea of this paper. There results suggest that it may be possible to prove stronger partial deanonymizability converses, which would enable better privacy guarantees.

preprint2016arXiv

On the Vulnerability of Digital Fingerprinting Systems to Finite Alphabet Collusion Attacks

This paper proposes a novel, non-linear collusion attack on digital fingerprinting systems. The attack is proposed for fingerprinting systems with finite alphabet but can be extended to continuous alphabet. We analyze the error probability of the attack for some classes of proposed random and deterministic schemes and obtain a bound on the number of colluders necessary to correctly estimate the host signal. That is, it requires fewer number of colluders to defeat the fingerprinting scheme. Our simulation results show that our attack is more powerful in practice than predicted by the theoretical bound.

preprint2015arXiv

Bounded Degree Approximations of Stochastic Networks

We propose algorithms to approximate directed information graphs. Directed information graphs are probabilistic graphical models that depict causal dependencies between stochastic processes in a network. The proposed algorithms identify optimal and near-optimal approximations in terms of Kullback-Leibler divergence. The user-chosen sparsity trades off the quality of the approximation against visual conciseness and computational tractability. One class of approximations contains graphs with specified in-degrees. Another class additionally requires that the graph is connected. For both classes, we propose algorithms to identify the optimal approximations and also near-optimal approximations, using a novel relaxation of submodularity. We also propose algorithms to identify the r-best approximations among these classes, enabling robust decision making.

preprint2015arXiv

Directed Information Graphs

We propose a graphical model for representing networks of stochastic processes, the minimal generative model graph. It is based on reduced factorizations of the joint distribution over time. We show that under appropriate conditions, it is unique and consistent with another type of graphical model, the directed information graph, which is based on a generalization of Granger causality. We demonstrate how directed information quantifies Granger causality in a particular sequential prediction setting. We also develop efficient methods to estimate the topological structure from data that obviate estimating the joint statistics. One algorithm assumes upper-bounds on the degrees and uses the minimal dimension statistics necessary. In the event that the upper-bounds are not valid, the resulting graph is nonetheless an optimal approximation. Another algorithm uses near-minimal dimension statistics when no bounds are known but the distribution satisfies a certain criterion. Analogous to how structure learning algorithms for undirected graphical models use mutual information estimates, these algorithms use directed information estimates. We characterize the sample-complexity of two plug-in directed information estimators and obtain confidence intervals. For the setting when point estimates are unreliable, we propose an algorithm that uses confidence intervals to identify the best approximation that is robust to estimation error. Lastly, we demonstrate the effectiveness of the proposed algorithms through analysis of both synthetic data and real data from the Twitter network. In the latter case, we identify which news sources influence users in the network by merely analyzing tweet times.

preprint2015arXiv

Efficient Neighborhood Selection for Gaussian Graphical Models

This paper addresses the problem of neighborhood selection for Gaussian graphical models. We present two heuristic algorithms: a forward-backward greedy algorithm for general Gaussian graphical models based on mutual information test, and a threshold-based algorithm for walk summable Gaussian graphical models. Both algorithms are shown to be structurally consistent, and efficient. Numerical results show that both algorithms work very well.

preprint2015arXiv

Generalized sphere-packing and sphere-covering bounds on the size of codes for combinatorial channels

Many of the classic problems of coding theory are highly symmetric, which makes it easy to derive sphere-packing upper bounds and sphere-covering lower bounds on the size of codes. We discuss the generalizations of sphere-packing and sphere-covering bounds to arbitrary error models. These generalizations become especially important when the sizes of the error spheres are nonuniform. The best possible sphere-packing and sphere-covering bounds are solutions to linear programs. We derive a series of bounds from approximations to packing and covering problems and study the relationships and trade-offs between them. We compare sphere-covering lower bounds with other graph theoretic lower bounds such as Turán's theorem. We show how to obtain upper bounds by optimizing across a family of channels that admit the same codes. We present a generalization of the local degree bound of Kulkarni and Kiyavash and use it to improve the best known upper bounds on the sizes of single deletion correcting codes and single grain error correcting codes.

preprint2014arXiv

Quantifying the Information Leakage in Timing Side Channels in Deterministic Work-Conserving Schedulers

When multiple job processes are served by a single scheduler, the queueing delays of one process are often affected by the others, resulting in a timing side channel that leaks the arrival pattern of one process to the others. In this work, we study such a timing side channel between a regular user and a malicious attacker. Utilizing Shannon's mutual information as a measure of information leakage between the user and attacker, we analyze privacy-preserving behaviors of common work-conserving schedulers. We find that the attacker can always learn perfectly the user's arrival process in a longest-queue-first (LQF) scheduler. When the user's job arrival rate is very low (near zero), first-come-first-serve (FCFS) and round robin schedulers both completely reveal the user's arrival pattern. The near-complete information leakage in the low-rate traffic region is proven to be reduced by half in a work-conserving version of TDMA (WC-TDMA) scheduler, which turns out to be privacy-optimal in the class of deterministic-working-conserving (det-WC) schedulers, according to a universal lower bound on information leakage we derive for all det-WC schedulers.

preprint2013arXiv

An Improvement to Levenshtein's Upper Bound on the Cardinality of Deletion Correcting Codes

We consider deletion correcting codes over a q-ary alphabet. It is well known that any code capable of correcting s deletions can also correct any combination of s total insertions and deletions. To obtain asymptotic upper bounds on code size, we apply a packing argument to channels that perform different mixtures of insertions and deletions. Even though the set of codes is identical for all of these channels, the bounds that we obtain vary. Prior to this work, only the bounds corresponding to the all insertion case and the all deletion case were known. We recover these as special cases. The bound from the all deletion case, due to Levenshtein, has been the best known for more than forty five years. Our generalized bound is better than Levenshtein's bound whenever the number of deletions to be corrected is larger than the alphabet size.

preprint2013arXiv

An Information Theoretic Study of Timing Side Channels in Two-user Schedulers

Timing side channels in two-user schedulers are studied. When two users share a scheduler, one user may learn the other user's behavior from patterns of service timings. We measure the information leakage of the resulting timing side channel in schedulers serving a legitimate user and a malicious attacker, using a privacy metric defined as the Shannon equivocation of the user's job density. We show that the commonly used first-come-first-serve (FCFS) scheduler provides no privacy as the attacker is able to to learn the user's job pattern completely. Furthermore, we introduce an scheduling policy, accumulate-and-serve scheduler, which services jobs from the user and attacker in batches after buffering them. The information leakage in this scheduler is mitigated at the price of service delays, and the maximum privacy is achievable when large delays are added.

preprint2013arXiv

Invisible Flow Watermarks for Channels with Dependent Substitution, Deletion, and Bursty Insertion Errors

Flow watermarks efficiently link packet flows in a network in order to thwart various attacks such as stepping stones. We study the problem of designing good flow watermarks. Earlier flow watermarking schemes mostly considered substitution errors, neglecting the effects of packet insertions and deletions that commonly happen within a network. More recent schemes consider packet deletions but often at the expense of the watermark visibility. We present an invisible flow watermarking scheme capable of enduring a large number of packet losses and insertions. To maintain invisibility, our scheme uses quantization index modulation (QIM) to embed the watermark into inter-packet delays, as opposed to time intervals including many packets. As the watermark is injected within individual packets, packet losses and insertions may lead to watermark desynchronization and substitution errors. To address this issue, we add a layer of error-correction coding to our scheme. Experimental results on both synthetic and real network traces demonstrate that our scheme is robust to network jitter, packet drops and splits, while remaining invisible to an attacker.

preprint2013arXiv

Mitigating Timing Side Channel in Shared Schedulers

In this work, we study information leakage in timing side channels that arise in the context of shared event schedulers. Consider two processes, one of them an innocuous process (referred to as Alice) and the other a malicious one (referred to as Bob), using a common scheduler to process their jobs. Based on when his jobs get processed, Bob wishes to learn about the pattern (size and timing) of jobs of Alice. Depending on the context, knowledge of this pattern could have serious implications on Alice's privacy and security. For instance, shared routers can reveal traffic patterns, shared memory access can reveal cloud usage patterns, and suchlike. We present a formal framework to study the information leakage in shared resource schedulers using the pattern estimation error as a performance metric. The first-come-first-serve (FCFS) scheduling policy and time-division-multiple-access (TDMA) are identified as two extreme policies on the privacy metric, FCFS has the least, and TDMA has the highest. However, on performance based metrics, such as throughput and delay, it is well known that FCFS significantly outperforms TDMA. We then derive two parametrized policies, accumulate and serve, and proportional TDMA, which take two different approaches to offer a tunable trade-off between privacy and performance.

preprint2012arXiv

Multi-Flow Attacks Against Network Flow Watermarks: Analysis and Countermeasures

In this paper, we analyze several recent schemes for watermarking network flows that are based on splitting the flow into timing intervals. We show that this approach creates time-dependent correlations that enable an attack that combines multiple watermarked flows. Such an attack can easily be mounted in nearly all applications of network flow watermarking, both in anonymous communication and stepping stone detection. The attack can be used to detect the presence of a watermark, recover the secret parameters, and remove the watermark from a flow. The attack can be effective even if different flows are marked with different values of a watermark. We analyze the efficacy of our attack using a probabilistic model and a Markov-Modulated Poisson Process (MMPP) model of interactive traffic. We also implement our attack and test it using both synthetic and real-world traces, showing that our attack is effective with as few as 10 watermarked flows. Finally, we propose possible countermeasures to defeat the multi-flow attack.

preprint2012arXiv

Non-asymptotic Upper Bounds for Deletion Correcting Codes

Explicit non-asymptotic upper bounds on the sizes of multiple-deletion correcting codes are presented. In particular, the largest single-deletion correcting code for $q$-ary alphabet and string length $n$ is shown to be of size at most $\frac{q^n-q}{(q-1)(n-1)}$. An improved bound on the asymptotic rate function is obtained as a corollary. Upper bounds are also derived on sizes of codes for a constrained source that does not necessarily comprise of all strings of a particular length, and this idea is demonstrated by application to sets of run-length limited strings. The problem of finding the largest deletion correcting code is modeled as a matching problem on a hypergraph. This problem is formulated as an integer linear program. The upper bound is obtained by the construction of a feasible point for the dual of the linear programming relaxation of this integer linear program. The non-asymptotic bounds derived imply the known asymptotic bounds of Levenshtein and Tenengolts and improve on known non-asymptotic bounds. Numerical results support the conjecture that in the binary case, the Varshamov-Tenengolts codes are the largest single-deletion correcting codes.

preprint2012arXiv

Non-blind watermarking of network flows

Linking network flows is an important problem in intrusion detection as well as anonymity. Passive traffic analysis can link flows but requires long periods of observation to reduce errors. Active traffic analysis, also known as flow watermarking, allows for better precision and is more scalable. Previous flow watermarks introduce significant delays to the traffic flow as a side effect of using a blind detection scheme; this enables attacks that detect and remove the watermark, while at the same time slowing down legitimate traffic. We propose the first non-blind approach for flow watermarking, called RAINBOW, that improves watermark invisibility by inserting delays hundreds of times smaller than previous blind watermarks, hence reduces the watermark interference on network flows. We derive and analyze the optimum detectors for RAINBOW as well as the passive traffic analysis under different traffic models by using hypothesis testing. Comparing the detection performance of RAINBOW and the passive approach we observe that both RAINBOW and passive traffic analysis perform similarly good in the case of uncorrelated traffic, however, the RAINBOW detector drastically outperforms the optimum passive detector in the case of correlated network flows. This justifies the use of non-blind watermarks over passive traffic analysis even though both approaches have similar scalability constraints. We confirm our analysis by simulating the detectors and testing them against large traces of real network flows.

preprint2012arXiv

Two Approaches to the Construction of Deletion Correcting Codes: Weight Partitioning and Optimal Colorings

We consider the problem of constructing deletion correcting codes over a binary alphabet and take a graph theoretic view. An $n$-bit $s$-deletion correcting code is an independent set in a particular graph. We propose constructing such a code by taking the union of many constant Hamming weight codes. This results in codes that have additional structure. Searching for codes in constant Hamming weight induced subgraphs is computationally easier than searching the original graph. We prove a lower bound on size of a codebook constructed this way for any number of deletions and show that it is only a small factor below the corresponding lower bound on unrestricted codes. In the single deletion case, we find optimal colorings of the constant Hamming weight induced subgraphs. We show that the resulting code is asymptotically optimal. We discuss the relationship between codes and colorings and observe that the VT codes are optimal in a coloring sense. We prove a new lower bound on the chromatic number of the deletion channel graphs. Colorings of the deletion channel graphs that match this bound do not necessarily produce asymptotically optimal codes.

preprint2011arXiv

Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes

We investigate approximating joint distributions of random processes with causal dependence tree distributions. Such distributions are particularly useful in providing parsimonious representation when there exists causal dynamics among processes. By extending the results by Chow and Liu on dependence tree approximations, we show that the best causal dependence tree approximation is the one which maximizes the sum of directed informations on its edges, where best is defined in terms of minimizing the KL-divergence between the original and the approximate distribution. Moreover, we describe a low-complexity algorithm to efficiently pick this approximate distribution.

preprint2011arXiv

Fingerprinting with Equiangular Tight Frames

Digital fingerprinting is a framework for marking media files, such as images, music, or movies, with user-specific signatures to deter illegal distribution. Multiple users can collude to produce a forgery that can potentially overcome a fingerprinting system. This paper proposes an equiangular tight frame fingerprint design which is robust to such collusion attacks. We motivate this design by considering digital fingerprinting in terms of compressed sensing. The attack is modeled as linear averaging of multiple marked copies before adding a Gaussian noise vector. The content owner can then determine guilt by exploiting correlation between each user's fingerprint and the forged copy. The worst-case error probability of this detection scheme is analyzed and bounded. Simulation results demonstrate the average-case performance is similar to the performance of orthogonal and simplex fingerprint designs, while accommodating several times as many users.

preprint2011arXiv

Website Detection Using Remote Traffic Analysis

Recent work in traffic analysis has shown that traffic patterns leaked through side channels can be used to recover important semantic information. For instance, attackers can find out which website, or which page on a website, a user is accessing simply by monitoring the packet size distribution. We show that traffic analysis is even a greater threat to privacy than previously thought by introducing a new attack that can be carried out remotely. In particular, we show that, to perform traffic analysis, adversaries do not need to directly observe the traffic patterns. Instead, they can gain sufficient information by sending probes from a far-off vantage point that exploits a queuing side channel in routers. To demonstrate the threat of such remote traffic analysis, we study a remote website detection attack that works against home broadband users. Because the remotely observed traffic patterns are more noisy than those obtained using previous schemes based on direct local traffic monitoring, we take a dynamic time warping (DTW) based approach to detecting fingerprints from the same website. As a new twist on website fingerprinting, we consider a website detection attack, where the attacker aims to find out whether a user browses a particular web site, and its privacy implications. We show experimentally that, although the success of the attack is highly variable, depending on the target site, for some sites very low error rates. We also show how such website detection can be used to deanonymize message board users.

preprint2005arXiv

On the Minimal Pseudo-Codewords of Codes from Finite Geometries

In order to understand the performance of a code under maximum-likelihood (ML) decoding, it is crucial to know the minimal codewords. In the context of linear programming (LP) decoding, it turns out to be necessary to know the minimal pseudo-codewords. This paper studies the minimal codewords and minimal pseudo-codewords of some families of codes derived from projective and Euclidean planes. Although our numerical results are only for codes of very modest length, they suggest that these code families exhibit an interesting property. Namely, all minimal pseudo-codewords that are not multiples of a minimal codeword have an AWGNC pseudo-weight that is strictly larger than the minimum Hamming weight of the code. This observation has positive consequences not only for LP decoding but also for iterative decoding.

Negar Kiyavash

What is connected

Connect this record

See the researcher in context

Building this map preview

34 published item(s)

Active Context Selection Improves Simple Regret in Contextual Bandits

Inference Time Causal Probing in LLMs

Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets

s-ID: Causal Effect Identification in a Sub-Population

Causal Effect Identification with Context-specific Independence Relations of Control Variables

Novel Ordering-based Approaches for Causal Structure Learning in the Presence of Unobserved Variables

Revisiting the General Identifiability Problem

Impact of Data Processing on Fairness in Supervised Learning

Characterizing Distribution Equivalence and Structure Learning for Cyclic and Acyclic Directed Graphs

Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems

LazyIter: A Fast Algorithm for Counting Markov Equivalent DAGs and Designing Experiments

Model-Augmented Estimation of Conditional Mutual Information for Feature Selection

Toward Optimal Adversarial Policies in the Multiplicative Learning System with a Malicious Expert

Improved Achievability and Converse Bounds for Erdős-Rényi Graph Matching

Learning Network of Multivariate Hawkes Processes: A Time Series Approach

On the Simultaneous Preservation of Privacy and Community Structure in Anonymized Networks

On the Vulnerability of Digital Fingerprinting Systems to Finite Alphabet Collusion Attacks

Bounded Degree Approximations of Stochastic Networks

Directed Information Graphs

Efficient Neighborhood Selection for Gaussian Graphical Models

Generalized sphere-packing and sphere-covering bounds on the size of codes for combinatorial channels

Quantifying the Information Leakage in Timing Side Channels in Deterministic Work-Conserving Schedulers

An Improvement to Levenshtein's Upper Bound on the Cardinality of Deletion Correcting Codes

An Information Theoretic Study of Timing Side Channels in Two-user Schedulers

Invisible Flow Watermarks for Channels with Dependent Substitution, Deletion, and Bursty Insertion Errors

Mitigating Timing Side Channel in Shared Schedulers

Multi-Flow Attacks Against Network Flow Watermarks: Analysis and Countermeasures

Non-asymptotic Upper Bounds for Deletion Correcting Codes

Non-blind watermarking of network flows

Two Approaches to the Construction of Deletion Correcting Codes: Weight Partitioning and Optimal Colorings

Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes

Fingerprinting with Equiangular Tight Frames

Website Detection Using Remote Traffic Analysis

On the Minimal Pseudo-Codewords of Codes from Finite Geometries