Source author record

Weina Wang

Weina Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Performance Information Theory math.IT math.PR Databases Machine Learning Applications Computer Science and Game Theory Distributed, Parallel, and Cluster Computing eess.SY math.OC math.ST Statistics Theory Systems and Control

Catalog footprint

What is connected

11works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, an efficiently computable policy achieves an $O(1/\sqrt{N})$ optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.

preprint2022arXiv

Calibration with Privacy in Peer Review

Reviewers in peer review are often miscalibrated: they may be strict, lenient, extreme, moderate, etc. A number of algorithms have previously been proposed to calibrate reviews. Such attempts of calibration can however leak sensitive information about which reviewer reviewed which paper. In this paper, we identify this problem of calibration with privacy, and provide a foundational building block to address it. Specifically, we present a theoretical study of this problem under a simplified-yet-challenging model involving two reviewers, two papers, and an MAP-computing adversary. Our main results establish the Pareto frontier of the tradeoff between privacy (preventing the adversary from inferring reviewer identity) and utility (accepting better papers), and design explicit computationally-efficient algorithms that we prove are Pareto optimal.

preprint2022arXiv

On Low-Complexity Quickest Intervention of Mutated Diffusion Processes Through Local Approximation

We consider the problem of controlling a mutated diffusion process with an unknown mutation time. The problem is formulated as the quickest intervention problem with the mutation modeled by a change-point, which is a generalization of the quickest change-point detection (QCD). Our goal is to intervene in the mutated process as soon as possible while maintaining a low intervention cost with optimally chosen intervention actions. This model and the proposed algorithms can be applied to pandemic prevention (such as Covid-19) or misinformation containment. We formulate the problem as a partially observed Markov decision process (POMDP) and convert it to an MDP through the belief state of the change-point. We first propose a grid approximation approach to calculate the optimal intervention policy, whose computational complexity could be very high when the number of grids is large. In order to reduce the computational complexity, we further propose a low-complexity threshold-based policy through the analysis of the first-order approximation of the value functions in the ``local intervention'' regime. Simulation results show the low-complexity algorithm has a similar performance as the grid approximation and both perform much better than the QCD-based algorithms.

preprint2022arXiv

Tackling Heterogeneous Traffic in Multi-access Systems via Erasure Coded Servers

Most data generated by modern applications is stored in the cloud, and there is an exponential growth in the volume of jobs to access these data and perform computations using them. The volume of data access or computing jobs can be heterogeneous across different job types and can unpredictably change over time. Cloud service providers cope with this demand heterogeneity and unpredictability by over-provisioning the number of servers hosting each job type. In this paper, we propose the addition of erasure-coded servers that can flexibly serve multiple job types without additional storage cost. We analyze the service capacity region and the response time of such erasure-coded systems and compare them with standard uncoded replication-based systems currently used in the cloud. We show that coding expands the service capacity region, thus enabling the system to handle variability in demand for different data types. Moreover, we characterize the response time of the coded system in various arrival rate regimes. This analysis reveals that adding even a small number of coded servers can significantly reduce the mean response time, with a drastic reduction in regimes where the demand is skewed across different job types.

preprint2021arXiv

Heavy-Traffic Insensitive Bounds for Weighted Proportionally Fair Bandwidth Sharing Policies

We consider a connection-level model proposed by Massoulié and Roberts for bandwidth sharing among file transfer flows in a communication network. We study weighted proportionally fair sharing policies and establish explicit-form bounds on the weighted sum of the expected numbers of flows on different routes in heavy traffic. The bounds are linear in the number of critically loaded links in the network, and they hold for a class of phase-type file-size distributions; i.e., the bounds are heavy-traffic insensitive to the distributions in this class. Our approach is Lyapunov-drift based, which is different from the widely used diffusion approximation approach. A key technique we develop is to construct a novel inner product in the state space, which then allows us to obtain a multiplicative type of state-space collapse in steady state. Furthermore, this state-space collapse result implies the interchange of limits as a by-product for the diffusion approximation of the equal-weight case under phase-type file-size distributions, demonstrating the heavy-traffic insensitivity of the stationary distribution.

preprint2021arXiv

Zero Queueing for Multi-Server Jobs

Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-job-per-server model: an arrival might not "fit" into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From a queueing perspective, almost nothing is understood about multi-server job queueing systems; even understanding the exact stability region is a very hard problem. In this paper, we investigate a multi-server job queueing model under scaling regimes where the number of servers in the system grows. Specifically, we consider a system with multiple classes of jobs, where jobs from different classes can request different numbers of servers and have different service time distributions, and jobs are served in first-come-first-served order. The multi-server job model opens up new scaling regimes where both the number of servers that a job needs and the system load scale with the total number of servers. Within these scaling regimes, we derive the first results on stability, queueing probability, and the transient analysis of the number of jobs in the system for each class. In particular we derive sufficient conditions for zero queueing. Our analysis introduces a novel way of extracting information from the Lyapunov drift, which can be applicable to a broader scope of problems in queueing systems.

preprint2020arXiv

On the Privacy-Utility Tradeoff in Peer-Review Data Analysis

A major impediment to research on improving peer review is the unavailability of peer-review data, since any release of such data must grapple with the sensitivity of the peer review data in terms of protecting identities of reviewers from authors. We posit the need to develop techniques to release peer-review data in a privacy-preserving manner. Identifying this problem, in this paper we propose a framework for privacy-preserving release of certain conference peer-review data -- distributions of ratings, miscalibration, and subjectivity -- with an emphasis on the accuracy (or utility) of the released data. The crux of the framework lies in recognizing that a part of the data pertaining to the reviews is already available in public, and we use this information to post-process the data released by any privacy mechanism in a manner that improves the accuracy (utility) of the data while retaining the privacy guarantees. Our framework works with any privacy-preserving mechanism that operates via releasing perturbed data. We present several positive and negative theoretical results, including a polynomial-time algorithm for improving on the privacy-utility tradeoff.

preprint2020arXiv

Optimal Resource Allocation for Elastic and Inelastic Jobs

Modern data centers are tasked with processing heterogeneous workloads consisting of various classes of jobs. These classes differ in their arrival rates, size distributions, and job parallelizability. With respect to paralellizability, some jobs are elastic, meaning they can parallelize linearly across many servers. Other jobs are inelastic, meaning they can only run on a single server. Although job classes can differ drastically, they are typically forced to share a single cluster. When sharing a cluster among heterogeneous jobs, one must decide how to allocate servers to each job at every moment in time. In this paper, we design and analyze allocation policies which aim to minimize the mean response time across jobs, where a job's response time is the time from when it arrives until it completes. We model this problem in a stochastic setting where each job may be elastic or inelastic. Job sizes are drawn from exponential distributions, but are unknown to the system. We show that, in the common case where elastic jobs are larger on average than inelastic jobs, the optimal allocation policy is Inelastic-First, giving inelastic jobs preemptive priority over elastic jobs. We obtain this result by introducing a novel sample path argument. We also show that there exist cases where Elastic-First (giving priority to elastic jobs) performs better than Inelastic-First. We then provide the first analysis of mean response time under both Elastic-First and Inelastic-First by leveraging recent techniques for solving high-dimensional Markov chains.

preprint2016arXiv

The Value of Privacy: Strategic Data Subjects, Incentive Mechanisms and Fundamental Limits

We study the value of data privacy in a game-theoretic model of trading private data, where a data collector purchases private data from strategic data subjects (individuals) through an incentive mechanism. The private data of each individual represents her knowledge about an underlying state, which is the information that the data collector desires to learn. Different from most of the existing work on privacy-aware surveys, our model does not assume the data collector to be trustworthy. Then, an individual takes full control of its own data privacy and reports only a privacy-preserving version of her data. In this paper, the value of $ε$ units of privacy is measured by the minimum payment of all nonnegative payment mechanisms, under which an individual's best response at a Nash equilibrium is to report the data with a privacy level of $ε$. The higher $ε$ is, the less private the reported data is. We derive lower and upper bounds on the value of privacy which are asymptotically tight as the number of data subjects becomes large. Specifically, the lower bound assures that it is impossible to use less amount of payment to buy $ε$ units of privacy, and the upper bound is given by an achievable payment mechanism that we designed. Based on these fundamental limits, we further derive lower and upper bounds on the minimum total payment for the data collector to achieve a given learning accuracy target, and show that the total payment of the designed mechanism is at most one individual's payment away from the minimum.

preprint2015arXiv

On the Relation Between Identifiability, Differential Privacy and Mutual-Information Privacy

This paper investigates the relation between three different notions of privacy: identifiability, differential privacy and mutual-information privacy. Under a unified privacy-distortion framework, where the distortion is defined to be the Hamming distance of the input and output databases, we establish some fundamental connections between these three privacy notions. Given a distortion level $D$, define $ε_{\mathrm{i}}^*(D)$ to be the smallest (best) identifiability level, and $ε_{\mathrm{d}}^*(D)$ to be the smallest differential privacy level. We characterize $ε_{\mathrm{i}}^*(D)$ and $ε_{\mathrm{d}}^*(D)$, and prove that $ε_{\mathrm{i}}^*(D)-ε_X\leε_{\mathrm{d}}^*(D)\leε_{\mathrm{i}}^*(D)$ for $D$ in some range, where $ε_X$ is a constant depending on the distribution of the original database $X$, and diminishes to zero when the distribution of $X$ is uniform. Furthermore, we show that identifiability and mutual-information privacy are consistent in the sense that given distortion level $D$, the mechanism that optimizes the mutual-information privacy also minimizes the identifiability level.

preprint2014arXiv

A Minimax Distortion View of Differentially Private Query Release

We consider the problem of differentially private query release through a synthetic database approach. Departing from the existing approaches that require the query set to be specified in advance, we advocate to devise query-set independent mechanisms, with an ambitious goal of providing accurate answers, while meeting the privacy constraints, for all queries in a general query class. Specifically, a differentially private mechanism is constructed to "encode" rich stochastic structure into the synthetic database, and "customized" companion estimators are then derived to provide accurate answers by making use of all available information, including the mechanism (which is public information) and the query functions. Accordingly, the distortion under the best of this kind of mechanisms at the worst-case query in a general query class, so called the minimax distortion, provides a fundamental characterization of differentially private query release. For the general class of statistical queries, we prove that with the squared-error distortion measure, the minimax distortion is $O(1/n)$ by deriving asymptotically tight upper and lower bounds in the regime that the database size $n$ goes to infinity. The upper bound is achievable by a mechanism $\mathcal{E}$ and its corresponding companion estimators, which points directly to the feasibility of the proposed approach in large databases. We further evaluate the mechanism $\mathcal{E}$ and the companion estimators through experiments on real datasets from Netflix and Facebook. Experimental results show improvement over the state-of-art MWEM algorithm and verify the scaling behavior $O(1/n)$ of the minimax distortion.

Weina Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

Calibration with Privacy in Peer Review

On Low-Complexity Quickest Intervention of Mutated Diffusion Processes Through Local Approximation

Tackling Heterogeneous Traffic in Multi-access Systems via Erasure Coded Servers

Heavy-Traffic Insensitive Bounds for Weighted Proportionally Fair Bandwidth Sharing Policies

Zero Queueing for Multi-Server Jobs

On the Privacy-Utility Tradeoff in Peer-Review Data Analysis

Optimal Resource Allocation for Elastic and Inelastic Jobs

The Value of Privacy: Strategic Data Subjects, Incentive Mechanisms and Fundamental Limits

On the Relation Between Identifiability, Differential Privacy and Mutual-Information Privacy

A Minimax Distortion View of Differentially Private Query Release