Researcher profile

Hua Sun

Hua Sun contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

Optimal Communication and Key Rate Region for Hierarchical Secure Aggregation with User Collusion

Secure aggregation is concerned with the task of securely uploading the inputs of multiple users to an aggregation server without letting the server know the inputs beyond their summation. It finds broad applications in distributed machine learning paradigms such as federated learning (FL) where multiple clients, each having access to a proprietary dataset, periodically upload their locally trained models (abstracted as inputs) to a parameter server which then generates an aggregate (e.g., averaged) model that is sent back to the clients as an initializing point for a new round of local training. To enhance the data privacy of the clients, secure aggregation protocols are developed using techniques from cryptography to ensure that the server infers no more information of the users' inputs beyond the desired aggregated input, even if the server can collude with some users. Although laying the ground for understanding the fundamental utility-security trade-off in secure aggregation, the simple star client-server architecture cannot capture more complex network architectures used in practical systems. Motivated by hierarchical federated learning, we investigate the secure aggregation problem in a $3$-layer hierarchical network consisting of clustered users connecting to an aggregation server through an intermediate layer of relays. Besides the conventional server security which requires that the server learns nothing beyond the desired sum of inputs, relay security is also imposed so that the relays infer nothing about the users' inputs and remain oblivious. For such a hierarchical secure aggregation (HSA) problem, we characterize the optimal multifaceted trade-off between communication (in terms of user-to-relay and relay-to-server communication rates) and secret key generation efficiency (in terms of individual key and source key rates).

preprint2026arXiv

Optimal Rate Region for Multi-server Secure Aggregation with User Collusion

Secure aggregation is a fundamental primitive in privacy-preserving distributed learning systems, where an aggregator aims to compute the sum of users' inputs without revealing individual data. In this paper, we study a multi-server secure aggregation problem in a two-hop network consisting of multiple aggregation servers and multiple users per server, under the presence of user collusion. Each user communicates only with its associated server, while the servers exchange messages to jointly recover the global sum. We adopt an information-theoretic security framework, allowing up to $T$ users to collude with any server. We characterize the complete optimal rate region in terms of user-to-server communication rate, server-to-server communication rate, individual key rate, and source key rate. Our main result shows that the minimum communication and individual key rates are all one symbol per input symbol, while the optimal source key rate is given by $\min\{U+V+T-2,\, UV-1\}$, where $U$ denotes the number of servers and $V$ the number of users per server. The achievability is established via a linear key construction that ensures correctness and security against colluding users, while the converse proof relies on tight entropy bounds derived from correctness and security constraints. The results reveal a fundamental tradeoff between security and key efficiency and demonstrate that the multi-server architecture can significantly reduce the required key randomness compared to single-server secure aggregation. Our findings provide a complete information-theoretic characterization of secure aggregation in multi-server systems with user collusion.

preprint2023arXiv

A Shannon-Theoretic Approach to the Storage-Retrieval Tradeoff in PIR Systems

We consider the storage-retrieval rate tradeoff in private information retrieval (PIR) systems using a Shannon-theoretic approach. Our focus is mostly on the canonical two-message two-database case, for which a coding scheme based on random codebook generation and the binning technique is proposed. This coding scheme reveals a hidden connection between PIR and the classic multiple description source coding problem. We first show that when the retrieval rate is kept optimal, the proposed non-linear scheme can achieve better performance over any linear scheme. Moreover, a non-trivial storage-retrieval rate tradeoff can be achieved beyond space-sharing between this extreme point and the other optimal extreme point, achieved by the retrieve-everything strategy. We further show that with a method akin to the expurgation technique, one can extract a zero-error PIR code from the random code. Outer bounds are also studied and compared to establish the superiority of the non-linear codes over linear codes.

preprint2022arXiv

On Extremal Rates of Secure Storage over Graphs

A secure storage code maps $K$ source symbols, each of $L_w$ bits, to $N$ coded symbols, each of $L_v$ bits, such that each coded symbol is stored in a node of a graph. Each edge of the graph is either associated with $D$ of the $K$ source symbols such that from the pair of nodes connected by the edge, we can decode the $D$ source symbols and learn no information about the remaining $K-D$ source symbols; or the edge is associated with no source symbols such that from the pair of nodes connected by the edge, nothing about the $K$ source symbols is revealed. The ratio $L_w/L_v$ is called the symbol rate of a secure storage code and the highest possible symbol rate is called the capacity. We characterize all graphs over which the capacity of a secure storage code is equal to $1$, when $D = 1$. This result is generalized to $D> 1$, i.e., we characterize all graphs over which the capacity of a secure storage code is equal to $1/D$ under a mild condition that for any node, the source symbols associated with each of its connected edges do not include a common element. Further, we characterize all graphs over which the capacity of a secure storage code is equal to $2/D$.

preprint2022arXiv

On the Fundamental Limits of Device-to-Device Private Caching under Uncoded Cache Placement and User Collusion

In the coded caching problem, as originally formulated by Maddah-Ali and Niesen, a server communicates via a noiseless shared broadcast link to multiple users that have local storage capability. In order for a user to decode its demanded file from the coded multicast transmission, the demands of all the users must be globally known, which may violate the privacy of the users. To overcome this privacy problem, Wan and Caire recently proposed several schemes that attain coded multicasting gain while simultaneously guarantee information theoretic privacy of the users' demands. In Device-to-Device (D2D) networks, the demand privacy problem is further exacerbated by the fact that each user is also a transmitter, which appears to be needing the knowledge of the files demanded by the remaining users in order to form its coded multicast transmission. This paper shows how to solve this seemingly infeasible problem. The main contribution of this paper is the development of novel achievable and converse bounds for D2D coded caching that are to within a constant factor of one another when privacy of the users' demands must be guaranteed even in the presence of colluding users.

preprint2022arXiv

Secure Summation: Capacity Region, Groupwise Key, and Feasibility

The secure summation problem is considered, where $K$ users, each holds an input, wish to compute the sum of their inputs at a server securely, i.e., without revealing any information beyond the sum even if the server may collude with any set of up to $T$ users. First, we prove a folklore result for secure summation - to compute $1$ bit of the sum securely, each user needs to send at least $1$ bit to the server, each user needs to hold a key of at least $1$ bit, and all users need to hold collectively some key variables of at least $K-1$ bits. Next, we focus on the symmetric groupwise key setting, where every group of $G$ users share an independent key. We show that for symmetric groupwise keys with group size $G$, when $G > K-T$, the secure summation problem is not feasible; when $G \leq K-T$, to compute $1$ bit of the sum securely, each user needs to send at least $1$ bit to the server and the size of each groupwise key is at least $(K-T-1)/\binom{K-T}{G}$ bits. Finally, we relax the symmetry assumption on the groupwise keys and the colluding user sets; we allow any arbitrary group of users to share an independent key and any arbitrary group of users to collude with the server. For such a general groupwise key and colluding user setting, we show that secure summation is feasible if and only if the hypergraph, where each node is a user and each edge is a group of users sharing the same key, is connected after removing the nodes corresponding to any colluding set of users and their incident edges.

preprint2021arXiv

A New Design of Cache-aided Multiuser Private Information Retrieval with Uncoded Prefetching

In the problem of cache-aided multiuser private information retrieval (MuPIR), a set of $K_{\rm u}$ cache-equipped users wish to privately download a set of messages from $N$ distributed databases each holding a library of $K$ messages. The system works in two phases: {\it cache placement (prefetching) phase} in which the users fill up their cache memory, and {\it private delivery phase} in which the users' demands are revealed and they download an answer from each database so that the their desired messages can be recovered while each individual database learns nothing about the identities of the requested messages. The goal is to design the placement and the private delivery phases such that the \emph{load}, which is defined as the total number of downloaded bits normalized by the message size, is minimized given any user memory size. This paper considers the MuPIR problem with two messages, arbitrary number of users and databases where uncoded prefetching is assumed, i.e., the users directly copy some bits from the library as their cached contents. We propose a novel MuPIR scheme inspired by the Maddah-Ali and Niesen (MAN) coded caching scheme. The proposed scheme achieves lower load than any existing schemes, especially the product design (PD), and is shown to be optimal within a factor of $8$ in general and exactly optimal at very high or low memory regime.

preprint2021arXiv

Information Theoretic Secure Aggregation with User Dropouts

In the robust secure aggregation problem, a server wishes to learn and only learn the sum of the inputs of a number of users while some users may drop out (i.e., may not respond). The identity of the dropped users is not known a priori and the server needs to securely recover the sum of the remaining surviving users. We consider the following minimal two-round model of secure aggregation. Over the first round, any set of no fewer than $U$ users out of $K$ users respond to the server and the server wants to learn the sum of the inputs of all responding users. The remaining users are viewed as dropped. Over the second round, any set of no fewer than $U$ users of the surviving users respond (i.e., dropouts are still possible over the second round) and from the information obtained from the surviving users over the two rounds, the server can decode the desired sum. The security constraint is that even if the server colludes with any $T$ users and the messages from the dropped users are received by the server (e.g., delayed packets), the server is not able to infer any additional information beyond the sum in the information theoretic sense. For this information theoretic secure aggregation problem, we characterize the optimal communication cost. When $U \leq T$, secure aggregation is not feasible, and when $U > T$, to securely compute one symbol of the sum, the minimum number of symbols sent from each user to the server is $1$ over the first round, and $1/(U-T)$ over the second round.

preprint2021arXiv

On Secure Distributed Linearly Separable Computation

Distributed linearly separable computation, where a user asks some distributed servers to compute a linearly separable function, was recently formulated by the same authors and aims to alleviate the bottlenecks of stragglers and communication cost in distributed computation. For this purpose, the data center assigns a subset of input datasets to each server, and each server computes some coded packets on the assigned datasets, which are then sent to the user. The user should recover the task function from the answers of a subset of servers, such the effect of stragglers could be tolerated. In this paper, we formulate a novel secure framework for this distributed linearly separable computation, where we aim to let the user only retrieve the desired task function without obtaining any other information about the input datasets, even if it receives the answers of all servers. In order to preserve the security of the input datasets, some common randomness variable independent of the datasets should be introduced into the transmission. We show that any non-secure linear-coding based computing scheme for the original distributed linearly separable computation problem, can be made secure without increasing the communication cost. Then we focus on the case where the computation cost of each server is minimum and aim to minimize the size of the randomness variable introduced in the system while achieving the optimal communication cost. We first propose an information theoretic converse bound on the randomness size. We then propose secure computing schemes based on two well-known data assignments, namely fractional repetition assignment and cyclic assignment. We then propose a computing scheme with novel assignment, which strictly outperforms the above two schemes. Some additional optimality results are also obtained.

preprint2020arXiv

Capacity-Achieving Private Information Retrieval Codes from MDS-Coded Databases with Minimum Message Size

We consider constructing capacity-achieving linear codes with minimum message size for private information retrieval (PIR) from $N$ non-colluding databases, where each message is coded using maximum distance separable (MDS) codes, such that it can be recovered from accessing the contents of any $T$ databases. It is shown that the minimum message size (sometimes also referred to as the sub-packetization factor) is significantly, in fact exponentially, lower than previously believed. More precisely, when $K>T/\textbf{gcd}(N,T)$ where $K$ is the total number of messages in the system and $\textbf{gcd}(\cdot,\cdot)$ means the greatest common divisor, we establish, by providing both novel code constructions and a matching converse, the minimum message size as $\textbf{lcm}(N-T,T)$, where $\textbf{lcm}(\cdot,\cdot)$ means the least common multiple. On the other hand, when $K$ is small, we show that it is in fact possible to design codes with a message size even smaller than $\textbf{lcm}(N-T,T)$.

preprint2020arXiv

Compound Secure Groupcast: Key Assignment for Selected Broadcasting

The compound secure groupcast problem is considered, where the key variables at $K$ receivers are designed so that a transmitter can securely groupcast a message to any $N$ out of the $K$ receivers through a noiseless broadcast channel. The metric is the information theoretic tradeoff between key storage $α$, i.e., the number of bits of the key variable per message bit, and broadcast bandwidth $β$, i.e., the number of bits of the broadcast information per message bit. We have three main results. First, when broadcast bandwidth is minimized, i.e., when $β= 1$, we show that the minimum key storage is $α= N$. Second, when key storage is minimized, i.e., when $α= 1$, we show that broadcast bandwidth $β= \min(N, K-N+1)$ is achievable and is optimal (minimum) if $N=2$ or $K-1$. Third, when $N=2$, the optimal key storage and broadcast bandwidth tradeoff is characterized as $α+β\geq 3, α\geq 1, β\geq 1$.

preprint2020arXiv

Conditional Disclosure of Secrets: A Noise and Signal Alignment Approach

In the conditional disclosure of secrets (CDS) problem, Alice and Bob (each holds an input and a common secret) wish to disclose, as efficiently as possible, the secret to Carol if and only if their inputs satisfy some function. The capacity of CDS is the maximum number of bits of the secret that can be securely disclosed per bit of total communication. We characterize the necessary and sufficient condition for the extreme case where the capacity of CDS is the highest and is equal to 1/2. For the simplest instance where the capacity is smaller than 1/2, we show that the linear capacity is 2/5.

preprint2020arXiv

On Optimal Load-Memory Tradeoff of Cache-Aided Scalar Linear Function Retrieval

Coded caching has the potential to greatly reduce network traffic by leveraging the cheap and abundant storage available in end-user devices so as to create multicast opportunities in the delivery phase. In the seminal work by Maddah-Ali and Niesen (MAN), the shared-link coded caching problem was formulated, where each user demands one file (i.e., single file retrieval). This paper generalizes the MAN problem so as to allow users to request scalar linear functions of the files. This paper proposes a novel coded delivery scheme that, based on MAN uncoded cache placement, is shown to allow for the decoding of arbitrary scalar linear functions of the files (on arbitrary finite fields). Interestingly, and quite surprisingly, it is shown that the load for cache-aided scalar linear function retrieval depends on the number of linearly independent functions that are demanded, akin to the cache-aided single-file retrieval problem where the load depends on the number of distinct file requests. The proposed scheme is optimal under the constraint of uncoded cache placement, in terms of worst-case load, and within a factor 2 otherwise. The key idea of this paper can be extended to all scenarios which the original MAN scheme has been extended to, including demand-private and/or device-to-device settings.

preprint2020arXiv

Secure Groupcast with Shared Keys

We consider a transmitter and $K$ receivers, each of which shares a key variable with the transmitter. Through a noiseless broadcast channel, the transmitter wishes to send a common message $W$ securely to $N$ out of the $K$ receivers while the remaining $K-N$ receivers learn no information about $W$. We are interested in the maximum message rate, i.e., the maximum number of bits of $W$ that can be securely groupcast to the legitimate receivers per key block and the minimum broadcast bandwidth, i.e., the minimum number of bits of the broadcast information required to securely groupcast the message bits. We focus on the setting of combinatorial keys, where every subset of the $K$ receivers share an independent key of arbitrary size. Under this combinatorial key setting, the maximum message rate is characterized for the following scenarios - 1) $N=1$ or $N=K-1$, i.e., secure unicast to 1 receiver with $K-1$ eavesdroppers or secure groupcast to $K-1$ receivers with $1$ eavesdropper, 2) $N=2, K=4$, i.e., secure groupcast to $2$ out of 4 receivers, and 3) the symmetric setting where the key size for any subset of the same cardinality is equal for any $N,K$. Further, for the latter two cases, the minimum broadcast bandwidth for the maximum message rate is characterized.

preprint2020arXiv

Secure Groupcast: Extra-Entropic Structure and Linear Feasibility

In the secure groupcast problem, a transmitter wants to securely groupcast a message with the maximum rate to the first $N$ of $K$ receivers by broadcasting with the minimum bandwidth, where the $K$ receivers are each equipped with a key variable from a known joint distribution. Examples are provided to prove that different instances of secure groupcast that have the same entropic structure, i.e., the same entropy for all subsets of the key variables, can have different maximum groupcast rates and different minimum broadcast bandwidth. Thus, extra-entropic structure matters for secure groupcast. Next, the maximum groupcast rate is explored when the key variables are generic linear combinations of a basis set of independent key symbols, i.e., the keys lie in generic subspaces. The maximum groupcast rate is characterized when the dimension of each key subspace is either small or large, i.e., the extreme regimes. For the intermediate regime, various interference alignment schemes originated from wireless interference networks, such as eigenvector based and asymptotic schemes, are shown to be useful.

preprint2020arXiv

Structure, examples and classification for generalized near-group fusion categories

We describe the structure of a generalized near-group fusion category and present an example of this class of fusion categories which arises from the extension of a Fibonacci category. We then classify slightly degenerate generalized near-group fusion categories. We also prove a structure result for braided generalized Tambara-Yamagami fusion categories.