Source author record

Chenggang Wu

Chenggang Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Computational Complexity Artificial Intelligence Computer Science and Game Theory Cryptography and Security Databases Machine Learning

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

FuncFooler: A Practical Black-box Attack Against Learning-based Binary Code Similarity Detection Methods

The binary code similarity detection (BCSD) method measures the similarity of two binary executable codes. Recently, the learning-based BCSD methods have achieved great success, outperforming traditional BCSD in detection accuracy and efficiency. However, the existing studies are rather sparse on the adversarial vulnerability of the learning-based BCSD methods, which cause hazards in security-related applications. To evaluate the adversarial robustness, this paper designs an efficient and black-box adversarial code generation algorithm, namely, FuncFooler. FuncFooler constrains the adversarial codes 1) to keep unchanged the program's control flow graph (CFG), and 2) to preserve the same semantic meaning. Specifically, FuncFooler consecutively 1) determines vulnerable candidates in the malicious code, 2) chooses and inserts the adversarial instructions from the benign code, and 3) corrects the semantic side effect of the adversarial code to meet the constraints. Empirically, our FuncFooler can successfully attack the three learning-based BCSD models, including SAFE, Asm2Vec, and jTrans, which calls into question whether the learning-based BCSD is desirable.

preprint2020arXiv

A Fault-Tolerance Shim for Serverless Computing

Serverless computing has grown in popularity in recent years, with an increasing number of applications being built on Functions-as-a-Service (FaaS) platforms. By default, FaaS platforms support retry-based fault tolerance, but this is insufficient for programs that modify shared state, as they can unwittingly persist partial sets of updates in case of failures. To address this challenge, we would like atomic visibility of the updates made by a FaaS application. In this paper, we present AFT, an atomic fault tolerance shim for serverless applications. AFT interposes between a commodity FaaS platform and storage engine and ensures atomic visibility of updates by enforcing the read atomic isolation guarantee. AFT supports new protocols to guarantee read atomic isolation in the serverless setting. We demonstrate that aft introduces minimal overhead relative to existing storage engines and scales smoothly to thousands of requests per second, while preventing a significant number of consistency anomalies.

preprint2020arXiv

Cloudburst: Stateful Functions-as-a-Service

Function-as-a-Service (FaaS) platforms and "serverless" cloud computing are becoming increasingly popular. Current FaaS offerings are targeted at stateless functions that do minimal I/O and communication. We argue that the benefits of serverless computing can be extended to a broader range of applications and algorithms. We present the design and implementation of Cloudburst, a stateful FaaS platform that provides familiar Python programming with low-latency mutable state and communication, while maintaining the autoscaling benefits of serverless computing. Cloudburst accomplishes this by leveraging Anna, an autoscaling key-value store, for state sharing and overlay routing combined with mutable caches co-located with function executors for data locality. Performant cache consistency emerges as a key challenge in this architecture. To this end, Cloudburst provides a combination of lattice-encapsulated state and new definitions and protocols for distributed session consistency. Empirical results on benchmarks and diverse applications show that Cloudburst makes stateful functions practical, reducing the state-management overheads of current FaaS platforms by orders of magnitude while also improving the state of the art in serverless consistency.

preprint2020arXiv

Optimizing Prediction Serving on Low-Latency Serverless Dataflow

Prediction serving systems are designed to provide large volumes of low-latency inferences machine learning models. These systems mix data processing and computationally intensive model inference and benefit from multiple heterogeneous processors and distributed computing resources. In this paper, we argue that a familiar dataflow API is well-suited to this latency-sensitive task, and amenable to optimization even with unmodified black-box ML models. We present the design of Cloudflow, a system that provides this API and realizes it on an autoscaling serverless backend. Cloudflow transparently implements performance-critical optimizations including operator fusion and competitive execution. Our evaluation shows that Cloudflow's optimizations yield significant performance improvements on synthetic workloads and that Cloudflow outperforms state-of-the-art prediction serving systems by as much as 2x on real-world prediction pipelines, meeting latency goals of demanding applications like real-time video analysis.

preprint2016arXiv

Approximation of barter exchanges with cycle length constraints

We explore the clearing problem in the barter exchange market. The problem, described in the terminology of graph theory, is to find a set of vertex-disjoint, length-restricted cycles that maximize the total weight in a weighted digraph. The problem has previously been shown to be NP-hard. We advance the understanding of this problem by the following contributions. We prove three constant inapproximability results for this problem. For the weighted graphs, we prove that it is NP-hard to approximate the clearing problem within a factor of 14/13 under general length constraints and within a factor of 434/433 when the cycle length is not longer than 3. For the unweighted graphs, we prove that this problem is NP-hard to approximate within a factor of 698/697. For the unweighted graphs when the cycle length is not longer than 3, we design and implement two simple and practical algorithms. Experiments on simulated data suggest that these algorithms yield excellent performances.

preprint2014arXiv

Hardness of robust graph isomorphism, Lasserre gaps, and asymmetry of random graphs

Building on work of Cai, Fürer, and Immerman \cite{CFI92}, we show two hardness results for the Graph Isomorphism problem. First, we show that there are pairs of nonisomorphic $n$-vertex graphs $G$ and $H$ such that any sum-of-squares (SOS) proof of nonisomorphism requires degree $Ω(n)$. In other words, we show an $Ω(n)$-round integrality gap for the Lasserre SDP relaxation. In fact, we show this for pairs $G$ and $H$ which are not even $(1-10^{-14})$-isomorphic. (Here we say that two $n$-vertex, $m$-edge graphs $G$ and $H$ are $α$-isomorphic if there is a bijection between their vertices which preserves at least $αm$ edges.) Our second result is that under the {\sc R3XOR} Hypothesis \cite{Fei02} (and also any of a class of hypotheses which generalize the {\sc R3XOR} Hypothesis), the \emph{robust} Graph Isomorphism problem is hard. I.e.\ for every $ε> 0$, there is no efficient algorithm which can distinguish graph pairs which are $(1-ε)$-isomorphic from pairs which are not even $(1-ε_0)$-isomorphic for some universal constant $ε_0$. Along the way we prove a robust asymmetry result for random graphs and hypergraphs which may be of independent interest.

preprint2013arXiv

Decision Trees, Protocols, and the Fourier Entropy-Influence Conjecture

Given $f:\{-1, 1\}^n \rightarrow \{-1, 1\}$, define the \emph{spectral distribution} of $f$ to be the distribution on subsets of $[n]$ in which the set $S$ is sampled with probability $\widehat{f}(S)^2$. Then the Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai (1996) states that there is some absolute constant $C$ such that $\operatorname{H}[\widehat{f}^2] \leq C\cdot\operatorname{Inf}[f]$. Here, $\operatorname{H}[\widehat{f}^2]$ denotes the Shannon entropy of $f$'s spectral distribution, and $\operatorname{Inf}[f]$ is the total influence of $f$. This conjecture is one of the major open problems in the analysis of Boolean functions, and settling it would have several interesting consequences. Previous results on the FEI conjecture have been largely through direct calculation. In this paper we study a natural interpretation of the conjecture, which states that there exists a communication protocol which, given subset $S$ of $[n]$ distributed as $\widehat{f}^2$, can communicate the value of $S$ using at most $C\cdot\operatorname{Inf}[f]$ bits in expectation. Using this interpretation, we are able show the following results: 1. First, if $f$ is computable by a read-$k$ decision tree, then $\operatorname{H}[\widehat{f}^2] \leq 9k\cdot \operatorname{Inf}[f]$. 2. Next, if $f$ has $\operatorname{Inf}[f] \geq 1$ and is computable by a decision tree with expected depth $d$, then $\operatorname{H}[\widehat{f}^2] \leq 12d\cdot \operatorname{Inf}[f]$. 3. Finally, we give a new proof of the main theorem of O'Donnell and Tan (ICALP 2013), i.e. that their FEI$^+$ conjecture composes. In addition, we show that natural improvements to our decision tree results would be sufficient to prove the FEI conjecture in its entirety. We believe that our methods give more illuminating proofs than previous results about the FEI conjecture.

Chenggang Wu

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

FuncFooler: A Practical Black-box Attack Against Learning-based Binary Code Similarity Detection Methods

A Fault-Tolerance Shim for Serverless Computing

Cloudburst: Stateful Functions-as-a-Service

Optimizing Prediction Serving on Low-Latency Serverless Dataflow

Approximation of barter exchanges with cycle length constraints

Hardness of robust graph isomorphism, Lasserre gaps, and asymmetry of random graphs

Decision Trees, Protocols, and the Fourier Entropy-Influence Conjecture