Source author record

Chunming Wu

Chunming Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Cryptography and Security Databases Human-Computer Interaction math.OC Networking and Internet Architecture

Catalog footprint

What is connected

5works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

AdvSQLi: Generating Adversarial SQL Injections against Real-world WAF-as-a-service

As the first defensive layer that attacks would hit, the web application firewall (WAF) plays an indispensable role in defending against malicious web attacks like SQL injection (SQLi). With the development of cloud computing, WAF-as-a-service, as one kind of Security-as-a-service, has been proposed to facilitate the deployment, configuration, and update of WAFs in the cloud. Despite its tremendous popularity, the security vulnerabilities of WAF-as-a-service are still largely unknown, which is highly concerning given its massive usage. In this paper, we propose a general and extendable attack framework, namely AdvSQLi, in which a minimal series of transformations are performed on the hierarchical tree representation of the original SQLi payload, such that the generated SQLi payloads can not only bypass WAF-as-a-service under black-box settings but also keep the same functionality and maliciousness as the original payload. With AdvSQLi, we make it feasible to inspect and understand the security vulnerabilities of WAFs automatically, helping vendors make products more secure. To evaluate the attack effectiveness and efficiency of AdvSQLi, we first employ two public datasets to generate adversarial SQLi payloads, leading to a maximum attack success rate of 100% against state-of-the-art ML-based SQLi detectors. Furthermore, to demonstrate the immediate security threats caused by AdvSQLi, we evaluate the attack effectiveness against 7 WAF-as-a-service solutions from mainstream vendors and find all of them are vulnerable to AdvSQLi. For instance, AdvSQLi achieves an attack success rate of over 79% against the F5 WAF. Through in-depth analysis of the evaluation results, we further condense out several general yet severe flaws of these vendors that cannot be easily patched.

preprint2022arXiv

Towards the Desirable Decision Boundary by Moderate-Margin Adversarial Training

Adversarial training, as one of the most effective defense methods against adversarial attacks, tends to learn an inclusive decision boundary to increase the robustness of deep learning models. However, due to the large and unnecessary increase in the margin along adversarial directions, adversarial training causes heavy cross-over between natural examples and adversarial examples, which is not conducive to balancing the trade-off between robustness and natural accuracy. In this paper, we propose a novel adversarial training scheme to achieve a better trade-off between robustness and natural accuracy. It aims to learn a moderate-inclusive decision boundary, which means that the margins of natural examples under the decision boundary are moderate. We call this scheme Moderate-Margin Adversarial Training (MMAT), which generates finer-grained adversarial examples to mitigate the cross-over problem. We also take advantage of logits from a teacher model that has been well-trained to guide the learning of our model. Finally, MMAT achieves high natural accuracy and robustness under both black-box and white-box attacks. On SVHN, for example, state-of-the-art robustness and natural accuracy are achieved.

preprint2022arXiv

Treating Crowdsourcing as Examination: How to Score Tasks and Online Workers?

Crowdsourcing is an online outsourcing mode which can solve the current machine learning algorithm's urge need for massive labeled data. Requester posts tasks on crowdsourcing platforms, which employ online workers over the Internet to complete tasks, then aggregate and return results to requester. How to model the interaction between different types of workers and tasks is a hot spot. In this paper, we try to model workers as four types based on their ability: expert, normal worker, sloppy worker and spammer, and divide tasks into hard, medium and easy task according to their difficulty. We believe that even experts struggle with difficult tasks while sloppy workers can get easy tasks right, and spammers always give out wrong answers deliberately. So, good examination tasks should have moderate degree of difficulty and discriminability to score workers more objectively. Thus, we first score workers' ability mainly on the medium difficult tasks, then reducing the weight of answers from sloppy workers and modifying the answers from spammers when inferring the tasks' ground truth. A probability graph model is adopted to simulate the task execution process, and an iterative method is adopted to calculate and update the ground truth, the ability of workers and the difficulty of the task successively. We verify the rightness and effectiveness of our algorithm both in simulated and real crowdsourcing scenes.

preprint2021arXiv

Machine Learning based Malicious Payload Identification in Software-Defined Networking

Deep packet inspection (DPI) has been extensively investigated in software-defined networking (SDN) as complicated attacks may intractably inject malicious payloads in the packets. Existing proprietary pattern-based or port-based third-party DPI tools can suffer from limitations in efficiently processing a large volume of data traffic. In this paper, a novel OpenFlow-enabled deep packet inspection (OFDPI) approach is proposed based on the SDN paradigm to provide adaptive and efficient packet inspection. First, OFDPI prescribes an early detection at the flow-level granularity by checking the IP addresses of each new flow via OpenFlow protocols. Then, OFDPI allows for deep packet inspection at the packet-level granularity: (i) for unencrypted packets, OFDPI extracts the features of accessible payloads, including tri-gram frequency based on Term Frequency and Inverted Document Frequency (TF-IDF) and linguistic features. These features are concatenated into a sparse matrix representation and are then applied to train a binary classifier with logistic regression rather than matching with specific pattern combinations. In order to balance the detection accuracy and performance bottleneck of the SDN controller, OFDPI introduces an adaptive packet sampling window based on the linear prediction; and (ii) for encrypted packets, OFDPI extracts notable features of packets and then trains a binary classifier with a decision tree, instead of decrypting the encrypted traffic to weaken user privacy. A prototype of OFDPI is implemented on the Ryu SDN controller and the Mininet platform. The performance and the overhead of the proposed sulotion are assessed using the real-world datasets through experiments. The numerical results indicate that OFDPI can provide a significant improvement in detection accuracy with acceptable overheads.

preprint2020arXiv

Generalized Transformation-based Gradient

The reparameterization trick has become one of the most useful tools in the field of variational inference. However, the reparameterization trick is based on the standardization transformation which restricts the scope of application of this method to distributions that have tractable inverse cumulative distribution functions or are expressible as deterministic transformations of such distributions. In this paper, we generalized the reparameterization trick by allowing a general transformation. We discover that the proposed model is a special case of control variate indicating that the proposed model can combine the advantages of CV and generalized reparameterization.