Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2024arXiv

Open Set Dandelion Network for IoT Intrusion Detection

As IoT devices become widely, it is crucial to protect them from malicious intrusions. However, the data scarcity of IoT limits the applicability of traditional intrusion detection methods, which are highly data-dependent. To address this, in this paper we propose the Open-Set Dandelion Network (OSDN) based on unsupervised heterogeneous domain adaptation in an open-set manner. The OSDN model performs intrusion knowledge transfer from the knowledge-rich source network intrusion domain to facilitate more accurate intrusion detection for the data-scarce target IoT intrusion domain. Under the open-set setting, it can also detect newly-emerged target domain intrusions that are not observed in the source domain. To achieve this, the OSDN model forms the source domain into a dandelion-like feature space in which each intrusion category is compactly grouped and different intrusion categories are separated, i.e., simultaneously emphasising inter-category separability and intra-category compactness. The dandelion-based target membership mechanism then forms the target dandelion. Then, the dandelion angular separation mechanism achieves better inter-category separability, and the dandelion embedding alignment mechanism further aligns both dandelions in a finer manner. To promote intra-category compactness, the discriminating sampled dandelion mechanism is used. Assisted by the intrusion classifier trained using both known and generated unknown intrusion knowledge, a semantic dandelion correction mechanism emphasises easily-confused categories and guides better inter-category separability. Holistically, these mechanisms form the OSDN model that effectively performs intrusion knowledge transfer to benefit IoT intrusion detection. Comprehensive experiments on several intrusion datasets verify the effectiveness of the OSDN model, outperforming three state-of-the-art baseline methods by 16.9%.

preprint2022arXiv

A self-contained and self-explanatory DNA storage system

Current research on DNA storage usually focuses on the improvement of storage density by developing effective encoding and decoding schemes while lacking the consideration on the uncertainty in ultra-long-term data storage and retention. Consequently, the current DNA storage systems are often not self-contained, implying that they have to resort to external tools for the restoration of the stored DNA data. This may result in high risks in data loss since the required tools might not be available due to the high uncertainty in far future. To address this issue, we propose in this paper a self-contained DNA storage system that can bring self-explanatory to its stored data without relying on any external tool. To this end, we design a specific DNA file format whereby a separate storage scheme is developed to reduce the data redundancy while an effective indexing is designed for random read operations to the stored data file. We verified through experimental data that the proposed self-contained and self-explanatory method can not only get rid of the reliance on external tools for data restoration but also minimise the data redundancy brought about when the amount of data to be stored reaches a certain scale.

preprint2022arXiv

Learning to Rank with Small Set of Ground Truth Data

Over the past decades, researchers had put lots of effort investigating ranking techniques used to rank query results retrieved during information retrieval, or to rank the recommended products in recommender systems. In this project, we aim to investigate searching, ranking, as well as recommendation techniques to help to realize a university academia searching platform. Unlike the usual information retrieval scenarios where lots of ground truth ranking data is present, in our case, we have only limited ground truth knowledge regarding the academia ranking. For instance, given some search queries, we only know a few researchers who are highly relevant and thus should be ranked at the top, and for some other search queries, we have no knowledge about which researcher should be ranked at the top at all. The limited amount of ground truth data makes some of the conventional ranking techniques and evaluation metrics become infeasible, and this is a huge challenge we faced during this project. This project enhances the user's academia searching experience to a large extent, it helps to achieve an academic searching platform which includes researchers, publications and fields of study information, which will be beneficial not only to the university faculties but also to students' research experiences.

preprint2022arXiv

MIX-RS: A Multi-indexing System based on HDFS for Remote Sensing Data Storage

A large volume of remote sensing (RS) data has been generated with the deployment of satellite technologies. The data facilitates research in ecological monitoring, land management and desertification, etc. The characteristics of RS data (e.g., enormous volume, large single-file size and demanding requirement of fault tolerance) make the Hadoop Distributed File System (HDFS) an ideal choice for RS data storage as it is efficient, scalable and equipped with a data replication mechanism for failure resilience. To use RS data, one of the most important techniques is geospatial indexing. However, the large data volume makes it time-consuming to efficiently construct and leverage. Considering that most modern geospatial data centres are equipped with HDFS-based big data processing infrastructures, deploying multiple geospatial indices becomes natural to optimise the efficacy. Moreover, because of the reliability introduced by high-quality hardware and the infrequently modified property of the RS data, the use of multi-indexing will not cause large overhead. Therefore, we design a framework called Multi-IndeXing-RS (MIX-RS) that unifies the multi-indexing mechanism on top of the HDFS with data replication enabled for both fault tolerance and geospatial indexing efficiency. Given the fault tolerance provided by the HDFS, RS data is structurally stored inside for faster geospatial indexing. Additionally, multi-indexing enhances efficiency. The proposed technique naturally sits on top of the HDFS to form a holistic framework without incurring severe overhead or sophisticated system implementation efforts. The MIX-RS framework is implemented and evaluated using real remote sensing data provided by the Chinese Academy of Sciences, demonstrating excellent geospatial indexing performance.

preprint2022arXiv

PackCache: An Online Cost-driven Data Caching Algorithm in the Cloud

In this paper, we study a data caching problem in the cloud environment, where multiple frequently co-utilised data items could be packed as a single item being transferred to serve a sequence of data requests dynamically with reduced cost. To this end, we propose an online algorithm with respect to a homogeneous cost model, called PackCache, that can leverage the FP-Tree technique to mine those frequently co-utilised data items for packing whereby the incoming requests could be cost-effectively served online by exploiting the concept of anticipatory caching. We show the algorithm is 2αcompetitive, reaching the lower bound of the competitive ratio for any deterministic online algorithm on the studied caching problem, and also time and space efficient to serve the requests. Finally, we evaluate the performance of the algorithm via experimental studies to show its actual cost-effectiveness and scalability.

preprint2022arXiv

PECCO: A Profit and Cost-oriented Computation Offloading Scheme in Edge-Cloud Environment with Improved Moth-flame Optimisation

With the fast growing quantity of data generated by smart devices and the exponential surge of processing demand in the Internet of Things (IoT) era, the resource-rich cloud centres have been utilised to tackle these challenges. To relieve the burden on cloud centres, edge-cloud computation offloading becomes a promising solution since shortening the proximity between the data source and the computation by offloading computation tasks from the cloud to edge devices can improve performance and Quality of Service (QoS). Several optimisation models of edge-cloud computation offloading have been proposed that take computation costs and heterogeneous communication costs into account. However, several important factors are not jointly considered, such as heterogeneities of tasks, load balancing among nodes and the profit yielded by computation tasks, which lead to the profit and cost-oriented computation offloading optimisation model PECCO proposed in this paper. Considering that the model is hard in nature and the optimisation objective is not differentiable, we propose an improved Moth-flame optimiser PECCO-MFI which addresses some deficiencies of the original Moth-flame Optimiser and integrate it under the edge-cloud environment. Comprehensive experiments are conducted to verify the superior performance of the proposed method when optimising the proposed task offloading model under the edge-cloud environment.

preprint2022arXiv

Towards Fast Theta-join: A Prefiltering and Amalgamated Partitioning Approach

As one of the most useful online processing techniques, the theta-join operation has been utilized by many applications to fully excavate the relationships between data streams in various scenarios. As such, constant research efforts have been put to optimize its performance in the distributed environment, which is typically characterized by reducing the number of Cartesian products as much as possible. In this article, we design and implement a novel fast theta-join algorithm, called Prefap, by developing two distinct techniques - prefiltering and amalgamated partitioning-based on the state-of-the-art FastThetaJoin algorithm to optimize the efficiency of the theta-join operation. Firstly, we develop a prefiltering strategy before data streams are partitioned to reduce the amount of data to be involved and benefit a more fine-grained partitioning. Secondly, to avoid the data streams being partitioned in a coarse-grained isolated manner and improve the quality of the partition-level filtering, we introduce an amalgamated partitioning mechanism that can amalgamate the partitioning boundaries of two data streams to assist a fine-grained partitioning. With the integration of these two techniques into the existing FastThetaJoin algorithm, we design and implement a new framework to achieve a decreased number of Cartesian products and a higher theta-join efficiency. By comparing with existing algorithms, FastThetaJoin in particular, we evaluate the performance of Prefap on both synthetic and real data streams from two-way to multiway theta-join to demonstrate its superiority.