Source author record

Yu Huang

Yu Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

57works

34topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SAIL: Structure-Aware Interpretable Learning for Anatomy-Aligned Post-hoc Explanations in OCT

Optical coherence tomography (OCT), a commonly used retinal imaging modality, plays a central role in retinal disease diagnosis by providing high-resolution visualization of retinal layers. While deep learning (DL) has achieved expert-level accuracy in OCT-based retinal disease detection, its "black box" nature poses challenges for clinical adoption, where explainability is essential for clinical trust and regulatory approval. Existing post-hoc explainable AI (XAI) methods often struggle to delineate fine-grained lesion structures, respect anatomical boundaries, or suppress noise, limiting the trustworthiness of their explanations. To bridge these gaps, we propose a Structure-Aware Interpretable Learning (SAIL) framework that integrates retinal anatomical priors at the representation level and couples them with semantic features via a fusion design. Without modifying standard post-hoc explainability methods, this representation yields sharper and more anatomically aligned attribution maps. Comprehensive experiments on diverse OCT datasets demonstrate that our structure-aware method consistently enhances interpretability, producing clinically meaningful and anatomy-aware explanations. Ablation studies further show that strong interpretability requires both structural priors and semantic features, and that properly fusing the two is critical to achieve the best explanation quality. Together, these results highlight structure-aware representations as a key step toward reliable explainability in OCT.

preprint2024arXiv

Applications of Large Scale Foundation Models for Autonomous Driving

Since DARPA Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.

preprint2022arXiv

A Deep-Learning Usability Expansion Model of Ocean Observations

Today's ocean numerical prediction skills depend on the availability of in-situ and remote ocean observations at the time of the predictions only. Because observations are scarce and discontinuous in time and space, numerical models are often unable to accurately model and predict real ocean dynamics, leading to a lack of fulfillment of a range of services that require reliable predictions at various temporal and spatial scales. The process of constraining free numerical models with observations is known as data assimilation. The primary objective is to minimize the misfit of model states with the observations while respecting the rules of physics. The caveat of this approach is that measurements are used only once, at the time of the prediction. The information contained in the history of the measurements and its role in the determinism of the prediction is, therefore, not accounted for. Consequently, historical measurement cannot be used in real-time forecasting systems. The research presented in this paper provides a novel approach rooted in artificial intelligence to expand the usability of observations made before the time of the prediction. Our approach is based on the re-purpose of an existing deep learning model, called U-Net, designed specifically for image segmentation analysis in the biomedical field. U-Net is used here to create a Transform Model that retains the temporal and spatial evolution of the differences between model and observations to produce a correction in the form of regression weights that evolves spatially and temporally with the model both forward and backward in time, beyond the observation period. Using virtual observations, we show that the usability of the observation can be extended up to a one year prior or post observations.

preprint2022arXiv

A Multi-Characteristic Learning Method with Micro-Doppler Signatures for Pedestrian Identification

The identification of pedestrians using radar micro-Doppler signatures has become a hot topic in recent years. In this paper, we propose a multi-characteristic learning (MCL) model with clusters to jointly learn discrepant pedestrian micro-Doppler signatures and fuse the knowledge learned from each cluster into final decisions. Time-Doppler spectrogram (TDS) and signal statistical features extracted from FMCW radar, as two categories of micro-Doppler signatures, are used in MCL to learn the micro-motion information inside pedestrians' free walking patterns. The experimental results show that our model achieves a higher accuracy rate and is more stable for pedestrian identification than other studies, which make our model more practical.

preprint2022arXiv

BSODA: A Bipartite Scalable Framework for Online Disease Diagnosis

A growing number of people are seeking healthcare advice online. Usually, they diagnose their medical conditions based on the symptoms they are experiencing, which is also known as self-diagnosis. From the machine learning perspective, online disease diagnosis is a sequential feature (symptom) selection and classification problem. Reinforcement learning (RL) methods are the standard approaches to this type of tasks. Generally, they perform well when the feature space is small, but frequently become inefficient in tasks with a large number of features, such as the self-diagnosis. To address the challenge, we propose a non-RL Bipartite Scalable framework for Online Disease diAgnosis, called BSODA. BSODA is composed of two cooperative branches that handle symptom-inquiry and disease-diagnosis, respectively. The inquiry branch determines which symptom to collect next by an information-theoretic reward. We employ a Product-of-Experts encoder to significantly improve the handling of partial observations of a large number of features. Besides, we propose several approximation methods to substantially reduce the computational cost of the reward to a level that is acceptable for online services. Additionally, we leverage the diagnosis model to estimate the reward more precisely. For the diagnosis branch, we use a knowledge-guided self-attention model to perform predictions. In particular, BSODA determines when to stop inquiry and output predictions using both the inquiry and diagnosis models. We demonstrate that BSODA outperforms the state-of-the-art methods on several public datasets. Moreover, we propose a novel evaluation method to test the transferability of symptom checking methods from synthetic to real-world tasks. Compared to existing RL baselines, BSODA is more effectively scalable to large search spaces.

preprint2022arXiv

Compositional Model Checking of Consensus Protocols Specified in TLA+ via Interaction-Preserving Abstraction

Consensus protocols are widely used in building reliable distributed software systems and its correctness is of vital importance. TLA+ is a lightweight formal specification language which enables precise specification of system design and exhaustive checking of the design without any human effort. The features of TLA+ make it widely used in the specification and model checking of consensus protocols, both in academia and industry. However, the application of TLA+ is limited by the state explosion problem in model checking. Though compositional model checking is essential to tame the state explosion problem, existing compositional checking techniques do not sufficiently consider the characteristics of TLA+. In this work, we propose the Interaction-Preserving Abstraction (IPA) framework, which leverages the features of TLA+ and enables practical and efficient compositional model checking of consensus protocols specified in TLA+. In the IPA framework, system specification is partitioned into multiple modules, and each module is divided to the internal part and the interaction part. The basic idea of the interaction-preserving abstraction is to omit the internal part of each module, such that another module cannot distinguish whether it is interacting with the original module or the coarsened abstract one. We use the IPA framework to the compositional checking of the TLA+ specification of two consensus protocols Raft and ParallelRaft. Raft is a consensus protocol which is originally developed in the academia and then widely used in industry. ParallelRaft is the replication protocol in PolarFS, the distributed file system for the commercial database Alibaba PoloarDB. We demonstrate that the IPA framework is easy to use in realistic scenarios and at the same time significantly reduces the model checking cost.

preprint2022arXiv

DeepGate: Learning Neural Representations of Logic Gates

Applying deep learning (DL) techniques in the electronic design automation (EDA) field has become a trending topic. Most solutions apply well-developed DL models to solve specific EDA problems. While demonstrating promising results, they require careful model tuning for every problem. The fundamental question on "How to obtain a general and effective neural representation of circuits?" has not been answered yet. In this work, we take the first step towards solving this problem. We propose DeepGate, a novel representation learning solution that effectively embeds both logic function and structural information of a circuit as vectors on each gate. Specifically, we propose transforming circuits into unified and-inverter graph format for learning and using signal probabilities as the supervision task in DeepGate. We then introduce a novel graph neural network that uses strong inductive biases in practical circuits as learning priors for signal probability prediction. Our experimental results show the efficacy and generalization capability of DeepGate.

preprint2022arXiv

DeepTPI: Test Point Insertion with Deep Reinforcement Learning

Test point insertion (TPI) is a widely used technique for testability enhancement, especially for logic built-in self-test (LBIST) due to its relatively low fault coverage. In this paper, we propose a novel TPI approach based on deep reinforcement learning (DRL), named DeepTPI. Unlike previous learning-based solutions that formulate the TPI task as a supervised-learning problem, we train a novel DRL agent, instantiated as the combination of a graph neural network (GNN) and a Deep Q-Learning network (DQN), to maximize the test coverage improvement. Specifically, we model circuits as directed graphs and design a graph-based value network to estimate the action values for inserting different test points. The policy of the DRL agent is defined as selecting the action with the maximum value. Moreover, we apply the general node embeddings from a pre-trained model to enhance node features, and propose a dedicated testability-aware attention mechanism for the value network. Experimental results on circuits with various scales show that DeepTPI significantly improves test coverage compared to the commercial DFT tool. The code of this work is available at https://github.com/cure-lab/DeepTPI.

preprint2022arXiv

Detection Interval for Diffusion Molecular Communication: How Long is Enough?

Molecular communication has a key role to play in future medical applications, including detecting, analyzing, and addressing infectious disease outbreaks. Overcoming inter-symbol interference (ISI) is one of the key challenges in the design of molecular communication systems. In this paper, we propose to optimize the detection interval to minimize the impact of ISI while ensuring the accurate detection of the transmitted information symbol, which is suitable for the absorbing and passive receivers. For tractability, based on the signal-to-interference difference (SID) and signal-to-interference-and-noise amplitude ratio (SINAR), we propose a modified-SINAR (mSINAR) to measure the bit error rate (BER) performance for the molecular communication system with a variable detection interval. Besides, we derive the optimal detection interval in closed form. Using simulation results, we show that the BER performance of our proposed mSINAR scheme is superior to the competing schemes, and achieves similar performance to optimal intervals found by the exhaustive search.

preprint2022arXiv

Invariance entropy for uncertain control systems

We introduce a notion of invariance entropy for uncertain control systems, which is, roughly speaking, the exponential growth rate of "branches" of "trees" that are formed by controls and are necessary to achieve invariance of controlled invariant subsets of the state space. This entropy extends the invariance entropy for deterministic control systems introduced by Colonius and Kawan (2009). We show that invariance feedback entropy, proposed by Tomar, Rungger, and Zamani (2020), is bounded from below by our invariance entropy. We generalize the formula for the calculation of entropy of invariant partitions obtained by Tomar, Kawan, and Zamani (2020) to quasi-invariant-partitions. Moreover, we also derive lower and upper bounds for entropy of a quasi-invariant-partition by spectral radii of its adjacency matrix and weighted adjacency matrix. With some reasonable assumptions, we obtain explicit formulas for computing invariance entropy for uncertain control systems and invariance feedback entropy for finite controlled invariant sets.

preprint2022arXiv

LHNN: Lattice Hypergraph Neural Network for VLSI Congestion Prediction

Precise congestion prediction from a placement solution plays a crucial role in circuit placement. This work proposes the lattice hypergraph (LH-graph), a novel graph formulation for circuits, which preserves netlist data during the whole learning process, and enables the congestion information propagated geometrically and topologically. Based on the formulation, we further developed a heterogeneous graph neural network architecture LHNN, jointing the routing demand regression to support the congestion spot classification. LHNN constantly achieves more than 35% improvements compared with U-nets and Pix2Pix on the F1 score. We expect our work shall highlight essential procedures using machine learning for congestion prediction.

preprint2022arXiv

Low Complexity First: Duration-Centric ISI Mitigation in Molecular Communication via Diffusion

In this paper, we propose a novel inter-symbol interference (ISI) mitigation scheme for molecular communication via diffusion (MCvD) systems with the optimal detection interval. Its rationale is to exploit the discarded duration (i.e., the symbol duration outside this optimal interval) to relieve ISI in the target system. Following this idea, we formulate an objective function to quantify the impact of the discarded time on bit error rate (BER) performance. Besides, an optimally reusable interval within the discarded duration is derived in closed form, which applies to both the absorbing and passive receivers. Finally, numerical results validate our analysis and show that for the considered MCvD system, significant BER improvements can be achieved by using the derived reusable duration.

preprint2022arXiv

MET: Model Checking-Driven Explorative Testing of CRDT Designs and Implementations

Internet-scale distributed systems often replicate data at multiple geographic locations to provide low latency and high availability. The Conflict-free Replicated Data Type (CRDT) is a framework that provides a principled approach to maintaining eventual consistency among data replicas. CRDTs have been notoriously difficult to design and implement correctly. Subtle deep bugs lie in the complex and tedious handling of all possible cases of conflicting data updates. We argue that the CRDT design should be formally specified and model-checked to uncover deep bugs. The implementation further needs to be systematically tested. On the one hand, the testing needs to inherit the exhaustive nature of the model checking and ensures the coverage of testing. On the other hand, the testing is expected to find coding errors which cannot be detected by design level verification. Towards the challenges above, we propose the Model Checking-driven Explorative Testing (MET) framework. At the design level, MET uses TLA+ to specify and model check CRDT designs. At the implementation level, MET conducts model checking-driven explorative testing, in the sense that the test cases are automatically generated from the model checking traces. The system execution is controlled to proceed deterministically, following the model checking trace. The explorative testing systematically controls and permutes all nondeterministic message reorderings. We apply MET in our practical development of CRDTs. The bugs in both designs and implementations of CRDTs are found. As for bugs which can be found by traditional testing techniques, MET greatly reduces the cost of fixing the bugs. Moreover, MET can find subtle deep bugs which cannot be found by existing techniques at a reasonable cost. We further discuss how MET provides us with sufficient confidence in the correctness of our CRDT designs and implementations.

preprint2022arXiv

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

Despite the remarkable success of deep multi-modal learning in practice, it has not been well-explained in theory. Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network, which is counter-intuitive since multiple signals generally bring more information. This work provides a theoretical explanation for the emergence of such performance gap in neural networks for the prevalent joint training framework. Based on a simplified data distribution that captures the realistic property of multi-modal data, we prove that for the multi-modal late-fusion network with (smoothed) ReLU activation trained jointly by gradient descent, different modalities will compete with each other. The encoder networks will learn only a subset of modalities. We refer to this phenomenon as modality competition. The losing modalities, which fail to be discovered, are the origins where the sub-optimality of joint training comes from. Experimentally, we illustrate that modality competition matches the intrinsic behavior of late-fusion joint training.

preprint2022arXiv

Provable Generalization of Overparameterized Meta-learning Trained with SGD

Despite the superior empirical success of deep meta-learning, theoretical understanding of overparameterized meta-learning is still limited. This paper studies the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML), which aims to find a good initialization for fast adaptation to new tasks. Under a mixed linear regression model, we analyze the generalization properties of MAML trained with SGD in the overparameterized regime. We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds. With such sharp characterizations, we further explore how various learning parameters impact the generalization capability of overparameterized MAML, including explicitly identifying typical data and task distributions that can achieve diminishing generalization error with overparameterization, and characterizing the impact of adaptation learning rate on both excess risk and the early stopping time. Our theoretical findings are further validated by experiments.

preprint2022arXiv

Radiologist-level Performance by Using Deep Learning for Segmentation of Breast Cancers on MRI Scans

Purpose: To develop a deep network architecture that would achieve fully automated radiologist-level segmentation of cancers at breast MRI. Materials and Methods: In this retrospective study, 38229 examinations (composed of 64063 individual breast scans from 14475 patients) were performed in female patients (age range, 12-94 years; mean age, 52 years +/- 10 [standard deviation]) who presented between 2002 and 2014 at a single clinical site. A total of 2555 breast cancers were selected that had been segmented on two-dimensional (2D) images by radiologists, as well as 60108 benign breasts that served as examples of noncancerous tissue; all these were used for model training. For testing, an additional 250 breast cancers were segmented independently on 2D images by four radiologists. Authors selected among several three-dimensional (3D) deep convolutional neural network architectures, input modalities, and harmonization methods. The outcome measure was the Dice score for 2D segmentation, which was compared between the network and radiologists by using the Wilcoxon signed rank test and the two one-sided test procedure. Results: The highest-performing network on the training set was a 3D U-Net with dynamic contrast-enhanced MRI as input and with intensity normalized for each examination. In the test set, the median Dice score of this network was 0.77 (interquartile range, 0.26). The performance of the network was equivalent to that of the radiologists (two one-sided test procedures with radiologist performance of 0.69-0.84 as equivalence bounds, P <= .001 for both; n = 250). Conclusion: When trained on a sufficiently large dataset, the developed 3D U-Net performed as well as fellowship-trained radiologists in detailed 2D segmentation of breast cancers at routine clinical MRI.

preprint2022arXiv

Remove-Win: a Design Framework for Conflict-free Replicated Data Types

Distributed storage systems employ replication to improve performance and reliability. To provide low latency data access, replicas are often required to accept updates without coordination with each other, and the updates are then propagated asynchronously. This brings the critical challenge of conflict resolution among concurrent updates. Conflict-free Replicated Data Type (CRDT) is a principled approach to addressing this challenge. However, existing CRDT designs are tricky, and hard to be generalized to other data types. A design framework is in great need to guide the systematic design of new CRDTs. To address this challenge, we propose RWF -- the Remove-Win design Framework for CRDTs. RWF leverages the simple but powerful remove-win strategy to resolve conflicting updates, and provides generic design for a variety of data container types. Two exemplar implementations following RWF are given over the Redis data type store, which demonstrate the effectiveness of RWF. Performance measurements of our implementations further show the efficiency of CRDT designs following RWF.

preprint2022arXiv

Verifying Transactional Consistency of MongoDB

MongoDB is a popular general-purpose, document-oriented, distributed NoSQL database. It supports transactions in three different deployments: single-document transactions utilizing the WiredTiger storage engine in a standalone node, multi-document transactions in a replica set which consists of a primary node and several secondary nodes, and distributed transactions in a sharded cluster which is a group of multiple replica sets, among which data is sharded. A natural and fundamental question about MongoDB transactions is: What transactional consistency guarantee do MongoDB Transactions in each deployment provide? However, it lacks both concise pseudocode of MongoDB transactions in each deployment and formal specification of the consistency guarantees which MongoDB claimed to provide. In this work, we formally specify and verify the transactional consistency protocols of MongoDB. Specifically, we provide a concise pseudocode for the transactional consistency protocols in each MongoDB deployment, namely WIREDTIGER, REPLICASET, and SHARDEDCLUSTER, based on the official documents and source code. We then prove that WIREDTIGER, REPLICASET, and SHARDEDCLUSTER satisfy different variants of snapshot isolation, namely Strong-SI, Realtime-SI, and Session-SI, respectively. We also propose and evaluate efficient white-box checking algorithms for MongoDB transaction protocols against their consistency guarantees, effectively circumventing the NP-hard obstacle in theory.

preprint2022arXiv

ViSearch: Weak Consistency Measurement for Replicated Data Types

Large-scale replicated data type stores often resort to eventual consistency to guarantee low latency and high availability. It is widely accepted that programming over eventually consistent data stores is challenging, since arbitrary divergence among replicas is allowed. Moreover, pragmatic protocols actually achieve consistency guarantees stronger than eventual consistency, which can be and need to be utilized to facilitate the reasoning of and programming over replicated data types. Toward the challenges above, we propose the ViSearch framework for precise measurement of eventual consistency semantics. ViSearch employs the visibility-arbitration specification methodology in concurrent programming, which extends the linearizability-based specification methodology with a dynamic visibility relation among operations, in addition to the standard dynamic happen-before and linearization relations. The consistency measurement using ViSearch is NP-hard in general. To enable practical and efficient consistency measurement in replicated data type stores, the ViSearch framework refactors the existing brute-force checking algorithm to a generic algorithm skeleton, which further enables efficient pruning of the search space and effective parallelization. We employ the ViSearch framework for consistency measurement in two replicated data type stores Riak and CRDT-Redis. The experimental evaluation shows the usefulness and cost-effectiveness of consistency measurement based on the ViSearch framework in realistic scenarios.

preprint2021arXiv

IEEE BigData 2021 Cup: Soft Sensing at Scale

IEEE BigData 2021 Cup: Soft Sensing at Scale is a data mining competition organized by Seagate Technology, in association with the IEEE BigData 2021 conference. The scope of this challenge is to tackle the task of classifying soft sensing data with machine learning techniques. In this paper we go into the details of the challenge and describe the data set provided to participants. We define the metrics of interest, baseline models, and describe approaches we found meaningful which may be a good starting point for further analysis. We discuss the results obtained with our approaches and give insights on what potential challenges participants may run into. Students, researchers, and anyone interested in working on a major industrial problem are welcome to participate in the challenge!

preprint2021arXiv

Leaving My Fingerprints: Motivations and Challenges of Contributing to OSS for Social Good

When inspiring software developers to contribute to open source software, the act is often referenced as an opportunity to build tools to support the developer community. However, that is not the only charge that propels contributions -- growing interest in open source has also been attributed to software developers deciding to use their technical skills to benefit a common societal good. To understand how developers identify these projects, their motivations for contributing, and challenges they face, we conducted 21 semi-structured interviews with OSS for Social Good (OSS4SG) contributors. From our interview analysis, we identified themes of contribution styles that we wanted to understand at scale by deploying a survey to over 5765 OSS and Open Source Software for Social Good contributors. From our quantitative analysis of 517 responses, we find that the majority of contributors demonstrate a distinction between OSS4SG and OSS. Likewise, contributors described definitions based on what societal issue the project was to mitigate and who the outcomes of the project were going to benefit. In addition, we find that OSS4SG contributors focus less on benefiting themselves by padding their resume with new technology skills and are more interested in leaving their mark on society at statistically significant levels. We also find that OSS4SG contributors evaluate the owners of the project significantly more than OSS contributors. These findings inform implications to help contributors identify high societal impact projects, help project maintainers reduce barriers to entry, and help organizations understand why contributors are drawn to these projects to sustain active participation.

preprint2021arXiv

Physics-informed Tensor-train ConvLSTM for Volumetric Velocity Forecasting of Loop Current

According to the National Academies, a weekly forecast of velocity, vertical structure, and duration of the Loop Current (LC) and its eddies is critical for understanding the oceanography and ecosystem, and for mitigating outcomes of anthropogenic and natural disasters in the Gulf of Mexico (GoM). However, this forecast is a challenging problem since the LC behaviour is dominated by long-range spatial connections across multiple timescales. In this paper, we extend spatiotemporal predictive learning, showing its effectiveness beyond video prediction, to a 4D model, i.e., a novel Physics-informed Tensor-train ConvLSTM (PITT-ConvLSTM) for temporal sequences of 3D geospatial data forecasting. Specifically, we propose 1) a novel 4D higher-order recurrent neural network with empirical orthogonal function analysis to capture the hidden uncorrelated patterns of each hierarchy, 2) a convolutional tensor-train decomposition to capture higher-order space-time correlations, and 3) to incorporate prior physic knowledge that is provided from domain experts by informing the learning in latent space. The advantage of our proposed method is clear: constrained by physical laws, it simultaneously learns good representations for frame dependencies (both short-term and long-term high-level dependency) and inter-hierarchical relations within each time frame. Experiments on geospatial data collected from the GoM demonstrate that PITT-ConvLSTM outperforms the state-of-the-art methods in forecasting the volumetric velocity of the LC and its eddies for a period of over one week.

preprint2021arXiv

Regions of Attraction Estimation using Level SetMethod for Complex Network System

Many complex engineering systems network together functional elements and balance demand loads (e.g.information on data networks, electric power on grids). This allows load spikes to be shifted and avoid a local overload. In mobile wireless networks, base stations(BSs) receive data demand and shift high loads to neighbouring BSs to avoid the outage. The stability of cascade load balancing is important because unstable networks can cause high inefficiency. The research challenge is to prove the stability conditions for any arbitrarily large, complex, and dynamic network topology, and for any balancing dynamic function. Our previous work has proven the conditions for stability for stationary networks near equilibrium for any load balancing dynamic and topology. Most current analyses in dynamic complex networks linearize the system around the fixed equilibrium solutions. This approach is insufficient for dynamic networks with changing equilibrium and estimating the Region of Attraction(ROA) is needed. The novelty of this paper is that we compress this high-dimensional system and use Level Set Methods (LSM) to estimate the ROA. Our results show how we can control the ROA via network topology (local degree control) as a way to configure the mobility of transceivers to ensure the preservation of stable load balancing.

preprint2021arXiv

Soft Sensing Transformer: Hundreds of Sensors are Worth a Single Word

With the rapid development of AI technology in recent years, there have been many studies with deep learning models in soft sensing area. However, the models have become more complex, yet, the data sets remain limited: researchers are fitting million-parameter models with hundreds of data samples, which is insufficient to exercise the effectiveness of their models and thus often fail to perform when implemented in industrial applications. To solve this long-lasting problem, we are providing large scale, high dimensional time series manufacturing sensor data from Seagate Technology to the public. We demonstrate the challenges and effectiveness of modeling industrial big data by a Soft Sensing Transformer model on these data sets. Transformer is used because, it has outperformed state-of-the-art techniques in Natural Language Processing, and since then has also performed well in the direct application to computer vision without introduction of image-specific inductive biases. We observe the similarity of a sentence structure to the sensor readings and process the multi-variable sensor readings in a time series in a similar manner of sentences in natural language. The high-dimensional time-series data is formatted into the same shape of embedded sentences and fed into the transformer model. The results show that transformer model outperforms the benchmark models in soft sensing field based on auto-encoder and long short-term memory (LSTM) models. To the best of our knowledge, we are the first team in academia or industry to benchmark the performance of original transformer model with large-scale numerical soft sensing data.

preprint2020arXiv

A class of multiplicative splitting iterations for solving the continuous Sylvester equation

For solving the continuous Sylvester equation, a class of the multiplicative splitting iteration method is presented. We consider two symmetric positive definite splittings for each coefficient matrix of the continuous Sylvester equations and it can be equivalently written as two multiplicative splitting matrix equations. When both coefficient matrices in the continuous Sylvester equation are (non-symmetric) positive semi-definite, and at least one of them is positive definite; we can choose Hermitian and skew-Hermitian (HS) splittings of matrices $A$ and $B$, in the first equation, and the splitting of the Jacobi iterations for matrices $A$ and $B$, in the second equation in the multiplicative splitting iteration method. Convergence conditions of this method are studied and numerical experiments show the efficiency of this method.

preprint2020arXiv

Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies

Since DARPA Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Almost at the same time, deep learning has made breakthrough by several pioneers, three of them (also called fathers of deep learning), Hinton, Bengio and LeCun, won ACM Turin Award in 2019. This is a survey of autonomous driving technologies with deep learning methods. We investigate the major fields of self-driving systems, such as perception, mapping and localization, prediction, planning and control, simulation, V2X and safety etc. Due to the limited space, we focus the analysis on several key areas, i.e. 2D and 3D object detection in perception, depth estimation from cameras, multiple sensor fusion on the data, feature and task level respectively, behavior modelling and prediction of vehicle driving and pedestrian trajectories.

preprint2020arXiv

Diversifying Anonymized Data with Diversity Constraints

Recently introduced privacy legislation has aimed to restrict and control the amount of personal data published by companies and shared to third parties. Much of this real data is not only sensitive requiring anonymization, but also contains characteristic details from a variety of individuals. This diversity is desirable in many applications ranging from Web search to drug and product development. Unfortunately, data anonymization techniques have largely ignored diversity in its published result. This inadvertently propagates underlying bias in subsequent data analysis. We study the problem of finding a diverse anonymized data instance where diversity is measured via a set of diversity constraints. We formalize diversity constraints and study their foundations such as implication and satisfiability. We show that determining the existence of a diverse, anonymized instance can be done in PTIME, and we present a clustering-based algorithm. We conduct extensive experiments using real and synthetic data showing the effectiveness of our techniques, and improvement over existing baselines. Our work aligns with recent trends towards responsible data science by coupling diversity with privacy-preserving data publishing.

preprint2020arXiv

Equi-invariability, bounded invariance complexity and L-stability for control systems

In the paper we introduce the notions of bounded invariance complexity, bounded invariance complexity in the mean and mean L-stability for control systems. Then we characterize these notions by introducing six types of equi-invariability. As by product, two new dichotomy theorems for control system on control sets are established.

preprint2020arXiv

Evolutionary Architecture Search for Graph Neural Networks

Automated machine learning (AutoML) has seen a resurgence in interest with the boom of deep learning over the past decade. In particular, Neural Architecture Search (NAS) has seen significant attention throughout the AutoML research community, and has pushed forward the state-of-the-art in a number of neural models to address grid-like data such as texts and images. However, very litter work has been done about Graph Neural Networks (GNN) learning on unstructured network data. Given the huge number of choices and combinations of components such as aggregator and activation function, determining the suitable GNN structure for a specific problem normally necessitates tremendous expert knowledge and laborious trails. In addition, the slight variation of hyper parameters such as learning rate and dropout rate could dramatically hurt the learning capacity of GNN. In this paper, we propose a novel AutoML framework through the evolution of individual models in a large GNN architecture space involving both neural structures and learning parameters. Instead of optimizing only the model structures with fixed parameter settings as existing work, an alternating evolution process is performed between GNN structures and learning parameters to dynamically find the best fit of each other. To the best of our knowledge, this is the first work to introduce and evaluate evolutionary architecture search for GNN models. Experiments and validations demonstrate that evolutionary NAS is capable of matching existing state-of-the-art reinforcement learning approaches for both the semi-supervised transductive and inductive node representation learning and classification.

preprint2020arXiv

Fine-grained Analysis on Fast Implementations of Distributed Multi-writer Atomic Registers

Distributed multi-writer atomic registers are at the heart of a large number of distributed algorithms. While enjoying the benefits of atomicity, researchers further explore fast implementations of atomic reigsters which are optimal in terms of data access latency. Though it is proved that multi-writer atomic register implementations are impossible when both read and write are required to be fast, it is still open whether implementations are impossible when only write or read is required to be fast. This work proves the impossibility of fast write implementations based on a series of chain arguments among indistiguishable executions. We also show the necessary and sufficient condition for fast read implementations by extending the results in the single-writer case. This work concludes a series of studies on fast implementations of distributed atomic registers.

preprint2020arXiv

Genetic Improvement @ ICSE 2020

Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceedings) there was a wide ranging discussion at the eighth international Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the 42nd ACM/IEEE International Conference on Software Engineering on Friday 3rd July 2020). Topics included industry take up, human factors, explainabiloity (explainability, justifyability, exploitability) and GI benchmarks. We also contrast various recent online approaches (e.g. SBST 2020) to holding virtual computer science conferences and workshops via the WWW on the Internet without face-2-face interaction. Finally we speculate on how the Coronavirus Covid-19 Pandemic will affect research next year and into the future.

preprint2020arXiv

GPU-Accelerated Compression and Visualization of Large-Scale Vessel Trajectories in Maritime IoT Industries

The automatic identification system (AIS), an automatic vessel-tracking system, has been widely adopted to perform intelligent traffic management and collision avoidance services in maritime Internet of Things (IoT) industries. With the rapid development of maritime transportation, tremendous numbers of AIS-based vessel trajectory data have been collected, which make trajectory data compression imperative and challenging. This paper mainly focuses on the compression and visualization of large-scale vessel trajectories and their Graphics Processing Unit (GPU)-accelerated implementations. The visualization was implemented to investigate the influence of compression on vessel trajectory data quality. In particular, the Douglas-Peucker (DP) and Kernel Density Estimation (KDE) algorithms, respectively utilized for trajectory compression and visualization, were significantly accelerated through the massively parallel computation capabilities of GPU architecture. Comprehensive experiments on trajectory compression and visualization have been conducted on large-scale AIS data of recording ship movements collected from 3 different water areas, i.e., the South Channel of Yangtze River Estuary, the Chengshan Jiao Promontory, and the Zhoushan Islands. Experimental results illustrated that (1) the proposed GPU-based parallel implementation frameworks could significantly reduce the computational time for both trajectory compression and visualization; (2) the influence of compressed vessel trajectories on trajectory visualization could be negligible if the compression threshold was selected suitably; (3) the Gaussian kernel was capable of generating more appropriate KDE-based visualization performance by comparing with other seven kernel functions.

preprint2020arXiv

Preflare very long-periodic pulsations observed in Halpha emission before the onset of a solar flare

Very long-periodic pulsations during preflare phases (preflare-VLPs) have been detected in the full-disk solar soft X-Ray (SXR) flux (see Tan et al. 2016). They may be regarded as precursors to solar flares and may help us better understand the trigger mechanism of solar flares. In this letter, we report a preflare-VLP event before the onset of an M1.1 circular-ribbon flare on 2015 October 16. It was simultaneously observed in Halpha, SXR, and extreme ultraviolet (EUV) wavelengths, which were recorded by the NVST, GOES, EVE, and AIA respectively. The preflare-VLP is identified as the repeat and quasi-periodic pulses in light curves during preflare phase, which might be modulated by LRC-circuit oscillation in the current-carrying plasma loop. The quasi-periodicity can be determined from the Fourier power spectrum with Markov chain Monte Carlo (MCMC)-based Bayesian (e.g., Liang et al. 2020), such as ~9.3 minutes. We present the first report of a preflare-VLP event in the local Halpha line and EUV wavelength, which could be considered a precursor of a solar flare. This finding should therefore prove useful for the prediction of solar flares, especially for powerful flares.

preprint2020arXiv

Privacy-Aware Data Cleaning-as-a-Service (Extended Version)

Data cleaning is a pervasive problem for organizations as they try to reap value from their data. Recent advances in networking and cloud computing technology have fueled a new computing paradigm called Database-as-a-Service, where data management tasks are outsourced to large service providers. In this paper, we consider a Data Cleaning-as-a-Service model that allows a client to interact with a data cleaning provider who hosts curated, and sensitive data. We present PACAS: a Privacy-Aware data Cleaning-As-a-Service model that facilitates interaction between the parties with client query requests for data, and a service provider using a data pricing scheme that computes prices according to data sensitivity. We propose new extensions to the model to define generalized data repairs that obfuscate sensitive data to allow data sharing between the client and service provider. We present a new semantic distance measure to quantify the utility of such repairs, and we re-define the notion of consistency in the presence of generalized values. The PACAS model uses (X,Y,L)-anonymity that extends existing data publishing techniques to consider the semantics in the data while protecting sensitive values. Our evaluation over real data show that PACAS safeguards semantically related sensitive values, and provides lower repair errors compared to existing privacy-aware cleaning techniques.

preprint2016arXiv

Gate-induced insulator to band-like transport transition in organolead halide perovskite

Understanding the intrinsic charge transport in organolead halide perovskites is essential for the development of high-efficiency photovoltaics and other optoelectronic devices. Despite the rapid advancement of the organolead halide perovskite in photovoltaic and optoelectronic applications, the intrinsic charge carrier transport in these materials remains elusive partly due to the difficulty of fabricating electrical devices and obtaining good electrical contact. Here, we report the fabrication of organolead halide perovskite microplates with monolayer graphene as low barrier electrical contact. A systematic charge transport studies reveal an insulator to band-like transport transition. Our studies indicate that the insulator to band-like transport transition depends on the orthorhombic-to-tetragonal phase transition temperature and defect densities of the organolead halide perovskite microplates. Our findings are not only important for the fundamental understanding of charge transport behavior but also offer valuable practical implications for photovoltaics and optoelectronic applications based on the organolead halide perovskite.

preprint2016arXiv

Quasi-periodic pulsations with periods that change depending on whether the pulsations have thermal or nonthermal components

Context. Quasi-periodic pulsations (QPPs) typically display periodic and regular peaks in the light curves during the flare emissions. Sometimes, QPPs show multiple periods at the same wavelength. However, changing periods in various channels are rare. Aims. We report QPPs in a solar flare on 2014 October 27. They showed a period change that depended on whether thermal or nonthermal components were included. The flare was simultaneously observed by many instruments. Methods. Using the fast Fourier transform (FFT), we decomposed the light curves at multiple wavelengths into slowly varying and rapidly varying signals. Then we identified the QPPs as the regular and periodic peaks from the rapidly varying signals. The periods are derived with the wavelet method and confirmed based on the FFT spectra of the rapidly varying signals. Results. We find a period of 50 s from the thermal emissions during the impulsive phase of the flare, that is, in the soft X-ray bands. At the same time, a period of about 100 s is detected from the nonthermal emissions, such as hard X-ray and microwave channels. The period ratio is exactly 2.0, which might be due to the modulations of the magnetic reconnection rate by the fundamental and harmonic modes of magnetohydrodynamic waves. Our results further show that the 100 s period is present over a broad wavelength, such as hard X-rays, extreme-UV/UV, and microwave emissions, indicating the periodic magnetic reconnection in this flare. Conclusions. To our knowledge, this is the first report about period changes from thermal to nonthermal components in a single flare that occur at almost the same time. This new observational finding could be a challenge to the theory of flare QPPs.

preprint2015arXiv

Almost Strong Consistency: "Good Enough" in Distributed Storage Systems

A consistency/latency tradeoff arises as soon as a distributed storage system replicates data. For low latency, modern storage systems often settle for weak consistency conditions, which provide little, or even worse, no guarantee for data consistency. In this paper we propose the notion of almost strong consistency as a better balance option for the consistency/latency tradeoff. It provides both deterministically bounded staleness of data versions for each read and probabilistic quantification on the rate of "reading stale values", while achieving low latency. In the context of distributed storage systems, we investigate almost strong consistency in terms of 2-atomicity. Our 2AM (2-Atomicity Maintenance) algorithm completes both reads and writes in one communication round-trip, and guarantees that each read obtains the value of within the latest 2 versions. To quantify the rate of "reading stale values", we decompose the so-called "old-new inversion" phenomenon into concurrency patterns and read-write patterns, and propose a stochastic queueing model and a "timed balls-into-bins model" to analyze them, respectively. The theoretical analysis not only demonstrates that "old-new inversions" rarely occur as expected, but also reveals that the read-write pattern dominates in guaranteeing such rare data inconsistencies. These are further confirmed by the experimental results, showing that 2-atomicity is "good enough" in distributed storage systems by achieving low latency, bounded staleness, and rare data inconsistencies.

preprint2015arXiv

Electric field induced strong enhancement of electroluminescence in multi-Layer MoS2

The layered transition metal dichalcogenides (TMDs) have attracted considerable interest due to their unique electronic and optical properties. Here we report electric field induced strong electroluminescence in multi-layer MoS2 and WSe2. We show that GaN-Al2O3-MoS2 and GaN-Al2O3-MoS2-Al2O3-graphene vertical heterojunctions can be created with excellent rectification behaviour. Electroluminescence studies demonstrate prominent direct bandgap excitonic emission in multi-layer MoS2 over the entire vertical junction area. Importantly, the electroluminescence efficiency observed in multi-layer MoS2 is comparable to or even higher than that in monolayers, corresponding to a relative electroluminescence enhancement factor of >1000 in multi-layer MoS2 when compared to its photoluminescence. This striking enhancement of electroluminescence can be attributed to the high electric field induced carrier redistribution from low energy points (indirect bandgap) to high energy points (direct bandgap) of k-space, arising from the unique band structure of MoS2 with a much higher density of states at high energy points. The electric field induced electroluminescence is general for other TMDs including WSe2, and can provide a fundamental platform to probe the carrier injection, population and recombination in multi-layer TMDs and open up a new pathway toward TMD based optoelectronic devices.

preprint2015arXiv

High Current Density Vertical Tunneling Transistors from Graphene/Highly-Doped Silicon Heterostructures

Graphene/silicon heterostructures have attracted tremendous interest as a new platform for diverse electronic and photonic devices such as barristors, solar cells, optical modulators, and chemical sensors. The studies to date largely focus on junctions between graphene and lightly-doped silicon, where a Schottky barrier is believed to dominate the carrier transport process. Here we report a systematic investigation of carrier transport across the heterojunctions formed between graphene and highly-doped silicon. By varying the silicon doping level and the measurement temperature, we show that the carrier transport across the graphene/p++-Si heterojunction is dominated by tunneling effect through the native oxide. We further demonstrate that the tunneling current can be effectively modulated by the external gate electrical field, resulting in a vertical tunneling transistor. Benefited from the large density of states of highly doped silicon, our tunneling transistors can deliver a current density over 20 A/cm2, about two orders of magnitude higher than previous graphene/insulator/graphene tunneling transistor at the same on/off ratio.

preprint2015arXiv

Understanding the Timed Distributed Trace of a Partially Synchronous System at Runtime

It has gained broad attention to understand the timed distributed trace of a cyber-physical system at runtime, which is often achieved by verifying properties over the observed trace of system execution. However, this verification is facing severe challenges. First, in realistic settings, the computing entities only have imperfectly synchronized clocks. A proper timing model is essential to the interpretation of the trace of system execution. Second, the specification should be able to express properties with real-time constraints despite the asynchrony, and the semantics should be interpreted over the currently-observed and continuously-growing trace. To address these challenges, we propose PARO - the partially synchronous system observation framework, which i) adopts the partially synchronous model of time, and introduces the lattice and the timed automata theories to model the trace of system execution; ii) adopts a tailored subset of TCTL to specify temporal properties, and defines the 3-valued semantics to interpret the properties over the currently-observed finite trace; iii) constructs the timed automaton corresponding to the trace at runtime, and reduces the satisfaction of the 3-valued semantics over finite traces to that of the classical boolean semantics over infinite traces. PARO is implemented over MIPA - the open-source middleware we developed. Performance measurements show the cost-effectiveness of PARO in different settings of key environmental factors.

preprint2015arXiv

Wafer-scale growth of large arrays of perovskite microplate crystals for functional electronics and optoelectronics

Methylammonium lead iodide perovskite has attracted intensive interest for its diverse optoelectronic applications. However, most studies to date have been limited to bulk thin films that are difficult to implement for integrated device arrays because of their incompatibility with typical lithography processes. We report the first patterned growth of regular arrays of perovskite microplate crystals for functional electronics and optoelectronics. We show that large arrays of lead iodide microplates can be grown from an aqueous solution through a seeded growth process and can be further intercalated with methylammonium iodide to produce perovskite crystals. Structural and optical characterizations demonstrate that the resulting materials display excellent crystalline quality and optical properties. We further show that perovskite crystals can be selectively grown on prepatterned electrode arrays to create independently addressable photodetector arrays and functional field effect transistors. The ability to grow perovskite microplates and to precisely place them at specific locations offers a new material platform for the fundamental investigation of the electronic and optical properties of perovskite materials and opens a pathway for integrated electronic and optoelectronic systems.

preprint2014arXiv

Electroluminescence and photocurrent generation from atomically sharp WSe2/MoS2 heterojunction p-n diodes

The p-n diodes represent the most fundamental device building block for diverse optoelectronic functions, but are difficult to achieve in atomically thin transition metal dichalcogenides (TMDs) due to the inability to selectively dope them into p- or n-type semiconductors. Here we report the first demonstration of an atomically thin and atomically sharp heterojunction p-n diode by vertically stacking p-type monolayer tungsten diselenide (WSe2) and n-type few-layer molybdenum disulfide (MoS2). Electrical measurement demonstrates excellent diode characteristics with well-defined current rectification behaviour and an ideality factor of 1.2. Photocurrent mapping shows fast photoresponse over the entire overlapping region with a highest external quantum efficiency up to 12 %. Electroluminescence studies show prominent band edge excitonic emission and strikingly enhanced hot electron luminescence. A systematic investigation shows distinct layer-number dependent emission characteristics and reveals important insight about the origin of hot-electron luminescence and the nature of electron-orbital interaction in TMDs. We believe that these atomically thin heterojunction p-n diodes represent an interesting system for probing the fundamental electro-optical properties in TMDs, and can open up a new pathway to novel optoelectronic devices such as atomically thin photodetectors, photovoltaics, as well as spin-/valley-polarized light emitting diodes and on-chip lasers.

preprint2014arXiv

Exponential stability of nonhomogeneous matrix-valued Markovian chains

In this paper, we characterize the stability of matrix-valued Markovian chains by periodic data.

preprint2014arXiv

Regional study of the Archean to Proterozoic crust at the Sudbury Neutrino Observatory (SNO+), Ontario: Predicting the geoneutrino flux

The SNO+ detector, a new kiloton scale liquid scintillator detector capable of recording geoneutrino events, will define the strength of the Earth radiogenic heat. A detailed 3-D model of the regional crust, centered at SNO+ and based on compiled geological, geophysical and geochemical information, was used to characterize the physical and chemical attributes of crust and assign uncertainties to its structure. Monte Carlo simulations were used to predict the U and Th abundances and uncertainties in crustal lithologies and to model the regional crustal geoneutrino signal originating from the at SNO+.

preprint2014arXiv

Reply to "Local Filtering Fundamentally Against Wide Spectrum"

After carefully studying the comment by Wang et al. (arXiv:1408.6420), we found it includes several mistakes and unjustified statements and Wang et al. lack very basic knowledge of dislocations. Moreover, there is clear evidence indicating that Wang et al. significantly misrepresented our method and claimed something that they actually did not implement.

preprint2013arXiv

A reference Earth model for the heat producing elements and associated geoneutrino flux

The recent geoneutrino experimental results from KamLAND and Borexino detectors reveal the usefulness of analyzing the Earth geoneutrino flux, as it provides a constraint on the strength of the radiogenic heat power and this, in turn, provides a test of compositional models of the bulk silicate Earth (BSE). This flux is dependent on the amount and distribution of heat producing elements (HPEs: U, Th and K) in the Earth interior. We have developed a geophysically-based, three-dimensional global reference model for the abundances and distributions of HPEs in the BSE. The structure and composition of the outermost portion of the Earth, the crust and underlying lithospheric mantle, is detailed in the reference model, this portion of the Earth has the greatest influence on the geoneutrino fluxes. The reference model combines three existing geophysical models of the global crust and yields an average crustal thickness of 34.4+-4.1 km in the continents and 8.0+-2.7 km in the oceans. In situ seismic velocity provided by CRUST 2.0 allows us to estimate the average composition of the deep continental crust by using new and updated compositional databases for amphibolite and granulite facies rocks in combination with laboratory ultrasonic velocities measurements. An updated xenolithic peridotite database is used to represent the average composition of continental lithospheric mantle. Monte Carlo simulation is used to predict the geoneutrino flux at 16 selected locations and to track the asymmetrical uncertainties of radiogenic heat power due to the log-normal distributions of HPE concentrations in crustal rocks.

preprint2013arXiv

Chaotic Characteristic of Discrete-time Linear Inclusion Dynamical Systems

In this paper, we study the fiber-chaos of switched linear dynamical systems.

preprint2013arXiv

Chaotic Characteristics of Discrete-time Linear Inclusion Dynamical Systems

In this paper, we study the chaotic behavior of a discrete-time linear inclusion.

preprint2013arXiv

Enabling Context-awareness by Predicate Detection in Asynchronous Pervasive Computing Environments

Pervasive applications are involving more and more autonomous computing and communicating devices, augmented with the abilities of sensing and controlling the logical / physical environment. To enable context-awareness for such applications, we are challenged by the intrinsic asynchrony among the context collecting devices. To this end, we introduce the predicate detection theory and propose the Predicate-Detection-based Context-Awareness (PD-CA) framework, in which: a) logical time is used to explicitly cope with the asynchrony; b) specification of predicates enables the applications to express contextual properties of their concerns; c) online and incremental predicate detection algorithms effectively enable context-awareness at runtime. Under the guidance of the PD-CA framework, we present the design and implementation of the MIPA middleware, which shields the applications from the burden of processing the asynchronous contexts. We also demonstrate how PD-CA simplifies the development of context-aware applications. Experimental evaluations show the performance of MIPA in supporting context-aware applications despite of the asynchrony.

preprint2013arXiv

Guided Neuronal Growth on Arrays of Biofunctionalized GaAs/InGaAs Semiconductor Microtubes

We demonstrate embedded growth of cortical mouse neurons in dense arrays of semiconductor microtubes. The microtubes, fabricated from a strained GaAs/InGaAs heterostructure, guide axon growth through them and enable electrical and optical probing of propagating action potentials. The coaxial nature of the microtubes -- similar to myelin -- is expected to enhance the signal transduction along the axon. We present a technique of suppressing arsenic toxicity and prove the success of this technique by overgrowing neuronal mouse cells.

preprint2013arXiv

Pointwise Stabilization of Discrete-time Stationary Matrix-valued Markovian Chains

We study the pointwise stabilizability of a discrete-time, time-homogeneous, and stationary Markovian jump linear system. By using measure theory, ergodic theory and a splitting theorem of state space we show in a relatively simple way that if the system is essentially product-bounded, then it is pointwise convergent if and only if it is pointwise exponentially convergent.

preprint2013arXiv

Verifying PRAM Consistency over Read/Write Traces of Data Replicas

Data replication technologies enable efficient and highly-available data access, thus gaining more and more interests in both the academia and the industry. However, data replication introduces the problem of data consistency. Modern commercial data replication systems often provide weak consistency for high availability under certain failure scenarios. An important weak consistency is Pipelined-RAM (PRAM) consistency. It allows different processes to hold different views of data. To determine whether a data replication system indeed provides PRAM consistency, we study the problem of Verifying PRAM Consistency over read/write traces (or VPC, for short). We first identify four variants of VPC according to a) whether there are Multiple shared variables (or one Single variable), and b) whether write operations can assign Duplicate values (or only Unique values) for each shared variable; the four variants are labeled VPC-SU, VPC-MU, VPC-SD, and VPC-MD. Second, we present a simple VPC-MU algorithm, called RW-CLOSURE. It constructs an operation graph $\mathcal{G}$ by iteratively adding edges according to three rules. Its time complexity is $O(n^5)$, where n is the number of operations in the trace. Third, we present an improved VPC-MU algorithm, called READ-CENTRIC, with time complexity $O(n^4)$. Basically it attempts to construct the operation graph $\mathcal{G}$ in an incremental and efficient way. Its correctness is based on that of RW-CLOSURE. Finally, we prove that VPC-SD (so is VPC-MD) is $\sf{NP}$-complete by reducing the strongly $\sf{NP}$-complete problem 3-PARTITION to it.

preprint2012arXiv

Devaney's chaos revisited

In this note, we give several equivalent definitions of Devaney's chaos

preprint2011arXiv

Design of a Sliding Window over Asynchronous Event Streams

The proliferation of sensing and monitoring applications motivates adoption of the event stream model of computation. Though sliding windows are widely used to facilitate effective event stream processing, it is greatly challenged when the event sources are distributed and asynchronous. To address this challenge, we first show that the snapshots of the asynchronous event streams within the sliding window form a convex distributive lattice (denoted by Lat-Win). Then we propose an algorithm to maintain Lat-Win at runtime. The Lat-Win maintenance algorithm is implemented and evaluated on the open-source context-aware middleware we developed. The evaluation results first show the necessity of adopting sliding windows over asynchronous event streams. Then they show the performance of detecting specified predicates within Lat-Win, even when faced with dynamic changes in the computing environment.

preprint2011arXiv

Extremal ergodic measures and the finiteness property of matrix semigroups

Let $\bS=\{S_1,...,S_K\}$ be a finite set of complex $d\times d$ matrices and $\varSigma_{K}^+$ the compact space of all one-sided infinite sequences $i_{\bcdot}\colon\mathbb{N}\rightarrow\{1,...,K\}$. An ergodic probability $μ_*$ of the Markov shift $θ\colon\varSigma_{K}^+\rightarrow\varSigma_{K}^+;\ i_{\bcdot}\mapsto i_{\bcdot+1}$, is called "extremal" for $\bS$, if $ρ(\bS)=\lim_{n\to\infty}\sqrt[n]{\norm{S_{i_1}...S_{i_n}}}$ holds for $μ_*$-a.e. $i_{\bcdot}\in\varSigma_{K}^+$, where $ρ(\bS)$ denotes the generalized/joint spectral radius of $\bS$. Using extremal norm and Kingman subadditive ergodic theorem, it is shown that $\bS$ has the spectral finiteness property (i.e. $ρ(\bS)=\sqrt[n]{ρ(S_{i_1}...S_{i_n})}$ for some finite-length word $(i_1,...,i_n)$) if and only if for some extremal measure $μ_*$ of $\bS$, it has at least one periodic density point $i_{\bcdot}\in\varSigma_{K}^+$.

preprint2011arXiv

Stability Criteria via Common Non-strict Lyapunov Matrix for Discrete-time Linear Switched Systems

In this paper, we consider the stability of discrete-time linear switched systems with a common non-strict Lyapunov matrix.

preprint2011arXiv

Stability of time-varying nonlinear switching systems under perturbations

Using a Liao-type exponent, we study the stability of a time-varying nonlinear switching system.

Yu Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

57 published item(s)

SAIL: Structure-Aware Interpretable Learning for Anatomy-Aligned Post-hoc Explanations in OCT

Applications of Large Scale Foundation Models for Autonomous Driving

A Deep-Learning Usability Expansion Model of Ocean Observations

A Multi-Characteristic Learning Method with Micro-Doppler Signatures for Pedestrian Identification

BSODA: A Bipartite Scalable Framework for Online Disease Diagnosis

Compositional Model Checking of Consensus Protocols Specified in TLA+ via Interaction-Preserving Abstraction

DeepGate: Learning Neural Representations of Logic Gates

DeepTPI: Test Point Insertion with Deep Reinforcement Learning

Detection Interval for Diffusion Molecular Communication: How Long is Enough?

Invariance entropy for uncertain control systems

LHNN: Lattice Hypergraph Neural Network for VLSI Congestion Prediction

Low Complexity First: Duration-Centric ISI Mitigation in Molecular Communication via Diffusion

MET: Model Checking-Driven Explorative Testing of CRDT Designs and Implementations

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

Provable Generalization of Overparameterized Meta-learning Trained with SGD

Radiologist-level Performance by Using Deep Learning for Segmentation of Breast Cancers on MRI Scans

Remove-Win: a Design Framework for Conflict-free Replicated Data Types

Verifying Transactional Consistency of MongoDB

ViSearch: Weak Consistency Measurement for Replicated Data Types

IEEE BigData 2021 Cup: Soft Sensing at Scale

Leaving My Fingerprints: Motivations and Challenges of Contributing to OSS for Social Good

Physics-informed Tensor-train ConvLSTM for Volumetric Velocity Forecasting of Loop Current

Regions of Attraction Estimation using Level SetMethod for Complex Network System

Soft Sensing Transformer: Hundreds of Sensors are Worth a Single Word

A class of multiplicative splitting iterations for solving the continuous Sylvester equation

Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies

Diversifying Anonymized Data with Diversity Constraints

Equi-invariability, bounded invariance complexity and L-stability for control systems

Evolutionary Architecture Search for Graph Neural Networks

Fine-grained Analysis on Fast Implementations of Distributed Multi-writer Atomic Registers

Genetic Improvement @ ICSE 2020

GPU-Accelerated Compression and Visualization of Large-Scale Vessel Trajectories in Maritime IoT Industries

Preflare very long-periodic pulsations observed in Halpha emission before the onset of a solar flare

Privacy-Aware Data Cleaning-as-a-Service (Extended Version)

Gate-induced insulator to band-like transport transition in organolead halide perovskite

Quasi-periodic pulsations with periods that change depending on whether the pulsations have thermal or nonthermal components

Almost Strong Consistency: "Good Enough" in Distributed Storage Systems

Electric field induced strong enhancement of electroluminescence in multi-Layer MoS2

High Current Density Vertical Tunneling Transistors from Graphene/Highly-Doped Silicon Heterostructures

Understanding the Timed Distributed Trace of a Partially Synchronous System at Runtime

Wafer-scale growth of large arrays of perovskite microplate crystals for functional electronics and optoelectronics

Electroluminescence and photocurrent generation from atomically sharp WSe2/MoS2 heterojunction p-n diodes

Exponential stability of nonhomogeneous matrix-valued Markovian chains

Regional study of the Archean to Proterozoic crust at the Sudbury Neutrino Observatory (SNO+), Ontario: Predicting the geoneutrino flux

Reply to "Local Filtering Fundamentally Against Wide Spectrum"

A reference Earth model for the heat producing elements and associated geoneutrino flux

Chaotic Characteristic of Discrete-time Linear Inclusion Dynamical Systems

Chaotic Characteristics of Discrete-time Linear Inclusion Dynamical Systems

Enabling Context-awareness by Predicate Detection in Asynchronous Pervasive Computing Environments

Guided Neuronal Growth on Arrays of Biofunctionalized GaAs/InGaAs Semiconductor Microtubes

Pointwise Stabilization of Discrete-time Stationary Matrix-valued Markovian Chains

Verifying PRAM Consistency over Read/Write Traces of Data Replicas

Devaney's chaos revisited

Design of a Sliding Window over Asynchronous Event Streams

Extremal ergodic measures and the finiteness property of matrix semigroups

Stability Criteria via Common Non-strict Lyapunov Matrix for Discrete-time Linear Switched Systems

Stability of time-varying nonlinear switching systems under perturbations