Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
177works
0followers
57topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

177 published item(s)

preprint2026arXiv

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier models without overfitting. We argue that benchmarks must be secure by design. From past incidents of reward hacks, we derive a taxonomy of eight recurring flaw patterns and compile them into the Agent-Eval Checklist for benchmark designers. We condense the insights into BenchJack, an automated red-teaming system that drives coding agents to audit benchmarks and identify possible reward-hacking exploits in a clairvoyant manner. Moreover, we extend BenchJack to an iterative generative-adversarial pipeline that discovers new flaws and patches them iteratively to improve benchmark robustness. We apply BenchJack to 10 popular agent benchmarks spanning software engineering, web navigation, desktop computing, and terminal operations. BenchJack synthesizes reward-hacking exploits that achieve near-perfect scores on most of the benchmarks without solving a single task, surfacing 219 distinct flaws across the eight classes. Moreover, BenchJack's extended pipeline reduces the hackable-task ratio from near 100% to under 10% on four benchmarks without fatal design flaws, fully patching WebArena and OSWorld within three iterations. Our results show that evaluation pipelines have not internalized an adversarial mindset, and that proactive auditing could help close the security gap for the fast-paced benchmarking space.

preprint2026arXiv

MaMi-HOI: Harmonizing Global Kinematics and Local Geometry for Human-Object Interaction Generation

Generating realistic 3D Human-Object Interactions (HOI) is a fundamental task for applications ranging from embodied AI to virtual content creation, which requires harmonizing high-level semantic intent with strict low-level physical constraints. Existing methods excel at semantic alignment, however, they struggle to maintain precise object contact. We reveal a key finding termed \textit{Geometric Forgetting}: as diffusion model depth increases, semantic feature tend to overshadow object geometry feature, causing the model to lose its perception to object geometry. To address this, we propose MaMi-HOI, a hierarchical framework reconciling \textbf{Ma}cro-level kinematic fluidity with \textbf{Mi}cro-level spatial precision. First, to counteract geometric forgetting, we introduce the Geometry-Aware Proximity Adapter (GAPA), which explicitly re-injects dense object details to perform residual snapping corrections for precise contact. Nevertheless, such aggressive local enforcement can disrupt global dynamics, leading to robotic stiffness. In response, we introduce the Kinematic Harmony Adapter (KHA), which proactively aligns whole-body posture with spatial objectives, ensuring the skeleton actively accommodates constraints without compromising naturalness. Extensive experiments validate that MaMi-HOI simultaneously achieves natural motion and precise contact. Crucially, it extends generation capabilities to long-term tasks with complex trajectories, effectively bridging the gap between global navigation and high-fidelity manipulation in 3D scenes. Code is available at https://github.com/DON738110198/MaMi-HOI.git

preprint2026arXiv

Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation

Policy evaluation is a fundamental component of the development and deployment pipeline for robotic policies. In modern manipulation systems, this problem is particularly challenging: rewards are often sparse, task progression of evaluation rollouts are often non-monotonic as the policies exhibit recovery behaviors, and evaluation rollouts are necessarily of finite length. This finite length introduces truncation bias, breaking the infinite-horizon assumptions underlying standard methods relying on Bellman equations/principle of optimality. In this work, we propose a framework for offline policy evaluation from sparse rewards based on a liveness-based Bellman operator. Our formulation interprets policy evaluation as a task-completion problem and yields a conservative fixed-point value function that is robust to finite-horizon truncation. We analyze the theoretical properties of the proposed operator, including contraction guarantees, and show how it encodes task progression while mitigating truncation bias. We evaluate our method on two simulated manipulation tasks using both a Vision-Language-Action model and a diffusion policy, and a cloth folding task using human demonstrations. Empirical results demonstrate that our approach more accurately reflects task progress and substantially reduces truncation bias, outperforming classical baselines such as TD(0) and Monte Carlo policy evaluation.

preprint2026arXiv

Optimal Transport for LLM Reward Modeling from Noisy Preference

Reward models are fundamental to Reinforcement Learning from Human Feedback (RLHF), yet real-world datasets are inevitably corrupted by noisy preference. Conventional training objectives tend to overfit these errors, while existing denoising approaches often rely on homogeneous noise assumptions that fail to capture the complexity of linguistic preferences. To handle these challenges, we propose SelectiveRM, a framework grounded in optimal transport. We first devise a Joint Consistency Discrepancy to align the distribution of model predictions with preference data. Furthermore, to address the limitation of strict mass conservation which compels the model to fit outliers, we incorporate a Mass Relaxation mechanism via partial transport. This enables the autonomous exclusion of samples with noisy preference that contradict semantic consistency. Theoretically, we demonstrate that SelectiveRM optimizes a tighter upper bound on the unobserved clean risk. Extensive experiments validate that our approach significantly outperforms state-of-the-art baselines across diverse benchmarks.

preprint2026arXiv

Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs

Vision-language models (VLMs) have advanced rapidly and are increasingly deployed in real-world applications, especially with the rise of agent-based systems. However, their safety has received relatively limited attention. Even the latest proprietary and open-weight VLMs remain highly vulnerable to adversarial attacks, leaving downstream applications exposed to significant risks. In this work, we propose a novel and lightweight adversarial attack detection framework based on sparse autoencoders (SAEs), termed SAEgis. By inserting an SAE module into a pretrained VLM and training it with standard reconstruction objectives, we find that the learned sparse latent features naturally capture attack-relevant signals. These features enable reliable classification of whether an input image has been adversarially perturbed, even for previously unseen samples. Extensive experiments show that SAEgis achieves strong performance across in-domain, cross-domain, and cross-attack settings, with particularly large improvements in cross-domain generalization compared to existing baselines. In addition, combining signals from multiple layers further improves robustness and stability. To the best of our knowledge, this is the first work to explore SAE as a plug-and-play mechanism for adversarial attack detection in VLMs. Our method requires no additional adversarial training, introduces minimal overhead, and provides a practical approach for improving the safety of real-world VLM systems.

preprint2026arXiv

StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

Existing code reasoning methods primarily supervise final code outputs, ignoring intermediate states, often leading to reward hacking where correct answers are obtained through inconsistent reasoning. We propose StepCodeReasoner, a framework that introduces explicit intermediate execution-state supervision. By automatically inserting structured print-based execution-trace anchors into code, the model is trained to predict runtime states at each step, transforming code reasoning into a verifiable, stepwise execution modeling problem. Building on this execution-aware method, we introduce Bi-Level GRPO, a reinforcement learning algorithm for structured credit assignment at two levels: inter-trajectory, comparing alternative execution paths, and intra-trajectory, rewarding intermediate accuracy based on its impact on downstream correctness. Extensive experiments demonstrate that StepCodeReasoner achieves SOTA performance in code reasoning. In particular, our 7B model achieves 91.1\% on CRUXEval and 86.5\% on LiveCodeBench, outperforming the CodeReasoner-7B baseline (86.0\% and 77.7\%) and GPT-4o (85.6\% and 75.1\%). Furthermore, on the execution-trace benchmark REval, our model scores 82.9\%, outperforming baseline CodeReasoner-7B (72.3\%), its 14B counterpart (81.1\%), and GPT-4o (77.3\%). Additionally, our approach also improves code generation performance, demonstrating that explicit execution modeling enhances both code reasoning and code generation.

preprint2026arXiv

TFZ-Tree: An Ultra-Lightweight Waveform Classification Framework for Resource-Constrained Devices

Under the trend of multi-waveform coexistence in 6G IoT, intelligent receivers must first identify physical-layer waveform types before performing correct demodulation and resource scheduling. However, existing signal identification research largely focuses on symbol-level modulation classification. Research directly targeting physical-layer waveform types (e.g., OFDM, OTFS, LoRa) is not only extremely scarce but also heavily reliant on deep neural networks and complex time-frequency transforms, making deployment on resource-constrained terminals difficult. Symbol modulation classification methods themselves cannot circumvent the prerequisite of ``waveform identification first.'' To address this dual gap, we propose an ultra-lightweight waveform classification framework based on time-frequency multidimensional features with a cooperative Z-test tree (ZTree). The framework employs low-complexity time-domain feature extraction, and the classification backend adopts a ZTree optimized by Z-statistical testing, which uses hypothesis testing confidence to automatically control decision tree splitting and size, ensuring efficient execution on resource-limited processors. Tested on ten 6G candidate waveforms including OFDM, OTFS, DSSS, LoRa, and NB-IoT, the method achieves 99.5\% average accuracy under AWGN and 87.4\% under TDL-C multipath channels, with main confusion between OTFS and LoRa. Implemented in C on an x86 platform, single inference latency is under 4~ms. To the best of our knowledge, this is the first work achieving real-time recognition of ten IoT waveform types. Future work will target deployment acceleration on embedded MCUs. Code and dataset are open-sourced at: https://github.com/Einstein-sworder/IoT-wave.

preprint2026arXiv

Zero-Shot Interpretable Image Steganalysis for Invertible Image Hiding

Image steganalysis, which aims at detecting secret information concealed within images, has become a critical countermeasure for assessing the security of steganography methods, especially the emerging invertible image hiding approaches. However, prior studies merely classify input images into two categories (i.e., stego or cover) and typically conduct steganalysis under the constraint that training and testing data must follow similar distribution, thereby hindering their application in real-world scenarios. To overcome these shortcomings, we propose a novel interpretable image steganalysis framework tailored for invertible image hiding schemes under a challenging zero-shot setting. Specifically, we integrate image hiding, revealing, and steganalysis into a unified framework, endowing the steganalysis component with the capability to recover the secret information embedded in stego images. Additionally, we elaborate a simple yet effective residual augmentation strategy for generating stego images to further enhance the generalizability of the steganalyzer in cross-dataset and cross-architecture scenarios. Extensive experiments on benchmark datasets demonstrate that our proposed approach significantly outperforms the existing steganalysis techniques for invertible image hiding schemes.

preprint2025arXiv

Learning Coupled System Dynamics under Incomplete Physical Constraints and Missing Data

Advances in data acquisition and computational methods have accelerated the use of differential equation based modelling for complex systems. Such systems are often described by coupled (or more) variables, yet governing equation is typically available for one variable, while the remaining variable can be accessed only through data. This mismatch between known physics and observed data poses a fundamental challenge for existing physics-informed machine learning approaches, which generally assume either complete knowledge of the governing equations or full data availability across all variables. In this paper, we introduce MUSIC (Multitask Learning Under Sparse and Incomplete Constraints), a sparsity induced multitask neural network framework that integrates partial physical constraints with data-driven learning to recover full-dimensional solutions of coupled systems when physics-constrained and data-informed variables are mutually exclusive. MUSIC employs mesh-free (random) sampling of training data and sparsity regularization, yielding highly compressed models with improved training and evaluation efficiency. We demonstrate that MUSIC accurately learns solutions (shock wave solutions, discontinuous solutions, pattern formation solutions) to complex coupled systems under data-scarce and noisy conditions, consistently outperforming non-sparse formulations. These results highlight MUSIC as a flexible and effective approach for modeling partially observed systems with incomplete physical knowledge.

preprint2024arXiv

Pre-trained Recommender Systems: A Causal Debiasing Perspective

Recent studies on pre-trained vision/language models have demonstrated the practical benefit of a new, promising solution-building paradigm in AI where models can be pre-trained on broad data describing a generic task space and then adapted successfully to solve a wide range of downstream tasks, even when training data is severely limited (e.g., in zero- or few-shot learning scenarios). Inspired by such progress, we investigate in this paper the possibilities and challenges of adapting such a paradigm to the context of recommender systems, which is less investigated from the perspective of pre-trained model. In particular, we propose to develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains, which can then be fast adapted to improve few-shot learning performance in unseen new domains (with limited data). However, unlike vision/language data which share strong conformity in the semantic space, universal patterns underlying recommendation data collected across different domains (e.g., different countries or different E-commerce platforms) are often occluded by both in-domain and cross-domain biases implicitly imposed by the cultural differences in their user and item bases, as well as their uses of different e-commerce platforms. As shown in our experiments, such heterogeneous biases in the data tend to hinder the effectiveness of the pre-trained model. To address this challenge, we further introduce and formalize a causal debiasing perspective, which is substantiated via a hierarchical Bayesian deep learning model, named PreRec. Our empirical studies on real-world data show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings under both cross-market and cross-platform scenarios.

preprint2023arXiv

Fair Recommendation by Geometric Interpretation and Analysis of Matrix Factorization

Matrix factorization-based recommender system is in effect an angle preserving dimensionality reduction technique. Since the frequency of items follows power-law distribution, most vectors in the original dimension of user feature vectors and item feature vectors lie on the same hyperplane. However, it is very difficult to reconstruct the embeddings in the original dimension analytically, so we reformulate the original angle preserving dimensionality reduction problem into a distance preserving dimensionality reduction problem. We show that the geometric shape of input data of recommender system in its original higher dimension are distributed on co-centric circles with interesting properties, and design a paraboloid-based matrix factorization named ParaMat to solve the recommendation problem. In the experiment section, we compare our algorithm with 8 other algorithms and prove our new method is the most fair algorithm compared with modern day recommender systems such as ZeroMat and DotMat Hybrid.

preprint2023arXiv

LARP: Language-Agent Role Play for Open-World Games

Language agents have shown impressive problem-solving skills within defined settings and brief timelines. Yet, with the ever-evolving complexities of open-world simulations, there's a pressing need for agents that can flexibly adapt to complex environments and consistently maintain a long-term memory to ensure coherent actions. To bridge the gap between language agents and open-world games, we introduce Language Agent for Role-Playing (LARP), which includes a cognitive architecture that encompasses memory processing and a decision-making assistant, an environment interaction module with a feedback-driven learnable action space, and a postprocessing method that promotes the alignment of various personalities. The LARP framework refines interactions between users and agents, predefined with unique backgrounds and personalities, ultimately enhancing the gaming experience in open-world contexts. Furthermore, it highlights the diverse uses of language models in a range of areas such as entertainment, education, and various simulation scenarios. The project page is released at https://miao-ai-lab.github.io/LARP/.

preprint2023arXiv

Multi-stage feature decorrelation constraints for improving CNN classification performance

For the convolutional neural network (CNN) used for pattern classification, the training loss function is usually applied to the final output of the network, except for some regularization constraints on the network parameters. However, with the increasing of the number of network layers, the influence of the loss function on the network front layers gradually decreases, and the network parameters tend to fall into local optimization. At the same time, it is found that the trained network has significant information redundancy at all stages of features, which reduces the effectiveness of feature mapping at all stages and is not conducive to the change of the subsequent parameters of the network in the direction of optimality. Therefore, it is possible to obtain a more optimized solution of the network and further improve the classification accuracy of the network by designing a loss function for restraining the front stage features and eliminating the information redundancy of the front stage features .For CNN, this article proposes a multi-stage feature decorrelation loss (MFD Loss), which refines effective features and eliminates information redundancy by constraining the correlation of features at all stages. Considering that there are many layers in CNN, through experimental comparison and analysis, MFD Loss acts on multiple front layers of CNN, constrains the output features of each layer and each channel, and performs supervision training jointly with classification loss function during network training. Compared with the single Softmax Loss supervised learning, the experiments on several commonly used datasets on several typical CNNs prove that the classification performance of Softmax Loss+MFD Loss is significantly better. Meanwhile, the comparison experiments before and after the combination of MFD Loss and some other typical loss functions verify its good universality.

preprint2023arXiv

Real-Time High-Resolution Pedestrian Detection in Crowded Scenes via Parallel Edge Offloading

To identify dense and small-size pedestrians in surveillance systems, high-resolution cameras are widely deployed, where high-resolution images are captured and delivered to off-the-shelf pedestrian detection models. However, given the highly computation-intensive workload brought by the high resolution, the resource-constrained cameras fail to afford accurate inference in real time. To address that, we propose Hode, an offloaded video analytic framework that utilizes multiple edge nodes in proximity to expedite pedestrian detection with high-resolution inputs. Specifically, Hode can intelligently split high-resolution images into respective regions and then offload them to distributed edge nodes to perform pedestrian detection in parallel. A spatio-temporal flow filtering method is designed to enable context-aware region partitioning, as well as a DRL-based scheduling algorithm to allow accuracy-aware load balance among heterogeneous edge nodes. Extensive evaluation results using realistic prototypes show that Hode can achieve up to 2.01% speedup with very mild accuracy loss.

preprint2023arXiv

Switching between Numerical Black-box Optimization Algorithms with Warm-starting Policies

When solving optimization problems with black-box approaches, the algorithms gather valuable information about the problem instance during the optimization process. This information is used to adjust the distributions from which new solution candidates are sampled. In fact, a key objective in evolutionary computation is to identify the most effective ways to collect and exploit instance knowledge. However, while considerable work is devoted to adjusting hyper-parameters of black-box optimization algorithms on the fly or exchanging some of its modular components, we barely know how to effectively switch between different black-box optimization algorithms. In this work, we build on the recent study of Vermetten et al. [GECCO 2020], who presented a data-driven approach to investigate promising switches between pairs of algorithms for numerical black-box optimization. We replicate their approach with a portfolio of five algorithms and investigate whether the predicted performance gains are realized when executing the most promising switches. Our results suggest that with a single switch between two algorithms, we outperform the best static choice among the five algorithms on 48 out of the 120 considered problem instances, the 24 BBOB functions in five different dimensions. We also show that for switching between BFGS and CMA-ES, a proper warm-starting of the parameters is crucial to realize high-performance gains. Lastly, with a sensitivity analysis, we find the actual performance gain per run is largely affected by the switching point, and in some cases, the switching point yielding the best actual performance differs from the one computed from the theoretical gain.

preprint2023arXiv

The Hypervolume Indicator Hessian Matrix: Analytical Expression, Computational Time Complexity, and Sparsity

The problem of approximating the Pareto front of a multiobjective optimization problem can be reformulated as the problem of finding a set that maximizes the hypervolume indicator. This paper establishes the analytical expression of the Hessian matrix of the mapping from a (fixed size) collection of $n$ points in the $d$-dimensional decision space (or $m$ dimensional objective space) to the scalar hypervolume indicator value. To define the Hessian matrix, the input set is vectorized, and the matrix is derived by analytical differentiation of the mapping from a vectorized set to the hypervolume indicator. The Hessian matrix plays a crucial role in second-order methods, such as the Newton-Raphson optimization method, and it can be used for the verification of local optimal sets. So far, the full analytical expression was only established and analyzed for the relatively simple bi-objective case. This paper will derive the full expression for arbitrary dimensions ($m\geq2$ objective functions). For the practically important three-dimensional case, we also provide an asymptotically efficient algorithm with time complexity in $O(n\log n)$ for the exact computation of the Hessian Matrix' non-zero entries. We establish a sharp bound of $12m-6$ for the number of non-zero entries. Also, for the general $m$-dimensional case, a compact recursive analytical expression is established, and its algorithmic implementation is discussed. Also, for the general case, some sparsity results can be established; these results are implied by the recursive expression. To validate and illustrate the analytically derived algorithms and results, we provide a few numerical examples using Python and Mathematica implementations. Open-source implementations of the algorithms and testing data are made available as a supplement to this paper.

preprint2022arXiv

100 GHz Micrometer compact broadband Monolithic ITO Mach Zehnder Interferometer Modulator enabling 3500 times higher Packing Density

Electro-optic modulators provide a key function in optical transceivers and increasingly in photonic programmable Application Specific Integrated Circuits (ASICs) for machine learning and signal processing. However, both foundry ready silicon based modulators and conventional material based devices utilizing Lithium niobate fall short in simultaneously providing high chip packaging density and fast speed. Current driven ITO based modulators have the potential to achieve both enabled by efficient light matter interactions. Here, we introduce micrometer compact Mach Zehnder Interferometer (MZI) based modulators capable of exceeding 100 GHz switching rates. Integrating ITO thin films atop a photonic waveguide, spectrally broadband, and compact MZI phase shifter. Remarkably, this allows integrating more than 3500 of these modulators within the same chip area as only one single silicon MZI modulator. The modulator design introduced here features a holistic photonic, electronic, and RF-based optimization and includes an asymmetric MZI tuning step to optimize the Extinction Ratio (ER) to Insertion Loss (IL) and dielectric thickness sweep to balance the tradeoffs between ER and speed. Driven by CMOS compatible bias voltage levels, this device is the first to address next generation modulator demands for processors of the machine intelligence revolution, in addition to the edge and cloud computing demands as well as optical transceivers alike.

preprint2022arXiv

3D inhomogeneous self-accelerating beams

We propose and generate a new class of structured light fulfilling quantum-like coherent states based on a set of circular Airy vortex modes. Such coherent-state wave packets possess strong focus with both radial and angular self-accelerations, which exploit more general 3D inhomogeneous velocity control with global spatial symmetry of multilayer rotation akin to galactic kinematics, as termed galaxy waves. Galaxy waves are endowed with new degrees of freedom to control strong focusing and acceleration of 3D structured light, promising numerous applications in optical trapping, manufacturing, and nonlinear optics.

preprint2022arXiv

3D Path Planning and Obstacle Avoidance Algorithms for Obstacle-Overcoming Robots

This article introduces a multimodal motion planning (MMP) algorithm that combines three-dimensional (3-D) path planning and a DWA obstacle avoidance algorithm. The algorithms aim to plan the path and motion of obstacle-overcoming robots in complex unstructured scenes. A novel A-star algorithm is proposed to combine the characteristics of unstructured scenes and a strategy to switch it into a greedy best-first strategy algorithm. Meanwhile, the algorithm of path planning is integrated with the DWA algorithm so that the robot can perform local dynamic obstacle avoidance during the movement along the global planned path. Furthermore, when the proposed global path planning algorithm combines with the local obstacle avoidance algorithm, the robot can correct the path after obstacle avoidance and obstacle overcoming. The simulation experiments in a factory with several complex environments verified the feasibility and robustness of the algorithms. The algorithms can quickly generate a reasonable 3-D path for obstacle-overcoming robots and perform reliable local obstacle avoidance under the premise of considering the characteristics of the scene and motion obstacles.

preprint2022arXiv

A Fisher-KPP model with a nonlocal weighted free boundary: analysis of how habitat boundaries expand, balance or shrink

In this paper, we propose a novel free boundary problem to model the movement of single species with a range boundary. The spatial movement and birth/death processes of the species found within the range boundary are assumed to be governed by the classic Fisher-KPP reaction-diffusion equation, while the movement of a free boundary describing the range limit is assumed to be influenced by the weighted total population inside the range boundary and is described by an integro-differential equation. Our free boundary equation is a generalization of the classical Stefan problem that allows for nonlocal influences on the boundary movement so that range expansion and shrinkage are both possible. In this paper we prove that the new model is well posed and possesses steady state. We show that the spreading speed of the range boundary is smaller than that for the equivalent problem with a Stefan condition. This implies that the nonlocal effect of the weighted total population on the boundary movement slows down the spreading speed of the population. While the classical Stefan condition categorizes asymptotic behavior via a spreading-vanishing dichotomy, the new model extends this dichotomy to a spreading-balancing-vanishing trichotomy. We specifically analyze how habitat boundaries expand, balance or shrink. When the model is extended to have two free boundaries, we observe the steady state scenario, asymmetric shifts, or even boundaries moving synchronously in the same direction. These are newly discovered phenomena in the free boundary problems for animal movement.

preprint2022arXiv

A hypothesis-free bridging of disease dynamics and non-pharmaceutical policies

Accurate prediction of the number of daily or weekly confirmed cases of COVID-19 is critical to the control of the pandemic. Existing mechanistic models nicely capture the disease dynamics. However, to forecast the future, they require the transmission rate to be known, limiting their prediction power. Typically, a hypothesis is made on the form of the transmission rate with respect to time. Yet the real form is too complex to be mechanistically modeled due to the unknown dynamics of many influential factors. We tackle this problem by using a hypothesis-free machine-learning algorithm to estimate the transmission rate from data on non-pharmaceutical policies, and in turn forecast the confirmed cases using a mechanistic disease model. More specifically, we build a hybrid model consisting of a mechanistic ordinary differential equation (ODE) model and a generalized boosting model (GBM). To calibrate the parameters, we develop an "inverse method" that obtains the transmission rate inversely in time from the other variables in the ODE model and then feed it into the GBM to connect with the policy data. The resulting model forecasted the number of daily confirmed cases up to 35 days in the future in the United States with an averaged mean absolute percentage error of 27%. Being partly data-driven, the method is more accurate than typical mechanistic models and meanwhile more intuitive, and possibly reliable, than purely data-based machine learning models. Moreover, it can identify the most informative predictive variables, which can be helpful in designing improved forecasters as well as informing policymakers.

preprint2022arXiv

A physical perspective to understand myelin. I. Peters quadrant mystery

In the development of oligodendrocytes in the central nervous systems, the inner and outer tongue of the myelin sheath tend to be located within the same quadrant, which was named as Peters quadrant mystery. In this study, we conduct in silico investigations to explore the possible mechanisms underlying the Peters quadrant mystery. A biophysically detailed model of oligodendrocytes was used to simulate the effect of the actional potential-induced electric field across the myelin sheath. Our simulation suggests that the paranodal channel connecting the inner and outer tongue forms a low impedance route, inducing two high-current zones at the area around the inner and outer tongue. When the inner tongue and outer tongue are located within the same quadrant, the interaction of these two high-current-zones will induce a maximum amplitude and a polarity reverse of the voltage upon the inner tongue, resulting in the same quadrant phenomenon. This model indicates that the growth of myelin follows a simple principle: an external negative or positive E-field can promote or inhibit the growth of the inner tongue, respectively.

preprint2022arXiv

A physical perspective to understand myelin. II. The physical origin of myelin development

The physical principle of myelin development is obtained from our previous study by explaining Peter's quadrant mystery: an external applied negative and positive E-field can promote and inhibit the growth of the inner tongue of the myelin sheath, respectively. In this study, this principle is considered as a fundamental hypothesis, named Hypothesis-E, to explain more phenomena about myelin development systematically. Specifically, the g-ratio and the fate of the Schwann cell's differentiation are explained in terms of E-field. Moreover, an experiment is proposed to validate this theory.

preprint2022arXiv

A Screening Strategy for Structured Optimization Involving Nonconvex $\ell_{q,p}$ Regularization

In this paper, we develop a simple yet effective screening rule strategy to improve the computational efficiency in solving structured optimization involving nonconvex $\ell_{q,p}$ regularization. Based on an iteratively reweighted $\ell_1$ (IRL1) framework, the proposed screening rule works like a preprocessing module that potentially removes the inactive groups before starting the subproblem solver, thereby reducing the computational time in total. This is mainly achieved by heuristically exploiting the dual subproblem information during each iteration.Moreover, we prove that our screening rule can remove all inactive variables in a finite number of iterations of the IRL1 method. Numerical experiments illustrate the efficiency of our screening rule strategy compared with several state-of-the-art algorithms.

preprint2022arXiv

ABG: A Multi-Party Mixed Protocol Framework for Privacy-Preserving Cooperative Learning

Cooperative learning, that enables two or more data owners to jointly train a model, has been widely adopted to solve the problem of insufficient training data in machine learning. Nowadays, there is an urgent need for institutions and organizations to train a model cooperatively while keeping each other's data privately. To address the issue of privacy-preserving in collaborative learning, secure outsourced computation and federated learning are two typical methods. Nevertheless, there are many drawbacks for these two methods when they are leveraged in cooperative learning. For secure outsourced computation, semi-honest servers need to be introduced. Once the outsourced servers collude or perform other active attacks, the privacy of data will be disclosed. For federated learning, it is difficult to apply to the scenarios where vertically partitioned data are distributed over multiple parties. In this work, we propose a multi-party mixed protocol framework, ABG$^n$, which effectively implements arbitrary conversion between Arithmetic sharing (A), Boolean sharing (B) and Garbled-Circuits sharing (G) for $n$-party scenarios. Based on ABG$^n$, we design a privacy-preserving multi-party cooperative learning system, which allows different data owners to cooperate in machine learning in terms of data security and privacy-preserving. Additionally, we design specific privacy-preserving computation protocols for some typical machine learning methods such as logistic regression and neural networks. Compared with previous work, the proposed method has a wider scope of application and does not need to rely on additional servers. Finally, we evaluate the performance of ABG$^n$ on the local setting and on the public cloud setting. The experiments indicate that ABG$^n$ has excellent performance, especially in the network environment with low latency.

preprint2022arXiv

Accelerating Serverless Computing by Harvesting Idle Resources

Serverless computing automates fine-grained resource scaling and simplifies the development and deployment of online services with stateless functions. However, it is still non-trivial for users to allocate appropriate resources due to various function types, dependencies, and input sizes. Misconfiguration of resource allocations leaves functions either under-provisioned or over-provisioned and leads to continuous low resource utilization. This paper presents Freyr, a new resource manager (RM) for serverless platforms that maximizes resource efficiency by dynamically harvesting idle resources from over-provisioned functions to under-provisioned functions. Freyr monitors each function's resource utilization in real-time, detects over-provisioning and under-provisioning, and learns to harvest idle resources safely and accelerates functions efficiently by applying deep reinforcement learning algorithms along with a safeguard mechanism. We have implemented and deployed a Freyr prototype in a 13-node Apache OpenWhisk cluster. Experimental results show that 38.8% of function invocations have idle resources harvested by Freyr, and 39.2% of invocations are accelerated by the harvested resources. Freyr reduces the 99th-percentile function response latency by 32.1% compared to the baseline RMs.

preprint2022arXiv

Adaptive 3D descattering with a dynamic synthesis network

Deep learning has been broadly applied to imaging in scattering applications. A common framework is to train a descattering network for image recovery by removing scattering artifacts. To achieve the best results on a broad spectrum of scattering conditions, individual "expert" networks need to be trained for each condition. However, the expert's performance sharply degrades when the testing condition differs from the training. An alternative brute-force approach is to train a "generalist" network using data from diverse scattering conditions. It generally requires a larger network to encapsulate the diversity in the data and a sufficiently large training set to avoid overfitting. Here, we propose an adaptive learning framework, termed dynamic synthesis network (DSN), which dynamically adjusts the model weights and adapts to different scattering conditions. The adaptability is achieved by a novel "mixture of experts" architecture that enables dynamically synthesizing a network by blending multiple experts using a gating network. We demonstrate the DSN in holographic 3D particle imaging for a variety of scattering conditions. We show in simulation that our DSN provides generalization across a continuum of scattering conditions. In addition, we show that by training the DSN entirely on simulated data, the network can generalize to experiments and achieve robust 3D descattering. We expect the same concept can find many other applications, such as denoising and imaging in scattering media. Broadly, our dynamic synthesis framework opens up a new paradigm for designing highly adaptive deep learning and computational imaging techniques.

preprint2022arXiv

Adversarial samples for deep monocular 6D object pose estimation

Estimating 6D object pose from an RGB image is important for many real-world applications such as autonomous driving and robotic grasping. Recent deep learning models have achieved significant progress on this task but their robustness received little research attention. In this work, for the first time, we study adversarial samples that can fool deep learning models with imperceptible perturbations to input image. In particular, we propose a Unified 6D pose estimation Attack, namely U6DA, which can successfully attack several state-of-the-art (SOTA) deep learning models for 6D pose estimation. The key idea of our U6DA is to fool the models to predict wrong results for object instance localization and shape that are essential for correct 6D pose estimation. Specifically, we explore a transfer-based black-box attack to 6D pose estimation. We design the U6DA loss to guide the generation of adversarial examples, the loss aims to shift the segmentation attention map away from its original position. We show that the generated adversarial samples are not only effective for direct 6D pose estimation models, but also are able to attack two-stage models regardless of their robust RANSAC modules. Extensive experiments were conducted to demonstrate the effectiveness, transferability, and anti-defense capability of our U6DA on large-scale public benchmarks. We also introduce a new U6DA-Linemod dataset for robustness study of the 6D pose estimation task. Our codes and dataset will be available at \url{https://github.com/cuge1995/U6DA}.

preprint2022arXiv

Analyzing the Impact of Undersampling on the Benchmarking and Configuration of Evolutionary Algorithms

The stochastic nature of iterative optimization heuristics leads to inherently noisy performance measurements. Since these measurements are often gathered once and then used repeatedly, the number of collected samples will have a significant impact on the reliability of algorithm comparisons. We show that care should be taken when making decisions based on limited data. Particularly, we show that the number of runs used in many benchmarking studies, e.g., the default value of 15 suggested by the COCO environment, can be insufficient to reliably rank algorithms on well-known numerical optimization benchmarks. Additionally, methods for automated algorithm configuration are sensitive to insufficient sample sizes. This may result in the configurator choosing a `lucky' but poor-performing configuration despite exploring better ones. We show that relying on mean performance values, as many configurators do, can require a large number of runs to provide accurate comparisons between the considered configurations. Common statistical tests can greatly improve the situation in most cases but not always. We show examples of performance losses of more than 20%, even when using statistical races to dynamically adjust the number of runs, as done by irace. Our results underline the importance of appropriately considering the statistical distribution of performance values.

preprint2022arXiv

Application of Color Block Code in Image Scaling

Aiming at the high cost of embedding annotation watermark in a narrow small area and the information distortion caused by the change of annotation watermark image resolution, this paper proposes a color block code technology, which uses location information and color code to form recognizable graphics, which can not only simplify the annotation graphics, but also ensure the recognition efficiency. First, the constituent elements of color block code are designed, and then the coding and decoding method of color block code is proposed. Experiments show that color block code has high anti-scaling and anti-interference, and can be widely used in the labeling of small object surface and low resolution image.

preprint2022arXiv

Automated Configuration of Genetic Algorithms by Tuning for Anytime Performance

Finding the best configuration of algorithms' hyperparameters for a given optimization problem is an important task in evolutionary computation. We compare in this work the results of four different hyperparameter tuning approaches for a family of genetic algorithms on 25 diverse pseudo-Boolean optimization problems. More precisely, we compare previously obtained results from a grid search with those obtained from three automated configuration techniques: iterated racing, mixed-integer parallel efficient global optimization, and mixed-integer evolutionary strategies. Using two different cost metrics, expected running time and the area under the empirical cumulative distribution function curve, we find that in several cases the best configurations with respect to expected running time are obtained when using the area under the empirical cumulative distribution function curve as the cost metric during the configuration process. Our results suggest that even when interested in expected running time performance, it might be preferable to use anytime performance measures for the configuration task. We also observe that tuning for expected running time is much more sensitive with respect to the budget that is allocated to the target algorithms.

preprint2022arXiv

Beyond 3DMM: Learning to Capture High-fidelity 3D Face Shape

3D Morphable Model (3DMM) fitting has widely benefited face analysis due to its strong 3D priori. However, previous reconstructed 3D faces suffer from degraded visual verisimilitude due to the loss of fine-grained geometry, which is attributed to insufficient ground-truth 3D shapes, unreliable training strategies and limited representation power of 3DMM. To alleviate this issue, this paper proposes a complete solution to capture the personalized shape so that the reconstructed shape looks identical to the corresponding person. Specifically, given a 2D image as the input, we virtually render the image in several calibrated views to normalize pose variations while preserving the original image geometry. A many-to-one hourglass network serves as the encode-decoder to fuse multiview features and generate vertex displacements as the fine-grained geometry. Besides, the neural network is trained by directly optimizing the visual effect, where two 3D shapes are compared by measuring the similarity between the multiview images rendered from the shapes. Finally, we propose to generate the ground-truth 3D shapes by registering RGB-D images followed by pose and shape augmentation, providing sufficient data for network training. Experiments on several challenging protocols demonstrate the superior reconstruction accuracy of our proposal on the face shape.

preprint2022arXiv

Beyond Adult and COMPAS: Fairness in Multi-Class Prediction

We consider the problem of producing fair probabilistic classifiers for multi-class classification tasks. We formulate this problem in terms of "projecting" a pre-trained (and potentially unfair) classifier onto the set of models that satisfy target group-fairness requirements. The new, projected model is given by post-processing the outputs of the pre-trained classifier by a multiplicative factor. We provide a parallelizable iterative algorithm for computing the projected classifier and derive both sample complexity and convergence guarantees. Comprehensive numerical comparisons with state-of-the-art benchmarks demonstrate that our approach maintains competitive performance in terms of accuracy-fairness trade-off curves, while achieving favorable runtime on large datasets. We also evaluate our method at scale on an open dataset with multiple classes, multiple intersectional protected groups, and over 1M samples.

preprint2022arXiv

Causal Transportability for Visual Recognition

Visual representations underlie object recognition tasks, but they often contain both robust and non-robust features. Our main observation is that image classifiers may perform poorly on out-of-distribution samples because spurious correlations between non-robust features and labels can be changed in a new environment. By analyzing procedures for out-of-distribution generalization with a causal graph, we show that standard classifiers fail because the association between images and labels is not transportable across settings. However, we then show that the causal effect, which severs all sources of confounding, remains invariant across domains. This motivates us to develop an algorithm to estimate the causal effect for image classification, which is transportable (i.e., invariant) across source and target environments. Without observing additional variables, we show that we can derive an estimand for the causal effect under empirical assumptions using representations in deep models as proxies. Theoretical analysis, empirical results, and visualizations show that our approach captures causal invariances and improves overall generalization.

preprint2022arXiv

Colorful Optical Vortices with White Light Illumination

The orbital angular momentum (OAM) of light holds great promise for applications in optical communication, super-resolution imaging, and high-dimensional quantum computing. However, the spatio-temporal coherence of the light source has been essential for generating OAM beams, as incoherent ambient light would result in polychromatic and obscured OAM beams in the visible spectrum. Here, we extend the applications of OAM to ambient lighting conditions. By miniaturizing spiral phase plates and integrating them with structural color filters, we achieve spatio-temporal coherence using only an incoherent white light source. These optical elements act as building blocks that encode both color and OAM information in the form of colorful optical vortices. Thus, pairs of transparent substrates that contain matching positions of these vortices constitute a reciprocal optical lock and key system. Due to the multiple helical eigenstates of OAM, the pairwise coupling can be further extended to form a one-to-many matching and validation scheme. Generating and decoding colorful optical vortices with broadband white light could find potential applications in anti-counterfeiting, optical metrology, high-capacity optical encryption, and on-chip 3D photonic devices.

preprint2022arXiv

Constrained Optimization Involving Nonconvex $\ell_p$ Norms: Optimality Conditions, Algorithm and Convergence

This paper investigates the optimality conditions for characterizing the local minimizers of the constrained optimization problems involving an $\ell_p$ norm ($0<p<1$) of the variables, which may appear in either the objective or the constraint. This kind of problems have strong applicability to a wide range of areas since usually the $\ell_p$ norm can promote sparse solutions. However, the nonsmooth and non-Lipschtiz nature of the $\ell_p$ norm often cause these problems difficult to analyze and solve. We provide the calculation of the subgradients of the $\ell_p$ norm and the normal cones of the $\ell_p$ ball. For both problems, we derive the first-order necessary conditions under various constraint qualifications. We also derive the sequential optimality conditions for both problems and study the conditions under which these conditions imply the first-order necessary conditions. We point out that the sequential optimality conditions can be easily satisfied for iteratively reweighted algorithms and show that the global convergence can be easily derived using sequential optimality conditions.

preprint2022arXiv

Context Uncertainty in Contextual Bandits with Applications to Recommender Systems

Recurrent neural networks have proven effective in modeling sequential user feedbacks for recommender systems. However, they usually focus solely on item relevance and fail to effectively explore diverse items for users, therefore harming the system performance in the long run. To address this problem, we propose a new type of recurrent neural networks, dubbed recurrent exploration networks (REN), to jointly perform representation learning and effective exploration in the latent space. REN tries to balance relevance and exploration while taking into account the uncertainty in the representations. Our theoretical analysis shows that REN can preserve the rate-optimal sublinear regret even when there exists uncertainty in the learned representations. Our empirical study demonstrates that REN can achieve satisfactory long-term rewards on both synthetic and real-world recommendation datasets, outperforming state-of-the-art models.

preprint2022arXiv

Coverage and Capacity Optimization in STAR-RISs Assisted Networks: A Machine Learning Approach

Coverage and capacity are the important metrics for performance evaluation in wireless networks, while the coverage and capacity have several conflicting relationships, e.g. high transmit power contributes to large coverage but high inter-cell interference reduces the capacity performance. Therefore, in order to strike a balance between the coverage and capacity, a novel model is proposed for the coverage and capacity optimization of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) assisted networks. To solve the coverage and capacity optimization (CCO) problem, a machine learning-based multi-objective optimization algorithm, i.e., the multi-objective proximal policy optimization (MO-PPO) algorithm, is proposed. In this algorithm, a loss function-based update strategy is the core point, which is able to calculate weights for both loss functions of coverage and capacity by a min-norm solver at each update. The numerical results demonstrate that the investigated update strategy outperforms the fixed weight-based MO algorithms.

preprint2022arXiv

Cross-Modal Graph with Meta Concepts for Video Captioning

Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention mechanism to model the relations between objects. They often miss some undefined semantic concepts of the pretrained model and fail to identify exact predicate relationships between objects. In this paper, we investigate an open research task of generating text descriptions for the given videos, and propose Cross-Modal Graph (CMG) with meta concepts for video captioning. Specifically, to cover the useful semantic concepts in video captions, we weakly learn the corresponding visual regions for text descriptions, where the associated visual regions and textual words are named cross-modal meta concepts. We further build meta concept graphs dynamically with the learned cross-modal meta concepts. We also construct holistic video-level and local frame-level video graphs with the predicted predicates to model video sequence structures. We validate the efficacy of our proposed techniques with extensive experiments and achieve state-of-the-art results on two public datasets.

preprint2022arXiv

Decomposing Generation Networks with Structure Prediction for Recipe Generation

Recipe generation from food images and ingredients is a challenging task, which requires the interpretation of the information from another modality. Different from the image captioning task, where the captions usually have one sentence, cooking instructions contain multiple sentences and have obvious structures. To help the model capture the recipe structure and avoid missing some cooking details, we propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction, to get more structured and complete recipe generation outputs. Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase. Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure. Extensive experiments on the challenging large-scale Recipe1M dataset validate the effectiveness of our proposed model, which improves the performance over the state-of-the-art results.

preprint2022arXiv

Deep Learning Serves Traffic Safety Analysis: A Forward-looking Review

This paper explores Deep Learning (DL) methods that are used or have the potential to be used for traffic video analysis, emphasizing driving safety for both Autonomous Vehicles (AVs) and human-operated vehicles. We present a typical processing pipeline, which can be used to understand and interpret traffic videos by extracting operational safety metrics and providing general hints and guidelines to improve traffic safety. This processing framework includes several steps, including video enhancement, video stabilization, semantic and incident segmentation, object detection and classification, trajectory extraction, speed estimation, event analysis, modeling and anomaly detection. Our main goal is to guide traffic analysts to develop their own custom-built processing frameworks by selecting the best choices for each step and offering new designs for the lacking modules by providing a comparative analysis of the most successful conventional and DL-based algorithms proposed for each step. We also review existing open-source tools and public datasets that can help train DL models. To be more specific, we review exemplary traffic problems and mentioned requires steps for each problem. Besides, we investigate connections to the closely related research areas of drivers&#39; cognition evaluation, Crowd-sourcing-based monitoring systems, Edge Computing in roadside infrastructures, Automated Driving Systems (ADS)-equipped vehicles, and highlight the missing gaps. Finally, we review commercial implementations of traffic monitoring systems, their future outlook, and open problems and remaining challenges for widespread use of such systems.

preprint2022arXiv

Deep Reinforcement Learning for Optimal Power Flow with Renewables Using Graph Information

Renewable energy resources (RERs) have been increasingly integrated into large-scale distributed power systems. Considering uncertainties and voltage fluctuation issues introduced by RERs, in this paper, we propose a deep reinforcement learning (DRL)-based strategy leveraging spatial-temporal (ST) graphical information of power systems, to dynamically search for the optimal operation, i.e., optimal power flow (OPF), of power systems with a high uptake of RERs. Specifically, we formulate the OPF problem as a multi-objective optimization problem considering generation cost, voltage fluctuation, and transmission loss, and employ deep deterministic policy gradient (DDPG) to learn an optimal allocation strategy for OPF. Moreover, given that the nodes in power systems are self-correlated and interrelated in temporal and spatial views, we develop a multi-grained attention-based spatial-temporal graph convolution network (MG-ASTGCN) for extracting ST graphical correlations and features, aiming to provide prior knowledge of power systems for its sequential DDPG algorithm to more effectively solve OPF. We validate our algorithm on modified IEEE 33, 69, and 118-bus radial distribution systems and demonstrate that our algorithm outperforms other benchmark algorithms. Our experimental results also reveal that our MG-ASTGCN can significantly accelerate DDPG&#39;s training process and performance in solving OPF.

preprint2022arXiv

DH-GAN: A Physics-driven Untrained Generative Adversarial Network for 3D Microscopic Imaging using Digital Holography

Digital holography is a 3D imaging technique by emitting a laser beam with a plane wavefront to an object and measuring the intensity of the diffracted waveform, called holograms. The object&#39;s 3D shape can be obtained by numerical analysis of the captured holograms and recovering the incurred phase. Recently, deep learning (DL) methods have been used for more accurate holographic processing. However, most supervised methods require large datasets to train the model, which is rarely available in most DH applications due to the scarcity of samples or privacy concerns. A few one-shot DL-based recovery methods exist with no reliance on large datasets of paired images. Still, most of these methods often neglect the underlying physics law that governs wave propagation. These methods offer a black-box operation, which is not explainable, generalizable, and transferrable to other samples and applications. In this work, we propose a new DL architecture based on generative adversarial networks that uses a discriminative network for realizing a semantic measure for reconstruction quality while using a generative network as a function approximator to model the inverse of hologram formation. We impose smoothness on the background part of the recovered image using a progressive masking module powered by simulated annealing to enhance the reconstruction quality. The proposed method is one of its kind that exhibits high transferability to similar samples, which facilitates its fast deployment in time-sensitive applications without the need for retraining the network. The results show a considerable improvement to competitor methods in reconstruction quality (about 5 dB PSNR gain) and robustness to noise (about 50% reduction in PSNR vs noise increase rate).

preprint2022arXiv

DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow

Recently, the dense correlation volume method achieves state-of-the-art performance in optical flow. However, the correlation volume computation requires a lot of memory, which makes prediction difficult on high-resolution images. In this paper, we propose a novel Patchmatch-based framework to work on high-resolution optical flow estimation. Specifically, we introduce the first end-to-end Patchmatch based deep learning optical flow. It can get high-precision results with lower memory benefiting from propagation and local search of Patchmatch. Furthermore, a new inverse propagation is proposed to decouple the complex operations of propagation, which can significantly reduce calculations in multiple iterations. At the time of submission, our method ranks first on all the metrics on the popular KITTI2015 benchmark, and ranks second on EPE on the Sintel clean benchmark among published optical flow methods. Experiment shows our method has a strong cross-dataset generalization ability that the F1-all achieves 13.73%, reducing 21% from the best published result 17.4% on KITTI2015. What&#39;s more, our method shows a good details preserving result on the high-resolution dataset DAVIS and consumes 2x less memory than RAFT.

preprint2022arXiv

Domain Adaptation for Time Series Forecasting via Attention Sharing

Recently, deep neural networks have gained increasing popularity in the field of time series forecasting. A primary reason for their success is their ability to effectively capture complex temporal dynamics across multiple related time series. The advantages of these deep forecasters only start to emerge in the presence of a sufficient amount of data. This poses a challenge for typical forecasting problems in practice, where there is a limited number of time series or observations per time series, or both. To cope with this data scarcity issue, we propose a novel domain adaptation framework, Domain Adaptation Forecaster (DAF). DAF leverages statistical strengths from a relevant domain with abundant data samples (source) to improve the performance on the domain of interest with limited data (target). In particular, we use an attention-based shared module with a domain discriminator across domains and private modules for individual domains. We induce domain-invariant latent features (queries and keys) and retrain domain-specific features (values) simultaneously to enable joint training of forecasters on source and target domains. A main insight is that our design of aligning keys allows the target domain to leverage source time series even with different characteristics. Extensive experiments on various domains demonstrate that our proposed method outperforms state-of-the-art baselines on synthetic and real-world datasets, and ablation studies verify the effectiveness of our design choices.

preprint2022arXiv

Domain Adaptation with Factorizable Joint Shift

Existing domain adaptation (DA) usually assumes the domain shift comes from either the covariates or the labels. However, in real-world applications, samples selected from different domains could have biases in both the covariates and the labels. In this paper, we propose a new assumption, Factorizable Joint Shift (FJS), to handle the co-existence of sampling bias in covariates and labels. Although allowing for the shift from both sides, FJS assumes the independence of the bias between the two factors. We provide theoretical and empirical understandings about when FJS degenerates to prior assumptions and when it is necessary. We further propose Joint Importance Aligning (JIA), a discriminative learning objective to obtain joint importance estimators for both supervised and unsupervised domain adaptation. Our method can be seamlessly incorporated with existing domain adaptation algorithms for better importance estimation and weighting on the training data. Experiments on a synthetic dataset demonstrate the advantage of our method.

preprint2022arXiv

DotMat: Solving Cold-start Problem and Alleviating Sparsity Problem for Recommender Systems

Cold-start and sparsity problem are two key intrinsic problems to recommender systems. During the past two decades, researchers and industrial practitioners have spent considerable amount of efforts trying to solve the problems. However, for cold-start problem, most research relies on importing side information to transfer knowledge. A notable exception is ZeroMat, which uses no extra input data. Sparsity is a lesser noticed problem. In this paper, we propose a new algorithm named DotMat that relies on no extra input data, but is capable of solving cold-start and sparsity problems. In experiments, we prove that like ZeroMat, DotMat can achieve competitive results with recommender systems with full data, such as the classic matrix factorization algorithm.

preprint2022arXiv

Efficient Data-Plane Memory Scheduling for In-Network Aggregation

As the scale of distributed training grows, communication becomes a bottleneck. To accelerate the communication, recent works introduce In-Network Aggregation (INA), which moves the gradients summation into network middle-boxes, e.g., programmable switches to reduce the traffic volume. However, switch memory is scarce compared to the volume of gradients transmitted in distributed training. Although literature applies methods like pool-based streaming or dynamic sharing to tackle the mismatch, switch memory is still a potential performance bottleneck. Furthermore, we observe the under-utilization of switch memory due to the synchronization requirement for aggregator deallocation in recent works. To improve the switch memory utilization, we propose ESA, an $\underline{E}$fficient Switch Memory $\underline{S}$cheduler for In-Network $\underline{A}$ggregation. At its cores, ESA enforces the preemptive aggregator allocation primitive and introduces priority scheduling at the data-plane, which improves the switch memory utilization and average job completion time (JCT). Experiments show that ESA can improve the average JCT by up to $1.35\times$.

preprint2022arXiv

Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review

The intelligent dialogue system, aiming at communicating with humans harmoniously with natural language, is brilliant for promoting the advancement of human-machine interaction in the era of artificial intelligence. With the gradually complex human-computer interaction requirements (e.g., multimodal inputs, time sensitivity), it is difficult for traditional text-based dialogue system to meet the demands for more vivid and convenient interaction. Consequently, Visual Context Augmented Dialogue System (VAD), which has the potential to communicate with humans by perceiving and understanding multimodal information (i.e., visual context in images or videos, textual dialogue history), has become a predominant research paradigm. Benefiting from the consistency and complementarity between visual and textual context, VAD possesses the potential to generate engaging and context-aware responses. For depicting the development of VAD, we first characterize the concepts and unique features of VAD, and then present its generic system architecture to illustrate the system workflow. Subsequently, several research challenges and representative works are detailed investigated, followed by the summary of authoritative benchmarks. We conclude this paper by putting forward some open issues and promising research trends for VAD, e.g., the cognitive mechanisms of human-machine dialogue under cross-modal dialogue context, and knowledge-enhanced cross-modal semantic interaction.

preprint2022arXiv

Energy and Spectrum Efficient Federated Learning via High-Precision Over-the-Air Computation

Federated learning (FL) enables mobile devices to collaboratively learn a shared prediction model while keeping data locally. However, there are two major research challenges to practically deploy FL over mobile devices: (i) frequent wireless updates of huge size gradients v.s. limited spectrum resources, and (ii) energy-hungry FL communication and local computing during training v.s. battery-constrained mobile devices. To address those challenges, in this paper, we propose a novel multi-bit over-the-air computation (M-AirComp) approach for spectrum-efficient aggregation of local model updates in FL and further present an energy-efficient FL design for mobile devices. Specifically, a high-precision digital modulation scheme is designed and incorporated in the M-AirComp, allowing mobile devices to upload model updates at the selected positions simultaneously in the multi-access channel. Moreover, we theoretically analyze the convergence property of our FL algorithm. Guided by FL convergence analysis, we formulate a joint transmission probability and local computing control optimization, aiming to minimize the overall energy consumption (i.e., iterative local computing + multi-round communications) of mobile devices in FL. Extensive simulation results show that our proposed scheme outperforms existing ones in terms of spectrum utilization, energy efficiency, and learning accuracy.

preprint2022arXiv

ESCM$^2$: Entire Space Counterfactual Multi-Task Model for Post-Click Conversion Rate Estimation

Accurate estimation of post-click conversion rate is critical for building recommender systems, which has long been confronted with sample selection bias and data sparsity issues. Methods in the Entire Space Multi-task Model (ESMM) family leverage the sequential pattern of user actions, i.e. $impression\rightarrow click \rightarrow conversion$ to address data sparsity issue. However, they still fail to ensure the unbiasedness of CVR estimates. In this paper, we theoretically demonstrate that ESMM suffers from the following two problems: (1) Inherent Estimation Bias (IEB), where the estimated CVR of ESMM is inherently higher than the ground truth; (2) Potential Independence Priority (PIP) for CTCVR estimation, where there is a risk that the ESMM overlooks the causality from click to conversion. To this end, we devise a principled approach named Entire Space Counterfactual Multi-task Modelling (ESCM$^2$), which employs a counterfactual risk miminizer as a regularizer in ESMM to address both IEB and PIP issues simultaneously. Extensive experiments on offline datasets and online environments demonstrate that our proposed ESCM$^2$ can largely mitigate the inherent IEB and PIP issues and achieve better performance than baseline models.

preprint2022arXiv

Evidence for mechanical softening-hardening dual anomaly in transition metals from shock compressed vanadium

Solid usually becomes harder and tougher under compression, and turns softer at elevated temperature. Recently, compression-induced softening and heating-induced hardening (CISHIH) dual anomaly was predicted in group VB elements such as vanadium. Here, the evidence for this counterintuitive phenomenon is reported. By using accurate high-temperature high-pressure sound velocities measured at Hugoniot states generated by shock-waves, together with first-principles calculations, we observe not only the prominent compression-induced sound velocity reduction, but also strong heating-induced sound velocity enhancement, in shocked vanadium. The former corresponds to the softening in shear modulus by compression, whereas the latter reflects the reverse hardening by heat. These experiments also unveil another anomaly in Young&#39;s modulus that wasn&#39;t reported before. Based on the experimental and theoretical data, we infer that vanadium might transition from BCC into two different rhombohedral (RH1 and RH2) phases at about 79GPa and 116GPa along the Hugoniot, respectively, which implies a dramatic difference in static and dynamic loading, as well as the significance of deviatoric stress and rate-relevant effects in high-pressure phase transition dynamics.

preprint2022arXiv

Extremal GloVe: Theoretically Accurate Distributed Word Embedding by Tail Inference

Distributed word embeddings such as Word2Vec and GloVe have been widely adopted in industrial context settings. Major technical applications of GloVe include recommender systems and natural language processing. The fundamental theory behind GloVe relies on the selection of a weighting function in the weighted least squres formulation that computes the powered ratio of word occurrence count and the maximum word count in the corpus. However, the initial formulation of GloVe is not theoretically sound in two aspects, namely the selection of the weighting function and its power exponent is ad-hoc. In this paper, we utilize the theory of extreme value analysis and propose a theoretically accurate version of GloVe. By reformulating the weighted least squares loss function as the expected loss function and accurately choosing the power exponent, we create a theoretically accurate version of GloVe. We demonstrate the competitiveness of our algorithm and show that the initial formulation of GloVe with the suggested optimal parameter can be viewed as a special case of our paradigm.

preprint2022arXiv

Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values

We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.

preprint2022arXiv

Federated Learning for Personalized Humor Recognition

Computational understanding of humor is an important topic under creative language understanding and modeling. It can play a key role in complex human-AI interactions. The challenge here is that human perception of humorous content is highly subjective. The same joke may receive different funniness ratings from different readers. This makes it highly challenging for humor recognition models to achieve personalization in practical scenarios. Existing approaches are generally designed based on the assumption that users have a consensus on whether a given text is humorous or not. Thus, they cannot handle diverse humor preferences well. In this paper, we propose the FedHumor approach for the recognition of humorous content in a personalized manner through Federated Learning (FL). Extending a pre-trained language model, FedHumor guides the fine-tuning process by considering diverse distributions of humor preferences from individuals. It incorporates a diversity adaptation strategy into the FL paradigm to train a personalized humor recognition model. To the best of our knowledge, FedHumor is the first text-based personalized humor recognition model through federated learning. Extensive experiments demonstrate the advantage of FedHumor in recognizing humorous texts compared to nine state-of-the-art humor recognition approaches with superior capability for handling the diversity in humor labels produced by users with diverse preferences.

preprint2022arXiv

From PHY to QoE: A Parameterized Framework Design

The rapid development of 5G communication technology has given birth to various real-time broadband communication services, such as augmented reality (AR), virtual reality (VR) and cloud games. Compared with traditional services, consumers tend to focus more on their subjective experience when utilizing these services. In the meantime, the problem of power consumption is particularly prominent in 5G and beyond. The traditional design of physical layer (PHY) receiver is based on maximizing spectrum efficiency or minimizing error, but this will no longer be the best after considering energy efficiency and these new-coming services. Therefore, this paper uses quality of experience (QoE) as the optimization criterion of the PHY algorithm. In order to establish the relationship between PHY and QoE, this paper models the end-to-end transmission from UE perspective and proposes a five-layer framework based on hierarchical analysis method, which includes system-level model, bitstream model, packet model, service quality model and experience quality model. Real data in 5G network is used to train the parameters of the involved models for each type of services, respectively. The results show that the PHY algorithms can be simplified in perspective of QoE.

preprint2022arXiv

From policy to prediction: Forecasting COVID-19 dynamics under imperfect vaccination

Understanding the joint impact of vaccination and non-pharmaceutical interventions on COVID-19 development is important for making public health decisions that control the pandemic. Recently, we created a method in forecasting the daily number of confirmed cases of infectious diseases by combining a mechanistic ordinary differential equation (ODE) model for infectious classes and a generalized boosting machine learning model (GBM) for predicting how public health policies and mobility data affect the transmission rate in the ODE model [WWR+]. In this paper, we extend the method to the post-vaccination period, accordingly obtain a retrospective forecast of COVID-19 daily confirmed cases in the US, and identify the relative influence of the policies used as the predictor variables. In particular, our ODE model contains both partially and fully vaccinated compartments and accounts for the breakthrough cases, that is, vaccinated individuals can still get infected. Our results indicate that the inclusion of data on non-pharmaceutical interventions can significantly improve the accuracy of the predictions. With the use of policy data, the model predicts the number of daily infected cases up to 35 days in the future, with an average mean absolute percentage error of 34%, which is further improved to 21% if combined with human mobility data. Moreover, similar to the pre-vaccination study, the most influential predictor variable remains the policy of restrictions on gatherings. The modeling approach used in this work can help policymakers design control measures as variant strains threaten public health in the future.

preprint2022arXiv

Full RGB Just Noticeable Difference (JND) Modelling

Just Noticeable Difference (JND) has many applications in multimedia signal processing, especially for visual data processing up to date. It&#39;s generally defined as the minimum visual content changes that the human can perspective, which has been studied for decades. However, most of the existing methods only focus on the luminance component of JND modelling and simply regard chrominance components as scaled versions of luminance. In this paper, we propose a JND model to generate the JND by taking the characteristics of full RGB channels into account, termed as the RGB-JND. To this end, an RGB-JND-NET is proposed, where the visual content in full RGB channels is used to extract features for JND generation. To supervise the JND generation, an adaptive image quality assessment combination (AIC) is developed. Besides, the RDB-JND-NET also takes the visual attention into account by automatically mining the underlying relationship between visual attention and the JND, which is further used to constrain the JND spatial distribution. To the best of our knowledge, this is the first work on careful investigation of JND modelling for full-color space. Experimental results demonstrate that the RGB-JND-NET model outperforms the relevant state-of-the-art JND models. Besides, the JND of the red and blue channels are larger than that of the green one according to the experimental results of the proposed model, which demonstrates that more changes can be tolerated in the red and blue channels, in line with the well-known fact that the human visual system is more sensitive to the green channel in comparison with the red and blue ones.

preprint2022arXiv

Gas permeation through graphdiyne-based nanoporous membranes

Nanoporous membranes based on two dimensional materials are predicted to provide highly selective gas transport in combination with extreme permeability. Here we investigate membranes made from multilayer graphdiyne, a graphene-like crystal with a larger unit cell. Despite being nearly a hundred of nanometers thick, the membranes allow fast, Knudsen-type permeation of light gases such as helium and hydrogen whereas heavy noble gases like xenon exhibit strongly suppressed flows. Using isotope and cryogenic temperature measurements, the seemingly conflicting characteristics are explained by a high density of straight-through holes (direct porosity of ~0.1%), in which heavy atoms are adsorbed on the walls, partially blocking Knudsen flows. Our work offers important insights into intricate transport mechanisms playing a role at nanoscale.

preprint2022arXiv

Gaussian Process Constraint Learning for Scalable Chance-Constrained Motion Planning from Demonstrations

We propose a method for learning constraints represented as Gaussian processes (GPs) from locally-optimal demonstrations. Our approach uses the Karush-Kuhn-Tucker (KKT) optimality conditions to determine where on the demonstrations the constraint is tight, and a scaling of the constraint gradient at those states. We then train a GP representation of the constraint which is consistent with and which generalizes this information. We further show that the GP uncertainty can be used within a kinodynamic RRT to plan probabilistically-safe trajectories, and that we can exploit the GP structure within the planner to exactly achieve a specified safety probability. We demonstrate our method can learn complex, nonlinear constraints demonstrated on a 5D nonholonomic car, a 12D quadrotor, and a 3-link planar arm, all while requiring minimal prior information on the constraint. Our results suggest the learned GP constraint is accurate, outperforming previous constraint learning methods that require more a priori knowledge.

preprint2022arXiv

HEGrid: A High Efficient Multi-Channel Radio Astronomical Data Gridding Framework in Heterogeneous Computing Environments

The challenge to fully exploit the potential of existing and upcoming scientific instruments like large single-dish radio telescopes is to process the collected massive data effectively and efficiently. As a &#34;quasi 2D stencil computation&#34; with the &#34;Moore neighborhood pattern,&#34; gridding is the most computationally intensive step in data reduction pipeline for radio astronomy studies, enabling astronomers to create correct sky images for further analysis. However, the existing gridding frameworks can either only run on multi-core CPU architecture or do not support high-concurrency, multi-channel data gridding. Their performance is then limited, and there are emerging needs for innovative gridding frameworks to process data from large single-dish radio telescopes like the Five-hundred-meter Aperture Spherical Telescope (FAST). To address those challenges, we developed a High Efficient Gridding framework, HEGrid, by overcoming the above limitations. Specifically, we propose and construct the gridding pipeline in heterogeneous computing environments and achieve multi-pipeline concurrency for high performance multi-channel processing. Furthermore, we propose pipeline-based co-optimization to alleviate the potential negative performance impact of possible intra- and inter-pipeline low computation and I/O utilization, including component share-based redundancy elimination, thread-level data reuse and overlapping I/O and computation. Our experiments are based on both simulated datasets and actual FAST observational datasets. The results show that HEGrid outperforms other state-of-the-art gridding frameworks by up to 5.5x and has robust hardware portability, including AMD Radeon Instinct GPU and NVIDIA GPU.

preprint2022arXiv

High Dimensional Bayesian Optimization with Kernel Principal Component Analysis

Bayesian Optimization (BO) is a surrogate-based global optimization strategy that relies on a Gaussian Process regression (GPR) model to approximate the objective function and an acquisition function to suggest candidate points. It is well-known that BO does not scale well for high-dimensional problems because the GPR model requires substantially more data points to achieve sufficient accuracy and acquisition optimization becomes computationally expensive in high dimensions. Several recent works aim at addressing these issues, e.g., methods that implement online variable selection or conduct the search on a lower-dimensional sub-manifold of the original search space. Advancing our previous work of PCA-BO that learns a linear sub-manifold, this paper proposes a novel kernel PCA-assisted BO (KPCA-BO) algorithm, which embeds a non-linear sub-manifold in the search space and performs BO on this sub-manifold. Intuitively, constructing the GPR model on a lower-dimensional sub-manifold helps improve the modeling accuracy without requiring much more data from the objective function. Also, our approach defines the acquisition function on the lower-dimensional sub-manifold, making the acquisition optimization more manageable. We compare the performance of KPCA-BO to a vanilla BO and to PCA-BO on the multi-modal problems of the COCO/BBOB benchmark suite. Empirical results show that KPCA-BO outperforms BO in terms of convergence speed on most test problems, and this benefit becomes more significant when the dimensionality increases. For the 60D functions, KPCA-BO achieves better results than PCA-BO for many test cases. Compared to the vanilla BO, it efficiently reduces the CPU time required to train the GPR model and to optimize the acquisition function compared to the vanilla BO.

preprint2022arXiv

High-fidelity intensity diffraction tomography with a non-paraxial multiple-scattering model

We propose a novel intensity diffraction tomography (IDT) reconstruction algorithm based on the split-step non-paraxial (SSNP) model for recovering the 3D refractive index (RI) distribution of multiple-scattering biological samples. High-quality IDT reconstruction requires high-angle illumination to encode both low- and high- spatial frequency information of the 3D biological sample. We show that our SSNP model can more accurately compute multiple scattering from high-angle illumination compared to paraxial approximation-based multiple-scattering models. We apply this SSNP model to both sequential and multiplexed IDT techniques. We develop a unified reconstruction algorithm for both IDT modalities that is highly computationally efficient and is implemented by a modular automatic differentiation framework. We demonstrate the capability of our reconstruction algorithm on both weakly scattering buccal epithelial cells and strongly scattering live $\textit{C. elegans}$ worms and live $\textit{C. elegans}$ embryos.

preprint2022arXiv

High-order Photonic Cavity Modes Enabled 3D Structural Color

It remains a challenge to directly print three-dimensional arbitrary shapes that exhibit structural colors at the micrometer scale. Woodpile photonic crystals (WPCs) fabricated via two-photon lithography (TPL) are promising as building blocks to produce 3D geometries that generate structural colors due to their ability to exhibit either omnidirectional or anisotropic photonic stopbands. However, existing approaches have focused on achieving structural colors when illuminating WPCs from the top, which necessitates print resolutions beyond the limit of commercial TPL and/or post-processing techniques. Here, we devised a new strategy to support high-order photonic cavity modes upon side-illumination on WPCs that surprisingly generate large reflectance peaks in the visible spectrum. Based on that, we demonstrate one-step printing of 3D photonic structural colors without requiring post-processing or subwavelength features. Vivid colors with reflectance peaks exhibiting a full width at half maximum of ~25 nm, a maximum reflectance of 50%, gamut of ~85% of sRGB, and large viewing angles, were achieved. In addition, we also demonstrated voxel-level manipulation and control of colors in arbitrary-shaped 3D objects constituted with WPCs as unit cells, which has great potential for applications in dynamic color displays, colorimetric sensing, anti-counterfeiting, and light-matter interaction platforms.

preprint2022arXiv

Hyperparameter-free and Explainable Whole Graph Embedding

Graphs can be used to describe complex systems. Recently, whole graph embedding (graph representation learning) can compress a graph into a compact lower-dimension vector while preserving intrinsic properties, earning much attention. However, most graph embedding methods have problems such as tedious parameter tuning or poor explanation. This paper presents a simple and hyperparameter-free whole graph embedding method based on the DHC (Degree, H-index, and Coreness) theorem and Shannon Entropy (E), abbreviated as DHC-E. The DHC-E can provide a trade-off between simplicity and quality for supervised classification learning tasks involving molecular, social, and brain networks. Moreover, it performs well in lower-dimensional graph visualization. Overall, the DHC-E is simple, hyperparameter-free, and explainable for whole graph embedding with promising potential for exploring graph classification and lower-dimensional graph visualization.

preprint2022arXiv

Image Quality Assessment with Gradient Siamese Network

In this work, we introduce Gradient Siamese Network (GSN) for image quality assessment. The proposed method is skilled in capturing the gradient features between distorted images and reference images in full-reference image quality assessment(IQA) task. We utilize Central Differential Convolution to obtain both semantic features and detail difference hidden in image pair. Furthermore, spatial attention guides the network to concentrate on regions related to image detail. For the low-level, mid-level and high-level features extracted by the network, we innovatively design a multi-level fusion method to improve the efficiency of feature utilization. In addition to the common mean square error supervision, we further consider the relative distance among batch samples and successfully apply KL divergence loss to the image quality assessment task. We experimented the proposed algorithm GSN on several publicly available datasets and proved its superior performance. Our network won the second place in NTIRE 2022 Perceptual Image Quality Assessment Challenge track 1 Full-Reference.

preprint2022arXiv

IMCI: Integrate Multi-view Contextual Information for Fact Extraction and Verification

With the rapid development of automatic fake news detection technology, fact extraction and verification (FEVER) has been attracting more attention. The task aims to extract the most related fact evidences from millions of open-domain Wikipedia documents and then verify the credibility of corresponding claims. Although several strong models have been proposed for the task and they have made great progress, we argue that they fail to utilize multi-view contextual information and thus cannot obtain better performance. In this paper, we propose to integrate multi-view contextual information (IMCI) for fact extraction and verification. For each evidence sentence, we define two kinds of context, i.e. intra-document context and inter-document context}. Intra-document context consists of the document title and all the other sentences from the same document. Inter-document context consists of all other evidences which may come from different documents. Then we integrate the multi-view contextual information to encode the evidence sentences to handle the task. Our experimental results on FEVER 1.0 shared task show that our IMCI framework makes great progress on both fact extraction and verification, and achieves state-of-the-art performance with a winning FEVER score of 72.97% and label accuracy of 75.84% on the online blind test set. We also conduct ablation study to detect the impact of multi-view contextual information. Our codes will be released at https://github.com/phoenixsecularbird/IMCI.

preprint2022arXiv

Improved normal-boundary intersection algorithm: a method for energy optimization strategy in smart buildings

With the widespread use of distributed energy sources, the advantages of smart buildings over traditional buildings are becoming increasingly obvious. Subsequently, its energy optimal scheduling and multi-objective optimization have become more and more complex and need to be solved urgently. This paper presents a novel method to optimize energy utilization in smart buildings. Firstly, multiple transfer-retention ratio (TRR) parameters are added to the evaluation of distributed renewable energy. Secondly, the normal-boundary intersection (NBI) algorithm is improved by the adaptive weight sum, the adjust uniform axes method, and Mahalanobis distance to form the improved normal-boundary intersection (INBI) algorithm. The multi-objective optimization problem in smart buildings is solved by the parameter TRR and INBI algorithm to improve the regulation efficiency. In response to the needs of decision-makers with evaluation indicators, the average deviation is reduced by 60% compared with the previous case. Numerical examples show that the proposed method is superior to the existing technologies in terms of three optimization objectives. The objectives include 8.2% reduction in equipment costs, 7.6% reduction in power supply costs, and 1.6% improvement in occupants&#39; comfort.

preprint2022arXiv

IOHanalyzer: Detailed Performance Analyses for Iterative Optimization Heuristics

Benchmarking and performance analysis play an important role in understanding the behaviour of iterative optimization heuristics (IOHs) such as local search algorithms, genetic and evolutionary algorithms, Bayesian optimization algorithms, etc. This task, however, involves manual setup, execution, and analysis of the experiment on an individual basis, which is laborious and can be mitigated by a generic and well-designed platform. For this purpose, we propose IOHanalyzer, a new user-friendly tool for the analysis, comparison, and visualization of performance data of IOHs. Implemented in R and C++, IOHanalyzer is fully open source. It is available on CRAN and GitHub. IOHanalyzer provides detailed statistics about fixed-target running times and about fixed-budget performance of the benchmarked algorithms with a real-valued codomain, single-objective optimization tasks. Performance aggregation over several benchmark problems is possible, for example in the form of empirical cumulative distribution functions. Key advantages of IOHanalyzer over other performance analysis packages are its highly interactive design, which allows users to specify the performance measures, ranges, and granularity that are most useful for their experiments, and the possibility to analyze not only performance traces, but also the evolution of dynamic state parameters. IOHanalyzer can directly process performance data from the main benchmarking platforms, including the COCO platform, Nevergrad, the SOS platform, and IOHexperimenter. An R programming interface is provided for users preferring to have a finer control over the implemented functionalities.

preprint2022arXiv

IOHexperimenter: Benchmarking Platform for Iterative Optimization Heuristics

We present IOHexperimenter, the experimentation module of the IOHprofiler project, which aims at providing an easy-to-use and highly customizable toolbox for benchmarking iterative optimization heuristics such as local search, evolutionary and genetic algorithms, Bayesian optimization techniques, etc. IOHexperimenter can be used as a stand-alone tool or as part of a benchmarking pipeline that uses other components of IOHprofiler such as IOHanalyzer, the module for interactive performance analysis and visualization. IOHexperimenter provides an efficient interface between optimization problems and their solvers while allowing for granular logging of the optimization process. These logs are fully compatible with existing tools for interactive data analysis, which significantly speeds up the deployment of a benchmarking pipeline. The main components of IOHexperimenter are the environment to build customized problem suites and the various logging options that allow users to steer the granularity of the data records.

preprint2022arXiv

jTrans: Jump-Aware Transformer for Binary Code Similarity

Binary code similarity detection (BCSD) has important applications in various fields such as vulnerability detection, software component analysis, and reverse engineering. Recent studies have shown that deep neural networks (DNNs) can comprehend instructions or control-flow graphs (CFG) of binary code and support BCSD. In this study, we propose a novel Transformer-based approach, namely jTrans, to learn representations of binary code. It is the first solution that embeds control flow information of binary code into Transformer-based language models, by using a novel jump-aware representation of the analyzed binaries and a newly-designed pre-training task. Additionally, we release to the community a newly-created large dataset of binaries, BinaryCorp, which is the most diverse to date. Evaluation results show that jTrans outperforms state-of-the-art (SOTA) approaches on this more challenging dataset by 30.5% (i.e., from 32.0% to 62.5%). In a real-world task of known vulnerability searching, jTrans achieves a recall that is 2X higher than existing SOTA baselines.

preprint2022arXiv

KL-Mat : Fair Recommender System via Information Geometry

Recommender system has intrinsic problems such as sparsity and fairness. Although it has been widely adopted for the past decades, research on fairness of recommendation algorithms has been largely neglected until recently. One important paradigm for resolving the issue is regularization. However, researchers have not been able to come up with a consensusly agreed regularization term like regularization framework in other fields such as Lasso or Ridge Regression. In this paper, we borrow concepts from information geometry and propose a new regularization-based fair algorithm called KL-Mat. The algorithmic technique leads to a more robust performance in accuracy performance such as MAE. More importantly, the algorithm produces much fairer results than vanilla matrix factorization approach. KL-Mat is fast, easy-to-implement and explainable.

preprint2022arXiv

Knowledge Mining with Scene Text for Fine-Grained Recognition

Recently, the semantics of scene text has been proven to be essential in fine-grained image classification. However, the existing methods mainly exploit the literal meaning of scene text for fine-grained recognition, which might be irrelevant when it is not significantly related to objects/scenes. We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image and enhance the semantics and correlation to fine-tune the image representation. Unlike the existing methods, our model integrates three modalities: visual feature extraction, text semantics extraction, and correlating background knowledge to fine-grained image classification. Specifically, we employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification. Experiments on two benchmark datasets, Con-Text, and Drink Bottle, show that our method outperforms the state-of-the-art by 3.72\% mAP and 5.39\% mAP, respectively. To further validate the effectiveness of the proposed method, we create a new dataset on crowd activity recognition for the evaluation. The source code and new dataset of this work are available at https://github.com/lanfeng4659/KnowledgeMiningWithSceneText.

preprint2022arXiv

Landscape Learning for Neural Network Inversion

Many machine learning methods operate by inverting a neural network at inference time, which has become a popular technique for solving inverse problems in computer vision, robotics, and graphics. However, these methods often involve gradient descent through a highly non-convex loss landscape, causing the optimization process to be unstable and slow. We introduce a method that learns a loss landscape where gradient descent is efficient, bringing massive improvement and acceleration to the inversion process. We demonstrate this advantage on a number of methods for both generative and discriminative tasks, including GAN inversion, adversarial defense, and 3D human pose reconstruction.

preprint2022arXiv

Learning Structural Representations for Recipe Generation and Food Retrieval

Food is significant to human daily life. In this paper, we are interested in learning structural representations for lengthy recipes, that can benefit the recipe generation and food cross-modal retrieval tasks. Different from the common vision-language data, here the food images contain mixed ingredients and target recipes are lengthy paragraphs, where we do not have annotations on structure information. To address the above limitations, we propose a novel method to unsupervisedly learn the sentence-level tree structures for the cooking recipes. Our approach brings together several novel ideas in a systematic framework: (1) exploiting an unsupervised learning approach to obtain the sentence-level tree structure labels before training; (2) generating trees of target recipes from images with the supervision of tree structure labels learned from (1); and (3) integrating the learned tree structures into the recipe generation and food cross-modal retrieval procedure. Our proposed model can produce good-quality sentence-level tree structures and coherent recipes. We achieve the state-of-the-art recipe generation and food cross-modal retrieval performance on the benchmark Recipe1M dataset.

preprint2022arXiv

MovieMat: Context-aware Movie Recommendation with Matrix Factorization by Matrix Fitting

Movie Recommender System is widely applied in commercial environments such as NetFlix and Tubi. Classic recommender models utilize technologies such as collaborative filtering, learning to rank, matrix factorization and deep learning models to achieve lower marketing expenses and higher revenues. However, audience of movies have different ratings of the same movie in different contexts. Important movie watching contexts include audience mood, location, weather, etc. Tobe able to take advantage of contextual information is of great benefit to recommender builders. However, popular techniques such as tensor factorization consumes an impractical amount of storage, which greatly reduces its feasibility in real world environment. In this paper, we take advantage of the MatMat framework, which factorizes matrices by matrix fitting to build a context-aware movie recommender system that is superior to classic matrix factorization and comparable in the fairness metric.

preprint2022arXiv

Mutations make pandemics worse or better: modeling SARS-CoV-2 variants and imperfect vaccination

Since December 2020, variants of COVID-19 (especially Delta and Omicron) appeared with different characteristics that influenced death and transmissibility emerged around the world. To address the novel dynamics of the disease, we propose a dynamical model of two strains, namely native and mutant, transmission dynamics with mutation and imperfect vaccination. It is also assumed that the recuperated individuals from the native strain can be infected with mutant strain through the direct contact with individual or contaminated surfaces or aerosols. We compute the basic reproduction number for each strain independently and take the maximum for $R_0$. We prove the nonexistence of backward bifurcation using the center manifold theory, and global stability of disease-free equilibrium when the basic reproduction number $R_0<1. An intermediate mutation rate $ν_1$ leads to oscillations. When $ν_1$ increases over a threshold, the system regains its stability and exhibits an interesting dynamics called endemic bubble. An analytical expression for vaccine-induced herd immunity is derived. The model is parameterized using the Indian data of the cumulative number of confirmed cases and deaths of COVID-19 from March 1 to September 27 in 2021, using MCMC method. The cumulative cases and deaths can be reduced by increasing the vaccine efficacies to both native and mutant strains. We observe that by considering the vaccine efficacy to native strain as 90\%, the cumulative cases and deaths would be reduced by 3.27\% and 5.2\%, respectively; and by considering the vaccine efficacy to mutant strain as 90\%, the cumulative cases and deaths would be reduced by 0.9\% and 2.5\%, respectively. Our study demonstrates that the COVID-19 pandemic may be worse due to the occurrence of oscillations for certain mutation rates but better due to stability at a lower infection level with a larger mutation rate.

preprint2022arXiv

Network-level Safety Metrics for Overall Traffic Safety Assessment: A Case Study

Driving safety analysis has recently experienced unprecedented improvements thanks to technological advances in precise positioning sensors, artificial intelligence (AI)-based safety features, autonomous driving systems, connected vehicles, high-throughput computing, and edge computing servers. Particularly, deep learning (DL) methods empowered volume video processing to extract safety-related features from massive videos captured by roadside units (RSU). Safety metrics are commonly used measures to investigate crashes and near-conflict events. However, these metrics provide limited insight into the overall network-level traffic management. On the other hand, some safety assessment efforts are devoted to processing crash reports and identifying spatial and temporal patterns of crashes that correlate with road geometry, traffic volume, and weather conditions. This approach relies merely on crash reports and ignores the rich information of traffic videos that can help identify the role of safety violations in crashes. To bridge these two perspectives, we define a new set of network-level safety metrics (NSM) to assess the overall safety profile of traffic flow by processing imagery taken by RSU cameras. Our analysis suggests that NSMs show significant statistical associations with crash rates. This approach is different than simply generalizing the results of individual crash analyses, since all vehicles contribute to calculating NSMs, not only the ones involved in crash incidents. This perspective considers the traffic flow as a complex dynamic system where actions of some nodes can propagate through the network and influence the crash risk for other nodes. We also provide a comprehensive review of surrogate safety metrics (SSM) in the Appendix A.

preprint2022arXiv

Non-invasive improvement of machining by reversible electrochemical doping: a proof of principle with computational modeling

We propose that the machinability of hard ceramics can be improved by reversible electrochemical doping. On the example of TiO2, we show in a combined density functional theory-molecular dynamics computational study that a small amount of intercalated lithium, which preserves the host structure and can be introduced reversibly, leads to a lowering of the strength of work materials and the cutting force. This is in spite of the fact that there are no significant modifications of the elastic constants at room temperature, i.e. the effect is mostly on plastic properties. This approach is expected to be applicable to a class of ceramics exhibiting similar mechanisms of host-dopant interactions and presents a reversible and non-destructive way of modifying mechanical properties.

preprint2022arXiv

On Multi-Domain Long-Tailed Recognition, Imbalanced Domain Generalization and Beyond

Real-world data often exhibit imbalanced label distributions. Existing studies on data imbalance focus on single-domain settings, i.e., samples are from the same data distribution. However, natural data can originate from distinct domains, where a minority class in one domain could have abundant instances from other domains. We formalize the task of Multi-Domain Long-Tailed Recognition (MDLT), which learns from multi-domain imbalanced data, addresses label imbalance, domain shift, and divergent label distributions across domains, and generalizes to all domain-class pairs. We first develop the domain-class transferability graph, and show that such transferability governs the success of learning in MDLT. We then propose BoDA, a theoretically grounded learning strategy that tracks the upper bound of transferability statistics, and ensures balanced alignment and calibration across imbalanced domain-class distributions. We curate five MDLT benchmarks based on widely-used multi-domain datasets, and compare BoDA to twenty algorithms that span different learning strategies. Extensive and rigorous experiments verify the superior performance of BoDA. Further, as a byproduct, BoDA establishes new state-of-the-art on Domain Generalization benchmarks, highlighting the importance of addressing data imbalance across domains, which can be crucial for improving generalization to unseen domains. Code and data are available at: https://github.com/YyzHarry/multi-domain-imbalance.

preprint2022arXiv

Online Learning Based NLOS Ranging Error Mitigation in 5G Positioning

The fifth-generation (5G) wireless communication is useful for positioning due to its large bandwidth and low cost. However, the presence of obstacles that block the line-of-sight (LOS) path between devices would affect localization accuracy severely. In this paper, we propose an online learning approach to mitigate ranging error directly in non-line-of-sight (NLOS) channels. The distribution of NLOS ranging error is learned from received raw signals, where a network with neural processes regressor (NPR) is utilized to learn the environment and range-related information precisely. The network can be implemented for online learning free from retraining the network, which is computationally efficient. Simulation results show that the proposed approach outperforms conventional techniques in terms of NLOS ranging error mitigation.

preprint2022arXiv

OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks

This paper proposes a new eXplanation framework, called OrphicX, for generating causal explanations for any graph neural networks (GNNs) based on learned latent causal factors. Specifically, we construct a distinct generative model and design an objective function that encourages the generative model to produce causal, compact, and faithful explanations. This is achieved by isolating the causal factors in the latent space of graphs by maximizing the information flow measurements. We theoretically analyze the cause-effect relationships in the proposed causal graph, identify node attributes as confounders between graphs and GNN predictions, and circumvent such confounder effect by leveraging the backdoor adjustment formula. Our framework is compatible with any GNNs, and it does not require access to the process by which the target GNN produces its predictions. In addition, it does not rely on the linear-independence assumption of the explained features, nor require prior knowledge on the graph learning tasks. We show a proof-of-concept of OrphicX on canonical classification problems on graph data. In particular, we analyze the explanatory subgraphs obtained from explanations for molecular graphs (i.e., Mutag) and quantitatively evaluate the explanation performance with frequently occurring subgraph patterns. Empirically, we show that OrphicX can effectively identify the causal semantics for generating causal explanations, significantly outperforming its alternatives.

preprint2022arXiv

Overcoming Van der Waals Forces in reconfigurable nanostructures

Reconfigurable metamaterials require constituent nanostructures to demonstrate switching of shapes with external stimuli. For generality, such nanostructures would touch and stick to other surfaces in one of its configurations. Yet, a longstanding challenge is in overcoming this stiction caused by Van der Waals forces, which impedes shape recovery. Here, we introduce a stiff yet self-recovering material system based on acrylic acid, and tested it in high-aspect ratio structures, where recovery is weak. This designer material has a storage modulus of ~5.2 GPa at room temperature and ~90 MPa in the rubbery state at 150 Celsius, an order of magnitude higher than previous reports. A high-resolution resin for two-photon lithography was developed based on this polymer system, enabling 3D printing of nanopillars with diameters of ~400 nm and aspect ratio as high as ~10. Experimentally, we observed self-recovery as collapsed and touching structures overcome stiction to stand back up. We developed a theoretical model to explain the recoverability of these sub-micron structures. Reconfigurable structural colour prints and holograms were demonstrated, indicating potential applications of the material system as a shape memory polymer suitable for sub-micron reconfigurable metamaterials.

preprint2022arXiv

Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval

This paper investigates an open research problem of generating text-image pairs to improve the training of fine-grained image-to-text cross-modal retrieval task, and proposes a novel framework for paired data augmentation by uncovering the hidden semantic information of StyleGAN2 model. Specifically, we first train a StyleGAN2 model on the given dataset. We then project the real images back to the latent space of StyleGAN2 to obtain the latent codes. To make the generated images manipulatable, we further introduce a latent space alignment module to learn the alignment between StyleGAN2 latent codes and the corresponding textual caption features. When we do online paired data augmentation, we first generate augmented text through random token replacement, then pass the augmented text into the latent space alignment module to output the latent codes, which are finally fed to StyleGAN2 to generate the augmented images. We evaluate the efficacy of our augmented data approach on two public cross-modal retrieval datasets, in which the promising experimental results demonstrate the augmented text-image pair data can be trained together with the original data to boost the image-to-text cross-modal retrieval performance.

preprint2022arXiv

PC$^2$-PU: Patch Correlation and Point Correlation for Effective Point Cloud Upsampling

Point cloud upsampling is to densify a sparse point set acquired from 3D sensors, providing a denser representation for the underlying surface. Existing methods divide the input points into small patches and upsample each patch separately, however, ignoring the global spatial consistency between patches. In this paper, we present a novel method PC$^2$-PU, which explores patch-to-patch and point-to-point correlations for more effective and robust point cloud upsampling. Specifically, our network has two appealing designs: (i) We take adjacent patches as supplementary inputs to compensate the loss structure information within a single patch and introduce a Patch Correlation Module to capture the difference and similarity between patches. (ii) After augmenting each patch&#39;s geometry, we further introduce a Point Correlation Module to reveal the relationship of points inside each patch to maintain the local spatial consistency. Extensive experiments on both synthetic and real scanned datasets demonstrate that our method surpasses previous upsampling methods, particularly with the noisy inputs. The code and data are at \url{https://github.com/chenlongwhu/PC2-PU.git}.

preprint2022arXiv

Per-run Algorithm Selection with Warm-starting using Trajectory-based Features

Per-instance algorithm selection seeks to recommend, for a given problem instance and a given performance criterion, one or several suitable algorithms that are expected to perform well for the particular setting. The selection is classically done offline, using openly available information about the problem instance or features that are extracted from the instance during a dedicated feature extraction step. This ignores valuable information that the algorithms accumulate during the optimization process. In this work, we propose an alternative, online algorithm selection scheme which we coin per-run algorithm selection. In our approach, we start the optimization with a default algorithm, and, after a certain number of iterations, extract instance features from the observed trajectory of this initial optimizer to determine whether to switch to another optimizer. We test this approach using the CMA-ES as the default solver, and a portfolio of six different optimizers as potential algorithms to switch to. In contrast to other recent work on online per-run algorithm selection, we warm-start the second optimizer using information accumulated during the first optimization phase. We show that our approach outperforms static per-instance algorithm selection. We also compare two different feature extraction principles, based on exploratory landscape analysis and time series analysis of the internal state variables of the CMA-ES, respectively. We show that a combination of both feature sets provides the most accurate recommendations for our test cases, taken from the BBOB function suite from the COCO platform and the YABBOB suite from the Nevergrad platform.

preprint2022arXiv

Phase transitions and topological properties of the 5/2 quantum Hall states with strong Landau-level mixing

We numerically study a 5/2 fractional quantum Hall system with even number of electrons using the exact diagonalization where both the strong Landau level (LL) mixing and a finite width of the quantum well have been considered and adapted into a screened Coulomb interaction. With the principal component analysis, we are able to recognize a compressible-incompressible phase transition in the parameter space made of the magnetic field and the quantum well width by the competition between the first two leading components of the ground states wave functions, which is consistent with the low-lying spectral feature and previous works in the odd-electron system. In addition, the presence of the subdominant third component suggests an incompressible transition occurring as the LL-mixing strength grows into a certain parameter region associated with the ZnO experiments. We further investigate the strongly LL-mixed phase in this emerging region with the Hall viscosity, wave function overlaps, and the entanglement spectra. Results show it can be well described as a particle-hole symmetrized Pfaffian state with the dual topological properties of the Pfaffian and the anti-Pfaffian states.

preprint2022arXiv

Preference Enhanced Social Influence Modeling for Network-Aware Cascade Prediction

Network-aware cascade size prediction aims to predict the final reposted number of user-generated information via modeling the propagation process in social networks. Estimating the user&#39;s reposting probability by social influence, namely state activation plays an important role in the information diffusion process. Therefore, Graph Neural Networks (GNN), which can simulate the information interaction between nodes, has been proved as an effective scheme to handle this prediction task. However, existing studies including GNN-based models usually neglect a vital factor of user&#39;s preference which influences the state activation deeply. To that end, we propose a novel framework to promote cascade size prediction by enhancing the user preference modeling according to three stages, i.e., preference topics generation, preference shift modeling, and social influence activation. Our end-to-end method makes the user activating process of information diffusion more adaptive and accurate. Extensive experiments on two large-scale real-world datasets have clearly demonstrated the effectiveness of our proposed model compared to state-of-the-art baselines.

preprint2022arXiv

Privacy protection based on mask template

Powerful recognition algorithms are widely used in the Internet or important medical systems, which poses a serious threat to personal privacy. Although the law provides for diversity protection, e.g. The General Data Protection Regulation (GDPR) in Europe and Articles 1032 to 1039 of the civil code in China. However, as an important privacy disclosure event, biometric data is often hidden, which is difficult for the owner to detect and trace to the source. Human biometrics generally exist in images. In order to avoid the disclosure of personal privacy, we should prevent unauthorized recognition algorithms from acquiring the real features of the original image.

preprint2022arXiv

Progressive Domain-Independent Feature Decomposition Network for Zero-Shot Sketch-Based Image Retrieval

Zero-shot sketch-based image retrieval (ZS-SBIR) is a specific cross-modal retrieval task for searching natural images given free-hand sketches under the zero-shot scenario. Most existing methods solve this problem by simultaneously projecting visual features and semantic supervision into a low-dimensional common space for efficient retrieval. However, such low-dimensional projection destroys the completeness of semantic knowledge in original semantic space, so that it is unable to transfer useful knowledge well when learning semantic from different modalities. Moreover, the domain information and semantic information are entangled in visual features, which is not conducive for cross-modal matching since it will hinder the reduction of domain gap between sketch and image. In this paper, we propose a Progressive Domain-independent Feature Decomposition (PDFD) network for ZS-SBIR. Specifically, with the supervision of original semantic knowledge, PDFD decomposes visual features into domain features and semantic ones, and then the semantic features are projected into common space as retrieval features for ZS-SBIR. The progressive projection strategy maintains strong semantic supervision. Besides, to guarantee the retrieval features to capture clean and complete semantic information, the cross-reconstruction loss is introduced to encourage that any combinations of retrieval features and domain features can reconstruct the visual features. Extensive experiments demonstrate the superiority of our PDFD over state-of-the-art competitors.

preprint2022arXiv

QCluster: Clustering Packets for Flow Scheduling

Flow scheduling is crucial in data centers, as it directly influences user experience of applications. According to different assumptions and design goals, there are four typical flow scheduling problems/solutions: SRPT, LAS, Fair Queueing, and Deadline-Aware scheduling. When implementing these solutions in commodity switches with limited number of queues, they need to set static parameters by measuring traffic in advance, while optimal parameters vary across time and space. This paper proposes a generic framework, namely QCluster, to adapt all scheduling problems for limited number of queues. The key idea of QCluster is to cluster packets with similar weights/properties into the same queue. QCluster is implemented in Tofino switches, and can cluster packets at a speed of 3.2 Tbps. To the best of our knowledge, QCluster is the fastest clustering algorithm. Experimental results in testbed with programmable switches and ns-2 show that QCluster reduces the average flow completion time (FCT) for short flows up to 56.6%, and reduces the overall average FCT up to 21.7% over state-of-the-art. All the source code in ns-2 is available in Github without.

preprint2022arXiv

Quantum-Inspired Solvers on Mixed-Integer Linear Programming Problem

Mixed-integer linear programming (MILP) plays a crucial role in artificial intelligence, biochemistry, finance, cryptography, etc. Notwithstanding popular for decades, the researches of MILP solvers are still limited by the resource consumption caused by complexity and failure of Moore&#39;s Law. Quantum-inspired Ising machines, as a new computing paradigm, can be used to solve integer programming problems by reducing them into Ising models. Therefore, it is necessary to understand the technical evolution of quantum inspired solvers to break the bottleneck. In this paper, the concept and traditional algorithms for MILP are introduced. Then, focused on Ising model, the principle and implementations of annealers and coherent Ising machines are summarized. Finally, the paper discusses the challenges and opportunities of miniaturized solvers in the future.

preprint2022arXiv

RankMat : Matrix Factorization with Calibrated Distributed Embedding and Fairness Enhancement

Matrix Factorization is a widely adopted technique in the field of recommender system. Matrix Factorization techniques range from SVD, LDA, pLSA, SVD++, MatRec, Zipf Matrix Factorization and Item2Vec. In recent years, distributed word embeddings have inspired innovation in the area of recommender systems. Word2vec and GloVe have been especially emphasized in many industrial application scenario such as Xiaomi&#39;s recommender system. In this paper, we propose a new matrix factorization inspired by the theory of power law and GloVe. Instead of the exponential nature of GloVe model, we take advantage of Pareto Distribution to model our loss function. Our method is explainable in theory and easy-to-implement in practice. In the experiment section, we prove our approach is superior to vanilla matrix factorization technique and comparable with GloVe-based model in both accuracy and fairness metrics.

preprint2022arXiv

Residual-Aided End-to-End Learning of Communication System without Known Channel

Leveraging powerful deep learning techniques, the end-to-end (E2E) learning of communication system is able to outperform the classical communication system. Unfortunately, this communication system cannot be trained by deep learning without known channel. To deal with this problem, a generative adversarial network (GAN) based training scheme has been recently proposed to imitate the real channel. However, the gradient vanishing and overfitting problems of GAN will result in the serious performance degradation of E2E learning of communication system. To mitigate these two problems, we propose a residual aided GAN (RA-GAN) based training scheme in this paper. Particularly, inspired by the idea of residual learning, we propose a residual generator to mitigate the gradient vanishing problem by realizing a more robust gradient backpropagation. Moreover, to cope with the overfitting problem, we reconstruct the loss function for training by adding a regularizer, which limits the representation ability of RA-GAN. Simulation results show that the trained residual generator has better generation performance than the conventional generator, and the proposed RA-GAN based training scheme can achieve the near-optimal block error rate (BLER) performance with a negligible computational complexity increase in both the theoretical channel model and the ray-tracing based channel dataset.

preprint2022arXiv

Self-Powered Broadband Photodetector Based on MoS2/Sb2Te3 Heterojunctions: A promising approach for highly sensitive detection

Topological insulators have shown great potential for future optoelectronic technology due to their extraordinary optical and electrical properties. Photodetectors, as one of the most widely used optoelectronic devices, are crucial for sensing, imaging, communication, and optical computing systems to convert optical signals to electrical signals. Here we experimentally show a novel combination of topological insulators (TIs) and transition metal chalcogenides (TMDs) based self-powered photodetectors with ultra-low dark current and high sensitivity. The photodetector formed by a MoS2/Sb2Te3 heterogeneous junction exhibits a low dark current of 2.4 pA at zero bias and 1.2 nA at 1V. It shows a high photoresponsivity of > 150 mA W-1 at zero bias and rectification of 3 times at an externally applied bias voltage of 1V. The excellent performance of the proposed photodetector with its innovative material combination of TMDs and TIs paves the way for the development of novel high-performance optoelectronic devices. The TIs/TMDs transfer used to form the heterojunction is simple to incorporate into on-chip waveguide systems, enabling future applications on highly integrated photonic circuits.

preprint2022arXiv

Self-recoverable Adversarial Examples: A New Effective Protection Mechanism in Social Networks

Malicious intelligent algorithms greatly threaten the security of social users&#39; privacy by detecting and analyzing the uploaded photos to social network platforms. The destruction to DNNs brought by the adversarial attack sparks the potential that adversarial examples serve as a new protection mechanism for privacy security in social networks. However, the existing adversarial example does not have recoverability for serving as an effective protection mechanism. To address this issue, we propose a recoverable generative adversarial network to generate self-recoverable adversarial examples. By modeling the adversarial attack and recovery as a united task, our method can minimize the error of the recovered examples while maximizing the attack ability, resulting in better recoverability of adversarial examples. To further boost the recoverability of these examples, we exploit a dimension reducer to optimize the distribution of adversarial perturbation. The experimental results prove that the adversarial examples generated by the proposed method present superior recoverability, attack ability, and robustness on different datasets and network architectures, which ensure its effectiveness as a protection mechanism in social networks.

preprint2022arXiv

Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer

We propose a semi-supervised network for wide-angle portraits correction. Wide-angle images often suffer from skew and distortion affected by perspective distortion, especially noticeable at the face regions. Previous deep learning based approaches need the ground-truth correction flow maps for training guidance. However, such labels are expensive, which can only be obtained manually. In this work, we design a semi-supervised scheme and build a high-quality unlabeled dataset with rich scenarios, allowing us to simultaneously use labeled and unlabeled data to improve performance. Specifically, our semi-supervised scheme takes advantage of the consistency mechanism, with several novel components such as direction and range consistency (DRC) and regression consistency (RC). Furthermore, different from the existing methods, we propose the Multi-Scale Swin-Unet (MS-Unet) based on the multi-scale swin transformer block (MSTB), which can simultaneously learn short-distance and long-distance information to avoid artifacts. Extensive experiments demonstrate that the proposed method is superior to the state-of-the-art methods and other representative baselines. The source code and dataset are available at: https://github.com/megvii-research/Portraits_Correction.

preprint2022arXiv

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

Unsupervised sentence embedding aims to obtain the most appropriate embedding for a sentence to reflect its semantic. Contrastive learning has been attracting developing attention. For a sentence, current models utilize diverse data augmentation methods to generate positive samples, while consider other independent sentences as negative samples. Then they adopt InfoNCE loss to pull the embeddings of positive pairs gathered, and push those of negative pairs scattered. Although these models have made great progress on sentence embedding, we argue that they may suffer from feature suppression. The models fail to distinguish and decouple textual similarity and semantic similarity. And they may overestimate the semantic similarity of any pairs with similar textual regardless of the actual semantic difference between them. This is because positive pairs in unsupervised contrastive learning come with similar and even the same textual through data augmentation. To alleviate feature suppression, we propose contrastive learning for unsupervised sentence embedding with soft negative samples (SNCSE). Soft negative samples share highly similar textual but have surely and apparently different semantic with the original samples. Specifically, we take the negation of original sentences as soft negative samples, and propose Bidirectional Margin Loss (BML) to introduce them into traditional contrastive learning framework, which merely involves positive and negative samples. Our experimental results show that SNCSE can obtain state-of-the-art performance on semantic textual similarity (STS) task with average Spearman&#39;s correlation coefficient of 78.97% on BERTbase and 79.23% on RoBERTabase. Besides, we adopt rank-based error analysis method to detect the weakness of SNCSE for future study.

preprint2022arXiv

Soft MIMO Detection Using Marginal Posterior Probability Statistics

Soft demodulation of received symbols into bit log-likelihood ratios (LLRs) is at the very heart of multiple-input-multiple-output (MIMO) detection. However, the optimal maximum a posteriori (MAP) detector is complicated and infeasible to be used in a practical system. In this paper, we propose a soft MIMO detection algorithm based on marginal posterior probability statistics (MPPS). With the help of optimal transport theory and order statistics theory, we transform the posteriori probability distribution of each layer into a Gaussian distribution. Then the full sampling paths can be implicitly restored from the first- and second-order moment statistics of the transformed distribution. A lightweight network is designed to learn to recovery the log-MAP LLRs from the moment statistics with low complexity. Simulation results show that the proposed algorithm can improve the performance significantly with reduced samples under fading and correlated channels.

preprint2022arXiv

Soliton walls paired by polar surface interactions in a ferroelectric nematic liquid crystal

Surface interactions are responsible for many properties of condensed matter, ranging from crystal faceting to the kinetics of phase transitions. Usually, these interactions are polar along the normal to the interface and apolar within the interface. Here, we demonstrate that polar in-plane surface interactions of a ferroelectric nematic NF produce polar monodomains in micron-thin planar cells and stripes of an alternating electric polarization, separated by 180 degree domain walls, in thicker slabs. The surface polarity binds together pairs of these walls, yielding a total polarization rotation by 360 degrees. The polar contribution to the total surface anchoring strength is on the order of 10%. The domain walls involve splay, bend, and twist of the polarization. The structure suggests that the splay elastic constant is larger than the bend modulus. The 360 degree pairs resemble domain walls in cosmology models with biased vacuums and ferromagnets in an external magnetic field.

preprint2022arXiv

Subgraph Frequency Distribution Estimation using Graph Neural Networks

Small subgraphs (graphlets) are important features to describe fundamental units of a large network. The calculation of the subgraph frequency distributions has a wide application in multiple domains including biology and engineering. Unfortunately due to the inherent complexity of this task, most of the existing methods are computationally intensive and inefficient. In this work, we propose GNNS, a novel representational learning framework that utilizes graph neural networks to sample subgraphs efficiently for estimating their frequency distribution. Our framework includes an inference model and a generative model that learns hierarchical embeddings of nodes, subgraphs, and graph types. With the learned model and embeddings, subgraphs are sampled in a highly scalable and parallel way and the frequency distribution estimation is then performed based on these sampled subgraphs. Eventually, our methods achieve comparable accuracy and a significant speedup by three orders of magnitude compared to existing methods.

preprint2022arXiv

Temporal Logic Guided Motion Primitives for Complex Manipulation Tasks with User Preferences

Dynamic movement primitives (DMPs) are a flexible trajectory learning scheme widely used in motion generation of robotic systems. However, existing DMP-based methods mainly focus on simple go-to-goal tasks. Motivated to handle tasks beyond point-to-point motion planning, this work presents temporal logic guided optimization of motion primitives, namely PIBB-TL algorithm, for complex manipulation tasks with user preferences. In particular, weighted truncated linear temporal logic (wTLTL) is incorporated in the PIBB-TL algorithm, which not only enables the encoding of complex tasks that involve a sequence of logically organized action plans with user preferences, but also provides a convenient and efficient means to design the cost function. The black-box optimization is then adapted to identify optimal shape parameters of DMPs to enable motion planning of robotic systems. The effectiveness of the PIBB-TL algorithm is demonstrated via simulation and experime

preprint2022arXiv

Testing dark energy after pre-recombination early dark energy

In the studies on pre-recombination early dark energy (EDE), the evolution of Universe after recombination is usually regarded as $ΛCDM$-like, which corresponds that the equation of state of dark energy responsible for current accelerated expansion is $w=-1$. However, in realistic models, $w$ might be evolving. We consider the parametrizations of $w$ with respect to the redshift $z$ in Axion-like EDE and AdS-EDE models, respectively. We performed the Monte Carlo Markov chain analysis with recent cosmological data, and found that the bestfit $w(z)$ is compatible with $w_0=-1,w_a=0$ (the cosmological constant) and the evolution of $w$ is only marginally favored, which so has little effect on lifting the bestfit value of ${H_0}$.

preprint2022arXiv

Theoretically Accurate Regularization Technique for Matrix Factorization based Recommender Systems

Regularization is a popular technique to solve the overfitting problem of machine learning algorithms. Most regularization technique relies on parameter selection of the regularization coefficient. Plug-in method and cross-validation approach are two most common parameter selection approaches for regression methods such as Ridge Regression, Lasso Regression and Kernel Regression. Matrix factorization based recommendation system also has heavy reliance on the regularization technique. Most people select a single scalar value to regularize the user feature vector and item feature vector independently or collectively. In this paper, we prove that such approach of selecting regularization coefficient is invalid, and we provide a theoretically accurate method that outperforms the most widely used approach in both accuracy and fairness metrics.

preprint2022arXiv

To Split or Not to Split: The Impact of Disparate Treatment in Classification

Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets.

preprint2022arXiv

Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate

Uncertainty estimation is an essential step in the evaluation of the robustness for deep learning models in computer vision, especially when applied in risk-sensitive areas. However, most state-of-the-art deep learning models either fail to obtain uncertainty estimation or need significant modification (e.g., formulating a proper Bayesian treatment) to obtain it. Most previous methods are not able to take an arbitrary model off the shelf and generate uncertainty estimation without retraining or redesigning it. To address this gap, we perform a systematic exploration into training-free uncertainty estimation for dense regression, an unrecognized yet important problem, and provide a theoretical construction justifying such estimations. We propose three simple and scalable methods to analyze the variance of outputs from a trained network under tolerable perturbations: infer-transformation, infer-noise, and infer-dropout. They operate solely during the inference, without the need to re-train, re-design, or fine-tune the models, as typically required by state-of-the-art uncertainty estimation methods. Surprisingly, even without involving such perturbations in training, our methods produce comparable or even better uncertainty estimation when compared to training-required state-of-the-art methods.

preprint2022arXiv

Tri-Functional Metasurface for Phase, Amplitude, and Luminescence Control

In optical anti-counterfeiting, several distinct optically variable devices (OVDs) are often concurrently employed to compensate for the insufficient security level of constituent OVDs. Alternatively, metasurfaces that exhibit multiple optical responses effectively combine multiple OVDs into one, thus significantly enhancing their security and hindering fraudulent replication. This work demonstrates the simultaneous control of three separate optical responses, i.e., phase, amplitude, and luminescence, using anisotropic gap-plasmon metasurfaces. Due to the incorporated geometric anisotropy, the designed structure exhibits distinct responses under x- and y-polarized light, revealing either a color image, or a holographic projection in the far field. Furthermore, inserting upconversion nanoparticles (UCNPs) into the dielectric gaps of the structures, the designed metasurface is able to generate a third luminescent image upon illumination with the near-infrared light. The stochastic distribution of the UCNPs constitutes a unique fingerprint, achieving a physically unclonable function (PUF) layer. Crucially, our triple-mode metasurface requires only readily attainable equipment such as a macro-lens/camera and a laser pointer to read most of the channels, thus paving the way towards highly secure and easy-to-authenticate metasurface-driven OVDs (mOVDs).

preprint2022arXiv

Two improvements of the foliation based quad meshing method

Quadrilateral meshes with high level structure and feature preserving property benefit industrial applications the most. Generation of such quad mesh remains a challenge. Quad meshes generated using surface foliation have the highest level structure, however they lack of the feature preserving ability. In this paper, we analyze the boundary curvature with Gauss-Bonnet theorem to determine whether a boundary rectangle corner preserving foliation based method exists. When it exists, we adopt a modified double cover technique together with surface foliation method to generate a corner feature preserving quad mesh. The experiments demonstrate the efficacy of our algorithm.

preprint2022arXiv

Unsupervised strategies for identifying optimal parameters in Quantum Approximate Optimization Algorithm

As combinatorial optimization is one of the main quantum computing applications, many methods based on parameterized quantum circuits are being developed. In general, a set of parameters are being tweaked to optimize a cost function out of the quantum circuit output. One of these algorithms, the Quantum Approximate Optimization Algorithm stands out as a promising approach to tackling combinatorial problems. However, finding the appropriate parameters is a difficult task. Although QAOA exhibits concentration properties, they can depend on instances characteristics that may not be easy to identify, but may nonetheless offer useful information to find good parameters. In this work, we study unsupervised Machine Learning approaches for setting these parameters without optimization. We perform clustering with the angle values but also instances encodings (using instance features or the output of a variational graph autoencoder), and compare different approaches. These angle-finding strategies can be used to reduce calls to quantum circuits when leveraging QAOA as a subroutine. We showcase them within Recursive-QAOA up to depth $3$ where the number of QAOA parameters used per iteration is limited to $3$, achieving a median approximation ratio of $0.94$ for MaxCut over $200$ Erdős-Rényi graphs. We obtain similar performances to the case where we extensively optimize the angles, hence saving numerous circuit calls.

preprint2022arXiv

Winning the CVPR&#39;2022 AQTC Challenge: A Two-stage Function-centric Approach

Affordance-centric Question-driven Task Completion for Egocentric Assistant(AQTC) is a novel task which helps AI assistant learn from instructional videos and scripts and guide the user step-by-step. In this paper, we deal with the AQTC via a two-stage Function-centric approach, which consists of Question2Function Module to ground the question with the related function and Function2Answer Module to predict the action based on the historical steps. We evaluated several possible solutions in each module and obtained significant gains compared to the given baselines. Our code is available at \url{https://github.com/starsholic/LOVEU-CVPR22-AQTC}.

preprint2021arXiv

A self-supervised learning-based 6-DOF grasp planning method for manipulator

To realize a robust robotic grasping system for unknown objects in an unstructured environment, large amounts of grasp data and 3D model data for the object are required, the sizes of which directly affect the rate of successful grasps. To reduce the time cost of data acquisition and labeling and increase the rate of successful grasps, we developed a self-supervised learning mechanism to control grasp tasks performed by manipulators. First, a manipulator automatically collects the point cloud for the objects from multiple perspectives to increase the efficiency of data acquisition. The complete point cloud for the objects is obtained by utilizing the hand-eye vision of the manipulator, and the TSDF algorithm. Then, the point cloud data for the objects is used to generate a series of six-degrees-of-freedom grasp poses, and the force-closure decision algorithm is used to add the grasp quality label to each grasp pose to realize the automatic labeling of grasp data. Finally, the point cloud in the gripper closing area corresponding to each grasp pose is obtained; it is then used to train the grasp-quality classification model for the manipulator. The results of data acquisition experiments demonstrate that the proposed method allows high-quality data to be obtained. The simulated results prove the effectiveness of the proposed grasp-data acquisition method. The results of performing actual grasping experiments demonstrate that the proposed self-supervised learning method can increase the rate of successful grasps for the manipulator.

preprint2021arXiv

A Survey on Bayesian Deep Learning

A comprehensive artificial intelligence system needs to not only perceive the environment with different `senses&#39; (e.g., seeing and hearing) but also infer the world&#39;s conditional (or even causal) relations and corresponding uncertainty. The past decade has seen major advances in many perception tasks such as visual object recognition and speech recognition using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. In recent years, Bayesian deep learning has emerged as a unified probabilistic framework to tightly integrate deep learning and Bayesian models. In this general framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in turn, the feedback from the inference process is able to enhance the perception of text or images. This survey provides a comprehensive introduction to Bayesian deep learning and reviews its recent applications on recommender systems, topic models, control, etc. Besides, we also discuss the relationship and differences between Bayesian deep learning and other related topics such as Bayesian treatment of neural networks. For a constantly updating project page, please refer to https://github.com/js05212/BayesianDeepLearning-Survey.

preprint2021arXiv

An Extrapolated Iteratively Reweighted l1 Method with Complexity Analysis

The iteratively reweighted l1 algorithm is a widely used method for solving various regularization problems, which generally minimize a differentiable loss function combined with a nonconvex regularizer to induce sparsity in the solution. However, the convergence and the complexity of iteratively reweighted l1 algorithms is generally difficult to analyze, especially for non-Lipschitz differentiable regularizers such as nonconvex lp norm regularization. In this paper, we propose, analyze and test a reweighted l1 algorithm combined with the extrapolation technique under the assumption of Kurdyka-Lojasiewicz (KL) property on the objective. Unlike existing iteratively reweighted l1 algorithms with extrapolation, our method does not require the Lipschitz differentiability on the regularizers nor the smoothing parameters in the weights bounded away from zero. We show the proposed algorithm converges uniquely to a stationary point of the regularization problem and has local linear complexity--a much stronger result than existing ones. Our numerical experiments show the efficiency of our proposed method.

preprint2021arXiv

Back to Prior Knowledge: Joint Event Causality Extraction via Convolutional Semantic Infusion

Joint event and causality extraction is a challenging yet essential task in information retrieval and data mining. Recently, pre-trained language models (e.g., BERT) yield state-of-the-art results and dominate in a variety of NLP tasks. However, these models are incapable of imposing external knowledge in domain-specific extraction. Considering the prior knowledge of frequent n-grams that represent cause/effect events may benefit both event and causality extraction, in this paper, we propose convolutional knowledge infusion for frequent n-grams with different windows of length within a joint extraction framework. Knowledge infusion during convolutional filter initialization not only helps the model capture both intra-event (i.e., features in an event cluster) and inter-event (i.e., associations across event clusters) features but also boosts training convergence. Experimental results on the benchmark datasets show that our model significantly outperforms the strong BERT+CSNN baseline.

preprint2021arXiv

Benchmarking Discrete Optimization Heuristics with IOHprofiler

Automated benchmarking environments aim to support researchers in understanding how different algorithms perform on different types of optimization problems. Such comparisons provide insights into the strengths and weaknesses of different approaches, which can be leveraged into designing new algorithms and into the automation of algorithm selection and configuration. With the ultimate goal to create a meaningful benchmark set for iterative optimization heuristics, we have recently released IOHprofiler, a software built to create detailed performance comparisons between iterative optimization heuristics. With this present work we demonstrate that IOHprofiler provides a suitable environment for automated benchmarking. We compile and assess a selection of 23 discrete optimization problems that subscribe to different types of fitness landscapes. For each selected problem we compare performances of twelve different heuristics, which are as of now available as baseline algorithms in IOHprofiler. We also provide a new module for IOHprofiler which extents the fixed-target and fixed-budget results for the individual problems by ECDF results, which allows one to derive aggregated performance statistics for groups of problems.

preprint2021arXiv

CARE: Commonsense-Aware Emotional Response Generation with Latent Concepts

Rationality and emotion are two fundamental elements of humans. Endowing agents with rationality and emotion has been one of the major milestones in AI. However, in the field of conversational AI, most existing models only specialize in one aspect and neglect the other, which often leads to dull or unrelated responses. In this paper, we hypothesize that combining rationality and emotion into conversational agents can improve response quality. To test the hypothesis, we focus on one fundamental aspect of rationality, i.e., commonsense, and propose CARE, a novel model for commonsense-aware emotional response generation. Specifically, we first propose a framework to learn and construct commonsense-aware emotional latent concepts of the response given an input message and a desired emotion. We then propose three methods to collaboratively incorporate the latent concepts into response generation. Experimental results on two large-scale datasets support our hypothesis and show that our model can produce more accurate and commonsense-aware emotional responses and achieve better human ratings than state-of-the-art models that only specialize in one aspect.

preprint2021arXiv

Consensus-Based Decentralized Energy Trading for Distributed Energy Resources

In smart grids, distributed energy resources (DERs) have penetrated residential zones to provide a new form of electricity supply, mainly from renewable energy. Residential households and commercial buildings with DERs have become prosumers in the local grids, since they can sell surplus power to others. Researches have been initiated to integrate and utilize DERs through better control and communication strategies. With the advances in the Internet of Things (IoT) technology, unprecedented coordination among DERs can be achieved to facilitate energy trading and transactive energy management. However, preventing leakage of users&#39; information during the optimization process keeps challenging researchers, which drives them to develop privacy-preserving energy management systems. In this paper, we develop a fully decentralized transactive energy management using the consensus-based algorithm. To be specific, we design a virtual pool for prosumers to trade energy and exchange information with IoT technologies&#39; support. The consensus-based algorithm enables prosumers to obtain the optimal energy schedule independently in a coordinated manner without revealing any personal data. We use real-world data to perform simulations and validate our developed algorithm. The results show that our consensus-based decentralized transactive energy management strategy is feasible and can significantly reduce the overall system cost.

preprint2021arXiv

Convergence Rate Analysis of Proximal Iteratively Reweighted $\ell_1$ Methods for $\ell_p$ Regularization Problems

In this paper, we focus on the local convergence rate analysis of the proximal iteratively reweighted $\ell_1$ algorithms for solving $\ell_p$ regularization problems, which are widely applied for inducing sparse solutions. We show that if the Kurdyka-Lojasiewicz (KL) property is satisfied, the algorithm converges to a unique first-order stationary point; furthermore, the algorithm has local linear convergence or local sublinear convergence. The theoretical results we derived are much stronger than the existing results for iteratively reweighted $\ell_1$ algorithms.

preprint2021arXiv

Experimental Validation of Eco-Driving and Eco-Heating Strategies for Connected and Automated HEVs

This paper presents experimental results that validate eco-driving and eco-heating strategies developed for connected and automated vehicles (CAVs). By exploiting vehicle-to-infrastructure (V2I) communications, traffic signal timing, and queue length estimations, optimized and smoothed speed profiles for the ego-vehicle are generated to reduce energy consumption. Next, the planned eco-trajectories are incorporated into a real-time predictive optimization framework that coordinates the cabin thermal load (in cold weather) with the speed preview, i.e., eco-heating. To enable eco-heating, the engine coolant (as the only heat source for cabin heating) and the cabin air are leveraged as two thermal energy storages. Our eco-heating strategy stores thermal energy in the engine coolant and cabin air while the vehicle is driving at high speeds, and releases the stored energy slowly during the vehicle stops for cabin heating without forcing the engine to idle to provide the heating source. To test and validate these solutions, a power-split hybrid electric vehicle (HEV) has been instrumented for cabin thermal management, allowing to regulate heating, ventilation, and air conditioning (HVAC) system inputs (cabin temperature setpoint and blower flow rate) in real-time. Experiments were conducted to demonstrate the energy-saving benefits of eco-driving and eco-heating strategies over real-world city driving cycles at different cold ambient temperatures. The data confirmed average fuel savings of 14.5% and 4.7% achieved by eco-driving and eco-heating, respectively, offering a combined energy saving of more than 19% when comparing to the baseline vehicle driven by a human driver with a constant-heating strategy.

preprint2021arXiv

Exploring Blockchain for The Coordination of Distributed Energy Resources

The fast growth of distributed energy resources (DERs), such as distributed renewables (e.g., rooftop PV panels), energy storage systems, electric vehicles, and controllable appliances, drives the power system toward a decentralized system with bidirectional power flow. The coordination of DERs through an aggregator, such as a utility, system operator, or a third-party coordinator, emerges as a promising paradigm. However, it is not well understood how to enable trust between the aggregator and DERs to integrate DERs efficiently. In this paper, we develop a trustable and distributed coordination system for DERs using blockchain technology. We model various DERs and formulate a cost minimization problem for DERs to optimize their energy trading, scheduling, and demand response. We use the alternating direction method of multipliers (ADMM) to solve the problem in a distributed fashion. To implement the distributed algorithm in a trustable way, we design a smart contract to update multipliers and communicate with DERs in a blockchain network. We validate our design by experiments using real-world data, and the simulation results demonstrate the effectiveness of our algorithm.

preprint2021arXiv

Keyword-Guided Neural Conversational Model

We study the problem of imposing conversational goals/keywords on open-domain conversational agents, where the agent is required to lead the conversation to a target keyword smoothly and fast. Solving this problem enables the application of conversational agents in many real-world scenarios, e.g., recommendation and psychotherapy. The dominant paradigm for tackling this problem is to 1) train a next-turn keyword classifier, and 2) train a keyword-augmented response retrieval model. However, existing approaches in this paradigm have two limitations: 1) the training and evaluation datasets for next-turn keyword classification are directly extracted from conversations without human annotations, thus, they are noisy and have low correlation with human judgements, and 2) during keyword transition, the agents solely rely on the similarities between word embeddings to move closer to the target keyword, which may not reflect how humans converse. In this paper, we assume that human conversations are grounded on commonsense and propose a keyword-guided neural conversational model that can leverage external commonsense knowledge graphs (CKG) for both keyword transition and response retrieval. Automatic evaluations suggest that commonsense improves the performance of both next-turn keyword prediction and keyword-augmented response retrieval. In addition, both self-play and human evaluations show that our model produces responses with smoother keyword transition and reaches the target keyword faster than competitive baselines.

preprint2021arXiv

Learning Guided Electron Microscopy with Active Acquisition

Single-beam scanning electron microscopes (SEM) are widely used to acquire massive data sets for biomedical study, material analysis, and fabrication inspection. Datasets are typically acquired with uniform acquisition: applying the electron beam with the same power and duration to all image pixels, even if there is great variety in the pixels&#39; importance for eventual use. Many SEMs are now able to move the beam to any pixel in the field of view without delay, enabling them, in principle, to invest their time budget more effectively with non-uniform imaging. In this paper, we show how to use deep learning to accelerate and optimize single-beam SEM acquisition of images. Our algorithm rapidly collects an information-lossy image (e.g. low resolution) and then applies a novel learning method to identify a small subset of pixels to be collected at higher resolution based on a trade-off between the saliency and spatial diversity. We demonstrate the efficacy of this novel technique for active acquisition by speeding up the task of collecting connectomic datasets for neurobiology by up to an order of magnitude.

preprint2021arXiv

Multimessenger parameter estimation of GW170817: from jet structure to the Hubble constant

The electromagnetic radiation that followed the neutron star merger event GW170817 revealed that gamma-ray burst afterglows from jets misaligned with our line of sight exhibit a light curve with slowly rising flux. The slope of the rising light curve depends sensitively on the angle of the observer with respect to the jet axis, which is likely to be perpendicular to the merger plane of the neutron star binary. Therefore, the afterglow emission can be used to constrain the inclination of the merging system. Here, we calculate the gamma-ray burst afterglow emission based on the realistic jet structure derived from general-relativistic magnetohydrodynamical simulations of a black hole torus system for the central engine of the gamma-ray burst. Combined with gravitational wave parameter estimation, we fit the multi-epoch afterglow emission of GW170817. We show that with such a jet model, the observing angle can be tightly constrained by multi messenger observations. The best fit observing angle of GW170817 is $θ_{\rm v} = 0.38\pm 0.02$ rad. With such a constraint, we can break the degeneracy between inclination angle and luminosity distance in gravitational wave parameter estimation, and substantially increase the precision with which the Hubble constant is constrained by the standard siren method. Our estimation of the distance is $D_{\rm L}=43.4\pm 1\ \rm Mpc$ and the Hubble constant constraint is $69.5\pm 4\ \mathrm{km\ s^{-1}\ Mpc^{-1}}$. As a result, multimessenger observations of short-duration gamma-ray bursts, combined with a good theoretical understanding of the jet structure, can be powerful probes of cosmological parameters.

preprint2021arXiv

Network Clustering for Multi-task Learning

The Multi-Task Learning (MTL) technique has been widely studied by word-wide researchers. The majority of current MTL studies adopt the hard parameter sharing structure, where hard layers tend to learn general representations over all tasks and specific layers are prone to learn specific representations for each task. Since the specific layers directly follow the hard layers, the MTL model needs to estimate this direct change (from general to specific) as well. To alleviate this problem, we introduce the novel cluster layer, which groups tasks into clusters during training procedures. In a cluster layer, the tasks in the same cluster are further required to share the same network. By this way, the cluster layer produces the general presentation for the same cluster, while produces relatively specific presentations for different clusters. As transitions the cluster layers are used between the hard layers and the specific layers. The MTL model thus learns general representations to specific representations gradually. We evaluate our model with MTL document classification and the results demonstrate the cluster layer is quite efficient in MTL.

preprint2021arXiv

Potential Advantages of Peak Picking Multi-Voltage Threshold Digitizer in Energy Determination in Radiation Measurement

The Multi-voltage Threshold (MVT) method, which samples the signal by certain reference voltages, has been well developed as being adopted in pre-clinical and clinical digital positron emission tomography(PET) system. To improve its energy measurement performance, we propose a Peak Picking MVT(PP-MVT) Digitizer in this paper. Firstly, a sampled Peak Point(the highest point in pulse signal), which carries the values of amplitude feature voltage and amplitude arriving time, is added to traditional MVT with a simple peak sampling circuit. Secondly, an amplitude deviation statistical analysis, which compares the energy deviation of various reconstruction models, is used to select adaptive reconstruction models for signal pulses with different amplitudes. After processing 30,000 randomly-chosen pulses sampled by the oscilloscope with a 22Na point source, our method achieves an energy resolution of 17.50% within a 450-650 KeV energy window, which is 2.44% better than the result of traditional MVT with same thresholds; and we get a count number at 15225 in the same energy window while the result of MVT is at 14678. When the PP-MVT involves less thresholds than traditional MVT, the advantages of better energy resolution and larger count number can still be maintained, which shows the robustness and the flexibility of PP-MVT Digitizer. This improved method indicates that adding feature peak information could improve the performance on signal sampling and reconstruction, which canbe proved by the better performance in energy determination in radiation measurement.

preprint2021arXiv

Prediction of Stable Ground-State Binary Sodium-Potassium Interalkalis under High Pressures

The complex structures and electronic properties of alkali metals and their alloys provide a natural laboratory for studying the interelectronic interactions of metals under compression. A recent theoretical study (J. Phys. Chem. Lett. 2019, 10, 3006) predicted an interesting pressure-induced decomposition-recombination behavior of the Na2K compound over a pressure range of 10 - 500 GPa. However, a subsequent experiment (Phys. Rev. B 2020, 101, 224108) reported the formation of NaK rather than Na2K at pressures above 5.9 GPa. To address this discordance, we study the chemical stability of different stoichiometries of NaxK (x = 1/4, 1/3, 1/2, 2/3, 3/4, 4/3, 3/2 and 1 - 4) by effective structure searching method combined with first-principles calculations. Na2K is calculated to be unstable at 5 - 35 GPa due to the decomposition reaction Na2K-> NaK + Na, coinciding well with the experiment. NaK undergoes a combination-decomposition-recombination process accompanied by an opposite charge-transfer behavior between Na and K with pressure. Besides NaK, two hitherto unknown compounds NaK3 and Na3K2 are uncovered. NaK3 is a typical metallic alloy, while Na3K2 is an electride with strong interstitial electron localization.

preprint2021arXiv

Secure Blockchain Platform for Industrial IoT with Trusted Computing Hardware

As a disruptive technology that originates from cryptocurrency, blockchain provides a trusted platform to facilitate industrial IoT (IIoT) applications. However, implementing a blockchain platform in IIoT scenarios confronts various security challenges due to the rigorous deployment condition. To this end, we present a novel design of secure blockchain based on trusted computing hardware for IIoT applications. Specifically, we employ the trusted execution environment (TEE) module and a customized security chip to safeguard the blockchain against different attacking vectors. Furthermore, we implement the proposed secure IIoT blockchain on the ARM-based embedded device and build a small-scale IIoT network to evaluate its performance. Our experimental results show that the secure blockchain platform achieves a high throughput (150TPS) with low transaction confirmation delay (below 66ms), demonstrating its feasibility in practical IIoT scenarios. Finally, we outline the open challenges and future research directions.

preprint2021arXiv

Towards Efficient Local Causal Structure Learning

Local causal structure learning aims to discover and distinguish direct causes (parents) and direct effects (children) of a variable of interest from data. While emerging successes have been made, existing methods need to search a large space to distinguish direct causes from direct effects of a target variable \emph{T}. To tackle this issue, we propose a novel Efficient Local Causal Structure learning algorithm, named ELCS. Specifically, we first propose the concept of N-structures, then design an efficient Markov Blanket (MB) discovery subroutine to integrate MB learning with N-structures to learn the MB of \emph{T} and simultaneously distinguish direct causes from direct effects of \emph{T}. With the proposed MB subroutine, ELCS starts from the target variable, sequentially finds MBs of variables connected to the target variable and simultaneously constructs local causal structures over MBs until the direct causes and direct effects of the target variable have been distinguished. Using eight Bayesian networks the extensive experiments have validated that ELCS achieves better accuracy and efficiency than the state-of-the-art algorithms.

preprint2021arXiv

Towards Efficient Local Causal Structure Learning

Local causal structure learning aims to discover and distinguish direct causes (parents) and direct effects (children) of a variable of interest from data. While emerging successes have been made, existing methods need to search a large space to distinguish direct causes from direct effects of a target variable T. To tackle this issue, we propose a novel Efficient Local Causal Structure learning algorithm, named ELCS. Specifically, we first propose the concept of N-structures, then design an efficient Markov Blanket (MB) discovery subroutine to integrate MB learning with N-structures to learn the MB of T and simultaneously distinguish direct causes from direct effects of T. With the proposed MB subroutine, ELCS starts from the target variable, sequentially finds MBs of variables connected to the target variable and simultaneously constructs local causal structures over MBs until the direct causes and direct effects of the target variable have been distinguished. Using eight Bayesian networks the extensive experiments have validated that ELCS achieves better accuracy and efficiency than the state-of-the-art algorithms.

preprint2021arXiv

Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

Cross-lingual voice conversion (VC) is an important and challenging problem due to significant mismatches of the phonetic set and the speech prosody of different languages. In this paper, we build upon the neural text-to-speech (TTS) model, i.e., FastSpeech, and LPCNet neural vocoder to design a new cross-lingual VC framework named FastSpeech-VC. We address the mismatches of the phonetic set and the speech prosody by applying Phonetic PosteriorGrams (PPGs), which have been proved to bridge across speaker and language boundaries. Moreover, we add normalized logarithm-scale fundamental frequency (Log-F0) to further compensate for the prosodic mismatches and significantly improve naturalness. Our experiments on English and Mandarin languages demonstrate that with only mono-lingual corpus, the proposed FastSpeech-VC can achieve high quality converted speech with mean opinion score (MOS) close to the professional records while maintaining good speaker similarity. Compared to the baselines using Tacotron2 and Transformer TTS models, the FastSpeech-VC can achieve controllable converted speech rate and much faster inference speed. More importantly, the FastSpeech-VC can easily be adapted to a speaker with limited training utterances.

preprint2021arXiv

Tuning as a Means of Assessing the Benefits of New Ideas in Interplay with Existing Algorithmic Modules

Introducing new algorithmic ideas is a key part of the continuous improvement of existing optimization algorithms. However, when introducing a new component into an existing algorithm, assessing its potential benefits is a challenging task. Often, the component is added to a default implementation of the underlying algorithm and compared against a limited set of other variants. This assessment ignores any potential interplay with other algorithmic ideas that share the same base algorithm, which is critical in understanding the exact contributions being made. We introduce a more extensive procedure, which uses hyperparameter tuning as a means of assessing the benefits of new algorithmic components. This allows for a more robust analysis by not only focusing on the impact on performance, but also by investigating how this performance is achieved. We implement our suggestion in the context of the Modular CMA-ES framework, which was redesigned and extended to include some new modules and several new options for existing modules, mostly focused on the step-size adaptation method. Our analysis highlights the differences between these new modules, and identifies the situations in which they have the largest contribution.

preprint2020arXiv

A Modular Hybridization of Particle Swarm Optimization and Differential Evolution

In swarm intelligence, Particle Swarm Optimization (PSO) and Differential Evolution (DE) have been successfully applied in many optimization tasks, and a large number of variants, where novel algorithm operators or components are implemented, has been introduced to boost the empirical performance. In this paper, we first propose to combine the variants of PSO or DE by modularizing each algorithm and incorporating the variants thereof as different options of the corresponding modules. Then, considering the similarity between the inner workings of PSO and DE, we hybridize the algorithms by creating two populations with variation operators of PSO and DE respectively, and selecting individuals from those two populations. The resulting novel hybridization, called PSODE, encompasses most up-to-date variants from both sides, and more importantly gives rise to an enormous number of unseen swarm algorithms via different instantiations of the modules therein. In detail, we consider 16 different variation operators originating from existing PSO- and DE algorithms, which, combined with 4 different selection operators, allow the hybridization framework to generate 800 novel algorithms. The resulting set of hybrid algorithms, along with the combined 30 PSO- and DE algorithms that can be generated with the considered operators, is tested on the 24 problems from the well-known COCO/BBOB benchmark suite, across multiple function groups and dimensionalities.

preprint2020arXiv

A Multilayer Neural Network Merging Image Preprocessing and Pattern Recognition by Integrating Diffusion and Drift Memristors

With the development of research on novel memristor model and device, neural networks by integrating various memristor models have become a hot research topic recently. However, state-of-the-art works still build such neural networks using drift memristor only. Furthermore, some other related works are only applied to a few individual applications including pattern recognition and edge detection. In this paper, a novel kind of multilayer neural network is proposed, in which diffusion and drift memristor models are applied to construct a system merging image preprocessing and pattern recognition. Specifically, the entire network consists of two diffusion memristive cellular layers for image preprocessing and one drift memristive feedforward layer for pattern recognition. Experimental results show that good recognition accuracy of noisy MNIST is obtained due to the fusion of image preprocessing and pattern recognition. Moreover, owing to high-efficiency in-memory computing and brief spiking encoding methods, high processing speed, high throughput, and few hardware resources of the entire network are achieved.

preprint2020arXiv

A Posteriori Error Estimates for Adaptive QM/MM Coupling Methods

Hybrid quantum/molecular mechanics models (QM/MM methods) are widely used in material and molecular simulations when MM models do not provide sufficient accuracy but pure QM models are computationally prohibitive. Adaptive QM/MM coupling methods feature on-the-fly classification of atoms during the simulation, allowing the QM and MM subsystems to be updated as needed. In this work, we propose such an adaptive QM/MM method for material defect simulations based on a new residual based it a posteriori error estimator, which provides both lower and upper bounds for the true error. We validate the analysis and illustrate the effectiveness of the new scheme on numerical simulations for material defects.

preprint2020arXiv

A Study of Geometry in Anisotropic Quantum Hall States by Principal Component Analysis

In the presence of mass anisotropy, anisotropic interaction, or in-plane magnetic field, quantum Hall droplets can exhibit shape deformation and internal geometrical degree of freedom. We characterize the geometry of quantum Hall states by principal component analysis, which is a statistical technique that emphasizes variation in a dataset. We first test the method in an integer quantum Hall droplet with dipole-dipole interaction in disk geometry. In the subsequent application to fractional quantum Hall systems with anisotropic Coulomb interaction in torus geometry, we demonstrate that the principal component analysis can quantify the metric degree of freedom and predict the collapse of a $ν= 1/3$ state. We also calculate the metric response to interaction anisotropy at filling fractions $ν= 1/5$ and $2/5$ and show that the response is roughly the same within the same Jain sequence, but can differ at large anisotropy for different sequences.

preprint2020arXiv

A Two-Stream Meticulous Processing Network for Retinal Vessel Segmentation

Vessel segmentation in fundus is a key diagnostic capability in ophthalmology, and there are various challenges remained in this essential task. Early approaches indicate that it is often difficult to obtain desirable segmentation performance on thin vessels and boundary areas due to the imbalance of vessel pixels with different thickness levels. In this paper, we propose a novel two-stream Meticulous-Processing Network (MP-Net) for tackling this problem. To pay more attention to the thin vessels and boundary areas, we firstly propose an efficient hierarchical model automatically stratifies the ground-truth masks into different thickness levels. Then a novel two-stream adversarial network is introduced to use the stratification results with a balanced loss function and an integration operation to achieve a better performance, especially in thin vessels and boundary areas detecting. Our model is proved to outperform state-of-the-art methods on DRIVE, STARE, and CHASE_DB1 datasets.

preprint2020arXiv

ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction

Molecular property prediction (e.g., energy) is an essential problem in chemistry and biology. Unfortunately, many supervised learning methods usually suffer from the problem of scarce labeled molecules in the chemical space, where such property labels are generally obtained by Density Functional Theory (DFT) calculation which is extremely computational costly. An effective solution is to incorporate the unlabeled molecules in a semi-supervised fashion. However, learning semi-supervised representation for large amounts of molecules is challenging, including the joint representation issue of both molecular essence and structure, the conflict between representation and property leaning. Here we propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules. Specifically, ASGN adopts a teacher-student framework. In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution. Then in the student model, we target at property prediction task to deal with the learning loss conflict. At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning. We conduct extensive experiments on several public datasets. Experimental results show the remarkable performance of our ASGN framework.

preprint2020arXiv

Authenticating On-Body IoT Devices: An Adversarial Learning Approach

By adding users as a new dimension to connectivity, on-body Internet-of-Things (IoT) devices have gained considerable momentum in recent years, while raising serious privacy and safety issues. Existing approaches to authenticate these devices limit themselves to dedicated sensors or specified user motions, undermining their widespread acceptance. This paper overcomes these limitations with a general authentication solution by integrating wireless physical layer (PHY) signatures with upper-layer protocols. The key enabling techniques are constructing representative radio propagation profiles from received signals, and developing an adversarial multi-player neural network to accurately recognize underlying radio propagation patterns and facilitate on-body device authentication. Once hearing a suspicious transmission, our system triggers a PHY-based challenge-response protocol to defend in depth against active attacks. We prove that at equilibrium, our adversarial model can extract all information about propagation patterns and eliminate any irrelevant information caused by motion variances and environment changes. We build a prototype of our system using Universal Software Radio Peripheral (USRP) devices and conduct extensive experiments with various static and dynamic body motions in typical indoor and outdoor environments. The experimental results show that our system achieves an average authentication accuracy of 91.6%, with a high area under the receiver operating characteristic curve (AUROC) of 0.96 and a better generalization performance compared with the conventional non-adversarial approach.

preprint2020arXiv

Benchmarking a $(μ+λ)$ Genetic Algorithm with Configurable Crossover Probability

We investigate a family of $(μ+λ)$ Genetic Algorithms (GAs) which creates offspring either from mutation or by recombining two randomly chosen parents. By scaling the crossover probability, we can thus interpolate from a fully mutation-only algorithm towards a fully crossover-based GA. We analyze, by empirical means, how the performance depends on the interplay of population size and the crossover probability. Our comparison on 25 pseudo-Boolean optimization problems reveals an advantage of crossover-based configurations on several easy optimization tasks, whereas the picture for more complex optimization problems is rather mixed. Moreover, we observe that the ``fast&#39;&#39; mutation scheme with its are power-law distributed mutation strengths outperforms standard bit mutation on complex optimization tasks when it is combined with crossover, but performs worse in the absence of crossover. We then take a closer look at the surprisingly good performance of the crossover-based $(μ+λ)$ GAs on the well-known LeadingOnes benchmark problem. We observe that the optimal crossover probability increases with increasing population size $μ$. At the same time, it decreases with increasing problem dimension, indicating that the advantages of the crossover are not visible in the asymptotic view classically applied in runtime analysis. We therefore argue that a mathematical investigation for fixed dimensions might help us observe effects which are not visible when focusing exclusively on asymptotic performance bounds.

preprint2020arXiv

Black Hole Mass Function of Coalescing Neutron Star-Black Hole Binary Systems: The Prospect of Reconstruction with the Gravitational Wave Observations

The discovery of gravitational waves from compact objects coalescence opens a brand-new window to observe the universe. With more events being detected in the future, statistical examinations would be essential to better understand the underlying astrophysical processes. In this work we investigate the prospect of measuring the mass function of black holes that are merging with the neutron stars. Applying Bayesian parameter estimation for hundreds of simulated neutron star$-$black hole (NSBH) mergers, we find that the parameters for most of the injected events can be well recovered. We also take a Bayesian hierarchical model to reconstruct the population properties of the masses of black holes, in the presence of a low mass gap, both the mass gap and power-law index ($α$) of black hole mass function can be well measured, thus we can reveal where the $α$ is different for binary black hole (BBH) and NSBH systems. In the absence of a low mass gap, the gravitational wave data as well as the electromagnetic data can be used to pin down the nature of the merger event and then measure the mass of these very light black holes. However, as a result of the misclassification of BBH into NSBH, the measurement of $α$ is more challenging and further dedicated efforts are needed.

preprint2020arXiv

Blockchain-based Privacy Preservation for 5G-enabled Drone Communications

5G-enabled drones have potential applications in a variety of both military and civilian settings (e.g., monitoring and tracking of individuals in demonstrations and/or enforcing of social / physical distancing during pandemics such as COVID-19). Such applications generally involve the collection and dissemination of (massive) data from the drones to remote data centres for storage and analysis, for example via 5G networks. Consequently, there are security and privacy considerations underpinning 5G-enabled drone communications. We posit the potential of leveraging blockchain to facilitate privacy preservation, and therefore in this article we will review existing blockchain-based solutions after introducing the architecture for 5G-enabled drone communications and blockchain. We will also review existing legislation and data privacy regulations that need to be considered in the design of blockchain-based solutions, as well as identifying potential challenges and open issues which will hopefully inform future research agenda.

preprint2020arXiv

Causal Discovery from Incomplete Data: A Deep Learning Approach

As systems are getting more autonomous with the development of artificial intelligence, it is important to discover the causal knowledge from observational sensory inputs. By encoding a series of cause-effect relations between events, causal networks can facilitate the prediction of effects from a given action and analyze their underlying data generation mechanism. However, missing data are ubiquitous in practical scenarios. Directly performing existing casual discovery algorithms on partially observed data may lead to the incorrect inference. To alleviate this issue, we proposed a deep learning framework, dubbed Imputated Causal Learning (ICL), to perform iterative missing data imputation and causal structure discovery. Through extensive simulations on both synthetic and real data, we show that ICL can outperform state-of-the-art methods under different missing data mechanisms.

preprint2020arXiv

Continuously Indexed Domain Adaptation

Existing domain adaptation focuses on transferring knowledge between domains with categorical indices (e.g., between datasets A and B). However, many tasks involve continuously indexed domains. For example, in medical applications, one often needs to transfer disease analysis and prediction across patients of different ages, where age acts as a continuous domain index. Such tasks are challenging for prior domain adaptation methods since they ignore the underlying relation among domains. In this paper, we propose the first method for continuously indexed domain adaptation. Our approach combines traditional adversarial adaptation with a novel discriminator that models the encoding-conditioned domain index distribution. Our theoretical analysis demonstrates the value of leveraging the domain index to generate invariant features across a continuous range of domains. Our empirical results show that our approach outperforms the state-of-the-art domain adaption methods on both synthetic and real-world medical datasets.

preprint2020arXiv

Deep Graph Random Process for Relational-Thinking-Based Speech Recognition

Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts. Such mental processes are difficult to model in real-world problems such as in conversational automatic speech recognition (ASR), as the percepts (if they are modelled as graphs indicating relationships among utterances) are supposed to be innumerable and not directly observable. In this paper, we present a Bayesian nonparametric deep learning method called deep graph random process (DGP) that can generate an infinite number of probabilistic graphs representing percepts. We further provide a closed-form solution for coupling and transformation of these percept graphs for acoustic modeling. Our approach is able to successfully infer relations among utterances without using any relational data during training. Experimental evaluations on ASR tasks including CHiME-2 and CHiME-5 demonstrate the effectiveness and benefits of our method.

preprint2020arXiv

Deep Hierarchical Classification for Category Prediction in E-commerce System

In e-commerce system, category prediction is to automatically predict categories of given texts. Different from traditional classification where there are no relations between classes, category prediction is reckoned as a standard hierarchical classification problem since categories are usually organized as a hierarchical tree. In this paper, we address hierarchical category prediction. We propose a Deep Hierarchical Classification framework, which incorporates the multi-scale hierarchical information in neural networks and introduces a representation sharing strategy according to the category tree. We also define a novel combined loss function to punish hierarchical prediction losses. The evaluation shows that the proposed approach outperforms existing approaches in accuracy.

preprint2020arXiv

Deep Learning based Denoise Network for CSI Feedback in FDD Massive MIMO Systems

Channel state information (CSI) feedback is critical for frequency division duplex (FDD) massive multi-input multi-output (MIMO) systems. Most conventional algorithms are based on compressive sensing (CS) and are highly dependent on the level of channel sparsity. To address the issue, a recent approach adopts deep learning (DL) to compress CSI into a codeword with low dimensionality, which has shown much better performance than the CS algorithms when feedback link is perfect. In practical scenario, however, there exists various interference and non-linear effect. In this article, we design a DL-based denoise network, called DNNet, to improve the performance of channel feedback. Numerical results show that the DL-based feedback algorithm with the proposed DNNet has superior performance over the existing algorithms, especially at low signal-to-noise ratio (SNR).

preprint2020arXiv

Detection Defense Against Adversarial Attacks with Saliency Map

It is well established that neural networks are vulnerable to adversarial examples, which are almost imperceptible on human vision and can cause the deep models misbehave. Such phenomenon may lead to severely inestimable consequences in the safety and security critical applications. Existing defenses are trend to harden the robustness of models against adversarial attacks, e.g., adversarial training technology. However, these are usually intractable to implement due to the high cost of re-training and the cumbersome operations of altering the model architecture or parameters. In this paper, we discuss the saliency map method from the view of enhancing model interpretability, it is similar to introducing the mechanism of the attention to the model, so as to comprehend the progress of object identification by the deep networks. We then propose a novel method combined with additional noises and utilize the inconsistency strategy to detect adversarial examples. Our experimental results of some representative adversarial attacks on common datasets including ImageNet and popular models show that our method can detect all the attacks with high detection success rate effectively. We compare it with the existing state-of-the-art technique, and the experiments indicate that our method is more general.

preprint2020arXiv

Does &#34;Fans Economy&#34; Work for Chinese Pop Music Industry?

China has become one of the largest entertainment markets in the world in recent years. Due to the success of Xiaomi, many Chinese pop music industry entrepreneurs believe &#34;Fans Economy&#34; works in the pop music industry. &#34;Fans Economy&#34; is based on the assumption that pop music consumer market could be segmented based on artists. Each music artist has its own exclusive loyal fans. In this paper, we provide an insightful study of the pop music artists and fans social network. Particularly, we segment the pop music consumer market and pop music artists respectively. Our results show that due to the Matthew Effect and limited diversity of consumer market, &#34;Fans Economy&#34; does not work for the Chinese pop music industry.

preprint2020arXiv

Domain-specific Communication Optimization for Distributed DNN Training

Communication overhead poses an important obstacle to distributed DNN training and draws increasing attention in recent years. Despite continuous efforts, prior solutions such as gradient compression/reduction, compute/communication overlapping and layer-wise flow scheduling, etc., are still coarse-grained and insufficient for an efficient distributed training especially when the network is under pressure. We present DLCP, a novel solution exploiting the domain-specific properties of deep learning to optimize communication overhead of DNN training in a fine-grained manner. At its heart, DLCP comprises of several key innovations beyond prior work: e.g., it exploits {\em bounded loss tolerance} of SGD-based training to improve tail communication latency which cannot be avoided purely through gradient compression. It then performs fine-grained packet-level prioritization and dropping, as opposed to flow-level scheduling, based on layers and magnitudes of gradients to further speedup model convergence without affecting accuracy. In addition, it leverages inter-packet order-independency to perform per-packet load balancing without causing classical re-ordering issues. DLCP works with both Parameter Server and collective communication routines. We have implemented DLCP with commodity switches, integrated it with various training frameworks including TensorFlow, MXNet and PyTorch, and deployed it in our small-scale testbed with 10 Nvidia V100 GPUs. Our testbed experiments and large-scale simulations show that DLCP delivers up to $84.3\%$ additional training acceleration over the best existing solutions.

preprint2020arXiv

Engine and Aftertreatment Co-Optimization of Connected HEVs via Multi-Range Vehicle Speed Planning and Prediction

Connected vehicles (CVs) have situational awareness that can be exploited for control and optimization of the powertrain system. While extensive studies have been carried out for energy efficiency improvement of CVs via eco-driving and planning, the implication of such technologies on the thermal responses of CVs has not been fully investigated. One of the key challenges in leveraging connectivity for optimization-based thermal management of CVs is the relatively slow thermal dynamics, which necessitate the use of a long prediction horizon to achieve the best performance. Long-term prediction of the CV speed, unlike the V2V/V2I-based short-range prediction, is difficult and error-prone. The multiple timescales inherent to power and thermal systems call for a variable timescale optimization framework with access to short- and long-term vehicle speed preview. To this end, a model predictive controller (MPC) with a multi-range speed preview for integrated power and thermal management (iPTM) of connected hybrid electric vehicles (HEVs) is presented in this paper. The MPC is formulated to manage the power-split between the engine and the battery while enforcing the power and thermal (engine coolant and catalytic converter temperatures) constraints. The MPC exploits prediction and optimization over a shorter receding horizon and longer shrinking horizon. Over the longer shrinking horizon, the vehicle speed estimation is based on the data collected from the connected vehicles traveling on the same route as the ego-vehicle. Simulation results of applying the MPC over real-world urban driving cycles in Ann Arbor, MI are presented to demonstrate the effectiveness and fuel-saving potentials of the proposed iPTM strategy under the uncertainty associated with long-term predictions of the CV&#39;s speed.

preprint2020arXiv

Extinction and quasi-stationarity for discrete-time, endemic SIS and SIR models

Stochastic discrete-time SIS and SIR models of endemic diseases are introduced and analyzed. For the deterministic, mean-field model, the basic reproductive number $R_0$ determines their global dynamics. If $R_0\le 1$, then the frequency of infected individuals asymptotically converges to zero. If $R_0>1$, then the infectious class uniformly persists for all time; conditions for a globally stable, endemic equilibrium are given. In contrast, the infection goes extinct in finite time with probability one in the stochastic models for all $R_0$ values. To understand the length of the transient prior to extinction as well as the behavior of the transients, the quasi-stationary distributions and the associated mean time to extinction are analyzed using large deviation methods. When $R_0>1$, these mean times to extinction are shown to increase exponentially with the population size $N$. Moreover, as $N$ approaches $\infty$, the quasi-stationary distributions are supported by a compact set bounded away from extinction; sufficient conditions for convergence to a Dirac measure at the endemic equilibrium of the deterministic model are also given. In contrast, when $R_0<1$, the mean times to extinction are bounded above $1/(1-α)$ where $α<1$ is the geometric rate of decrease of the infection when rare; as $N$ approaches $\infty$, the quasi-stationary distributions converge to a Dirac measure at the disease-free equilibrium for the deterministic model. For several special cases, explicit formulas for approximating the quasi-stationary distribution and the associated mean extinction are given. These formulas illustrate how for arbitrarily small $R_0$ values, the mean time to extinction can be arbitrarily large, and how for arbitrarily large $R_0$ values, the mean time to extinction can be arbitrarily large.

preprint2020arXiv

FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval

In this paper, we address the text and image matching in cross-modal retrieval of the fashion industry. Different from the matching in the general domain, the fashion matching is required to pay much more attention to the fine-grained information in the fashion images and texts. Pioneer approaches detect the region of interests (i.e., RoIs) from images and use the RoI embeddings as image representations. In general, RoIs tend to represent the &#34;object-level&#34; information in the fashion images, while fashion texts are prone to describe more detailed information, e.g. styles, attributes. RoIs are thus not fine-grained enough for fashion text and image matching. To this end, we propose FashionBERT, which leverages patches as image features. With the pre-trained BERT model as the backbone network, FashionBERT learns high level representations of texts and images. Meanwhile, we propose an adaptive loss to trade off multitask learning in the FashionBERT modeling. Two tasks (i.e., text and image matching and cross-modal retrieval) are incorporated to evaluate FashionBERT. On the public dataset, experiments demonstrate FashionBERT achieves significant improvements in performances than the baseline and state-of-the-art approaches. In practice, FashionBERT is applied in a concrete cross-modal retrieval application. We provide the detailed matching performance and inference efficiency analysis.

preprint2020arXiv

Fully Memristive Spiking-Neuron Learning Framework and its Applications on Pattern Recognition and Edge Detection

Fully memristive spiking-neuron learning framework, which uses drift and diffusion memristor models as axon and dendrite respectively, becomes a hot topic recently with the development of memristor devices. Normally, some other devices like resistor or capacitor are still necessary on recent works of fully memristive learning framework. However, theoretically, one neuron needs axon and dendrite only, which makes technique process simpler and learning framework more similar to biologic brain. In this paper, a fully memristive spiking-neuron learning framework is introduced, in which a neuron structure is just built of one drift and one diffusion memristive models. To verify it merits, a feedforward neural network for pattern recognition and a cellular neural network for edge detection are designed. Experiment results show that compared to other memristive neural networks, our framework&#39;s the processing speed is much faster and the hardware resource is saved in pattern recognition due to its simple structure. Further due to the dynamic filtering function of diffusion memristor model in our learning framework, its peak signal noise ratio (PSNR) is much higher than traditional algorithms in edge detection.

preprint2020arXiv

Game-Theoretical Analysis of Mining Strategy for Bitcoin-NG Blockchain Protocol

Bitcoin-NG, a scalable blockchain protocol, divides each block into a key block and many micro blocks to effectively improve the transaction processing capacity. Bitcoin-NG has a special incentive mechanism (i.e. splitting transaction fees to the current and the next leader) to maintain its security. However, this design of the incentive mechanism ignores the joint effect of transaction fees, mint coins and mining duration lengths on the expected mining reward. In this paper, we identify the advanced mining attack that deliberately ignores micro blocks to enlarge the mining duration length to increase the likelihood of winning the mining race. We first show that an advanced mining attacker can maximize its expected reward by optimizing its mining duration length. We then formulate a game-theoretical model in which multiple mining players perform advanced mining to compete with each other. We analyze the Nash equilibrium for the mining game. Our analytical and simulation results indicate that all mining players in the mining game converge to having advanced mining at the equilibrium and have no incentives for deviating from the equilibrium; the transaction processing capability of the Bitcoin-NG network at the equilibrium is decreased by advanced mining. Therefore, we conclude that the Bitcoin-NG blockchain protocol is vulnerable to advanced mining attack. We discuss how to reduce the negative impact of advanced mining for Bitcoin-NG.

preprint2020arXiv

HCGrid: A Convolution-based Gridding Framework for RadioAstronomy in Hybrid Computing Environments

Gridding operation, which is to map non-uniform data samples onto a uniformly distributedgrid, is one of the key steps in radio astronomical data reduction process. One of the mainbottlenecks of gridding is the poor computing performance, and a typical solution for suchperformance issue is the implementation of multi-core CPU platforms. Although such amethod could usually achieve good results, in many cases, the performance of gridding is stillrestricted to an extent due to the limitations of CPU, since the main workload of gridding isa combination of a large number of single instruction, multi-data-stream operations, which ismore suitable for GPU, rather than CPU implementations. To meet the challenge of massivedata gridding for the modern large single-dish radio telescopes, e.g., the Five-hundred-meterAperture Spherical radio Telescope (FAST), inspired by existing multi-core CPU griddingalgorithms such as Cygrid, here we present an easy-to-install, high-performance, and open-source convolutional gridding framework, HCGrid,in CPU-GPU heterogeneous platforms. Itoptimises data search by employing multi-threading on CPU, and accelerates the convolutionprocess by utilising massive parallelisation of GPU. In order to make HCGrid a more adaptivesolution, we also propose the strategies of thread organisation and coarsening, as well as optimalparameter settings under various GPU architectures. A thorough analysis of computing timeand performance gain with several GPU parallel optimisation strategies show that it can leadto excellent performance in hybrid computing environments.

preprint2020arXiv

High Dimensional Bayesian Optimization Assisted by Principal Component Analysis

Bayesian Optimization (BO) is a surrogate-assisted global optimization technique that has been successfully applied in various fields, e.g., automated machine learning and design optimization. Built upon a so-called infill-criterion and Gaussian Process regression (GPR), the BO technique suffers from a substantial computational complexity and hampered convergence rate as the dimension of the search spaces increases. Scaling up BO for high-dimensional optimization problems remains a challenging task. In this paper, we propose to tackle the scalability of BO by hybridizing it with a Principal Component Analysis (PCA), resulting in a novel PCA-assisted BO (PCA-BO) algorithm. Specifically, the PCA procedure learns a linear transformation from all the evaluated points during the run and selects dimensions in the transformed space according to the variability of evaluated points. We then construct the GPR model, and the infill-criterion in the space spanned by the selected dimensions. We assess the performance of our PCA-BO in terms of the empirical convergence rate and CPU time on multi-modal problems from the COCO benchmark framework. The experimental results show that PCA-BO can effectively reduce the CPU time incurred on high-dimensional problems, and maintains the convergence rate on problems with an adequate global structure. PCA-BO therefore provides a satisfactory trade-off between the convergence rate and computational efficiency opening new ways to benefit from the strength of BO approaches in high dimensional numerical optimization.

preprint2020arXiv

History-dependent percolation on multiplex networks

The structure of interconnected systems and its impact on the system dynamics is a much-studied cross-disciplinary topic. Although various critical phenomena have been found in different models, the study on the connections between different percolation transitions is still lacking. Here we propose a unified framework to study the origins of the discontinuous transitions of the percolation process on interacting networks. The model evolves in generations with the result of the present percolation depending on the previous state and thus is history-dependent. Both theoretical analysis and Monte Carlo simulations reveal that the nature of the transition remains the same at finite generations but exhibits an abrupt change for the infinite generation. We use brain functional correlation and morphological similarity data to show that our model also provides a general method to explore the network structure and can contribute to many practical applications, such as detecting the abnormal structures of human brain networks.

preprint2020arXiv

Inexact Sequential Quadratic Optimization with Penalty Parameter Updates Within the QP Solve: Extended Version

This paper focuses on the design of sequential quadratic optimization (commonly known as SQP) methods for solving large-scale nonlinear optimization problems. The most computationally demanding aspect of such an approach is the computation of the search direction during each iteration, for which we consider the use of matrix-free methods. In particular, we develop a method that requires an inexact solve of a single QP subproblem to establish the convergence of the overall SQP method. It is known that SQP methods can be plagued by poor behavior of the global convergence mechanism. To confront this issue, we propose the use of an exact penalty function with a dynamic penalty parameter updating strategy to be employed within the subproblem solver in such a way that the resulting search direction predicts progress toward both feasibility and optimality. We present our parameter updating strategy and prove that, under reasonable assumptions, the strategy does not modify the penalty parameter unnecessarily. We also discuss a matrix-free subproblem solver in which our updating strategy can be incorporated. We close the paper with a discussion of the results of numerical experiments that illustrate the benefits of our proposed techniques.

preprint2020arXiv

Integrated Power and Thermal Management of Connected HEVs via Multi-Horizon MPC

In this paper, a multi-horizon model predictive controller (MH-MPC) is developed for integrated power and thermal management (iPTM) of a power-split hybrid electric vehicle (HEV). The proposed MH-MPC leverages an accurate short-horizon vehicle speed preview and an approximate forecast over a longer shrinking horizon till the end of the driving cycle. This multiple-horizon scheme is developed to cope with fast and slow dynamics associated with power and thermal responses. The main objective of the proposed MH-MPC is to minimize fuel consumption and enforce the power and thermal constraints on the battery state-of-charge and engine coolant temperature, while meeting the driving (traction) and cabin air conditioning (heating) demands. The proposed MH-MPC allows for exploiting the engine coolant as thermal energy storage, providing more flexibility for the HEV energy flow optimization. The simulation results show that the proposed MH-MPC provides near-optimal results in reference to the Dynamic Programming (DP) solution with an affordable computational cost. Moreover, compared with a more conventional MPC strategy, the MH-MPC can leverage the speed previews with different resolutions effectively to achieve the desired performance with satisfactory robustness.

preprint2020arXiv

Intrinsic Valley Polarization and Anomalous Valley Hall Effect in Single-Layer 2H-FeCl2

Valley, as a new degree of freedom for electrons, has drawn considerable attention due to its significant potential for encoding and storing information. Lifting the energy degeneracy to achieve valley polarization is necessary for realizing valleytronic devices. Here, on the basis of first-principles calculations, we show that single-layer FeCl2 exhibits a large spontaneous valley polarization (~101 meV) arising from the broken time-reversal symmetry and spin-orbital coupling, which can be continuously tuned by varying the direction of magnetic crystalline. By employing the perturbation theory, the underlying physical mechanism is unveiled. Moreover, the coupling between valley degree of freedom and ferromagnetic order could generate a spin- and valley-polarized anomalous Hall current in the presence of the in-plane electric field, facilitating its experimental exploration and practical applications.

preprint2020arXiv

Learning Spatial Attention for Face Super-Resolution

General image super-resolution techniques have difficulties in recovering detailed face structures when applying to low resolution face images. Recent deep learning based methods tailored for face images have achieved improved performance by jointly trained with additional task such as face parsing and landmark prediction. However, multi-task learning requires extra manually labeled data. Besides, most of the existing works can only generate relatively low resolution face images (e.g., $128\times128$), and their applications are therefore limited. In this paper, we introduce a novel SPatial Attention Residual Network (SPARNet) built on our newly proposed Face Attention Units (FAUs) for face super-resolution. Specifically, we introduce a spatial attention mechanism to the vanilla residual blocks. This enables the convolutional layers to adaptively bootstrap features related to the key face structures and pay less attention to those less feature-rich regions. This makes the training more effective and efficient as the key face structures only account for a very small portion of the face image. Visualization of the attention maps shows that our spatial attention network can capture the key face structures well even for very low resolution faces (e.g., $16\times16$). Quantitative comparisons on various kinds of metrics (including PSNR, SSIM, identity similarity, and landmark detection) demonstrate the superiority of our method over current state-of-the-arts. We further extend SPARNet with multi-scale discriminators, named as SPARNetHD, to produce high resolution results (i.e., $512\times512$). We show that SPARNetHD trained with synthetic data cannot only produce high quality and high resolution outputs for synthetically degraded face images, but also show good generalization ability to real world low quality face images.

preprint2020arXiv

Learning-based Computer-aided Prescription Model for Parkinson&#39;s Disease: A Data-driven Perspective

In this paper, we study a novel problem: &#34;automatic prescription recommendation for PD patients.&#34; To realize this goal, we first build a dataset by collecting 1) symptoms of PD patients, and 2) their prescription drug provided by neurologists. Then, we build a novel computer-aided prescription model by learning the relation between observed symptoms and prescription drug. Finally, for the new coming patients, we could recommend (predict) suitable prescription drug on their observed symptoms by our prescription model. From the methodology part, our proposed model, namely Prescription viA Learning lAtent Symptoms (PALAS), could recommend prescription using the multi-modality representation of the data. In PALAS, a latent symptom space is learned to better model the relationship between symptoms and prescription drug, as there is a large semantic gap between them. Moreover, we present an efficient alternating optimization method for PALAS. We evaluated our method using the data collected from 136 PD patients at Nanjing Brain Hospital, which can be regarded as a large dataset in PD research community. The experimental results demonstrate the effectiveness and clinical potential of our method in this recommendation task, if compared with other competing methods.

preprint2020arXiv

Liquid crystal phases with unusual structures and physical properties formed by acute-angle bent-core molecules

Liquid crystals formed by acute-angle bent-core (ABC) molecules with a 1,7 naphthalene central core show an intriguing phase behavior with the nematic phase accompanied by poorly understood additional phases. In this work, we characterize the physical properties of an ABC material, such as birefringence, dielectric permittivities, elastic constants, and surface alignment, and present X-ray diffraction and transmission electron microscopy studies of their ordering. The ABC molecular shape resembling the letter $λ$ yields a very small splay elastic constant in the uniaxial nematic phase and results in the formation of a tetragonal positionally ordered columnar phase consisting of molecular columns with a uniform uniaxial director that can be bent but not splayed.

preprint2020arXiv

Medusa: Blockchain Powered Log Storage System

Blockchain is one of the most heavily invested technologies in recent years. Due to its tamper-proof and decentralization properties, blockchain has become an ideal utility for data storage that is applicable in many real world industrial scenarios. One important scenario is web log, which is treated as sources of technical significance and commercial revenues in major internet companies. In this paper, we illustrate our design of a web log storage system based on HyperLedger. HyperLedger yields higher throughput and lower latency compared with other blockchain systems. Alongside its efficiency advantages, HyperLeger is a permissioned blockchain, which is an ideal fit for enterprise software design scenario.

preprint2020arXiv

On the Robustness of Information-Theoretic Privacy Measures and Mechanisms

Consider a data publishing setting for a dataset composed by both private and non-private features. The publisher uses an empirical distribution, estimated from $n$ i.i.d. samples, to design a privacy mechanism which is applied to new fresh samples afterward. In this paper, we study the discrepancy between the privacy-utility guarantees for the empirical distribution, used to design the privacy mechanism, and those for the true distribution, experienced by the privacy mechanism in practice. We first show that, for any privacy mechanism, these discrepancies vanish at speed $O(1/\sqrt{n})$ with high probability. These bounds follow from our main technical results regarding the Lipschitz continuity of the considered information leakage measures. Then we prove that the optimal privacy mechanisms for the empirical distribution approach the corresponding mechanisms for the true distribution as the sample size $n$ increases, thereby establishing the statistical consistency of the optimal privacy mechanisms. Finally, we introduce and study uniform privacy mechanisms which, by construction, provide privacy to all the distributions within a neighborhood of the estimated distribution and, thereby, guarantee privacy for the true distribution with high probability.

preprint2020arXiv

Polarization Transfer from the Twisted Light to an Atom

When polarized light is absorbed by an atom, the excited atomic system carries information about the initial polarization of light. For the light that carries an orbital angular momentum, or the twisted light, the polarization states are described by eight independent parameters, as opposed to three Stokes parameters for plane waves. We use a parameterization of the spin-density matrix of the twisted light in terms of vector and tensor polarization, in analogy with massive spin-1 particles, and derive formulae that define atom&#39;s response to specific polarization components of the twisted light. It is shown that for dipole ($S\to P$) atomic transitions, the atom&#39;s polarization is in one-to-one correspondence with polarization of the incident light; this relation is violated, however, for the transitions of higher multipolarity ($S\to D$, $S\to F$, etc.) We pay special attention to contributions of the longitudinal electric field into the matrix elements of atomic transitions.

preprint2020arXiv

Privacy with Estimation Guarantees

We study the central problem in data privacy: how to share data with an analyst while providing both privacy and utility guarantees to the user that owns the data. In this setting, we present an estimation-theoretic analysis of the privacy-utility trade-off (PUT). Here, an analyst is allowed to reconstruct (in a mean-squared error sense) certain functions of the data (utility), while other private functions should not be reconstructed with distortion below a certain threshold (privacy). We demonstrate how chi-square information captures the fundamental PUT in this case and provide bounds for the best PUT. We propose a convex program to compute privacy-assuring mappings when the functions to be disclosed and hidden are known a priori and the data distribution is known. We derive lower bounds on the minimum mean-squared error of estimating a target function from the disclosed data and evaluate the robustness of our approach when an empirical distribution is used to compute the privacy-assuring mappings instead of the true data distribution. We illustrate the proposed approach through two numerical experiments.

preprint2020arXiv

Robust and Precise Vehicle Localization based on Multi-sensor Fusion in Diverse City Scenes

We present a robust and precise localization system that achieves centimeter-level localization accuracy in disparate city scenes. Our system adaptively uses information from complementary sensors such as GNSS, LiDAR, and IMU to achieve high localization accuracy and resilience in challenging scenes, such as urban downtown, highways, and tunnels. Rather than relying only on LiDAR intensity or 3D geometry, we make innovative use of LiDAR intensity and altitude cues to significantly improve localization system accuracy and robustness. Our GNSS RTK module utilizes the help of the multi-sensor fusion framework and achieves a better ambiguity resolution success rate. An error-state Kalman filter is applied to fuse the localization measurements from different sources with novel uncertainty estimation. We validate, in detail, the effectiveness of our approaches, achieving 5-10cm RMS accuracy and outperforming previous state-of-the-art systems. Importantly, our system, while deployed in a large autonomous driving fleet, made our vehicles fully autonomous in crowded city streets despite road construction that occurred from time to time. A dataset including more than 60 km real traffic driving in various urban roads is used to comprehensively test our system.

preprint2020arXiv

Secret Sharing based Secure Regressions with Applications

Nowadays, the utilization of the ever expanding amount of data has made a huge impact on web technologies while also causing various types of security concerns. On one hand, potential gains are highly anticipated if different organizations could somehow collaboratively share their data for technological improvements. On the other hand, data security concerns may arise for both data holders and data providers due to commercial or sociological concerns. To make a balance between technical improvements and security limitations, we implement secure and scalable protocols for multiple data holders to train linear regression and logistic regression models. We build our protocols based on the secret sharing scheme, which is scalable and efficient in applications. Moreover, our proposed paradigm can be generalized to any secure multiparty training scenarios where only matrix summation and matrix multiplications are used. We demonstrate our approach by experiments which shows the scalability and efficiency of our proposed protocols, and finally present its real-world applications.

preprint2020arXiv

Sequential vs. Integrated Algorithm Selection and Configuration: A Case Study for the Modular CMA-ES

When faced with a specific optimization problem, choosing which algorithm to use is always a tough task. Not only is there a vast variety of algorithms to select from, but these algorithms often are controlled by many hyperparameters, which need to be tuned in order to achieve the best performance possible. Usually, this problem is separated into two parts: algorithm selection and algorithm configuration. With the significant advances made in Machine Learning, however, these problems can be integrated into a combined algorithm selection and hyperparameter optimization task, commonly known as the CASH problem. In this work we compare sequential and integrated algorithm selection and configuration approaches for the case of selecting and tuning the best out of 4608 variants of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) tested on the Black Box Optimization Benchmark (BBOB) suite. We first show that the ranking of the modular CMA-ES variants depends to a large extent on the quality of the hyperparameters. This implies that even a sequential approach based on complete enumeration of the algorithm space will likely result in sub-optimal solutions. In fact, we show that the integrated approach manages to provide competitive results at a much smaller computational cost. We also compare two different mixed-integer algorithm configuration techniques, called irace and Mixed-Integer Parallel Efficient Global Optimization (MIP-EGO). While we show that the two methods differ significantly in their treatment of the exploration-exploitation balance, their overall performances are very similar.

preprint2020arXiv

Structural Multi-Colour Invisible Inks with Submicron 4D Printing of Shape Memory Polymers

Four-dimensional (4D) printing of shape memory polymer (SMP) imparts time responsive properties to 3D structures. Here, we explore 4D printing of a SMP in the submicron length scale, extending its applications to nanophononics. We report a new SMP photoresist based on Vero Clear achieving print features at a resolution of ~300 nm half pitch using two-photon polymerization lithography (TPL). Prints consisting of grids with size-tunable multi-colours enabled the study of shape memory effects to achieve large visual shifts through nanoscale structure deformation. As the nanostructures are flattened, the colours and printed information become invisible. Remarkably, the shape memory effect recovers the original surface morphology of the nanostructures along with its structural colour within seconds of heating above its glass transition temperature. The high-resolution printing and excellent reversibility in both microtopography and optical properties promises a platform for temperature-sensitive labels, information hiding for anti-counterfeiting, and tunable photonic devices.

preprint2020arXiv

Structure-Aware Generation Network for Recipe Generation from Images

Sharing food has become very popular with the development of social media. For many real-world applications, people are keen to know the underlying recipes of a food item. In this paper, we are interested in automatically generating cooking instructions for food. We investigate an open research task of generating cooking instructions based on only food images and ingredients, which is similar to the image captioning task. However, compared with image captioning datasets, the target recipes are long-length paragraphs and do not have annotations on structure information. To address the above limitations, we propose a novel framework of Structure-aware Generation Network (SGN) to tackle the food recipe generation task. Our approach brings together several novel ideas in a systematic framework: (1) exploiting an unsupervised learning approach to obtain the sentence-level tree structure labels before training; (2) generating trees of target recipes from images with the supervision of tree structure labels learned from (1); and (3) integrating the inferred tree structures with the recipe generation procedure. Our proposed model can produce high-quality and coherent recipes, and achieve the state-of-the-art performance on the benchmark Recipe1M dataset.

preprint2020arXiv

Task-agnostic Temporally Consistent Facial Video Editing

Recent research has witnessed the advances in facial image editing tasks. For video editing, however, previous methods either simply apply transformations frame by frame or utilize multiple frames in a concatenated or iterative fashion, which leads to noticeable visual flickers. In addition, these methods are confined to dealing with one specific task at a time without any extensibility. In this paper, we propose a task-agnostic temporally consistent facial video editing framework. Based on a 3D reconstruction model, our framework is designed to handle several editing tasks in a more unified and disentangled manner. The core design includes a dynamic training sample selection mechanism and a novel 3D temporal loss constraint that fully exploits both image and video datasets and enforces temporal consistency. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.

preprint2020arXiv

Towards Dynamic Algorithm Selection for Numerical Black-Box Optimization: Investigating BBOB as a Use Case

One of the most challenging problems in evolutionary computation is to select from its family of diverse solvers one that performs well on a given problem. This algorithm selection problem is complicated by the fact that different phases of the optimization process require different search behavior. While this can partly be controlled by the algorithm itself, there exist large differences between algorithm performance. It can therefore be beneficial to swap the configuration or even the entire algorithm during the run. Long deemed impractical, recent advances in Machine Learning and in exploratory landscape analysis give hope that this dynamic algorithm configuration~(dynAC) can eventually be solved by automatically trained configuration schedules. With this work we aim at promoting research on dynAC, by introducing a simpler variant that focuses only on switching between different algorithms, not configurations. Using the rich data from the Black Box Optimization Benchmark~(BBOB) platform, we show that even single-switch dynamic Algorithm selection (dynAS) can potentially result in significant performance gains. We also discuss key challenges in dynAS, and argue that the BBOB-framework can become a useful tool in overcoming these.

preprint2020arXiv

Towards information-rich, logical text generation with knowledge-enhanced neural models

Text generation system has made massive promising progress contributed by deep learning techniques and has been widely applied in our life. However, existing end-to-end neural models suffer from the problem of tending to generate uninformative and generic text because they cannot ground input context with background knowledge. In order to solve this problem, many researchers begin to consider combining external knowledge in text generation systems, namely knowledge-enhanced text generation. The challenges of knowledge enhanced text generation including how to select the appropriate knowledge from large-scale knowledge bases, how to read and understand extracted knowledge, and how to integrate knowledge into generation process. This survey gives a comprehensive review of knowledge-enhanced text generation systems, summarizes research progress to solving these challenges and proposes some open issues and research directions.

preprint2020arXiv

U-net Based Direct-path Dominance Test for Robust Direction-of-arrival Estimation

It has been noted that the identification of the time-frequency bins dominated by the contribution from the direct propagation of the target speaker can significantly improve the robustness of the direction-of-arrival estimation. However, the correct extraction of the direct-path sound is challenging especially in adverse environments. In this paper, a U-net based direct-path dominance test method is proposed. Exploiting the efficient segmentation capability of the U-net architecture, the direct-path information can be effectively retrieved from a dedicated multi-task neural network. Moreover, the training and inference of the neural network only need the input of a single microphone, circumventing the problem of array-structure dependence faced by common end-to-end deep learning based methods. Simulations demonstrate that significantly higher estimation accuracy can be achieved in high reverberant and low signal-to-noise ratio environments.