Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
68works
0followers
30topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

68 published item(s)

preprint2026arXiv

RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking

Offline-to-online reinforcement learning (RL) improves sample efficiency by leveraging pre-collected datasets prior to online interaction. A key challenge, however, is learning an accurate critic in large state--action spaces with limited dataset coverage. To mitigate harmful updates from value overestimation, prior methods impose pessimism by down-weighting out-of-distribution (OOD) actions relative to dataset actions. While effective, this essentially acts as a behavior cloning anchor and can hinder downstream online policy improvement when dataset actions are suboptimal. We propose RankQ, an offline-to-online Q-learning objective that augments temporal-difference learning with a self-supervised multi-term ranking loss to enforce structured action ordering. By learning relative action preferences rather than uniformly penalizing unseen actions, RankQ shapes the Q-function such that action gradients are directed toward higher-quality behaviors. Across sparse reward D4RL benchmarks, RankQ achieves performance competitive with or superior to seven prior methods. In vision-based robot learning, RankQ enables effective offline-to-online fine-tuning of a pretrained vision-language-action (VLA) model in a low-data regime, achieving on average a 42.7% higher simulation success rate than the next best method. In a high-data setting, RankQ improves simulation performance by 13.7% over the next best method and achieves strong sim-to-real transfer, increasing real-world cube stacking success from 43.1% to 84.7% relative to the VLA's initial performance.

preprint2024arXiv

New research paradigms and agenda of human factors science in the intelligence era

This paper proposes the innovative concept of "human factors science" to characterize engineering psychology, human factors engineering, human-computer interaction, and other similar fields. Although the perspectives in these fields differ, they share a common approach: "human-centered design." In the AI era, the human-machine relationship presents a trans-era evolution to "human-AI teaming." The change has raised challenges for human factors science, compelling us to re-examine current research paradigms and agendas. Based on our previous work, this paper proposes three research paradigms: (1) human-AI joint cognitive systems: this regards an intelligent agent as a cognitive agent with a certain level of cognitive capabilities. A human-AI system can be characterized as a joint cognitive system in which humans and intelligent agents work as teammates for collaboration; (2) human-AI joint cognitive ecosystems: an intelligent ecosystem with multiple human-AI systems can be represented as a human-AI joint cognitive ecosystem. The overall performance of the ecosystem depends on optima collaboration and design across the multiple human-AI systems; (3) intelligent sociotechnical systems (iSTS): human-AI systems are design, developed, and deployed in an iSTS environment. The successful design, development, and deployment of a human-AI system within an iSTS environment depends on the synergistic optimization between the subsystems. This paper looks forward to the future research agenda of human factors science from three aspects: human-AI interaction, intelligent human-machine interface, and human-AI teaming. Analyses show that the three new research paradigms will benefit future research in human factors science. We believe the proposed research paradigms and the future research agenda will mutually promote each other, further advancing human factors science in the AI era.

preprint2024arXiv

Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance

Existing text-to-image editing methods tend to excel either in rigid or non-rigid editing but encounter challenges when combining both, resulting in misaligned outputs with the provided text prompts. In addition, integrating reference images for control remains challenging. To address these issues, we present a versatile image editing framework capable of executing both rigid and non-rigid edits, guided by either textual prompts or reference images. We leverage a dual-path injection scheme to handle diverse editing scenarios and introduce an integrated self-attention mechanism for fusion of appearance and structural information. To mitigate potential visual artifacts, we further employ latent fusion techniques to adjust intermediate latents. Compared to previous work, our approach represents a significant advance in achieving precise and versatile image editing. Comprehensive experiments validate the efficacy of our method, showcasing competitive or superior results in text-based editing and appearance transfer tasks, encompassing both rigid and non-rigid settings.

preprint2023arXiv

Secure Communication for Spatially Correlated Massive MIMO with Low-Resolution DACs

In this paper, the performance of a secure massive multiple-input multiple-output (MIMO) system adopting low-resolution digital-to-analog converters (DACs) is analyzed over spatially correlated wireless channels. A tight lower bound for the achievable secrecy rate is derived with artificial noise (AN) transmitted in the null space of the user channels. Using the analytical results, the impact of spatial correlation on the secrecy rate is explicitly evaluated in the presence of low-resolution DACs. The analytical observations reveal that using low-resolution DACs can be beneficial to the secrecy performance compared with ideal DACs, when the channels are strongly correlated and optimal power allocation is not employed.

preprint2023arXiv

Secure Communication for Spatially Correlated RIS-Aided Multiuser Massive MIMO Systems: Analysis and Optimization

This letter investigates the secure communication in a reconfigurable intelligent surface (RIS)-aided multiuser massive multiple-input multiple-output (MIMO) system exploiting artificial noise (AN). We first derive a closed-form expression of the ergodic secrecy rate under spatially correlated MIMO channels. By using this derived result, we further optimize the power fraction of AN in closed form and the RIS phase shifts by developing a gradient-based algorithm, which requires only statistical channel state information (CSI). Our analysis shows that spatial correlation at the RIS provides an additional dimension for optimizing the RIS phase shifts. Numerical simulations validate the analytical results which show the insightful interplay among the system parameters and the degradation of secrecy performance due to high spatial correlation at the RIS.

preprint2022arXiv

A Deep Finite Difference Emulator for the Fast Simulation of Coupled Viscous Burgers' Equation

This work proposes a deep learning-based emulator for the efficient computation of the coupled viscous Burgers' equation with random initial conditions. In a departure from traditional data-driven deep learning approaches, the proposed emulator does not require a classical numerical solver to collect training data. Instead, it makes direct use of the problem's physics. Specifically, the model emulates a second-order finite difference solver, i.e., the Crank-Nicolson scheme in learning dynamics. A systematic case study is conducted to examine the model's prediction performance, generalization ability, and computational efficiency. The computed results are graphically represented and compared to those of state-of-the-art numerical solvers.

preprint2022arXiv

An End-to-End Transformer Model for Crowd Localization

Crowd localization, predicting head positions, is a more practical and high-level task than simply counting. Existing methods employ pseudo-bounding boxes or pre-designed localization maps, relying on complex post-processing to obtain the head positions. In this paper, we propose an elegant, end-to-end Crowd Localization Transformer named CLTR that solves the task in the regression-based paradigm. The proposed method views the crowd localization as a direct set prediction problem, taking extracted features and trainable embeddings as input of the transformer-decoder. To reduce the ambiguous points and generate more reasonable matching results, we introduce a KMO-based Hungarian matcher, which adopts the nearby context as the auxiliary matching cost. Extensive experiments conducted on five datasets in various data settings show the effectiveness of our method. In particular, the proposed method achieves the best localization performance on the NWPU-Crowd, UCF-QNRF, and ShanghaiTech Part A datasets.

preprint2022arXiv

Cooperative Reflection and Synchronization Design for Distributed Multiple-RIS Communications

To reap the promised gain achieved by distributed reconfigurable intelligent surfaces (RISs)-enhanced communications in a wireless network, timing synchronization among these metasurfaces is an essential prerequisite in practice. This paper proposes a unified framework for the joint estimation of the unknown timing offsets and the RIS channel parameters, as well as the design of cooperative reflection and synchronization algorithm for the distributed multiple-RIS communication. Considering that RIS is usually a passive device with limited capability of signal processing, the individual timing offset and channel gains of each hop of the RIS links cannot be directly estimated. To make the estimation tractable, we propose to estimate the cascaded channels and timing offsets jointly by deriving a maximum likelihood estimator. Furthermore, we theoretically characterize the Cramer-Rao lower bound (CRLB) to evaluate the accuracy of this estimator. By using the proposed estimator and the derived CRLBs, an efficient resynchronization algorithm is devised jointly at the RISs and the destination to compensate the multiple timing offsets. Based on the majorization-minimization framework, the proposed algorithm admits semi-closed and closed form solutions for the RIS reflection matrices and the timing offset equalizer, respectively. Simulation results verify that our theoretical analysis well matches the numerical tests and validate the effectiveness of the proposed resynchronization algorithm.

preprint2022arXiv

Data Augmentation Empowered Neural Precoding for Multiuser MIMO with MMSE Model

Precoding design exploiting deep learning methods has been widely studied for multiuser multiple-input multiple-output (MU-MIMO) systems. However, conventional neural precoding design applies black-box-based neural networks which are less interpretable. In this paper, we propose a deep learning-based precoding method based on an interpretable design of a neural precoding network, namely iPNet. In particular, the iPNet mimics the classic minimum mean-squared error (MMSE) precoding and approximates the matrix inversion in the design of the neural network architecture. Specifically, the proposed iPNet consists of a model-driven component network, responsible for augmenting the input channel state information (CSI), and a data-driven sub-network, responsible for precoding calculation from this augmented CSI. The latter data-driven module is explicitly interpreted as an unsupervised learner of the MMSE precoder. Simulation results show that by exploiting the augmented CSI, the proposed iPNet achieves noticeable performance gain over existing black-box designs and also exhibits enhanced generalizability against CSI mismatches.

preprint2022arXiv

Deep CSI Compression for Massive MIMO: A Self-information Model-driven Neural Network

In order to fully exploit the advantages of massive multiple-input multiple-output (mMIMO), it is critical for the transmitter to accurately acquire the channel state information (CSI). Deep learning (DL)-based methods have been proposed for CSI compression and feedback to the transmitter. Although most existing DL-based methods consider the CSI matrix as an image, structural features of the CSI image are rarely exploited in neural network design. As such, we propose a model of self-information that dynamically measures the amount of information contained in each patch of a CSI image from the perspective of structural features. Then, by applying the self-information model, we propose a model-and-data-driven network for CSI compression and feedback, namely IdasNet. The IdasNet includes the design of a module of self-information deletion and selection (IDAS), an encoder of informative feature compression (IFC), and a decoder of informative feature recovery (IFR). In particular, the model-driven module of IDAS pre-compresses the CSI image by removing informative redundancy in terms of the self-information. The encoder of IFC then conducts feature compression to the pre-compressed CSI image and generates a feature codeword which contains two components, i.e., codeword values and position indices of the codeword values. Subsequently, the IFR decoder decouples the codeword values as well as position indices to recover the CSI image. Experimental results verify that the proposed IdasNet noticeably outperforms existing DL-based networks under various compression ratios while it has the number of network parameters reduced by orders-of-magnitude compared with various existing methods.

preprint2022arXiv

Deployment of long distance multi-moving robots for underground pipe inspection

Blueprint of an in-pipe climbing robot that works with sharp transmissions to study complex line relationships. Standard wheeled/happening pipe climbing robots tend to slide when exploring pipe turns. Instruments help achieve a very distinct delay sequence in which the robot slides and drags as it progresses. The proposed transmission joins the farthest ground plane of the standard two-output transmission. This opens up a substantial time for 3 output transmissions. This instrument takes into account the force exerted on each track within the line relation to specifically alter the robot's track speed, unlocking the key to fine control. Deflection of the robot across pipe networks with different bearings and non-slip pipe bends demonstrate the integrity of the proposed structure.

preprint2022arXiv

Distributed Neural Precoding for Hybrid mmWave MIMO Communications with Limited Feedback

Hybrid precoding is a cost-efficient technique for millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) communications. This paper proposes a deep learning approach by using a distributed neural network for hybrid analog-and-digital precoding design with limited feedback. The proposed distributed neural precoding network, called DNet, is committed to achieving two objectives. First, the DNet realizes channel state information (CSI) compression with a distributed architecture of neural networks, which enables practical deployment on multiple users. Specifically, this neural network is composed of multiple independent sub-networks with the same structure and parameters, which reduces both the number of training parameters and network complexity. Secondly, DNet learns the calculation of hybrid precoding from reconstructed CSI from limited feedback. Different from existing black-box neural network design, the DNet is specifically designed according to the data form of the matrix calculation of hybrid precoding. Simulation results show that the proposed DNet significantly improves the performance up to nearly 50% compared to traditional limited feedback precoding methods under the tests with various CSI compression ratios.

preprint2022arXiv

Do You Need the Entropy Reward (in Practice)?

Maximum entropy (MaxEnt) RL maximizes a combination of the original task reward and an entropy reward. It is believed that the regularization imposed by entropy, on both policy improvement and policy evaluation, together contributes to good exploration, training convergence, and robustness of learned policies. This paper takes a closer look at entropy as an intrinsic reward, by conducting various ablation studies on soft actor-critic (SAC), a popular representative of MaxEnt RL. Our findings reveal that in general, entropy rewards should be applied with caution to policy evaluation. On one hand, the entropy reward, like any other intrinsic reward, could obscure the main task reward if it is not properly managed. We identify some failure cases of the entropy reward especially in episodic Markov decision processes (MDPs), where it could cause the policy to be overly optimistic or pessimistic. On the other hand, our large-scale empirical study shows that using entropy regularization alone in policy improvement, leads to comparable or even better performance and robustness than using it in both policy improvement and policy evaluation. Based on these observations, we recommend either normalizing the entropy reward to a zero mean (SACZero), or simply removing it from policy evaluation (SACLite) for better practical results.

preprint2022arXiv

Efficient and Probabilistic Adaptive Voxel Mapping for Accurate Online LiDAR Odometry

This paper proposes an efficient and probabilistic adaptive voxel mapping method for LiDAR odometry. The map is a collection of voxels; each contains one plane (or edge) feature that enables the probabilistic representation of the environment and accurate registration of a new LiDAR scan. We further analyze the need for coarse-to-fine voxel mapping and then use a novel voxel map organized by a Hash table and octrees to build and update the map efficiently. We apply the proposed voxel map to an iterated extended Kalman filter and construct a maximum a posteriori probability problem for pose estimation. Experiments on the open KITTI dataset show the high accuracy and efficiency of our method compared to other state-of-the-art methods. Outdoor experiments on unstructured environments with non-repetitive scanning LiDARs further verify the adaptability of our mapping method to different environments and LiDAR scanning patterns. Our codes and dataset are open-sourced on Github

preprint2022arXiv

Energy Efficient Beamforming Optimization for Integrated Sensing and Communication

This paper investigates the optimization of beamforming design in a system with integrated sensing and communication (ISAC), where the base station (BS) sends signals for simultaneous multiuser communication and radar sensing. We aim at maximizing the energy efficiency (EE) of the multiuser communication while guaranteeing the sensing requirement in terms of individual radar beampattern gains. The problem is a complicated nonconvex fractional program which is challenging to be solved. By appropriately reformulating the problem and then applying the techniques of successive convex approximation (SCA) and semidefinite relaxation (SDR), we propose an iterative algorithm to address this problem. In theory, we prove that the introduced relaxation of the SDR is rigorously tight. Numerical results validate the effectiveness of the proposed algorithm.

preprint2022arXiv

Extracting a Knowledge Base of COVID-19 Events from Social Media

In this paper, we present a manually annotated corpus of 10,000 tweets containing public reports of five COVID-19 events, including positive and negative tests, deaths, denied access to testing, claimed cures and preventions. We designed slot-filling questions for each event type and annotated a total of 31 fine-grained slots, such as the location of events, recent travel, and close contacts. We show that our corpus can support fine-tuning BERT-based classifiers to automatically extract publicly reported events and help track the spread of a new disease. We also demonstrate that, by aggregating events extracted from millions of tweets, we achieve surprisingly high precision when answering complex queries, such as "Which organizations have employees that tested positive in Philadelphia?" We will release our corpus (with user-information removed), automatic extraction models, and the corresponding knowledge base to the research community.

preprint2022arXiv

FAST-LIVO: Fast and Tightly-coupled Sparse-Direct LiDAR-Inertial-Visual Odometry

To achieve accurate and robust pose estimation in Simultaneous Localization and Mapping (SLAM) task, multi-sensor fusion is proven to be an effective solution and thus provides great potential in robotic applications. This paper proposes FAST-LIVO, a fast LiDAR-Inertial-Visual Odometry system, which builds on two tightly-coupled and direct odometry subsystems: a VIO subsystem and a LIO subsystem. The LIO subsystem registers raw points (instead of feature points on e.g., edges or planes) of a new scan to an incrementally-built point cloud map. The map points are additionally attached with image patches, which are then used in the VIO subsystem to align a new image by minimizing the direct photometric errors without extracting any visual features (e.g., ORB or FAST corner features). To further improve the VIO robustness and accuracy, a novel outlier rejection method is proposed to reject unstable map points that lie on edges or are occluded in the image view. Experiments on both open data sequences and our customized device data are conducted. The results show our proposed system outperforms other counterparts and can handle challenging environments at reduced computation cost. The system supports both multi-line spinning LiDARs and emerging solid-state LiDARs with completely different scanning patterns, and can run in real-time on both Intel and ARM processors. We open source our code and dataset of this work on Github to benefit the robotics community.

preprint2022arXiv

Focal Inverse Distance Transform Maps for Crowd Localization

In this paper, we focus on the crowd localization task, a crucial topic of crowd analysis. Most regression-based methods utilize convolution neural networks (CNN) to regress a density map, which can not accurately locate the instance in the extremely dense scene, attributed to two crucial reasons: 1) the density map consists of a series of blurry Gaussian blobs, 2) severe overlaps exist in the dense region of the density map. To tackle this issue, we propose a novel Focal Inverse Distance Transform (FIDT) map for the crowd localization task. Compared with the density maps, the FIDT maps accurately describe the persons' locations without overlapping in dense regions. Based on the FIDT maps, a Local-Maxima-Detection-Strategy (LMDS) is derived to effectively extract the center point for each individual. Furthermore, we introduce an Independent SSIM (I-SSIM) loss to make the model tend to learn the local structural information, better recognizing local maxima. Extensive experiments demonstrate that the proposed method reports state-of-the-art localization performance on six crowd datasets and one vehicle dataset. Additionally, we find that the proposed method shows superior robustness on the negative and extremely dense scenes, which further verifies the effectiveness of the FIDT maps. The code and model will be available at https://github.com/dk-liang/FIDTM.

preprint2022arXiv

Generative Planning for Temporally Coordinated Exploration in Reinforcement Learning

Standard model-free reinforcement learning algorithms optimize a policy that generates the action to be taken in the current time step in order to maximize expected future return. While flexible, it faces difficulties arising from the inefficient exploration due to its single step nature. In this work, we present Generative Planning method (GPM), which can generate actions not only for the current step, but also for a number of future steps (thus termed as generative planning). This brings several benefits to GPM. Firstly, since GPM is trained by maximizing value, the plans generated from it can be regarded as intentional action sequences for reaching high value regions. GPM can therefore leverage its generated multi-step plans for temporally coordinated exploration towards high value regions, which is potentially more effective than a sequence of actions generated by perturbing each action at single step level, whose consistent movement decays exponentially with the number of exploration steps. Secondly, starting from a crude initial plan generator, GPM can refine it to be adaptive to the task, which, in return, benefits future explorations. This is potentially more effective than commonly used action-repeat strategy, which is non-adaptive in its form of plans. Additionally, since the multi-step plan can be interpreted as the intent of the agent from now to a span of time period into the future, it offers a more informative and intuitive signal for interpretation. Experiments are conducted on several benchmark environments and the results demonstrated its effectiveness compared with several baseline methods.

preprint2022arXiv

Hierarchical Reinforcement Learning By Discovering Intrinsic Options

We propose a hierarchical reinforcement learning method, HIDIO, that can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks. Unlike current hierarchical RL approaches that tend to formulate goal-reaching low-level tasks or pre-define ad hoc lower-level policies, HIDIO encourages lower-level option learning that is independent of the task at hand, requiring few assumptions or little knowledge about the task structure. These options are learned through an intrinsic entropy minimization objective conditioned on the option sub-trajectories. The learned options are diverse and task-agnostic. In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency than regular RL baselines and two state-of-the-art hierarchical RL methods.

preprint2022arXiv

HMRNet: High and Multi-Resolution Network with Bidirectional Feature Calibration for Brain Structure Segmentation in Radiotherapy

Accurate segmentation of Anatomical brain Barriers to Cancer spread (ABCs) plays an important role for automatic delineation of Clinical Target Volume (CTV) of brain tumors in radiotherapy. Despite that variants of U-Net are state-of-the-art segmentation models, they have limited performance when dealing with ABCs structures with various shapes and sizes, especially thin structures (e.g., the falx cerebri) that span only few slices. To deal with this problem, we propose a High and Multi-Resolution Network (HMRNet) that consists of a multi-scale feature learning branch and a high-resolution branch, which can maintain the high-resolution contextual information and extract more robust representations of anatomical structures with various scales. We further design a Bidirectional Feature Calibration (BFC) block to enable the two branches to generate spatial attention maps for mutual feature calibration. Considering the different sizes and positions of ABCs structures, our network was applied after a rough localization of each structure to obtain fine segmentation results. Experiments on the MICCAI 2020 ABCs challenge dataset showed that: 1) Our proposed two-stage segmentation strategy largely outperformed methods segmenting all the structures in just one stage; 2) The proposed HMRNet with two branches can maintain high-resolution representations and is effective to improve the performance on thin structures; 3) The proposed BFC block outperformed existing attention methods using monodirectional feature calibration. Our method won the second place of ABCs 2020 challenge and has a potential for more accurate and reasonable delineation of CTV of brain tumors.

preprint2022arXiv

Intelligent MIMO Detection Using Meta Learning

In a K-best detector for multiple-input-multiple-output(MIMO) systems, the value of K needs to be sufficiently large to achieve near-maximum-likelihood (ML) performance. By treating K as a variable that can be adjusted according to a fitting function of some learnable coefficients, an intelligent MIMO detection network based on deep neural networks (DNN) is proposed to reduce complexity of the detection algorithm with little performance degradation. In particular, the proposed intelligent detection algorithm uses meta learning to learn the coefficients of the fitting function for K to circumvent the problem of learning K directly. The idea of network fusion is used to combine the learning results of the meta learning component networks. Simulation results show that the proposed scheme achieves near-ML detection performance while its complexity is close to that of linear detectors. Besides, it also exhibits strong ability of fast training.

preprint2022arXiv

Learning to Optimize Resource Assignment for Task Offloading in Mobile Edge Computing

In this paper, we consider a multiuser mobile edge computing (MEC) system, where a mixed-integer offloading strategy is used to assist the resource assignment for task offloading. Although the conventional branch and bound (BnB) approach can be applied to solve this problem, a huge burden of computational complexity arises which limits the application of BnB. To address this issue, we propose an intelligent BnB (IBnB) approach which applies deep learning (DL) to learn the pruning strategy of the BnB approach. By using this learning scheme, the structure of the BnB approach ensures near-optimal performance and meanwhile DL-based pruning strategy significantly reduces the complexity. Numerical results verify that the proposed IBnB approach achieves optimal performance with complexity reduced by over 80%.

preprint2022arXiv

Nuclear phase retrieval spectroscopy using resonant x-ray scattering

Light-matter interaction is exploited in spectroscopic techniques to access information about molecular, atomic or nuclear constituents of the sample of interest. While scattered light carries both amplitude and phase information of the electromagnetic field, most of the time the latter is lost in intensity measurements. However, often the phase information is paramount to reconstruct the desired information of the target, as it is well known from coherent x-ray imaging. Here we introduce a new phase retrieval algorithm which allows us to reconstruct the field phase information from two-dimensional time- and energy-resolved spectra. We apply this method to the particular case of x-ray scattering off Mössbauer nuclei at a synchrotron radiation source. Knowledge of the phase allows also for an excellent reconstruction of the energy spectra from experimental data, which could not be achieved with this resolution otherwise. Our approach provides an efficient novel data analysis tool which will benefit x-ray quantum optics and Mössbauer spectroscopy with synchrotron radiation alike.

preprint2022arXiv

PNM: Pixel Null Model for General Image Segmentation

A major challenge in image segmentation is classifying object boundaries. Recent efforts propose to refine the segmentation result with boundary masks. However, models are still prone to misclassifying boundary pixels even when they correctly capture the object contours. In such cases, even a perfect boundary map is unhelpful for segmentation refinement. In this paper, we argue that assigning proper prior weights to error-prone pixels such as object boundaries can significantly improve the segmentation quality. Specifically, we present the \textit{pixel null model} (PNM), a prior model that weights each pixel according to its probability of being correctly classified by a random segmenter. Empirical analysis shows that PNM captures the misclassification distribution of different state-of-the-art (SOTA) segmenters. Extensive experiments on semantic, instance, and panoptic segmentation tasks over three datasets (Cityscapes, ADE20K, MS COCO) confirm that PNM consistently improves the segmentation quality of most SOTA methods (including the vision transformers) and outperforms boundary-based methods by a large margin. We also observe that the widely-used mean IoU (mIoU) metric is insensitive to boundaries of different sharpness. As a byproduct, we propose a new metric, \textit{PNM IoU}, which perceives the boundary sharpness and better reflects the model segmentation performance in error-prone regions.

preprint2022arXiv

Pre-train or Annotate? Domain Adaptation with a Constrained Budget

Recent work has demonstrated that pre-training in-domain language models can boost performance when adapting to a new domain. However, the costs associated with pre-training raise an important question: given a fixed budget, what steps should an NLP practitioner take to maximize performance? In this paper, we view domain adaptation with a constrained budget as a consumer choice problem, where the goal is to select an optimal combination of data annotation and pre-training. We measure annotation costs of three procedural text datasets, along with the pre-training costs of several in-domain language models. The utility of different combinations of pre-training and data annotation are evaluated under varying budget constraints to assess which combination strategy works best. We find that for small budgets, spending all funds on annotation leads to the best performance; once the budget becomes large enough, however, a combination of data annotation and in-domain pre-training yields better performance. Our experiments suggest task-specific data annotation should be part of an economical strategy when adapting an NLP model to a new domain.

preprint2022arXiv

Pressure-induced mixed states caused by spin-elastic interactions during first-order spin phase transition in spin crossover compounds

Recently, the possibility of exploiting the phenomenon of spin transition (ST) has been intensively investigated, therefore, it is particularly important to study the behavior of ST under various stimuli. Here, the shape and content of the intermediate phase of ST in Hoffmann-like compounds [Fe(Fpz)2M(CN)4](M = Pt, Pd) under external stimuli are studied. For this purpose, magnetic and Raman spectroscopy measurements were carried out. In pressure-induced spin transition (PIST), a mixture of high-spin and low-spin states appears, while in temperature-induced spin transition (TIST), a homogeneous state occurs. The first-order ST induced by pressure has a hysteresis, but is not abrupt. Whereas, the temperature-induced spin transition at ambient pressure is hysteretic and abrupt. To investigate this difference, we discuss using a thermodynamic model that considers elastic interactions, showing that the slope of the hysteresis loop is related to the appearance of internal pressure, which is related to the difference in sample compressibility under high spin and low spin states.

preprint2022arXiv

RIS-Assisted Quasi-Static Broad Coverage for Wideband mmWave Massive MIMO Systems

Reconfigurable intelligent surfaces (RISs) can establish favorable wireless environments to combat the severe attenuation and blockages in millimeter-wave (mmWave) bands. However, to achieve the optimal enhancement of performance, the instantaneous channel state information (CSI) needs to be estimated at the cost of a large overhead that scales with the number of RIS elements and the number of users. In this paper, we design a quasi-static broad coverage at the RIS with the reduced overhead based on the statistical CSI. We propose a design framework to synthesize the power pattern reflected by the RIS that meets the customized requirements of broad coverage. For the communication of broadcast channels, we generalize the broad coverage of the single transmit stream to the scenario of multiple streams. Moreover, we employ the quasi-static broad coverage for a multiuser orthogonal frequency division multiplexing access (OFDMA) system, and derive the analytical expression of the downlink rate, which is proved to increase logarithmically with the power gain reflected by the RIS. By taking into account the overhead of channel estimation, the proposed quasi-static broad coverage even outperforms the design method that optimizes the RIS phases using the instantaneous CSI. Numerical simulations are conducted to verify these observations.

preprint2022arXiv

Testing gravitational redshift based on microwave frequency links onboard China Space Station

In 2022 China Space Station (CSS) will be equipped with atomic clocks and optical clocks with stabilities of $2 \times 10^{-16}$ and $8 \times 10^{-18}$, respectively, which provides an excellent opportunity to test gravitational redshift (GR) with higher accuracy than previous results. Based on high-precise frequency links between CSS and a ground station, we formulated a model and provided simulation experiments to test GR. Simulation results suggest that this method could test the GR at the accuracy level of $(0.27 \pm 2.15) \times10^{-7}$, more than two orders in magnitude higher than the result of the experiment of a hydrogen clock on board a flying rocket more than 40 years ago.

preprint2022arXiv

TransCrowd: weakly-supervised crowd counting with transformers

The mainstream crowd counting methods usually utilize the convolution neural network (CNN) to regress a density map, requiring point-level annotations. However, annotating each person with a point is an expensive and laborious process. During the testing phase, the point-level annotations are not considered to evaluate the counting accuracy, which means the point-level annotations are redundant. Hence, it is desirable to develop weakly-supervised counting methods that just rely on count-level annotations, a more economical way of labeling. Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm. However, having limited receptive fields for context modeling is an intrinsic limitation of these weakly-supervised CNN-based methods. These methods thus cannot achieve satisfactory performance, with limited applications in the real world. The transformer is a popular sequence-to-sequence prediction model in natural language processing (NLP), which contains a global receptive field. In this paper, we propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on transformers. We observe that the proposed TransCrowd can effectively extract the semantic crowd information by using the self-attention mechanism of transformer. To the best of our knowledge, this is the first work to adopt a pure transformer for crowd counting research. Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods and gains highly competitive counting performance compared with some popular fully-supervised counting methods.

preprint2022arXiv

Worst-case Design for RIS-aided Over-the-air Computation with Imperfect CSI

Over-the-air computation (AirComp) enables fast wireless data aggregation at the receiver through concurrent transmission by sensors in the application of Internet-of-Things (IoT). To further improve the performance of AirComp under unfavorable propagation channel conditions, we consider the problem of computation distortion minimization in a reconfigurable intelligent surface (RIS)-aided AirComp system. In particular, we take into account an additive bounded uncertainty of the channel state information (CSI) and the total power constraint, and jointly optimize the transceiver (Tx-Rx) and the RIS phase design from the perspective of worst-case robustness by minimizing the mean squared error (MSE) of the computation. To solve this intractable nonconvex problem, we develop an efficient alternating algorithm where both solutions to the robust sub-problem and to the joint design of Tx-Rx and RIS are obtained in closed forms. Simulation results demonstrate the effectiveness of the proposed method.

preprint2021arXiv

Analysis and Optimization for RIS-Aided Multi-Pair Communications Relying on Statistical CSI

In this paper, we investigate a reconfigurable intelligent surface (RIS) aided multi-pair communication system, in which multi-pair users exchange information via an RIS. We derive an approximate expression of the achievable rate by assuming that statistical channel state information (CSI) is available. A genetic algorithm (GA) to solve the rate maximization problem is proposed as well. In particular, we consider implementations of RISs with continuous phase shifts (CPSs) and discrete phase shifts (DPSs). Simulation results verify the correctness of the obtained results and show that the proposed GA method has almost the same performance as the globally optimal solution. In addition, numerical results show that three quantization bits can achieve a large portion of the sum achievable rate for the CPSs setup.

preprint2021arXiv

Avoiding dynamic small obstacles with onboard sensing and computating on aerial robots

In practical applications, autonomous quadrotors are still facing significant challenges, such as the detection and avoidance of very small and even dynamic obstacles (e.g., tree branches, power lines). In this paper, we propose a compact, integrated, and fully autonomous quadrotor system, which can fly safely in cluttered environments while avoiding dynamic small obstacles. Our quadrotor platform is equipped with a forward-looking three-dimensional (3D) light detection and ranging (lidar) sensor to perceive the environment and an onboard embedded computer to perform all the estimation, mapping, and planning tasks. Specifically, the computer estimates the current pose of the UAV, maintains a local map (time-accumulated point clouds KD-Trees), and computes a safe trajectory using kinodynamic A* search to the goal point. The whole perception and planning system can run onboard at 50Hz with careful optimization. Various indoor and outdoor experiments show that the system can avoid dynamic small obstacles (down to 20mm diameter bar) while flying at 2m/s in cluttered environments. Our codes and hardware design are open-sourced on Github.

preprint2021arXiv

Deep Reinforcement Learning Based Dynamic Trajectory Control for UAV-assisted Mobile Edge Computing

In this paper, we consider a platform of flying mobile edge computing (F-MEC), where unmanned aerial vehicles (UAVs) serve as equipment providing computation resource, and they enable task offloading from user equipment (UE). We aim to minimize energy consumption of all the UEs via optimizing the user association, resource allocation and the trajectory of UAVs. To this end, we first propose a Convex optimizAtion based Trajectory control algorithm (CAT), which solves the problem in an iterative way by using block coordinate descent (BCD) method. Then, to make the real-time decision while taking into account the dynamics of the environment (i.e., UAV may take off from different locations), we propose a deep Reinforcement leArning based Trajectory control algorithm (RAT). In RAT, we apply the Prioritized Experience Replay (PER) to improve the convergence of the training procedure. Different from the convex optimization based algorithm which may be susceptible to the initial points and requires iterations, RAT can be adapted to any taking off points of the UAVs and can obtain the solution more rapidly than CAT once training process has been completed. Simulation results show that the proposed CAT and RAT achieve the similar performance and both outperform traditional algorithms.

preprint2021arXiv

ikd-Tree: An Incremental K-D Tree for Robotic Applications

This paper proposes an efficient data structure, ikd-Tree, for dynamic space partition. The ikd-Tree incrementally updates a k-d tree with new coming points only, leading to much lower computation time than existing static k-d trees. Besides point-wise operations, the ikd-Tree supports several features such as box-wise operations and down-sampling that are practically useful in robotic applications. In parallel to the incremental operations (i.e., insert, re-insert, and delete), ikd-Tree actively monitors the tree structure and partially re-balances the tree, which enables efficient nearest point search in later stages. The ikd-Tree is carefully engineered and supports multi-thread parallel computing to maximize the overall efficiency. We validate the ikd-Tree in both theory and practical experiments. On theory level, a complete time complexity analysis is presented to prove the high efficiency. On experiment level, the ikd-Tree is tested on both randomized datasets and real-world LiDAR point data in LiDAR-inertial odometry and mapping application. In all tests, ikd-Tree consumes only 4% of the running time in a static k-d tree.

preprint2021arXiv

MetaView: Few-shot Active Object Recognition

In robot sensing scenarios, instead of passively utilizing human captured views, an agent should be able to actively choose informative viewpoints of a 3D object as discriminative evidence to boost the recognition accuracy. This task is referred to as active object recognition. Recent works on this task rely on a massive amount of training examples to train an optimal view selection policy. But in realistic robot sensing scenarios, the large-scale training data may not exist and whether the intelligent view selection policy can be still learned from few object samples remains unclear. In this paper, we study this new problem which is extremely challenging but very meaningful in robot sensing -- Few-shot Active Object Recognition, i.e., to learn view selection policies from few object samples, which has not been considered and addressed before. We solve the proposed problem by adopting the framework of meta learning and name our method "MetaView". Extensive experiments on both category-level and instance-level classification tasks demonstrate that the proposed method can efficiently resolve issues that are hard for state-of-the-art active object recognition methods to handle, and outperform several baselines by large margins.

preprint2021arXiv

R2LIVE: A Robust, Real-time, LiDAR-Inertial-Visual tightly-coupled state Estimator and mapping

In this letter, we propose a robust, real-time tightly-coupled multi-sensor fusion framework, which fuses measurement from LiDAR, inertial sensor, and visual camera to achieve robust and accurate state estimation. Our proposed framework is composed of two parts: the filter-based odometry and factor graph optimization. To guarantee real-time performance, we estimate the state within the framework of error-state iterated Kalman-filter, and further improve the overall precision with our factor graph optimization. Taking advantage of measurement from all individual sensors, our algorithm is robust enough to various visual failure, LiDAR-degenerated scenarios, and is able to run in real-time on an on-board computation platform, as shown by extensive experiments conducted in indoor, outdoor, and mixed environment of different scale. Moreover, the results show that our proposed framework can improve the accuracy of state-of-the-art LiDAR-inertial or visual-inertial odometry. To share our findings and to make contributions to the community, we open source our codes on our Github.

preprint2021arXiv

The solution space structure of planted constraint satisfaction problems with growing domains

Planting a solution into the random RB model, which is a prototype of random constraint satisfaction problem (CSP) with growing domains, can generate very hard satisfiable CSP benchmarks. We study the solution space structure of the planted RB model. With constraint density growing, we find that this model goes through four phase transitions. In the replica symmetric phase, what we call the independent phase transition occurs, after which the planted cluster (cluster containing the planted solution) is separated from the giant cluster. Then the solutions except that in the planted cluster go through the same clustering phase transition and the same satisfiability phase transition as the random RB model. The planted cluster goes through the isolated phase transition, after which the planted cluster contains only one solution. This phase diagram provides strong evidence that this model can generate very hard satisfiable CSP benchmarks. For over constraint instances (where the constraint density is very large), we find that the configuration space has only a single energy valley, which makes the instances tractable. Experiments using Belief Propagation confirm the locations of the clustering, satisfiability (by configurations outside the planted cluster), and isolated phase transition points.

preprint2020arXiv

A Bayes Factor Approach with Informative Prior for Rare Genetic Variant Analysis from Next Generation Sequencing Data

The discovery of rare genetic variants through Next Generation Sequencing is a very challenging issue in the field of human genetics. We propose a novel region-based statistical approach based on a Bayes Factor (BF) to assess evidence of association between a set of rare variants (RVs) located on the same genomic region and a disease outcome in the context of case-control design. Marginal likelihoods are computed under the null and alternative hypotheses assuming a binomial distribution for the RV count in the region and a beta or mixture of Dirac and beta prior distribution for the probability of RV. We derive the theoretical null distribution of the BF under our prior setting and show that a Bayesian control of the False Discovery Rate (BFDR) can be obtained for genome-wide inference. Informative priors are introduced using prior evidence of association from a Kolmogorov-Smirnov test statistic. We use our simulation program, sim1000G, to generate RV data similar to the 1,000 genomes sequencing project. Our simulation studies showed that the new BF statistic outperforms standard methods (SKAT, SKAT-O, Burden test) in case-control studies with moderate sample sizes and is equivalent to them under large sample size scenarios. Our real data application to a lung cancer case-control study found enrichment for RVs in known and novel cancer genes. It also suggests that using the BF with informative prior improves the overall gene discovery compared to the BF with non-informative prior.

preprint2020arXiv

An averaging principle for fractional stochastic differential equations with Lévy noise

This paper is devoted to the study of an averaging principle for fractional stochastic differential equations in Rnwith Lévy motion, using an integral transform method. We obtain a time-averaged equation under suitable assumptions. Furthermore, we show that the solutions of averaged equation approach the solutions of the original equation. Our results in this paper provide better understanding for effective approximation of fractional dynamical systems with non-Gaussian Lévy noise.

preprint2020arXiv

Analog Versus Hybrid Precoding for Multiuser Massive MIMO with Quantized CSI Feedback

In this letter, we study the performance of a downlink multiuser massive multiple-input multiple-output (MIMO) system with sub-connected structure over limited feedback channels. Tight rate approximations are theoretically analyzed for the system with pure analog precoding and hybrid precoding. The effect of quantized analog and digital precoding is characterized in the derived expressions. Furthermore, it is revealed that the pure analog precoding outperforms the hybrid precoding using maximal-ratio transmission (MRT) or zero forcing (ZF) under certain conditions, and we theoretically characterize the conditions in closed form with respect to signal-to-noise ratio (SNR), the number of users and the number of feedback bits. Numerical results verify the derived conclusions on both Rayleigh channels and mmWave channels.

preprint2020arXiv

AnciNet: An Efficient Deep Learning Approach for Feedback Compression of Estimated CSI in Massive MIMO Systems

Accurate channel state information (CSI) feedback plays a vital role in improving the performance gain of massive multiple-input multiple-output (m-MIMO) systems, where the dilemma is excessive CSI overhead versus limited feedback bandwith. By considering the noisy CSI due to imperfect channel estimation, we propose a novel deep neural network architecture, namely AnciNet, to conduct the CSI feedback with limited bandwidth. AnciNet extracts noise-free features from the noisy CSI samples to achieve effective CSI compression for the feedback. Experimental results verify that the proposed AnciNet approach outperforms the existing techniques under various conditions.

preprint2020arXiv

Asymptotic Results for Heavy-tailed Lévy Processes and their Exponential Functionals

In this paper we first provide several conditional limit theorems for Lévy processes with negative drift and regularly varying tail. Then we apply them to study the asymptotic behavior of expectations of some exponential functionals of heavy-tailed Lévy processes. As the key point, we observe that the asymptotics mainly depends on the sample paths with early arrival large jump. Both the polynomial decay rate and the exact expression of the limit coefficients are given. As an application, we give an exact description for the extinction speed of continuous-state branching processes in heavy-tailed Lévy random environment with stable branching mechanism.

preprint2020arXiv

Attacking Optical Character Recognition (OCR) Systems with Adversarial Watermarks

Optical character recognition (OCR) is widely applied in real applications serving as a key preprocessing tool. The adoption of deep neural network (DNN) in OCR results in the vulnerability against adversarial examples which are crafted to mislead the output of the threat model. Different from vanilla colorful images, images of printed text have clear backgrounds usually. However, adversarial examples generated by most of the existing adversarial attacks are unnatural and pollute the background severely. To address this issue, we propose a watermark attack method to produce natural distortion that is in the disguise of watermarks and evade human eyes' detection. Experimental results show that watermark attacks can yield a set of natural adversarial examples attached with watermarks and attain similar attack performance to the state-of-the-art methods in different attack scenarios.

preprint2020arXiv

Chimbuko: A Workflow-Level Scalable Performance Trace Analysis Tool

Because of the limits input/output systems currently impose on high-performance computing systems, a new generation of workflows that include online data reduction and analysis is emerging. Diagnosing their performance requires sophisticated performance analysis capabilities due to the complexity of execution patterns and underlying hardware, and no tool could handle the voluminous performance trace data needed to detect potential problems. This work introduces Chimbuko, a performance analysis framework that provides real-time, distributed, in situ anomaly detection. Data volumes are reduced for human-level processing without losing necessary details. Chimbuko supports online performance monitoring via a visualization module that presents the overall workflow anomaly distribution, call stacks, and timelines. Chimbuko also supports the capture and reduction of performance provenance. To the best of our knowledge, Chimbuko is the first online, distributed, and scalable workflow-level performance trace analysis framework, and we demonstrate the tool's usefulness on Oak Ridge National Laboratory's Summit system.

preprint2020arXiv

Determining geopotential difference via relativistic precise point positioning time comparison: A case study using simulated observations

According to general relativity theory (GRT), the geopotential difference (GD) can be determined by comparing the change in time difference between precise clocks using the precise point positioning (PPP) time transfer technique, referred to as the relativistic PPP time comparison approach. We focused on high-precision time comparison between two precise clocks for determining the GD using the relativistic PPP time transfer,and conducted simulation experiments to validate the approach. In the experiments, we consider three cases to evaluate the performance of the approach using clocks with different stabilities, namely, the frequency stabilities of the clocks equipped at three selected ground stations are respectively (Case 1), (Case 2), and (Case 3) at time period. Conclusions are drawn from the experimental results. First, high-precision clocks can significantly improve the accuracy for PPP time transfer, but the improvement is limited by measurement noises. Compared to Case 1, the long-term stabilities of OPMT-BRUX as well as PTBB-BRUX are improved in Cases 2 and 3. The frequency stabilities of Cases 1-3 are approximately 4.28*10-16, 4.00*10-17, and 3.22*10-17 at 10-day averaging time for OPMT-BRUX, respectively, and for PTBB-BRUX, these values are approximately 3.73*10-16, 8.17*10-17, and 4.64*10-17. Second, the geopotential difference between any two stations can be determined at the decimeter level, with its accuracy being consistent with the stabilities of the time links in Cases 1-3. In Case 3, the determined geopotential differences between OPMT and BRUX deviate from the EIGEN-6C4 model values by -0.64 m2/s2 with an uncertainty of 1.11 m2/s2, whereas the deviation error between PTBB and BRUX is 0.76 m2/s2 with an uncertainty of 1.79 m2/s2. The approach proposed in this study can be also applied to testing GRT.

preprint2020arXiv

Discourse Level Factors for Sentence Deletion in Text Simplification

This paper presents a data-driven study focusing on analyzing and predicting sentence deletion -- a prevalent but understudied phenomenon in document simplification -- on a large English text simplification corpus. We inspect various document and discourse factors associated with sentence deletion, using a new manually annotated sentence alignment corpus we collected. We reveal that professional editors utilize different strategies to meet readability standards of elementary and middle schools. To predict whether a sentence will be deleted during simplification to a certain level, we harness automatically aligned data to train a classification model. Evaluated on our manually annotated data, our best models reached F1 scores of 65.2 and 59.7 for this task at the levels of elementary and middle school, respectively. We find that discourse level factors contribute to the challenging task of predicting sentence deletion for simplification.

preprint2020arXiv

Distributed IRS with Statistical Passive Beamforming for MISO Communications

Intelligent reflecting surface (IRS) has recently been identified as a prominent technology with the ability of enhancing wireless communication by dynamically manipulating the propagation environment. This paper investigates a multiple-input single-output (MISO) system deploying distributed IRSs. For practical considerations, we propose an efficient design of passive reflecting beamforming for the IRSs to exploit statistical channel state information (CSI) and analyze the achievable rate of the network taking into account the impact of CSI estimation error. The ergodic achievable rate is derived in a closed form, which provides insightful system design guidelines. Numerical results confirm the accuracy of the derived results and unveil the performance superiority of the proposed distributed IRS deployment over the conventional centralized deployment.

preprint2020arXiv

Energy-Efficient Wireless Communications with Distributed Reconfigurable Intelligent Surfaces

This paper investigates the problem of resource allocation for a wireless communication network with distributed reconfigurable intelligent surfaces (RISs). In this network, multiple RISs are spatially distributed to serve wireless users and the energy efficiency of the network is maximized by dynamically controlling the on-off status of each RIS as well as optimizing the reflection coefficients matrix of the RISs. This problem is posed as a joint optimization problem of transmit beamforming and RIS control, whose goal is to maximize the energy efficiency under minimum rate constraints of the users. To solve this problem, two iterative algorithms are proposed for the single-user case and multi-user case. For the single-user case, the phase optimization problem is solved by using a successive convex approximation method, which admits a closed-form solution at each step. Moreover, the optimal RIS on-off status is obtained by using the dual method. For the multi-user case, a low-complexity greedy searching method is proposed to solve the RIS on-off optimization problem. Simulation results show that the proposed scheme achieves up to 33\% and 68\% gains in terms of the energy efficiency in both single-user and multi-user cases compared to the conventional RIS scheme and amplify-and-forward relay scheme, respectively.

preprint2020arXiv

Feature Statistics Guided Efficient Filter Pruning

Building compact convolutional neural networks (CNNs) with reliable performance is a critical but challenging task, especially when deploying them in real-world applications. As a common approach to reduce the size of CNNs, pruning methods delete part of the CNN filters according to some metrics such as $l1$-norm. However, previous methods hardly leverage the information variance in a single feature map and the similarity characteristics among feature maps. In this paper, we propose a novel filter pruning method, which incorporates two kinds of feature map selections: diversity-aware selection (DFS) and similarity-aware selection (SFS). DFS aims to discover features with low information diversity while SFS removes features that have high similarities with others. We conduct extensive empirical experiments with various CNN architectures on publicly available datasets. The experimental results demonstrate that our model obtains up to 91.6% parameter decrease and 83.7% FLOPs reduction with almost no accuracy loss.

preprint2020arXiv

Generalizing Natural Language Analysis through Span-relation Representations

Natural language processing covers a wide variety of tasks predicting syntax, semantics, and information content, and usually each type of output is generated with specially designed architectures. In this paper, we provide the simple insight that a great variety of tasks can be represented in a single unified format consisting of labeling spans and relations between spans, thus a single task-independent model can be used across different tasks. We perform extensive experiments to test this insight on 10 disparate tasks spanning dependency parsing (syntax), semantic role labeling (semantics), relation extraction (information content), aspect based sentiment analysis (sentiment), and many others, achieving performance comparable to state-of-the-art specialized models. We further demonstrate benefits of multi-task learning, and also show that the proposed method makes it easy to analyze differences and similarities in how the model handles different tasks. Finally, we convert these datasets into a unified format to build a benchmark, which provides a holistic testbed for evaluating future models for generalized natural language analysis.

preprint2020arXiv

Hybrid Transceiver Optimization for Multi-Hop Communications

Multi-hop communication with the aid of large-scale antenna arrays will play a vital role in future emergence communication systems. In this paper, we investigate amplify-and-forward based and multiple-input multiple-output assisted multi-hop communication, in which all nodes employ hybrid transceivers. Moreover, channel errors are taken into account in our hybrid transceiver design. Based on the matrix-monotonic optimization framework, the optimal structures of the robust hybrid transceivers are derived. By utilizing these optimal structures, the optimizations of analog transceivers and digital transceivers can be separated without loss of optimality. This fact greatly simplifies the joint optimization of analog and digital transceivers. Since the optimization of analog transceivers under unit-modulus constraints is non-convex, a projection type algorithm is proposed for analog transceiver optimization to overcome this difficulty. Based on the derived analog transceivers, the optimal digital transceivers can then be derived using matrix-monotonic optimization. Numeral results obtained demonstrate the performance advantages of the proposed hybrid transceiver designs over other existing solutions.

preprint2020arXiv

Implicit Generative Modeling for Efficient Exploration

Efficient exploration remains a challenging problem in reinforcement learning, especially for those tasks where rewards from environments are sparse. A commonly used approach for exploring such environments is to introduce some "intrinsic" reward. In this work, we focus on model uncertainty estimation as an intrinsic reward for efficient exploration. In particular, we introduce an implicit generative modeling approach to estimate a Bayesian uncertainty of the agent's belief of the environment dynamics. Each random draw from our generative model is a neural network that instantiates the dynamic function, hence multiple draws would approximate the posterior, and the variance in the future prediction based on this posterior is used as an intrinsic reward for exploration. We design a training algorithm for our generative model based on the amortized Stein Variational Gradient Descent. In experiments, we compare our implementation with state-of-the-art intrinsic reward-based exploration approaches, including two recent approaches based on an ensemble of dynamic models. In challenging exploration tasks, our implicit generative model consistently outperforms competing approaches regarding data efficiency in exploration.

preprint2020arXiv

Interactive Visual Study of Multiple Attributes Learning Model of X-Ray Scattering Images

Existing interactive visualization tools for deep learning are mostly applied to the training, debugging, and refinement of neural network models working on natural images. However, visual analytics tools are lacking for the specific application of x-ray image classification with multiple structural attributes. In this paper, we present an interactive system for domain scientists to visually study the multiple attributes learning models applied to x-ray scattering images. It allows domain scientists to interactively explore this important type of scientific images in embedded spaces that are defined on the model prediction output, the actual labels, and the discovered feature space of neural networks. Users are allowed to flexibly select instance images, their clusters, and compare them regarding the specified visual representation of attributes. The exploration is guided by the manifestation of model performance related to mutual relationships among attributes, which often affect the learning accuracy and effectiveness. The system thus supports domain scientists to improve the training dataset and model, find questionable attributes labels, and identify outlier images or spurious data clusters. Case studies and scientists feedback demonstrate its functionalities and usefulness.

preprint2020arXiv

Interpreting Galaxy Deblender GAN from the Discriminator's Perspective

Generative adversarial networks (GANs) are well known for their unsupervised learning capabilities. A recent success in the field of astronomy is deblending two overlapping galaxy images via a branched GAN model. However, it remains a significant challenge to comprehend how the network works, which is particularly difficult for non-expert users. This research focuses on behaviors of one of the network's major components, the Discriminator, which plays a vital role but is often overlooked, Specifically, we enhance the Layer-wise Relevance Propagation (LRP) scheme to generate a heatmap-based visualization. We call this technique Polarized-LRP and it consists of two parts i.e. positive contribution heatmaps for ground truth images and negative contribution heatmaps for generated images. Using the Galaxy Zoo dataset we demonstrate that our method clearly reveals attention areas of the Discriminator when differentiating generated galaxy images from ground truth images. To connect the Discriminator's impact on the Generator, we visualize the gradual changes of the Generator across the training process. An interesting result we have achieved there is the detection of a problematic data augmentation procedure that would else have remained hidden. We find that our proposed method serves as a useful visual analytical tool for a deeper understanding of GAN models.

preprint2020arXiv

Joint Transmit Power and Placement Optimization for URLLC-enabled UAV Relay Systems

This letter considers an unmanned aerial vehicle (UAV)-enabled relay communication system for delivering latency-critical messages with ultra-high reliability, where the relay is operating under amplifier-and-forward (AF) mode. We aim to jointly optimize the UAV location and power to minimize decoding error probability while guaranteeing the latency constraints. Both the free-space channel model and three-dimensional (3-D) channel model are considered. For the first model, we propose a low-complexity iterative algorithm to solve the problem, while globally optimal solution is derived for the case when the signal-to-noise ratio (SNR) is extremely high. For the second model, we also propose a low-complexity iterative algorithm to solve the problem. Simulation results confirm the performance advantages of our proposed algorithms.

preprint2020arXiv

Multi-cell Edge Coverage Enhancement Using Mobile UAV-Relay

Unmanned aerial vehicle (UAV)-assisted communication is a promising technology in future wireless communication networks. UAVs can not only help offload data traffic from ground base stations (GBSs), but also improve the quality of service of cell-edge users (CEUs). In this paper, we consider the enhancement of cell-edge communications through a mobile relay, i.e., UAV, in multi-cell networks. During each transmission period, GBSs first send data to the UAV, and then the UAV forwards its received data to CEUs according to a certain association strategy. In order to maximize the sum rate of all CEUs, we jointly optimize the UAV mobility management, including trajectory, velocity, and acceleration, and association strategy of CEUs to the UAV, subject to minimum rate requirements of CEUs, mobility constraints of the UAV and causal buffer constraints in practice. To address the mixed-integer nonconvex problem, we transform it into two convex subproblems by applying tight bounds and relaxations. An iterative algorithm was proposed to solve the two subproblems in an alternating manner. Numerical results show that the proposed algorithm achieves higher rates of CEUs as compared with existing benchmark schemes.

preprint2020arXiv

Multi-hop Reading Comprehension across Documents with Path-based Graph Convolutional Network

Multi-hop reading comprehension across multiple documents attracts much attention recently. In this paper, we propose a novel approach to tackle this multi-hop reading comprehension problem. Inspired by human reasoning processing, we construct a path-based reasoning graph from supporting documents. This graph can combine both the idea of the graph-based and path-based approaches, so it is better for multi-hop reasoning. Meanwhile, we propose Gated-RGCN to accumulate evidence on the path-based reasoning graph, which contains a new question-aware gating mechanism to regulate the usefulness of information propagating across documents and add question information during reasoning. We evaluate our approach on WikiHop dataset, and our approach achieves state-of-the-art accuracy against previously published approaches. Especially, our ensemble model surpasses human performance by 4.2%.

preprint2020arXiv

Multicell MIMO Communications Relying on Intelligent Reflecting Surface

Intelligent reflecting surfaces (IRSs) constitute a disruptive wireless communication technique capable of creating a controllable propagation environment. In this paper, we propose to invoke an IRS at the cell boundary of multiple cells to assist the downlink transmission to cell-edge users, whilst mitigating the inter-cell interference, which is a crucial issue in multicell communication systems. We aim for maximizing the weighted sum rate (WSR) of all users through jointly optimizing the active precoding matrices at the base stations (BSs) and the phase shifts at the IRS subject to each BS's power constraint and unit modulus constraint. Both the BSs and the users are equipped with multiple antennas, which enhances the spectral efficiency by exploiting the spatial multiplexing gain. Due to the non-convexity of the problem, we first reformulate it into an equivalent one, which is solved by using the block coordinate descent (BCD) algorithm, where the precoding matrices and phase shifts are alternately optimized. The optimal precoding matrices can be obtained in closed form, when fixing the phase shifts. A pair of efficient algorithms are proposed for solving the phase shift optimization problem, namely the Majorization-Minimization (MM) Algorithm and the Complex Circle Manifold (CCM) Method. Both algorithms are guaranteed to converge to at least locally optimal solutions. We also extend the proposed algorithms to the more general multiple-IRS and network MIMO scenarios. Finally, our simulation results confirm the advantages of introducing IRSs in enhancing the cell-edge user performance.

preprint2020arXiv

Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning

In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal. In the natural world, intelligent organisms learn from internal drives, bypassing the need for external signals, which is beneficial for a wide range of tasks. Motivated by this observation, we propose to formulate an intrinsic objective as the mutual information between the goal states and the controllable states. This objective encourages the agent to take control of its environment. Subsequently, we derive a surrogate objective of the proposed reward function, which can be optimized efficiently. Lastly, we evaluate the developed framework in different robotic manipulation and navigation tasks and demonstrate the efficacy of our approach. A video showing experimental results is available at https://youtu.be/CT4CKMWBYz0

preprint2020arXiv

Numerical Analysis of History-dependent Variational-hemivariational Inequalities

In this paper, numerical analysis is carried out for a class of history-dependent variational-hemivariational inequalities arising in contact problems. Three different numerical treatments for temporal discretization are proposed to approximate the continuous model. Fixed-point iteration algorithms are employed to implement the implicit scheme and the convergence is proved with a convergence rate independent of the time step-size and mesh grid-size. A special temporal discretization is introduced for the history-dependent operator, leading to numerical schemes for which the unique solvability and error bounds for the temporally discrete systems can be proved without any restriction on the time step-size. As for spatial approximation, the finite element method is applied and an optimal order error estimate for the linear element solutions is provided under appropriate regularity assumptions. Numerical examples are presented to illustrate the theoretical results.

preprint2020arXiv

Octopus: Privacy-Preserving Collaborative Evaluation of Loan Stacking

With the rise of online lenders, the loan stacking problem has become a significant issue in the financial industry. One of the key steps in the fight against it is the querying of the loan history of a borrower from peer lenders. This is especially important in markets without a trusted credit bureau. To protect participants privacy and business interests, we want to hide borrower identities and lenders data from the loan originator, while simultaneously verifying that the borrower authorizes the query. In this paper, we propose Octopus, a distributed system to execute the query while meeting all the above security requirements. Theoretically, Octopus is sound. Practically, it integrates multiple optimizations to reduce communication and computation overhead. Evaluation shows that Octopus can run on 800 geographically distributed servers and can perform a query within about 0.5 seconds on average.

preprint2020arXiv

On Uplink Performance of Multiuser Massive MIMO Relay Network With Limited RF Chains

This paper considers a multiuser massive multiple-input multiple-output uplink with the help of an analog amplify-and-forward relay. The base station equips a large array of $N_d$ antennas but is supported by a far smaller number of radio-frequency chains. By first deriving new results for a cascaded phase-aligned two-hop channel, we obtain a tight bound for the ergodic rate in closed form for both perfect and quantized channel phase information. The rate is characterized as a function of a scaled equivalent signal-to-noise ratio of the two-hop channel. It implies that the source and relay powers can be respectively scaled down as $1/N_d^a$ and $1/N_d^{1-a}~ (0\!\leq\!a\!\leq\!1)$ for an asymptotically unchanged sum rate. Then for the rate maximization, the problem of power allocation is optimized with closed-form solutions. Simulation results verified the observations of our derived results.

preprint2020arXiv

PrivPy: Enabling Scalable and General Privacy-Preserving Machine Learning

We introduce PrivPy, a practical privacy-preserving collaborative computation framework, especially optimized for machine learning tasks. PrivPy provides an easy-to-use and highly compatible Python programming front-end which supports high-level array operations and different secure computation engines to allow for security assumptions and performance trade-offs. With PrivPy, programmers can write modern machine learning algorithms conveniently and efficiently in Python. We also design and implement a new efficient computation engine, with which people can use competing cloud providers to efficiently perform general arithmetics over real numbers. We demonstrate the usability and scalability of PrivPy using common machine learning models (e.g. logistic regression and convolutional neural networks) and real-world datasets (including a 5000-by-1-million matrix).

preprint2020arXiv

Spectral and Energy Efficiency of IRS-Assisted MISO Communication with Hardware Impairments

In this letter, we analyze the spectral and energy efficiency of an intelligent reflecting surface (IRS)-assisted multiple-input single-output (MISO) downlink system with hardware impairments. An extended error vector magnitude (EEVM) model is utilized to characterize the impact of radio-frequency (RF) impairments at the access point (AP) and phase noise is considered for the imperfect IRS. We show that the spectral efficiency is limited due to the hardware impairments even when the numbers of AP antennas and IRS elements grow infinitely large, which is in contrast with the conventional case with ideal hardware. Moreover, the performance degradation at high SNR is shown to be mainly affected by the AP hardware impairments rather than the phase noise of IRS. We further obtain the optimal transmit power in closed form for energy efficiency maximization. Simulation results are provided to verify these results.

preprint2019arXiv

Optimal Multi-View Video Transmission in Multiuser Wireless Networks by Exploiting Natural and View Synthesis-Enabled Multicast Opportunities

Multi-view videos (MVVs) provide immersive viewing experience, at the cost of traffic load increase for wireless networks. In this paper, we would like to optimize MVV transmission in a multiuser wireless network by exploiting both natural multicast opportunities and view synthesis-enabled multicast opportunities. Specifically, we first establish a mathematical model to specify view synthesis at the server and each user, and characterize its impact on multicast opportunities. This model is highly nontrivial and fundamentally enables the optimization of view synthesis-based multicast opportunities. For given video quality requirements of all users, we consider the optimization of view selection, transmission time and power allocation to minimize the average weighted sum energy consumption for view transmission and synthesis. In addition, under the energy consumption constraints at the server and each user respectively, we consider the optimization of view selection, transmission time and power allocation and video quality selection to maximize the total utility. These two optimization problems are challenging mixed discrete-continuous optimization problems. For the first problem, we propose an algorithm to obtain an optimal solution with reduced computational complexity by exploiting optimality properties. For each problem, to reduce computational complexity, we also propose a low-complexity algorithm to obtain a suboptimal solution, using Difference of Convex (DC) programming. Finally, numerical results show the advantage of the proposed solutions over existing ones, and demonstrate the importance of the optimization of view synthesis-enabled multicast opportunities in MVV transmission.

preprint2019arXiv

Optimal Multi-View Video Transmission in OFDMA Systems

In this letter, we study the transmission of a multi-view video (MVV) to multiple users in an Orthogonal Frequency Division Multiple Access (OFDMA) system. To maximally improve transmission efficiency, we exploit both natural multicast opportunities and view synthesis-enabled multicast opportunities. First, we establish a communication model for transmission of a MVV to multiple users in an OFDMA system. Then, we formulate the minimization problem of the average weighted sum energy consumption for view transmission and synthesis with respect to view selection and transmission power and subcarrier allocation. The optimization problem is a challenging mixed discrete-continuous optimization problem with huge numbers of variables and constraints. A low-complexity algorithm is proposed to obtain a suboptimal solution. Finally, numerical results further demonstrate the value of view synthesis-enabled multicast opportunities for MVV transmission in OFDMA systems.

preprint2019arXiv

Secrecy Rate Maximization for Intelligent Reflecting Surface Assisted Multi-Antenna Communications

We investigate transmission optimization for intelligent reflecting surface (IRS) assisted multi-antenna systems from the physical-layer security perspective. The design goal is to maximize the system secrecy rate subject to the source transmit power constraint and the unit modulus constraints imposed on phase shifts at the IRS. To solve this complicated non-convex problem, we develop an efficient alternating algorithm where the solutions to the transmit covariance of the source and the phase shift matrix of the IRS are achieved in closed form and semi-closed forms, respectively. The convergence of the proposed algorithm is guaranteed theoretically. Simulations results validate the performance advantage of the proposed optimized design.