Source author record

Dong Liu

Dong Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

74works

38topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Proof-of-Concept Study of Multitask Learning for Cranial Synthetic CT Generation Across Heterogeneous MRI Field Strengths

Accurate synthesis of computed tomography (CT) images from magnetic resonance imaging (MRI) is clinically valuable for cranial applications such as attenuation correction, radiotherapy planning, and image-guided interventions. However, heterogeneity across MRI field strengths and acquisition protocols limits the generalizability of existing methods. In this study, we formulate cranial CT synthesis as a modular, structurally coupled problem and propose a deep learning framework to improve robustness across heterogeneous MRI conditions. The model is designed to adapt to variations in field strength and imaging protocols while preserving anatomical consistency. Experiments on multi-site datasets demonstrate improved performance and generalization compared with conventional approaches. The proposed method enables reliable CT synthesis across heterogeneous MRI settings, supporting broader clinical translation.

preprint2025arXiv

Ultrahigh-Energy Gamma-ray Emission Associated with Black Hole-Jet Systems

Black holes (BH), one of the most intriguing objects in the universe, can manifest themselves through electromagnetic radiation initiated by the accretion flow. Some stellar-mass BHs drive relativistic jets when accreting matter from their companion stars, forming microquasars. Non-thermal emission from the radio to tera-electronvolt (TeV) gamma-ray band has been observed from microquasars, indicating the acceleration of relativistic particles. Here we report detection of four microquasars (SS 433, V4641 Sgr, GRS 1915+105, MAXI J1820+070) of spectrum extending to the ultrahigh-energy (UHE; photon energy $E>100$ TeV) band and one microquasar (Cygnus X-1) of spectrum approaching 100 TeV, using the Large High Altitude Air Shower Observatory (LHAASO). Notably, the total emission associated with SS 433 cannot be interpreted with a single leptonic component. In the UHE band, its emission is in spatial coincidence with a giant atomic cloud, which is consistent with a hadronic origin. An elongated source is discovered from V4641 Sgr with the spectrum continuing up to 800 TeV. The detection of UHE gamma rays demonstrates that accreting BHs and their environments can operate as extremely efficient accelerators of particles out of 1 peta-electronvolt (PeV), suggesting microquasars to be important contributors to Galactic cosmic rays especially around the `knee' region.

preprint2024arXiv

Rotating black hole mimicker surrounded by the string cloud

Traversable wormholes and regular black holes usually represent completely different scenarios. But in the black bounce spacetime they can be described by a same line element, which is very attractive. Furthermore, the black hole photos taken by EHT show that black holes have spin, so spin is an indispensable intrinsic property of black holes in the actual universe. In this work, we derive a rotating black hole mimicker surrounded by the string cloud (SC), which can be interpolated to represent regular black hole spacetime and traversable wormhole spacetime. We investigate the effect of the spin $a$ and SC parameter $L$ on the observables (shadow radius $R_s$ and distortion $δ_s$) and energy emission rate of the black hole mimicker surrounded by the SC. We find that shadow for this spacetime is very sensitive to the $L$, i.e., the SC parameter can significantly increase the boundary of the shadow.

preprint2022arXiv

A nonlinear weighted anisotropic total variation regularization for electrical impedance tomography

This paper proposes a nonlinear weighted anisotropic total variation (NWATV) regularization technique for electrical impedance tomography (EIT). The key idea is to incorporate the internal inhomogeneity information (e.g., edges of the detected objects) into the EIT reconstruction process, aiming to preserve the conductivity profiles (to be detected). We study the NWATV image reconstruction by employing a novel soft thresholding based reformulation included in the alternating direction method of multipliers (ADMM). To evaluate the proposed approach, 2D and 3D numerical experiments and human EIT lung imaging are carried out. It is demonstrated that the properties of the internal inhomogeneity are well preserved and improved with the proposed regularization approach, in comparison to traditional total variation (TV) and recently proposed fidelity embedded regularization approaches. Owing to the simplicity of the proposed method, the computational cost is significantly decreased compared with the well established primal-dual algorithm. Meanwhile, it was found that the proposed regularization method is quite robust to the measurement noise, which is one of the main uncertainties in EIT.

preprint2022arXiv

Attribute Artifacts Removal for Geometry-based Point Cloud Compression

Geometry-based point cloud compression (G-PCC) can achieve remarkable compression efficiency for point clouds. However, it still leads to serious attribute compression artifacts, especially under low bitrate scenarios. In this paper, we propose a Multi-Scale Graph Attention Network (MS-GAT) to remove the artifacts of point cloud attributes compressed by G-PCC. We first construct a graph based on point cloud geometry coordinates and then use the Chebyshev graph convolutions to extract features of point cloud attributes. Considering that one point may be correlated with points both near and far away from it, we propose a multi-scale scheme to capture the short- and long-range correlations between the current point and its neighboring and distant points. To address the problem that various points may have different degrees of artifacts caused by adaptive quantization, we introduce the quantization step per point as an extra input to the proposed network. We also incorporate a weighted graph attentional layer into the network to pay special attention to the points with more attribute artifacts. To the best of our knowledge, this is the first attribute artifacts removal method for G-PCC. We validate the effectiveness of our method over various point clouds. Objective comparison results show that our proposed method achieves an average of 9.74% BD-rate reduction compared with Predlift and 10.13% BD-rate reduction compared with RAHT. Subjective comparison results present that visual artifacts such as color shifting, blurring, and quantization noise are reduced.

preprint2022arXiv

CERL: A Unified Optimization Framework for Light Enhancement with Realistic Noise

Low-light images captured in the real world are inevitably corrupted by sensor noise. Such noise is spatially variant and highly dependent on the underlying pixel intensity, deviating from the oversimplified assumptions in conventional denoising. Existing light enhancement methods either overlook the important impact of real-world noise during enhancement, or treat noise removal as a separate pre- or post-processing step. We present \underline{C}oordinated \underline{E}nhancement for \underline{R}eal-world \underline{L}ow-light Noisy Images (CERL), that seamlessly integrates light enhancement and noise suppression parts into a unified and physics-grounded optimization framework. For the real low-light noise removal part, we customize a self-supervised denoising model that can easily be adapted without referring to clean ground-truth images. For the light enhancement part, we also improve the design of a state-of-the-art backbone. The two parts are then joint formulated into one principled plug-and-play optimization. Our approach is compared against state-of-the-art low-light enhancement methods both qualitatively and quantitatively. Besides standard benchmarks, we further collect and test on a new realistic low-light mobile photography dataset (RLMP), whose mobile-captured photos display heavier realistic noise than those taken by high-quality cameras. CERL consistently produces the most visually pleasing and artifact-free results across all experiments. Our RLMP dataset and codes are available at: https://github.com/VITA-Group/CERL.

preprint2022arXiv

Design, Uncertainty Analysis and Measurement of a Silicon-based Platelet THz Corrugated Horn

Platelets corrugated horn is a promising technology for their scalability to a large corrugated horn array. In this paper, we present the design, fabrication, measurement and uncertainty analysis of a wideband 170-320 GHz platelet corrugated horn that features with low sidelobe across the band (<-30 dB). We also propose an accurate and universal method to analyze the axial misalignment of the platelets for the first time. It is based on the mode matching (MM) method with a closed-form solution to off-axis circular waveguide discontinuities obtained by using Graf addition theorem for the Bessel functions. The uncertainties introduced in the fabrication have been quantitatively analyzed using the Monte Carlo method. The analysis shows the cross-polarization of the corrugated horn degrades significantly with the axial misalignment. It well explains the discrepancy between the designed and the measured cross-polarization of platelets corrugated horn fabricated in THz band. The method can be used to determine the fabrication tolerance needed for other THz corrugated horns and evaluate the impact of the corrugated horn for astronomical observations.

preprint2022arXiv

Flow-Guided Transformer for Video Inpainting

We propose a flow-guided transformer, which innovatively leverage the motion discrepancy exposed by optical flows to instruct the attention retrieval in transformer for high fidelity video inpainting. More specially, we design a novel flow completion network to complete the corrupted flows by exploiting the relevant flow features in a local temporal window. With the completed flows, we propagate the content across video frames, and adopt the flow-guided transformer to synthesize the rest corrupted regions. We decouple transformers along temporal and spatial dimension, so that we can easily integrate the locally relevant completed flows to instruct spatial attention only. Furthermore, we design a flow-reweight module to precisely control the impact of completed flows on each spatial transformer. For the sake of efficiency, we introduce window partition strategy to both spatial and temporal transformers. Especially in spatial transformer, we design a dual perspective spatial MHSA, which integrates the global tokens to the window-based attention. Extensive experiments demonstrate the effectiveness of the proposed method qualitatively and quantitatively. Codes are available at https://github.com/hitachinsk/FGT.

preprint2022arXiv

Local discontinuous Galerkin method for the Backward Feynman-Kac Equation

Anomalous diffusions are ubiquitous in nature, whose functional distributions are governed by the backward Feynman-Kac equation. In this paper, the local discontinuous Galerkin (LDG) method is used to solve the 2D backward Feynman-Kac equation in a rectangular domain. The spatial semi-discrete LDG scheme of the equivalent form (obtained by Laplace transform) of the original equation is established. After discussing the properties of the fractional substantial calculus, the stability and optimal convergence rates $O(h^{k+1})$ of the semi-discrete scheme are proved by choosing an appropriate generalized numerical flux. The $L1$ scheme on the graded meshes is used to deal with the weak singularity of the solution near the initial time. Based on the theoretical results of a semi-discrete scheme, we investigate the stability and convergence of the fully discrete scheme, which shows the optimal convergence rates $O(h^{k+1}+τ^{\min\{2-α,γδ\}})$. Numerical experiments are carried out to show the efficiency and accuracy of the proposed scheme. In addition, we also verify the effect of the central numerical flux on the convergence rates and the condition number of the coefficient matrix.

preprint2022arXiv

Motion-Focused Contrastive Learning of Video Representations

Motion, as the most distinct phenomenon in a video to involve the changes over time, has been unique and critical to the development of video representation learning. In this paper, we ask the question: how important is the motion particularly for self-supervised video representation learning. To this end, we compose a duet of exploiting the motion for data augmentation and feature learning in the regime of contrastive learning. Specifically, we present a Motion-focused Contrastive Learning (MCL) method that regards such duet as the foundation. On one hand, MCL capitalizes on optical flow of each frame in a video to temporally and spatially sample the tubelets (i.e., sequences of associated frame patches across time) as data augmentations. On the other hand, MCL further aligns gradient maps of the convolutional layers to optical flow maps from spatial, temporal and spatio-temporal perspectives, in order to ground motion information in feature learning. Extensive experiments conducted on R(2+1)D backbone demonstrate the effectiveness of our MCL. On UCF101, the linear classifier trained on the representations learnt by MCL achieves 81.91% top-1 accuracy, outperforming ImageNet supervised pre-training by 6.78%. On Kinetics-400, MCL achieves 66.62% top-1 accuracy under the linear protocol. Code is available at https://github.com/YihengZhang-CV/MCL-Motion-Focused-Contrastive-Learning.

preprint2022arXiv

Multiple-Objective Packet Routing Optimization for Aeronautical ad-hoc Networks

Providing Internet service above the clouds is of ever-increasing interest and in this context aeronautical {\it{ad-hoc}} networking (AANET) constitutes a promising solution. However, the optimization of packet routing in large ad hoc networks is quite challenging. In this paper, we develop a discrete $ε$ multi-objective genetic algorithm ($ε$-DMOGA) for jointly optimizing the end-to-end latency, the end-to-end spectral efficiency (SE), and the path expiration time (PET) that specifies how long the routing path can be relied on without re-optimizing the path. More specifically, a distance-based adaptive coding and modulation (ACM) scheme specifically designed for aeronautical communications is exploited for quantifying each link's achievable SE. Furthermore, the queueing delay at each node is also incorporated into the multiple-objective optimization metric. Our $ε$-DMOGA assisted multiple-objective routing optimization is validated by real historical flight data collected over the Australian airspace on two selected representative dates.

preprint2022arXiv

Neural Compression-Based Feature Learning for Video Restoration

How to efficiently utilize the temporal features is crucial, yet challenging, for video restoration. The temporal features usually contain various noisy and uncorrelated information, and they may interfere with the restoration of the current frame. This paper proposes learning noise-robust feature representations to help video restoration. We are inspired by that the neural codec is a natural denoiser. In neural codec, the noisy and uncorrelated contents which are hard to predict but cost lots of bits are more inclined to be discarded for bitrate saving. Therefore, we design a neural compression module to filter the noise and keep the most useful information in features for video restoration. To achieve robustness to noise, our compression module adopts a spatial channel-wise quantization mechanism to adaptively determine the quantization step size for each position in the latent. Experiments show that our method can significantly boost the performance on video denoising, where we obtain 0.13 dB improvement over BasicVSR++ with only 0.23x FLOPs. Meanwhile, our method also obtains SOTA results on video deraining and dehazing.

preprint2022arXiv

Recurrent Dynamic Embedding for Video Object Segmentation

Space-time memory (STM) based video object segmentation (VOS) networks usually keep increasing memory bank every several frames, which shows excellent performance. However, 1) the hardware cannot withstand the ever-increasing memory requirements as the video length increases. 2) Storing lots of information inevitably introduces lots of noise, which is not conducive to reading the most important information from the memory bank. In this paper, we propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. Specifically, we explicitly generate and update RDE by the proposed Spatio-temporal Aggregation Module (SAM), which exploits the cue of historical information. To avoid error accumulation owing to the recurrent usage of SAM, we propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. Moreover, the predicted masks in the memory bank are inaccurate due to the inaccurate network inference, which affects the segmentation of the query frame. To address this problem, we design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank. Extensive experiments show our method achieves the best tradeoff between performance and speed. Code is available at https://github.com/Limingxing00/RDE-VOS-CVPR2022.

preprint2022arXiv

Retinal Vessel Segmentation with Pixel-wise Adaptive Filters

Accurate retinal vessel segmentation is challenging because of the complex texture of retinal vessels and low imaging contrast. Previous methods generally refine segmentation results by cascading multiple deep networks, which are time-consuming and inefficient. In this paper, we propose two novel methods to address these challenges. First, we devise a light-weight module, named multi-scale residual similarity gathering (MRSG), to generate pixel-wise adaptive filters (PA-Filters). Different from cascading multiple deep networks, only one PA-Filter layer can improve the segmentation results. Second, we introduce a response cue erasing (RCE) strategy to enhance the segmentation accuracy. Experimental results on the DRIVE, CHASE_DB1, and STARE datasets demonstrate that our proposed method outperforms state-of-the-art methods while maintaining a compact structure. Code is available at https://github.com/Limingxing00/Retinal-Vessel-Segmentation-ISBI20222.

preprint2022arXiv

The ringing of quantum corrected Schwarzschild black hole with GUP

Schwarzschild black holes with quantum corrections are studied under scalar field perturbations and electromagnetic field perturbations to analyze the effect of the correction term on the potential function and quasinormal mode (QNM). In classical general relativity, spacetime is continuous and there is no existence of the so-called minimal length. The introduction of the correction items of the generalized uncertainty principle (GUP), the parameter $β$, can change the singularity structure of the black hole gauge and may lead to discretization in time and space. We apply the sixth-order WKB method to approximate the QNM of Schwarzschild black holes with quantum corrections and perform numerical analysis to derive the results of the method. Also, we find that the effective potential and QNM in scalar fields are larger than those in electromagnetic fields.

preprint2022arXiv

Towards Hybrid-Optimization Video Coding

Video coding is a mathematical optimization problem of rate and distortion essentially. To solve this complex optimization problem, two popular video coding frameworks have been developed: block-based hybrid video coding and end-to-end learned video coding. If we rethink video coding from the perspective of optimization, we find that the existing two frameworks represent two directions of optimization solutions. Block-based hybrid coding represents the discrete optimization solution because those irrelevant coding modes are discrete in mathematics. It searches for the best one among multiple starting points (i.e. modes). However, the search is not efficient enough. On the other hand, end-to-end learned coding represents the continuous optimization solution because the gradient descent is based on a continuous function. It optimizes a group of model parameters efficiently by the numerical algorithm. However, limited by only one starting point, it is easy to fall into the local optimum. To better solve the optimization problem, we propose to regard video coding as a hybrid of the discrete and continuous optimization problem, and use both search and numerical algorithm to solve it. Our idea is to provide multiple discrete starting points in the global space and optimize the local optimum around each point by numerical algorithm efficiently. Finally, we search for the global optimum among those local optimums. Guided by the hybrid optimization idea, we design a hybrid optimization video coding framework, which is built on continuous deep networks entirely and also contains some discrete modes. We conduct a comprehensive set of experiments. Compared to the continuous optimization framework, our method outperforms pure learned video coding methods. Meanwhile, compared to the discrete optimization framework, our method achieves comparable performance to HEVC reference software HM16.10 in PSNR.

preprint2021arXiv

Engineered Raman Lasing in Photonic Integrated Chalcogenide Microresonators

Chalcogenide glass (ChG) is an attractive material for integrated nonlinear photonics due to its wide transparency and high nonlinearity, and its capability of being directly deposited and patterned on Silicon wafer substrates. It has a singular Raman effect among amorphous materials. Yet, the Raman lasing performance in high quality and chip integrated ChG microresonators remains unexplored. Here, we demonstrate an engineered Raman lasing dynamic based on home developed photonic integrated high-Q ChG microresonators. With a quality factor above 10^6, we achieve the record-low lasing threshold 3.25 mW among integrated planar photonic platforms. Both the single-mode Raman lasers and a broadband Raman-Kerr comb are observed and characterized, which is dependent on the dispersion of our flexible photonic platform and engineered via tuning the waveguide geometric size. The tunability of such a chipscale Raman laser is also demonstrated through tuning the pump wavelength and tuning the operating temperature on the chip. This allows for the access of single-mode lasing at arbitrary wavelengths in the range 1615-1755 nm. Our results may contribute to the understanding of rich Raman and Kerr nonlinear interactions in dissipative and nonlinear microresonators, and on application aspect, may pave a way to chip-scale efficient Raman lasers that is highly desired in spectroscopic applications in the infrared.

preprint2021arXiv

Marangoni Convection-Driven Laser Fountains and Waves on Free Surfaces of Liquids

It is well accepted that an outward Marangoni convection from a low surface tension region will make the surface depressed. Here, we report that this established perception is only valid for thin liquid films. Using surface laser heating, we show that in deep liquids a laser beam actually pulls up the fluid above the free surface generating fountains with different shapes. Whereas with decreasing liquid depth a transition from fountain to indentation with fountain in-indentation is observed. Further, high-speed imaging reveals a transient surface process before steady elevation is formed, and this dynamic deformation is subsequently utilized to resonantly excite giant surface waves by a modulated laser beam. Computational fluid dynamics models reveal the underlying flow patterns and quantify the depth-dependent and time-resolved surface deformations. Our discoveries and techniques have upended the century-old perception and opened up a new regime of interdisciplinary research and applications of Marangoni-induced interface phenomena and optocapillary fluidic surfaces-the control of fluids with light.

preprint2021arXiv

Robust Classification using Hidden Markov Models and Mixtures of Normalizing Flows

We test the robustness of a maximum-likelihood (ML) based classifier where sequential data as observation is corrupted by noise. The hypothesis is that a generative model, that combines the state transitions of a hidden Markov model (HMM) and the neural network based probability distributions for the hidden states of the HMM, can provide a robust classification performance. The combined model is called normalizing-flow mixture model based HMM (NMM-HMM). It can be trained using a combination of expectation-maximization (EM) and backpropagation. We verify the improved robustness of NMM-HMM classifiers in an application to speech recognition.

preprint2021arXiv

Soft magnetic microrobot doped with porous silica for stability-enhanced multimodal locomotion in nonideal environment

As an emerging field of robotics, magnetic-field-controlled soft microrobot has broad application prospects for its flexibility, locomotion diversity as well as remote controllability. Magnetic soft microrobots can perform multimodal locomotion under the control of a magnetic field, which may have potential applications in precision medicine. However, previous researches mainly focus on new locomotion in a relatively ideal environment, lacking exploration on the ability of magnetic microrobot locomotion to resist external disturbances and proceed in a nonideal environment. Here, a porous silica-doped soft magnetic microrobot is constructed for enhanced stability of multimodal locomotion in the nonideal biological environment. Porous silica spheres are doped into NdFeB-silicone elastomer base, improving adhesion properties as well as refining the comprehensive mechanical properties of the microrobot. Multimodal locomotions are achieved, and the influence of porous silica doping on the stability of each locomotion in nonideal environment is explored in depth. Motions in nonideal circumstances such as climbing, loading, current rushing, wind blowing, and obstacle hindering are conducted successfully with porous silica doping. Such a stability-enhanced multimodal locomotion system can be used in biocatalysis as well as thrombus removal, and its prospect for precision medicine is highlighted by in vivo demonstration of multimodal locomotion with nonideal disturbance.

preprint2021arXiv

Structural engineering from an inverse problems perspective

The field of structural engineering is vast, spanning areas from the design of new infrastructure to the assessment of existing infrastructure. From the onset, traditional entry-level university courses teach students to analyse structural response given data including external forces, geometry, member sizes, restraint, etc. -- characterising a forward problem (structural causalities $\to$ structural response). Shortly thereafter, junior engineers are introduced to structural design where they aim to, for example, select an appropriate structural form for members based on design criteria, which is the inverse of what they previously learned. Similar inverse realisations also hold true in structural health monitoring and a number of structural engineering sub-fields (response $\to$ structural causalities). In this light, we aim to demonstrate that many structural engineering sub-fields may be fundamentally or partially viewed as inverse problems and thus benefit via the rich and established methodologies from the inverse problems community. To this end, we conclude that the future of inverse problems in structural engineering is inexorably linked to engineering education and machine learning developments.

preprint2021arXiv

Synergy Between Semantic Segmentation and Image Denoising via Alternate Boosting

The capability of image semantic segmentation may be deteriorated due to noisy input image, where image denoising prior to segmentation helps. Both image denoising and semantic segmentation have been developed significantly with the advance of deep learning. Thus, we are interested in the synergy between them by using a holistic deep model. We observe that not only denoising helps combat the drop of segmentation accuracy due to noise, but also pixel-wise semantic information boosts the capability of denoising. We then propose a boosting network to perform denoising and segmentation alternately. The proposed network is composed of multiple segmentation and denoising blocks (SDBs), each of which estimates semantic map then uses the map to regularize denoising. Experimental results show that the denoised image quality is improved substantially and the segmentation accuracy is improved to close to that of clean images. Our code and models will be made publicly available.

preprint2020arXiv

$α$ Belief Propagation for Approximate Inference

Belief propagation (BP) algorithm is a widely used message-passing method for inference in graphical models. BP on loop-free graphs converges in linear time. But for graphs with loops, BP's performance is uncertain, and the understanding of its solution is limited. To gain a better understanding of BP in general graphs, we derive an interpretable belief propagation algorithm that is motivated by minimization of a localized $α$-divergence. We term this algorithm as $α$ belief propagation ($α$-BP). It turns out that $α$-BP generalizes standard BP. In addition, this work studies the convergence properties of $α$-BP. We prove and offer the convergence conditions for $α$-BP. Experimental simulations on random graphs validate our theoretical results. The application of $α$-BP to practical problems is also demonstrated.

preprint2020arXiv

2-Local derivations on the Super Virasoro algebra and Super W(2,2) algebra

The present paper is devoted to study 2-local superderivations on the super Virasoro algebra and the super W(2,2) algebra. We prove that all 2-local superderivations on the super Virasoro algebra as well as the super W(2,2) algebra are (global) superderivations.

preprint2020arXiv

A Game Theoretic Analysis of LQG Control under Adversarial Attack

Motivated by recent works addressing adversarial attacks on deep reinforcement learning, a deception attack on linear quadratic Gaussian control is studied in this paper. In the considered attack model, the adversary can manipulate the observation of the agent subject to a mutual information constraint. The adversarial problem is formulated as a novel dynamic cheap talk game to capture the strategic interaction between the adversary and the agent, the asymmetry of information availability, and the system dynamics. Necessary and sufficient conditions are provided for subgame perfect equilibria to exist in pure strategies and in behavioral strategies; and characteristics of the equilibria and the resulting control rewards are given. The results show that pure strategy equilibria are informative, while only babbling equilibria exist in behavioral strategies. Numerical results are shown to illustrate the impact of strategic adversarial interaction.

preprint2020arXiv

Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive Keypoint Estimates

The typical bottom-up human pose estimation framework includes two stages, keypoint detection and grouping. Most existing works focus on developing grouping algorithms, e.g., associative embedding, and pixel-wise keypoint regression that we adopt in our approach. We present several schemes that are rarely or unthoroughly studied before for improving keypoint detection and grouping (keypoint regression) performance. First, we exploit the keypoint heatmaps for pixel-wise keypoint regression instead of separating them for improving keypoint regression. Second, we adopt a pixel-wise spatial transformer network to learn adaptive representations for handling the scale and orientation variance to further improve keypoint regression quality. Last, we present a joint shape and heatvalue scoring scheme to promote the estimated poses that are more likely to be true poses. Together with the tradeoff heatmap estimation loss for balancing the background and keypoint pixels and thus improving heatmap estimation quality, we get the state-of-the-art bottom-up human pose estimation result. Code is available at https://github.com/HRNet/HRNet-Bottom-up-Pose-Estimation.

preprint2020arXiv

Classification of simple Harish-Chandra modules over the N=1 Ramond algebra

In this paper, we give a new approach to classify all simple Harish-Chandra modules for the N=1 Ramond algebra based on the so called A-cover theory developed in \cite{BF}

preprint2020arXiv

Deep High-Resolution Representation Learning for Visual Recognition

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions \emph{in series} (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams \emph{in parallel}; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at~{\url{https://github.com/HRNet}}.

preprint2020arXiv

Dual Temporal Memory Network for Efficient Video Object Segmentation

Video Object Segmentation (VOS) is typically formulated in a semi-supervised setting. Given the ground-truth segmentation mask on the first frame, the task of VOS is to track and segment the single or multiple objects of interests in the rest frames of the video at the pixel level. One of the fundamental challenges in VOS is how to make the most use of the temporal information to boost the performance. We present an end-to-end network which stores short- and long-term video sequence information preceding the current frame as the temporal memories to address the temporal modeling in VOS. Our network consists of two temporal sub-networks including a short-term memory sub-network and a long-term memory sub-network. The short-term memory sub-network models the fine-grained spatial-temporal interactions between local regions across neighboring frames in video via a graph-based learning framework, which can well preserve the visual consistency of local regions over time. The long-term memory sub-network models the long-range evolution of object via a Simplified-Gated Recurrent Unit (S-GRU), making the segmentation be robust against occlusions and drift errors. In our experiments, we show that our proposed method achieves a favorable and competitive performance on three frequently-used VOS datasets, including DAVIS 2016, DAVIS 2017 and Youtube-VOS in terms of both speed and accuracy.

preprint2020arXiv

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

The dominant speech separation models are based on complex recurrent or convolution neural network that model speech sequences indirectly conditioning on context, such as passing information through many intermediate states in recurrent neural network, leading to suboptimal separation performance. In this paper, we propose a dual-path transformer network (DPTNet) for end-to-end speech separation, which introduces direct context-awareness in the modeling for speech sequences. By introduces a improved transformer, elements in speech sequences can interact directly, which enables DPTNet can model for the speech sequences with direct context-awareness. The improved transformer in our approach learns the order information of the speech sequences without positional encodings by incorporating a recurrent neural network into the original transformer. In addition, the structure of dual paths makes our model efficient for extremely long speech sequence modeling. Extensive experiments on benchmark datasets show that our approach outperforms the current state-of-the-arts (20.6 dB SDR on the public WSj0-2mix data corpus).

preprint2020arXiv

Efficient Integer-Arithmetic-Only Convolutional Neural Networks

Integer-arithmetic-only networks have been demonstrated effective to reduce computational cost and to ensure cross-platform consistency. However, previous works usually report a decline in the inference accuracy when converting well-trained floating-point-number (FPN) networks into integer networks. We analyze this phonomenon and find that the decline is due to activation quantization. Specifically, when we replace conventional ReLU with Bounded ReLU, how to set the bound for each neuron is a key problem. Considering the tradeoff between activation quantization error and network learning ability, we set an empirical rule to tune the bound of each Bounded ReLU. We also design a mechanism to handle the cases of feature map addition and feature map concatenation. Based on the proposed method, our trained 8-bit integer ResNet outperforms the 8-bit networks of Google's TensorFlow and NVIDIA's TensorRT for image recognition. We also experiment on VDSR for image super-resolution and on VRCNN for compression artifact reduction, both of which serve for regression tasks that natively require high inference accuracy. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPUs. Our code and models can be found at github.com/HengRuiZ/brelu.

preprint2020arXiv

Foreground-Background Imbalance Problem in Deep Object Detectors: A Review

Recent years have witnessed the remarkable developments made by deep learning techniques for object detection, a fundamentally challenging problem of computer vision. Nevertheless, there are still difficulties in training accurate deep object detectors, one of which is owing to the foreground-background imbalance problem. In this paper, we survey the recent advances about the solutions to the imbalance problem. First, we analyze the characteristics of the imbalance problem in different kinds of deep detectors, including one-stage and two-stage ones. Second, we divide the existing solutions into two categories: sampling heuristics and non-sampling schemes, and review them in detail. Third, we experimentally compare the performance of some state-of-the-art solutions on the COCO benchmark. Promising directions for future work are also discussed.

preprint2020arXiv

Graph Neural Networks for Massive MIMO Detection

In this paper, we innovately use graph neural networks (GNNs) to learn a message-passing solution for the inference task of massive multiple multiple-input multiple-output (MIMO) detection in wireless communication. We adopt a graphical model based on the Markov random field (MRF) where belief propagation (BP) yields poor results when it assumes a uniform prior over the transmitted symbols. Numerical simulations show that, under the uniform prior assumption, our GNN-based MIMO detection solution outperforms the minimum mean-squared error (MMSE) baseline detector, in contrast to BP. Furthermore, experiments demonstrate that the performance of the algorithm slightly improves by incorporating MMSE information into the prior.

preprint2020arXiv

Is There Tradeoff between Spatial and Temporal in Video Super-Resolution?

Recent advances of deep learning lead to great success of image and video super-resolution (SR) methods that are based on convolutional neural networks (CNN). For video SR, advanced algorithms have been proposed to exploit the temporal correlation between low-resolution (LR) video frames, and/or to super-resolve a frame with multiple LR frames. These methods pursue higher quality of super-resolved frames, where the quality is usually measured frame by frame in e.g. PSNR. However, frame-wise quality may not reveal the consistency between frames. If an algorithm is applied to each frame independently (which is the case of most previous methods), the algorithm may cause temporal inconsistency, which can be observed as flickering. It is a natural requirement to improve both frame-wise fidelity and between-frame consistency, which are termed spatial quality and temporal quality, respectively. Then we may ask, is a method optimized for spatial quality also optimized for temporal quality? Can we optimize the two quality metrics jointly?

preprint2020arXiv

Learning Trailer Moments in Full-Length Movies

A movie's key moments stand out of the screenplay to grab an audience's attention and make movie browsing efficient. But a lack of annotations makes the existing approaches not applicable to movie key moment detection. To get rid of human annotations, we leverage the officially-released trailers as the weak supervision to learn a model that can detect the key moments from full-length movies. We introduce a novel ranking network that utilizes the Co-Attention between movies and trailers as guidance to generate the training pairs, where the moments highly corrected with trailers are expected to be scored higher than the uncorrelated moments. Additionally, we propose a Contrastive Attention module to enhance the feature representations such that the comparative contrast between features of the key and non-key moments are maximized. We construct the first movie-trailer dataset, and the proposed Co-Attention assisted ranking network shows superior performance even over the supervised approach. The effectiveness of our Contrastive Attention module is also demonstrated by the performance improvement over the state-of-the-art on the public benchmarks.

preprint2020arXiv

Neural Network based Explicit Mixture Models and Expectation-maximization based Learning

We propose two neural network based mixture models in this article. The proposed mixture models are explicit in nature. The explicit models have analytical forms with the advantages of computing likelihood and efficiency of generating samples. Computation of likelihood is an important aspect of our models. Expectation-maximization based algorithms are developed for learning parameters of the proposed models. We provide sufficient conditions to realize the expectation-maximization based learning. The main requirements are invertibility of neural networks that are used as generators and Jacobian computation of functional form of the neural networks. The requirements are practically realized using a flow-based neural network. In our first mixture model, we use multiple flow-based neural networks as generators. Naturally the model is complex. A single latent variable is used as the common input to all the neural networks. The second mixture model uses a single flow-based neural network as a generator to reduce complexity. The single generator has a latent variable input that follows a Gaussian mixture distribution. We demonstrate efficiency of proposed mixture models through extensive experiments for generating samples and maximum likelihood based classification.

preprint2020arXiv

On Dominant Interference in Random Networks and Communication Reliability

In this paper, we study the characteristics of dominant interference power with directional reception in a random network modelled by a Poisson Point Process. Additionally, the Laplace functional of cumulative interference excluding the $n$ dominant interferers is also derived, which turns out to be a generalization of omni-directional reception and complete accumulative interference. As an application of these results, we study the impact of directional receivers in random networks in terms of outage probability and error probability with queue length constraint.

preprint2020arXiv

Optimizing electrode positions in 2D Electrical Impedance Tomography using deep learning

Electrical Impedance Tomography (EIT) is a powerful tool for non-destructive evaluation, state estimation, and process tomography - among numerous other use cases. For these applications, and in order to reliably reconstruct images of a given process using EIT, we must obtain high-quality voltage measurements from the target of interest. As such, it is obvious that the locations of electrodes used for measuring plays a key role in this task. Yet, to date, methods for optimally placing electrodes either require knowledge on the EIT target (which is, in practice, never fully known) or are computationally difficult to implement numerically. In this paper, we circumvent these challenges and present a straightforward deep learning based approach for optimizing electrodes positions. It is found that the optimized electrode positions outperformed "standard" uniformly-distributed electrode layouts in all test cases. Further, it is found that the use of optimized electrode positions computed using the approach derived herein can reduce errors in EIT reconstructions as well as improve the distinguishability of EIT measurements.

preprint2020arXiv

Optimizing Wireless Systems Using Unsupervised and Reinforced-Unsupervised Deep Learning

Resource allocation and transceivers in wireless networks are usually designed by solving optimization problems subject to specific constraints, which can be formulated as variable or functional optimization. If the objective and constraint functions of a variable optimization problem can be derived, standard numerical algorithms can be applied for finding the optimal solution, which however incur high computational cost when the dimension of the variable is high. To reduce the on-line computational complexity, learning the optimal solution as a function of the environment's status by deep neural networks (DNNs) is an effective approach. DNNs can be trained under the supervision of optimal solutions, which however, is not applicable to the scenarios without models or for functional optimization where the optimal solutions are hard to obtain. If the objective and constraint functions are unavailable, reinforcement learning can be applied to find the solution of a functional optimization problem, which is however not tailored to optimization problems in wireless networks. In this article, we introduce unsupervised and reinforced-unsupervised learning frameworks for solving both variable and functional optimization problems without the supervision of the optimal solutions. When the mathematical model of the environment is completely known and the distribution of environment's status is known or unknown, we can invoke unsupervised learning algorithm. When the mathematical model of the environment is incomplete, we introduce reinforced-unsupervised learning algorithms that learn the model by interacting with the environment. Our simulation results confirm the applicability of these learning frameworks by taking a user association problem as an example.

preprint2020arXiv

Powering Hidden Markov Model by Neural Network based Generative Models

Hidden Markov model (HMM) has been successfully used for sequential data modeling problems. In this work, we propose to power the modeling capacity of HMM by bringing in neural network based generative models. The proposed model is termed as GenHMM. In the proposed GenHMM, each HMM hidden state is associated with a neural network based generative model that has tractability of exact likelihood and provides efficient likelihood computation. A generative model in GenHMM consists of mixture of generators that are realized by flow models. A learning algorithm for GenHMM is proposed in expectation-maximization framework. The convergence of the learning GenHMM is analyzed. We demonstrate the efficiency of GenHMM by classification tasks on practical sequential data. Code available at https://github.com/FirstHandScientist/genhmm.

preprint2020arXiv

Propagation of a plane-strain hydraulic fracture accounting for a rough cohesive zone

The quasi-brittle nature of rocks challenges the basic assumptions of linear hydraulic fracture mechanics (LHFM): linear elastic fracture mechanics and smooth parallel plates lubrication fluid flow. We relax these hypotheses and investigate the growth of a plane-strain hydraulic fracture in an impermeable medium accounting for a rough cohesive zone and a fluid lag. In addition to a dimensionless toughness and the time-scale of coalescence of the fluid and fracture fronts as in the LHFM case, the solution now also depends on the in-situ-to-cohesive stress ratio and the intensity of the flow deviation induced by aperture roughness. The solution is appropriately described by a nucleation time-scale, which delineates the fracture growth into a nucleation phase, an intermediate stage and a late time stage where convergence toward LHFM predictions finally occurs. A highly non-linear hydro-mechanical coupling takes place as the fluid front enters the rough cohesive zone which itself evolves during the nucleation and intermediate stages. This coupling leads to significant additional viscous flow dissipation. As a result, the fracture evolution deviates from LHFM solutions with shorter fracture lengths, larger widths and net pressures. These deviations ultimately decrease at late times as the lag and cohesive zone fractions both become smaller. The deviations increase with larger dimensionless toughness and in-situ-to-cohesive stress ratio, as both further localize viscous dissipation near the fluid front located in the rough cohesive zone. The convergence toward LHFM can occur at very late time for realistic values of in-situ-to-cohesive stress ratio encountered at depth. The impact of a rough cohesive zone appears to be prominent for laboratory experiments and short in-situ injections in quasi-brittle rocks with ultimately a larger energy demand compared to LHFM predictions.

preprint2020arXiv

Region-based Energy Neural Network for Approximate Inference

Region-based free energy was originally proposed for generalized belief propagation (GBP) to improve loopy belief propagation (loopy BP). In this paper, we propose a neural network based energy model for inference in general Markov random fields (MRFs), which directly minimizes the region-based free energy defined on region graphs. We term our model Region-based Energy Neural Network (RENN). Unlike message-passing algorithms, RENN avoids iterative message propagation and is faster. Also different from recent deep neural network based models, inference by RENN does not require sampling, and RENN works on general MRFs. RENN can also be employed for MRF learning. Our experiments on marginal distribution estimation, partition function estimation, and learning of MRFs show that RENN outperforms the mean field method, loopy BP, GBP, and the state-of-the-art neural network based model.

preprint2020arXiv

SSFN -- Self Size-estimating Feed-forward Network with Low Complexity, Limited Need for Human Intervention, and Consistent Behaviour across Trials

We design a self size-estimating feed-forward network (SSFN) using a joint optimization approach for estimation of number of layers, number of nodes and learning of weight matrices. The learning algorithm has a low computational complexity, preferably within few minutes using a laptop. In addition the algorithm has a limited need for human intervention to tune parameters. SSFN grows from a small-size network to a large-size network, guaranteeing a monotonically non-increasing cost with addition of nodes and layers. The learning approach uses judicious a combination of `lossless flow property' of some activation functions, convex optimization and instance of random matrix. Consistent performance -- low variation across Monte-Carlo trials -- is found for inference performance (classification accuracy) and estimation of network size.

preprint2020arXiv

Time-lapse reconstruction of the fracture front from diffracted waves arrivals in laboratory hydraulic fracture experiments

4D acoustic imaging via an array of 32 sources / 32 receivers is used to monitor hydraulic fracture propagating in a 250~mm cubic specimen under a true-triaxial state of stress. We present a method based on the arrivals of diffracted waves to reconstruct the fracture geometry (and fluid front when distinct from the fracture front). Using Bayesian model selection, we rank different possible fracture geometries (radial, elliptical, tilted or not) and estimate model error. The imaging is repeated every 4 seconds and provide a quantitative measurement of the growth of these low velocity fractures. We test the proposed method on two experiments performed in two different rocks (marble and gabbro) under experimental conditions characteristic respectively of the fluid lag-viscosity (marble) and toughness (gabbro) dominated hydraulic fracture propagation regimes. In both experiments, about 150 to 200 source-receiver combinations exhibit clear diffracted wave arrivals. The results of the inversion indicate a radial geometry evolving slightly into an ellipse towards the end of the experiment when the fractures feel the specimen boundaries. The estimated modelling error with all models is of the order of the wave arrival picking error. Posterior estimates indicate an uncertainty of the order of a millimeter on the fracture front location for a given acquisition sequence. The reconstructed fracture evolution from diffracted waves is shown to be consistent with the analysis of $90^{\circ}$ incidence transmitted waves across the growing fracture.

preprint2020arXiv

Transferring and Regularizing Prediction for Semantic Segmentation

Semantic segmentation often requires a large set of images with pixel-level annotations. In the view of extremely expensive expert labeling, recent research has shown that the models trained on photo-realistic synthetic data (e.g., computer games) with computer-generated annotations can be adapted to real images. Despite this progress, without constraining the prediction on real images, the models will easily overfit on synthetic data due to severe domain mismatch. In this paper, we novelly exploit the intrinsic properties of semantic segmentation to alleviate such problem for model transfer. Specifically, we present a Regularizer of Prediction Transfer (RPT) that imposes the intrinsic properties as constraints to regularize model transfer in an unsupervised fashion. These constraints include patch-level, cluster-level and context-level semantic prediction consistencies at different levels of image formation. As the transfer is label-free and data-driven, the robustness of prediction is addressed by selectively involving a subset of image regions for model regularization. Extensive experiments are conducted to verify the proposal of RPT on the transfer of models trained on GTA5 and SYNTHIA (synthetic data) to Cityscapes dataset (urban street scenes). RPT shows consistent improvements when injecting the constraints on several neural networks for semantic segmentation. More remarkably, when integrating RPT into the adversarial-based segmentation framework, we report to-date the best results: mIoU of 53.2%/51.7% when transferring from GTA5/SYNTHIA to Cityscapes, respectively.

preprint2020arXiv

Will Scale-free Popularity Develop Scale-free Geo-social Networks?

Empirical results show that spatial factors such as distance, population density and communication range affect our social activities, also reflected by the development of ties in social networks. This motivates the need for social network models that take these spatial factors into account. Therefore, in this paper we propose a gravity-low-based geo-social network model, where connections develop according to the popularity of the individuals, but are constrained through their geographic distance and the surrounding population density. Specifically, we consider a power-law distributed popularity, and random node positions governed by a Poisson point process. We evaluate the characteristics of the emerging networks, considering the degree distribution, the average degree of neighbors and the local clustering coefficient. These local metrics reflect the robustness of the network, the information dissemination speed and the communication locality. We show that unless the communication range is strictly limited, the emerging networks are scale-free, with a rank exponent affected by the spatial factors. Even the average neighbor degree and the local clustering coefficient show tendencies known in non-geographic scale-free networks, at least when considering individuals with low popularity. At high-popularity values, however, the spatial constraints lead to popularity-independent average neighbor degrees and clustering coefficients.

preprint2019arXiv

A Comprehensive Benchmark for Single Image Compression Artifacts Reduction

We present a comprehensive study and evaluation of existing single image compression artifacts removal algorithms, using a new 4K resolution benchmark including diversified foreground objects and background scenes with rich structures, called Large-scale Ideal Ultra high definition 4K (LIU4K) benchmark. Compression artifacts removal, as a common post-processing technique, aims at alleviating undesirable artifacts such as blockiness, ringing, and banding caused by quantization and approximation in the compression process. In this work, a systematic listing of the reviewed methods is presented based on their basic models (handcrafted models and deep networks). The main contributions and novelties of these methods are highlighted, and the main development directions, including architectures, multi-domain sources, signal structures, and new targeted units, are summarized. Furthermore, based on a unified deep learning configuration (i.e. same training data, loss function, optimization algorithm, etc.), we evaluate recent deep learning-based methods based on diversified evaluation measures. The experimental results show the state-of-the-art performance comparison of existing methods based on both full-reference, non-reference and task-driven metrics. Our survey would give a comprehensive reference source for future research on single image compression artifacts removal and inspire new directions of the related fields.

preprint2019arXiv

Deep Learning-Based Video Coding: A Review and A Case Study

The past decade has witnessed great success of deep learning technology in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. This paper reviews the representative works about using deep learning for image/video coding, which has been an actively developing research area since the year of 2015. We divide the related works into two categories: new coding schemes that are built primarily upon deep networks (deep schemes), and deep network-based coding tools (deep tools) that shall be used within traditional coding schemes or together with traditional coding tools. For deep schemes, pixel probability modeling and auto-encoder are the two approaches, that can be viewed as predictive coding scheme and transform coding scheme, respectively. For deep tools, there have been several proposed techniques using deep learning to perform intra-picture prediction, inter-picture prediction, cross-channel prediction, probability distribution prediction, transform, post- or in-loop filtering, down- and up-sampling, as well as encoding optimizations. In the hope of advocating the research of deep learning-based video coding, we present a case study of our developed prototype video codec, namely Deep Learning Video Coding (DLVC). DLVC features two deep tools that are both based on convolutional neural network (CNN), namely CNN-based in-loop filter (CNN-ILF) and CNN-based block adaptive resolution coding (CNN-BARC). Both tools help improve the compression efficiency by a significant margin. With the two deep tools as well as other non-deep coding tools, DLVC is able to achieve on average 39.6\% and 33.0\% bits saving than HEVC, under random-access and low-delay configurations, respectively. The source code of DLVC has been released for future researches.

preprint2019arXiv

On The Classification-Distortion-Perception Tradeoff

Signal degradation is ubiquitous and computational restoration of degraded signal has been investigated for many years. Recently, it is reported that the capability of signal restoration is fundamentally limited by the perception-distortion tradeoff, i.e. the distortion and the perceptual difference between the restored signal and the ideal `original' signal cannot be made both minimal simultaneously. Distortion corresponds to signal fidelity and perceptual difference corresponds to perceptual naturalness, both of which are important metrics in practice. Besides, there is another dimension worthy of consideration, namely the semantic quality or the utility for recognition purpose, of the restored signal. In this paper, we extend the previous perception-distortion tradeoff to the case of classification-distortion-perception (CDP) tradeoff, where we introduced the classification error rate of the restored signal in addition to distortion and perceptual difference. Two versions of the CDP tradeoff are considered, one using a predefined classifier and the other dealing with the optimal classifier for the restored signal. For both versions, we can rigorously prove the existence of the CDP tradeoff, i.e. the distortion, perceptual difference, and classification error rate cannot be made all minimal simultaneously. Our findings can be useful especially for computer vision researches where some low-level vision tasks (signal restoration) serve for high-level vision tasks (visual understanding).

preprint2019arXiv

Photoluminescence mapping and time-domain thermo-photoluminescence for rapid imaging and measurement of thermal conductivity of boron arsenide

Cubic boron arsenide (BAs) is attracting greater attention due to the recent experimental demonstration of ultrahigh thermal conductivity \k{appa} above 1000 W/mK. However, its bandgap has not been settled and a simple yet effective method to probe its crystal quality is missing. Furthermore, traditional \k{appa} measurement methods are destructive and time consuming, thus they cannot meet the urgent demand for fast screening of high \k{appa} materials. After we experimentally established 1.82 eV as the indirect bandgap of BAs and observed room-temperature band-edge photoluminescence, we developed two new optical techniques that can provide rapid and non-destructive characterization of \k{appa} with little sample preparation: photoluminescence mapping (PL-mapping) and time-domain thermo-photoluminescence (TDTP). PL-mapping provides nearly real-time image of crystal quality and \k{appa} over mm-sized crystal surfaces; while TDTP allows us to pick up any spot on the sample surface and measure its \k{appa} using nanosecond laser pulses. These new techniques reveal that the apparent single crystals are not only non-uniform in \k{appa}, but also are made of domains of very distinct \k{appa}. Because PL-mapping and TDTP are based on the band-edge PL and its dependence on temperature, they can be applied to other semiconductors, thus paving the way for rapid identification and development of high-\k{appa} semiconducting materials.

preprint2019arXiv

Two-Stream Action Recognition-Oriented Video Super-Resolution

We study the video super-resolution (SR) problem for facilitating video analytics tasks, e.g. action recognition, instead of for visual quality. The popular action recognition methods based on convolutional networks, exemplified by two-stream networks, are not directly applicable on video of low spatial resolution. This can be remedied by performing video SR prior to recognition, which motivates us to improve the SR procedure for recognition accuracy. Tailored for two-stream action recognition networks, we propose two video SR methods for the spatial and temporal streams respectively. On the one hand, we observe that regions with action are more important to recognition, and we propose an optical-flow guided weighted mean-squared-error loss for our spatial-oriented SR (SoSR) network to emphasize the reconstruction of moving objects. On the other hand, we observe that existing video SR methods incur temporal discontinuity between frames, which also worsens the recognition accuracy, and we propose a siamese network for our temporal-oriented SR (ToSR) training that emphasizes the temporal continuity between consecutive frames. We perform experiments using two state-of-the-art action recognition networks and two well-known datasets--UCF101 and HMDB51. Results demonstrate the effectiveness of our proposed SoSR and ToSR in improving recognition accuracy.

preprint2016arXiv

Cache-enabled Heterogeneous Cellular Networks: Comparison and Tradeoffs

Caching popular contents at base stations (BSs) is a promising way to unleash the potential of cellular heterogeneous networks (HetNets), where backhaul has become a bottleneck. In this paper, we compare a cache-enabled HetNet where a tier of multi-antenna macro BSs is overlaid by a tier of helper nodes having caches but no backhaul with a conventional HetNet where the macro BSs tier is overlaid by a tier of pico BSs with limited-capacity backhaul. We resort stochastic geometry theory to derive the area spectral efficiencies (ASEs) of these two kinds of HetNets and obtain the closed-form expressions under a special case. We use numerical results to show that the helper density is only 1/4 of the pico BS density to achieve the same target ASE, and the helper density can be further reduced by increasing cache capacity. With given total cache capacity within an area, there exists an optimal helper node density that maximizes the ASE.

preprint2016arXiv

Energy Efficiency of Downlink Networks with Caching at Base Stations

Caching popular contents at base stations (BSs) can reduce the backhaul cost and improve the network throughput. Yet whether locally caching at the BSs can improve the energy efficiency (EE), a major goal for 5th generation cellular networks, remains unclear. Due to the entangled impact of various factors on EE such as interference level, backhaul capacity, BS density, power consumption parameters, BS sleeping, content popularity and cache capacity, another important question is what are the key factors that contribute more to the EE gain from caching. In this paper, we attempt to explore the potential of EE of the cache-enabled wireless access networks and identify the key factors. By deriving closed-form expression of the approximated EE, we provide the condition when the EE can benefit from caching, find the optimal cache capacity that maximizes the network EE, and analyze the maximal EE gain brought by caching. We show that caching at the BSs can improve the network EE when power efficient cache hardware is used. When local caching has EE gain over not caching, caching more contents at the BSs may not provide higher EE. Numerical and simulation results show that the caching EE gain is large when the backhaul capacity is stringent, interference level is low, content popularity is skewed, and when caching at pico BSs instead of macro BSs.

preprint2016arXiv

Gravitational wave astronomy: the current status

In the centenary year of Einstein's General Theory of Relativity, this paper reviews the current status of gravitational wave astronomy across a spectrum which stretches from attohertz to kilohertz frequencies. Sect. 1 of this paper reviews the historical development of gravitational wave astronomy from Einstein's first prediction to our current understanding the spectrum. It is shown that detection of signals in the audio frequency spectrum can be expected very soon, and that a north-south pair of next generation detectors would provide large scientific benefits. Sect. 2 reviews the theory of gravitational waves and the principles of detection using laser interferometry. The state of the art Advanced LIGO detectors are then described. These detectors have a high chance of detecting the first events in the near future. Sect. 3 reviews the KAGRA detector currently under development in Japan, which will be the first laser interferometer detector to use cryogenic test masses. Sect. 4 of this paper reviews gravitational wave detection in the nanohertz frequency band using the technique of pulsar timing. Sect. 5 reviews the status of gravitational wave detection in the attohertz frequency band, detectable in the polarisation of the cosmic microwave background, and discusses the prospects for detection of primordial waves from the big bang. The techniques described in sects. 1-5 have already placed significant limits on the strength of gravitational wave sources. Sects. 6 and 7 review ambitious plans for future space based gravitational wave detectors in the millihertz frequency band. Sect. 6 presents a roadmap for development of space based gravitational wave detectors by China while sect. 7 discusses a key enabling technology for space interferometry known as time delay interferometry.

preprint2016arXiv

Insulating nature of strongly correlated massless Dirac fermions in an organic crystal

Through resistivity measurements of an organic crystal hosting massless Dirac fermions with a charge-ordering instability, we reveal the effect of interactions among Dirac fermions on the charge transport. A low-temperature resistivity upturn appears robustly irrespectively of pressure and is enhanced while approaching the critical pressure of charge ordering, indicating that the insulating behavior originates from short-range Coulomb interactions. Observation of apparently vanishing gap in the charge-ordered phase accords with the theoretical prediction of the non-topological edge states.

preprint2016arXiv

Optimal Content Placement for Offloading in Cache-enabled Heterogeneous Wireless Networks

Caching at base stations (BSs) is a promising way to offload traffic and eliminate backhaul bottleneck in heterogeneous networks (HetNets). In this paper, we investigate the optimal content placement maximizing the successful offloading probability in a cache-enabled HetNet where a tier of multi-antenna macro BSs (MBSs) is overlaid with a tier of helpers with caches. Based on probabilistic caching framework, we resort to stochastic geometry theory to derive the closed-form successful offloading probability and formulate the caching probability optimization problem, which is not concave in general. In two extreme cases with high and low user-to-helper density ratios, we obtain the optimal caching probability and analyze the impacts of BS density and transmit power of the two tiers and the signal-to-interference-plus-noise ratio (SINR) threshold. In general case, we obtain the optimal caching probability that maximizes the lower bound of successful offloading probability and analyze the impact of user density. Simulation and numerical results show that when the ratios of MBS-to-helper density, MBS-to-helper transmit power and user-to-helper density, and the SINR threshold are large, the optimal caching policy tends to cache the most popular files everywhere.

preprint2016arXiv

Spin excitations in the quasi-two-dimensional charge-ordered insulator $α$-(BEDT-TTF)$_2$I$_3$ probed via $^{13}$C NMR

The spin excitations from the nonmagnetic charge-ordered insulating state of $α$-(BEDT-TTF)$_2$I$_3$ at ambient pressure have been investigated by probing the static and low-frequency dynamic spin susceptibilities via site-selective nuclear magnetic resonance at $^{13}$C sites. The site-dependent values of the shift and the spin-lattice relaxation rate $1/T_1$ below the charge-ordering transition temperature ($T_{CO} \approx$ 135 K) demonstrate a spin density imbalance in the unit cell, in accord with the charge-density ratio reported earlier. The shift and $1/T_1$ show activated temperature dependence with a static (shift) gap $Δ_S \approx$ 47-52 meV and a dynamic ($1/T_1$) gap $Δ_R \approx$ 40 meV. The sizes of the gaps are well described in terms of a localized spin model, where spin one-half antiferromagnetic dimer chains are weakly coupled with each other.

preprint2016arXiv

The next detectors for gravitational wave astronomy

This paper focuses on the next detectors for gravitational wave astronomy which will be required after the current ground based detectors have completed their initial observations, and probably achieved the first direct detection of gravitational waves. The next detectors will need to have greater sensitivity, while also enabling the world array of detectors to have improved angular resolution to allow localisation of signal sources. Sect. 1 of this paper begins by reviewing proposals for the next ground based detectors, and presents an analysis of the sensitivity of an 8 km armlength detector, which is proposed as a safe and cost-effective means to attain a 4-fold improvement in sensitivity. The scientific benefits of creating a pair of such detectors in China and Australia is emphasised. Sect. 2 of this paper discusses the high performance suspension systems for test masses that will be an essential component for future detectors, while sect. 3 discusses solutions to the problem of Newtonian noise which arise from fluctuations in gravity gradient forces acting on test masses. Such gravitational perturbations cannot be shielded, and set limits to low frequency sensitivity unless measured and suppressed. Sects. 4 and 5 address critical operational technologies that will be ongoing issues in future detectors. Sect. 4 addresses the design of thermal compensation systems needed in all high optical power interferometers operating at room temperature. Parametric instability control is addressed in sect. 5. Only recently proven to occur in Advanced LIGO, parametric instability phenomenon brings both risks and opportunities for future detectors. The path to future enhancements of detectors will come from quantum measurement technologies. Sect. 6 focuses on the use of optomechanical devices for obtaining enhanced sensitivity, while sect. 7 reviews a range of quantum measurement options.

preprint2015arXiv

Classification of Harish-Chandra modules over some Lie algebras related to the Virasoro algebra

In this paper, we provide a uniform method to thoroughly classify all Harish-Chandra modules over some Lie algebras related to the Virasoro algebras. We first classify such modules over the Lie algebra $W(\varrho)[s]$ for $s=0,\frac12$. With this result and method, we can also do such works for some Lie algebras and superconformal algebras related to the Virasoro algebra, including the several kinds of Schrödinger-Virasoro Lie algebras, which are open up to now.

preprint2015arXiv

EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video

Event-specific concepts are the semantic concepts designed for the events of interest, which can be used as a mid-level representation of complex events in videos. Existing methods only focus on defining event-specific concepts for a small number of predefined events, but cannot handle novel unseen events. This motivates us to build a large scale event-specific concept library that covers as many real-world events and their concepts as possible. Specifically, we choose WikiHow, an online forum containing a large number of how-to articles on human daily life events. We perform a coarse-to-fine event discovery process and discover 500 events from WikiHow articles. Then we use each event name as query to search YouTube and discover event-specific concepts from the tags of returned videos. After an automatic filter process, we end up with 95,321 videos and 4,490 concepts. We train a Convolutional Neural Network (CNN) model on the 95,321 videos over the 500 events, and use the model to extract deep learning feature from video content. With the learned deep learning feature, we train 4,490 binary SVM classifiers as the event-specific concept library. The concepts and events are further organized in a hierarchical structure defined by WikiHow, and the resultant concept library is called EventNet. Finally, the EventNet concept library is used to generate concept based representation of event videos. To the best of our knowledge, EventNet represents the first video event ontology that organizes events and their concepts into a semantic structure. It offers great potential for event retrieval and browsing. Extensive experiments over the zero-shot event retrieval task when no training samples are available show that the EventNet concept library consistently and significantly outperforms the state-of-the-art (such as the 20K ImageNet concepts trained with CNN) by a large margin up to 207%.

preprint2015arXiv

Representations of the affine-Virasoro algebra of type $A_1$

In this paper, we classify all irreducible weight modules with finite-dimensional weight spaces over the affine-Virasoro Lie algebra of type $A_1$.

preprint2014arXiv

Building A Large Concept Bank for Representing Events in Video

Concept-based video representation has proven to be effective in complex event detection. However, existing methods either manually design concepts or directly adopt concept libraries not specifically designed for events. In this paper, we propose to build Concept Bank, the largest concept library consisting of 4,876 concepts specifically designed to cover 631 real-world events. To construct the Concept Bank, we first gather a comprehensive event collection from WikiHow, a collaborative writing project that aims to build the world's largest manual for any possible How-To event. For each event, we then search Flickr and discover relevant concepts from the tags of the returned images. We train a Multiple Kernel Linear SVM for each discovered concept as a concept detector in Concept Bank. We organize the concepts into a five-layer tree structure, in which the higher-level nodes correspond to the event categories while the leaf nodes are the event-specific concepts discovered for each event. Based on such tree ontology, we develop a semantic matching method to select relevant concepts for each textual event query, and then apply the corresponding concept detectors to generate concept-based video representations. We use TRECVID Multimedia Event Detection 2013 and Columbia Consumer Video open source event definitions and videos as our test sets and show very promising results on two video event detection tasks: event modeling over concept space and zero-shot event retrieval. To the best of our knowledge, this is the largest concept library covering the largest number of real-world events.

preprint2014arXiv

Estimation of conductivity changes in a region of interest with electrical impedance tomography

This paper proposes a novel approach to reconstruct changes in a target conductivity from electrical impedance tomography measurements. As in the conventional difference imaging, the reconstruction of the conductivity change is based on electrical potential measurements from the exterior boundary of the target before and after the change. In this paper, however, images of the conductivity before and after the change are reconstructed simultaneously based on the two data sets. The key feature of the approach is that the conductivity after the change is parameterized as a linear combination of the initial state and the change. This allows for modeling independently the spatial characteristics of the background conductivity and the change of the conductivity - by separate regularization functionals. The approach also allows in a straightforward way the restriction of the conductivity change to a localized region of interest inside the domain. While conventional difference imaging reconstruction is based on a global linearization of the observation model, the proposed approach amounts to solving a non-linear inverse problem. The feasibility of the proposed reconstruction method is tested experimentally and with a simulation which demonstrates a potential new medical application of electrical impedance tomography: imaging of vocal folds in voice loading studies.

preprint2014arXiv

Field and long-term demonstration of a wide area quantum key distribution network

A wide area quantum key distribution (QKD) network deployed on communication infrastructures provided by China Mobile Ltd. is demonstrated. Three cities and two metropolitan area QKD networks were linked up to form the Hefei-Chaohu-Wuhu wide area QKD network with over 150 kilometers coverage area, in which Hefei metropolitan area QKD network was a typical full-mesh core network to offer all-to-all interconnections, and Wuhu metropolitan area QKD network was a representative quantum access network with point-to-multipoint configuration. The whole wide area QKD network ran for more than 5000 hours, from 21 December 2011 to 19 July 2012, and part of the network stopped until last December. To adapt to the complex and volatile field environment, the Faraday-Michelson QKD system with several stability measures was adopted when we designed QKD devices. Through standardized design of QKD devices, resolution of symmetry problem of QKD devices, and seamless switching in dynamic QKD network, we realized the effective integration between point-to-point QKD techniques and networking schemes.

preprint2013arXiv

$\propto$SVM for learning with label proportions

We study the problem of learning with label proportions in which the training data is provided in groups and only the proportion of each class in each group is known. We propose a new method called proportion-SVM, or $\propto$SVM, which explicitly models the latent unknown instance labels together with the known group label proportions in a large-margin framework. Unlike the existing works, our approach avoids making restrictive assumptions about the data. The $\propto$SVM model leads to a non-convex integer programming problem. In order to solve it efficiently, we propose two algorithms: one based on simple alternating optimization and the other based on a convex relaxation. Extensive experiments on standard datasets show that $\propto$SVM outperforms the state-of-the-art, especially for larger group sizes.

preprint2012arXiv

Field test of the wavelength-saving quantum key distribution network

We propose a wavelength-saving topology of quantum key distribution(QKD) network based on passive optical elements, and report the field test of this network on the commercial telecom optical fiber. In this network, 5 nodes are supported with 2 wavelengths, and every two nodes can share secure keys directly at the same time. All QKD links in the network operate at the frequency of 20 MHz. We also characterized the insertion loss and crosstalk effects on the point-to-point QKD system after introducing this QKD network.

preprint2012arXiv

Lie bialgebra structures on the twisted Heisenberg-Virasoro algebra

In this paper we investigate Lie bialgebra structures on the twisted Heisenberg-Virasoro algebra. With the classifications of Lie bialgebra structures on the Virasoro algebra, we determined such structures on the twisted Heisenberg-Virasoro algebra. Moreover, some general and useful results are obtained. With our methods and results we also can easily to determine such structures on some Lie algebras related to the twisted Heisenberg-Virasoro algebra.

preprint2011arXiv

Attacking practical quantum key distribution system with wavelength dependent beam splitter and multi-wavelength sources

Unconditional security of quantum key distribution protocol can be guaranteed by the basic property of quantum mechanics. Unfortunately, the practical quantum key distribution system always have some imperfections, and the practical system may be attacked if the imperfection can be controlled by the eavesdropper Eve. Applying the fatal security loophole introduced by the imperfect beam splitter's wavelength dependent optical property, we propose wavelength-dependent attacking model, which can be applied to almost all practical quantum key distribution systems with the passive state modulation and photon state detection after the practical beam splitter. Utilizing our attacking model, we experimentally demonstrate the attacking system based on practical polarization encoding quantum key distribution system with almost 100% success probability. Our result demonstrate that all practical devices require tightened security inspection for avoiding side channel attacks in practical quantum key distribution experimental realizations.

preprint2011arXiv

Leibniz superalgebras graded by finite root systems

The structure of Lie algebras, Lie superalgebras and Leibniz algebras graded by finite root systems has been studied by several researchers since 1992. In this paper, we study the structure of Leibniz superalgebras graded by finite root systems, which gives an approach to study various classes of Leibniz superalgebras.

preprint2011arXiv

Lie superbialgebra structures on the N=2 superconformal Neveu-Schwarz algebra

In this paper, Lie superbialgebra structures on the N=2 superconformal Neveu-Schwarz algebra are considered by a very simple method. We prove that every Lie superbialgebra structure on the algebra is triangular coboundary.

preprint2009arXiv

Leibniz Algebras Graded by Finite Root Systems

There are several researches on Lie algebras and Lie superalgebras graded by finite root systems. In this paper, we study Leibniz algebras graded by finite root systems and obtain some results in simply-laced cases.

preprint2009arXiv

Whittaker Modules for the twisted Heisenberg-Virasoro Algebra

We define Whittaker modules for the twisted Heisenberg-Virasoro algebra and obtain analogues to several results from the classical setting, including a classification of simple Whittaker modules by central characters.

preprint2008arXiv

Classification of irreducible weight modules over $W$-algebra W(2,2)

We show that the support of an irreducible weight module over the $W$-algebra $W(2, 2)$, which has an infinite dimensional weight space, coincides with the weight lattice and that all nontrivial weight spaces of such a module are infinite dimensional. As a corollary, we obtain that every irreducible weight module over the the $W$-algebra $W(2, 2)$, having a nontrivial finite dimensional weight space, is a Harish-Chandra module (and hence is either an irreducible highest or lowest weight module or an irreducible module of the intermediate series).

preprint2008arXiv

Harish-Chandra Modules Over the Twisted Heisenberg-Virasoro Algebra

In this paper, we classify all indecomposable Harish-Chandra modules of the intermediate series over the twisted Heisenberg-Virasoro algebra. Meanwhile, some bosonic modules are also studied.

Dong Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

74 published item(s)

A Proof-of-Concept Study of Multitask Learning for Cranial Synthetic CT Generation Across Heterogeneous MRI Field Strengths

Ultrahigh-Energy Gamma-ray Emission Associated with Black Hole-Jet Systems

Rotating black hole mimicker surrounded by the string cloud

A nonlinear weighted anisotropic total variation regularization for electrical impedance tomography

Attribute Artifacts Removal for Geometry-based Point Cloud Compression

CERL: A Unified Optimization Framework for Light Enhancement with Realistic Noise

Design, Uncertainty Analysis and Measurement of a Silicon-based Platelet THz Corrugated Horn

Flow-Guided Transformer for Video Inpainting

Local discontinuous Galerkin method for the Backward Feynman-Kac Equation

Motion-Focused Contrastive Learning of Video Representations

Multiple-Objective Packet Routing Optimization for Aeronautical ad-hoc Networks

Neural Compression-Based Feature Learning for Video Restoration

Recurrent Dynamic Embedding for Video Object Segmentation

Retinal Vessel Segmentation with Pixel-wise Adaptive Filters

The ringing of quantum corrected Schwarzschild black hole with GUP

Towards Hybrid-Optimization Video Coding

Engineered Raman Lasing in Photonic Integrated Chalcogenide Microresonators

Marangoni Convection-Driven Laser Fountains and Waves on Free Surfaces of Liquids

Robust Classification using Hidden Markov Models and Mixtures of Normalizing Flows

Soft magnetic microrobot doped with porous silica for stability-enhanced multimodal locomotion in nonideal environment

Structural engineering from an inverse problems perspective

Synergy Between Semantic Segmentation and Image Denoising via Alternate Boosting

$α$ Belief Propagation for Approximate Inference

2-Local derivations on the Super Virasoro algebra and Super W(2,2) algebra

A Game Theoretic Analysis of LQG Control under Adversarial Attack

Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive Keypoint Estimates

Classification of simple Harish-Chandra modules over the N=1 Ramond algebra

Deep High-Resolution Representation Learning for Visual Recognition

Dual Temporal Memory Network for Efficient Video Object Segmentation

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

Efficient Integer-Arithmetic-Only Convolutional Neural Networks

Foreground-Background Imbalance Problem in Deep Object Detectors: A Review

Graph Neural Networks for Massive MIMO Detection

Is There Tradeoff between Spatial and Temporal in Video Super-Resolution?

Learning Trailer Moments in Full-Length Movies

Neural Network based Explicit Mixture Models and Expectation-maximization based Learning

On Dominant Interference in Random Networks and Communication Reliability

Optimizing electrode positions in 2D Electrical Impedance Tomography using deep learning

Optimizing Wireless Systems Using Unsupervised and Reinforced-Unsupervised Deep Learning

Powering Hidden Markov Model by Neural Network based Generative Models

Propagation of a plane-strain hydraulic fracture accounting for a rough cohesive zone

Region-based Energy Neural Network for Approximate Inference

SSFN -- Self Size-estimating Feed-forward Network with Low Complexity, Limited Need for Human Intervention, and Consistent Behaviour across Trials

Time-lapse reconstruction of the fracture front from diffracted waves arrivals in laboratory hydraulic fracture experiments

Transferring and Regularizing Prediction for Semantic Segmentation

Will Scale-free Popularity Develop Scale-free Geo-social Networks?

A Comprehensive Benchmark for Single Image Compression Artifacts Reduction

Deep Learning-Based Video Coding: A Review and A Case Study

On The Classification-Distortion-Perception Tradeoff

Photoluminescence mapping and time-domain thermo-photoluminescence for rapid imaging and measurement of thermal conductivity of boron arsenide

Two-Stream Action Recognition-Oriented Video Super-Resolution

Cache-enabled Heterogeneous Cellular Networks: Comparison and Tradeoffs

Energy Efficiency of Downlink Networks with Caching at Base Stations

Gravitational wave astronomy: the current status

Insulating nature of strongly correlated massless Dirac fermions in an organic crystal

Optimal Content Placement for Offloading in Cache-enabled Heterogeneous Wireless Networks

Spin excitations in the quasi-two-dimensional charge-ordered insulator $α$-(BEDT-TTF)$_2$I$_3$ probed via $^{13}$C NMR

The next detectors for gravitational wave astronomy

Classification of Harish-Chandra modules over some Lie algebras related to the Virasoro algebra

EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video

Representations of the affine-Virasoro algebra of type $A_1$

Building A Large Concept Bank for Representing Events in Video

Estimation of conductivity changes in a region of interest with electrical impedance tomography

Field and long-term demonstration of a wide area quantum key distribution network

$\propto$SVM for learning with label proportions

Field test of the wavelength-saving quantum key distribution network

Lie bialgebra structures on the twisted Heisenberg-Virasoro algebra

Attacking practical quantum key distribution system with wavelength dependent beam splitter and multi-wavelength sources

Leibniz superalgebras graded by finite root systems

Lie superbialgebra structures on the N=2 superconformal Neveu-Schwarz algebra

Leibniz Algebras Graded by Finite Root Systems

Whittaker Modules for the twisted Heisenberg-Virasoro Algebra

Classification of irreducible weight modules over $W$-algebra W(2,2)

Harish-Chandra Modules Over the Twisted Heisenberg-Virasoro Algebra