Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
45works
0followers
28topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

45 published item(s)

preprint2026arXiv

Caracal: Causal Architecture via Spectral Mixing

The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attention with a parameter-efficient, O(L log(L)) Multi-Head Fourier (MHF) module. Our contributions are threefold: (1) We leverage the Fast Fourier Transform (FFT) for sequence mixing, inherently addressing both bottlenecks mentioned above. (2) We apply a frequency-domain causal masking technique that enforces autoregressive capabilities via asymmetric padding and truncation, overcoming a critical barrier for Fourier-based generative models. (3) Unlike efficient models relying on hardware-specific implementations (e.g., Mamba), we uses standard library operators. This ensures robust portability, eliminating common deployment barriers. Evaluations demonstrate that Caracal performs competitively with Transformer and SSM baselines, offering a scalable and simple pathway for efficient long-sequence modeling. Code is available in Appendix.

preprint2023arXiv

Understanding Hyperdimensional Computing for Parallel Single-Pass Learning

Hyperdimensional computing (HDC) is an emerging learning paradigm that computes with high dimensional binary vectors. It is attractive because of its energy efficiency and low latency, especially on emerging hardware -- but HDC suffers from low model accuracy, with little theoretical understanding of what limits its performance. We propose a new theoretical analysis of the limits of HDC via a consideration of what similarity matrices can be "expressed" by binary vectors, and we show how the limits of HDC can be approached using random Fourier features (RFF). We extend our analysis to the more general class of vector symbolic architectures (VSA), which compute with high-dimensional vectors (hypervectors) that are not necessarily binary. We propose a new class of VSAs, finite group VSAs, which surpass the limits of HDC. Using representation theory, we characterize which similarity matrices can be "expressed" by finite group VSA hypervectors, and we show how these VSAs can be constructed. Experimental results show that our RFF method and group VSA can both outperform the state-of-the-art HDC model by up to 7.6\% while maintaining hardware efficiency.

preprint2022arXiv

Asymptotic Uncertainty of False Discovery Proportion

Multiple testing has been a popular topic in statistical research. Although vast works have been done, controlling the false discoveries remains a challenging task when the corresponding test statistics are dependent. Various methods have been proposed to estimate the false discovery proportion (FDP) under arbitrary dependence among the test statistics. One of the main ideas is to reduce arbitrary dependence to weak dependence and then to establish theoretically the strong consistency of the FDP and false discovery rate (FDR) under weak dependence. As a consequence, FDPs share the same asymptotic limit in the framework of weak dependence. We observe that the asymptotic variance of the FDP, however, may rely heavily on the dependence structure of the corresponding test statistics even when they are only weakly dependent; and it is of great practical value to quantify this variability, as it can serve as an indicator of the quality of the FDP estimate from the given data. As far as we are aware, the research on this respect is still limited in the literature. In this paper, we first derive the asymptotic expansion of FDP under mild regularity conditions and then examine how the asymptotic variance of FDP varies under different dependence structures both theoretically and numerically. With the observations in this study, we recommend that in a multiple testing performed by an FDP procedure, we may report both the mean and the variance estimates of FDP to enrich the study outcome.

preprint2022arXiv

Context-Based MEC Platform for Augmented-Reality Services in 5G Networks

Augmented reality (AR) has drawn great attention in recent years. However, current AR devices have drawbacks, e.g., weak computation ability and large power consumption. To solve the problem, mobile edge computing (MEC) can be introduced as a key technology to offload data and computation from AR devices to MEC servers via 5th Generation Mobile Communication Technology (5G) networks. To this end, a context-based MEC platform for AR services in 5G networks is proposed in this paper. On the platform, MEC is employed as a data processing center while AR devices are simplified as universal input/output devices, which overcomes their limitations and achieves better user experience. Moreover, the proof-of-concept (PoC) hardware prototype of the platform, and two typical use cases providing AR services of navigation and face recognition respectively are implemented to demonstrate the feasibility and effectiveness of the platform. Finally, the performance of the platform is also numerically evaluated, and the results validate the system design and agree well with the design expectations.

preprint2022arXiv

DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection

Although the YOLOv2 method is extremely fast on object detection, its detection accuracy is restricted due to the low performance of its backbone network and the underutilization of multi-scale region features. Therefore, a dense connection (DC) and spatial pyramid pooling (SPP) based YOLO (DC-SPP-YOLO) method for ameliorating the object detection accuracy of YOLOv2 is proposed in this paper. Specifically, the dense connection of convolution layers is employed in the backbone network of YOLOv2 to strengthen the feature extraction and alleviate the vanishing-gradient problem. Moreover, an improved spatial pyramid pooling is introduced to pool and concatenate the multi-scale region features, so that the network can learn the object features more comprehensively. The DC-SPP-YOLO model is established and trained based on a new loss function composed of MSE (mean square error) loss and cross-entropy loss. The experimental results indicated that the mAP (mean Average Precision) of DC-SPP-YOLO is higher than that of YOLOv2 on the PASCAL VOC datasets and the UA-DETRAC datasets. The effectiveness of DC-SPP-YOLO method proposed is demonstrated.

preprint2022arXiv

DeepCloth: Neural Garment Representation for Shape and Style Editing

Garment representation, editing and animation are challenging topics in the area of computer vision and graphics. It remains difficult for existing garment representations to achieve smooth and plausible transitions between different shapes and topologies. In this work, we introduce, DeepCloth, a unified framework for garment representation, reconstruction, animation and editing. Our unified framework contains 3 components: First, we represent the garment geometry with a "topology-aware UV-position map", which allows for the unified description of various garments with different shapes and topologies by introducing an additional topology-aware UV-mask for the UV-position map. Second, to further enable garment reconstruction and editing, we contribute a method to embed the UV-based representations into a continuous feature space, which enables garment shape reconstruction and editing by optimization and control in the latent space, respectively. Finally, we propose a garment animation method by unifying our neural garment representation with body shape and pose, which achieves plausible garment animation results leveraging the dynamic information encoded by our shape and style representation, even under drastic garment editing operations. To conclude, with DeepCloth, we move a step forward in establishing a more flexible and general 3D garment digitization framework. Experiments demonstrate that our method can achieve state-of-the-art garment representation performance compared with previous methods.

preprint2022arXiv

Design and Analysis of Robust Resilient Diffusion over Multi-Task Networks Against Byzantine Attacks

This paper studies distributed diffusion adaptation over clustered multi-task networks in the presence of impulsive interferences and Byzantine attacks. We develop a robust resilient diffusion least mean Geman-McClure-estimation (RDLMG) algorithm based on the cost function used by the Geman-McClure estimator, which can reduce the sensitivity to large outliers and make the algorithm robust under impulsive interferences. Moreover, the mean sub-sequence reduced method, in which each node discards the extreme value information of cost contributions received from its neighbors, can make the network resilient against Byzantine attacks. In this regard, the proposed RDLMG algorithm ensures that all normal nodes converge to their ideal states with cooperation among nodes. A statistical analysis of the RDLMG algorithm is also carried out in terms of mean and mean-square performances. Numerical results evaluate the proposed RDLMG algorithm in applications to multi-target localization and multi-task spectrum sensing.

preprint2022arXiv

DoubleField: Bridging the Neural Surface and Radiance Fields for High-fidelity Human Reconstruction and Rendering

We introduce DoubleField, a novel framework combining the merits of both surface field and radiance field for high-fidelity human reconstruction and rendering. Within DoubleField, the surface field and radiance field are associated together by a shared feature embedding and a surface-guided sampling strategy. Moreover, a view-to-view transformer is introduced to fuse multi-view features and learn view-dependent features directly from high-resolution inputs. With the modeling power of DoubleField and the view-to-view transformer, our method significantly improves the reconstruction quality of both geometry and appearance, while supporting direct inference, scene-specific high-resolution finetuning, and fast rendering. The efficacy of DoubleField is validated by the quantitative evaluations on several datasets and the qualitative results in a real-world sparse multi-view system, showing its superior capability for high-quality human model reconstruction and photo-realistic free-viewpoint human rendering. Data and source code will be made public for the research purpose. Please refer to our project page: http://www.liuyebin.com/dbfield/dbfield.html.

preprint2022arXiv

DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization

Transformer-based models have achieved state-of-the-art performance on short-input summarization. However, they still struggle with summarizing longer text. In this paper, we present DYLE, a novel dynamic latent extraction approach for abstractive long-input summarization. DYLE jointly trains an extractor and a generator and treats the extracted text snippets as the latent variable, allowing dynamic snippet-level attention weights during decoding. To provide adequate supervision, we propose simple yet effective heuristics for oracle extraction as well as a consistency loss term, which encourages the extractor to approximate the averaged dynamic weights predicted by the generator. We evaluate our method on different long-document and long-dialogue summarization tasks: GovReport, QMSum, and arXiv. Experiment results show that DYLE outperforms all existing methods on GovReport and QMSum, with gains up to 6.1 ROUGE, while yielding strong results on arXiv. Further analysis shows that the proposed dynamic weights provide interpretability of our generation process.

preprint2022arXiv

FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset

We present FaceVerse, a fine-grained 3D Neural Face Model, which is built from hybrid East Asian face datasets containing 60K fused RGB-D images and 2K high-fidelity 3D head scan models. A novel coarse-to-fine structure is proposed to take better advantage of our hybrid dataset. In the coarse module, we generate a base parametric model from large-scale RGB-D images, which is able to predict accurate rough 3D face models in different genders, ages, etc. Then in the fine module, a conditional StyleGAN architecture trained with high-fidelity scan models is introduced to enrich elaborate facial geometric and texture details. Note that different from previous methods, our base and detailed modules are both changeable, which enables an innovative application of adjusting both the basic attributes and the facial details of 3D face models. Furthermore, we propose a single-image fitting framework based on differentiable rendering. Rich experiments show that our method outperforms the state-of-the-art methods.

preprint2022arXiv

Full-Duplex Aerial Communication System for Multiple UAVs with Directional Antennas

UAV-based wireless systems, such as wireless relay and remote sensing, have attracted great attentions from academia and industry. To realize them, a high-performance wireless aerial communication system, which bridges UAVs and ground stations, is one of the key enablers. However, there are still issues hindering its development, such as the severe co-channel interference among UAVs, and the limited payload/battery-life of UAVs. To address the challenges, we propose an aerial communication system which enables system-level full-duplex communication of multiple UAVs with lower hardware complexities than ideal full-duplex communication systems. In the proposed system, each channel is re-assigned to the uplink and downlink of a pair of UAVs, and each UAV employ a pair of separated channels for its uplink and downlink. The co-channel interference between UAVs that reuse same channels is eliminated by exploiting advantages of UAVs' maneuverability and high-gain directional antennas equipped in UAVs and ground stations, so that dedicated cancellers are not necessary in the proposed system. The system design and performance analysis are given, and the simulation results well agree with the designs.

preprint2022arXiv

Geometry-aware Single-image Full-body Human Relighting

Single-image human relighting aims to relight a target human under new lighting conditions by decomposing the input image into albedo, shape and lighting. Although plausible relighting results can be achieved, previous methods suffer from both the entanglement between albedo and lighting and the lack of hard shadows, which significantly decrease the realism. To tackle these two problems, we propose a geometry-aware single-image human relighting framework that leverages single-image geometry reconstruction for joint deployment of traditional graphics rendering and neural rendering techniques. For the de-lighting, we explore the shortcomings of UNet architecture and propose a modified HRNet, achieving better disentanglement between albedo and lighting. For the relighting, we introduce a ray tracing-based per-pixel lighting representation that explicitly models high-frequency shadows and propose a learning-based shading refinement module to restore realistic shadows (including hard cast shadows) from the ray-traced shading maps. Our framework is able to generate photo-realistic high-frequency shadows such as cast shadows under challenging lighting conditions. Extensive experiments demonstrate that our proposed method outperforms previous methods on both synthetic and real images.

preprint2022arXiv

Giant Microwave Sensitivity of Magnetic Array by Long-Range Chiral Interaction Driven Skin Effect

Non-Hermitian skin effect was observed in one-dimensional systems with short-range chiral interaction. Long-range chiral interaction mediated by traveling waves also favors the accumulation of energy, but has not yet showed non-Hermitian topology. Here we find that the strong interference brought by the wave propagation is detrimental for accumulation. By suppression of interference via the damping of traveling waves, we predict the non-Hermitian skin effect of magnetic excitation in a periodic array of magnetic nanowires that are coupled chirally via spin waves of thin magnetic films. The local excitation of a wire at one edge by weak microwaves of magnitude $\sim μ{\rm T}$ leads to a considerable spin-wave amplitude at the other edge, i.e. a remarkable functionality useful for sensitive, non-local, and non-reciprocal detection of microwaves.

preprint2022arXiv

GIMO: Gaze-Informed Human Motion Prediction in Context

Predicting human motion is critical for assistive robots and AR/VR applications, where the interaction with humans needs to be safe and comfortable. Meanwhile, an accurate prediction depends on understanding both the scene context and human intentions. Even though many works study scene-aware human motion prediction, the latter is largely underexplored due to the lack of ego-centric views that disclose human intent and the limited diversity in motion and scenes. To reduce the gap, we propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, as well as ego-centric views with the eye gaze that serves as a surrogate for inferring human intent. By employing inertial sensors for motion capture, our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects. We perform an extensive study of the benefits of leveraging the eye gaze for ego-centric human motion prediction with various state-of-the-art architectures. Moreover, to realize the full potential of the gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches. Our network achieves the top performance in human motion prediction on the proposed dataset, thanks to the intent information from eye gaze and the denoised gaze feature modulated by the motion. Code and data can be found at https://github.com/y-zheng18/GIMO.

preprint2022arXiv

Ground Experiment of Full-Duplex Multi-UAV System Enabled by Directional Antennas

A high performance multi-UAV communication system, which bridges multiple UAVs and ground station, is one of the key enablers to realize a variety of UAV-based systems. To address the issues such as the low spectrum efficiency caused by the co-channel interference, we have proposed a spectrum-efficient full-duplex multi-UA V communication system with low hardware complexity. In this paper, on-ground experiments are conducted to confirm the feasibility and effectiveness of the key feature of the proposed system, i.e., co-channel interference cancellation among UAVs by directional antennas and UAV position control, instead of energy-consuming dedicated self-interference cancellers on UAVs in traditional full-duplex systems. Channel power of interference link between a pair of two UAVs reusing the same channel is measured, and the achievable channel capacity is also measured by a prototype system implemented by software-defined radio devices. The results of different antennas and different antenna heights are also compared. The experimental results agree well with the designs and confirm the feasibility and effectiveness of the proposed system. This ground experiment is a work in progress to provide preliminary results for the multi-UAV-based experiments in the air in the future.

preprint2022arXiv

HVTR: Hybrid Volumetric-Textural Rendering for Human Avatars

We propose a novel neural rendering pipeline, Hybrid Volumetric-Textural Rendering (HVTR), which synthesizes virtual human avatars from arbitrary poses efficiently and at high quality. First, we learn to encode articulated human motions on a dense UV manifold of the human body surface. To handle complicated motions (e.g., self-occlusions), we then leverage the encoded information on the UV manifold to construct a 3D volumetric representation based on a dynamic pose-conditioned neural radiance field. While this allows us to represent 3D geometry with changing topology, volumetric rendering is computationally heavy. Hence we employ only a rough volumetric representation using a pose-conditioned downsampled neural radiance field (PD-NeRF), which we can render efficiently at low resolutions. In addition, we learn 2D textural features that are fused with rendered volumetric features in image space. The key advantage of our approach is that we can then convert the fused features into a high-resolution, high-quality avatar by a fast GAN-based textural renderer. We demonstrate that hybrid rendering enables HVTR to handle complicated motions, render high-quality avatars under user-controlled poses/shapes and even loose clothing, and most importantly, be efficient at inference time. Our experimental results also demonstrate state-of-the-art quantitative results.

preprint2022arXiv

Interacting Attention Graph for Single Image Two-Hand Reconstruction

Graph convolutional network (GCN) has achieved great success in single hand reconstruction task, while interacting two-hand reconstruction by GCN remains unexplored. In this paper, we present Interacting Attention Graph Hand (IntagHand), the first graph convolution based network that reconstructs two interacting hands from a single RGB image. To solve occlusion and interaction challenges of two-hand reconstruction, we introduce two novel attention based modules in each upsampling step of the original GCN. The first module is the pyramid image feature attention (PIFA) module, which utilizes multiresolution features to implicitly obtain vertex-to-image alignment. The second module is the cross hand attention (CHA) module that encodes the coherence of interacting hands by building dense cross-attention between two hand vertices. As a result, our model outperforms all existing two-hand reconstruction methods by a large margin on InterHand2.6M benchmark. Moreover, ablation studies verify the effectiveness of both PIFA and CHA modules for improving the reconstruction accuracy. Results on in-the-wild images and live video streams further demonstrate the generalization ability of our network. Our code is available at https://github.com/Dw1010/IntagHand.

preprint2022arXiv

MCTensor: A High-Precision Deep Learning Library with Multi-Component Floating-Point

In this paper, we introduce MCTensor, a library based on PyTorch for providing general-purpose and high-precision arithmetic for DL training. MCTensor is used in the same way as PyTorch Tensor: we implement multiple basic, matrix-level computation operators and NN modules for MCTensor with identical PyTorch interface. Our algorithms achieve high precision computation and also benefits from heavily-optimized PyTorch floating-point arithmetic. We evaluate MCTensor arithmetic against PyTorch native arithmetic for a series of tasks, where models using MCTensor in float16 would match or outperform the PyTorch model with float32 or float64 precision.

preprint2022arXiv

NL2INTERFACE: Interactive Visualization Interface Generation from Natural Language Queries

We develop NL2INTERFACE to explore the potential of generating usable interactive multi-visualization interfaces from natural language queries. With NL2INTERFACE, users can directly write natural language queries to automatically generate a fully interactive multi-visualization interface without any extra effort of learning a tool or programming language. Further, users can interact with the interfaces to easily transform the data and quickly see the results in the visualizations.

preprint2022arXiv

OPAL: Occlusion Pattern Aware Loss for Unsupervised Light Field Disparity Estimation

Light field disparity estimation is an essential task in computer vision with various applications. Although supervised learning-based methods have achieved both higher accuracy and efficiency than traditional optimization-based methods, the dependency on ground-truth disparity for training limits the overall generalization performance not to say for real-world scenarios where the ground-truth disparity is hard to capture. In this paper, we argue that unsupervised methods can achieve comparable accuracy, but, more importantly, much higher generalization capacity and efficiency than supervised methods. Specifically, we present the Occlusion Pattern Aware Loss, named OPAL, which successfully extracts and encodes the general occlusion patterns inherent in the light field for loss calculation. OPAL enables: i) accurate and robust estimation by effectively handling occlusions without using any ground-truth information for training and ii) much efficient performance by significantly reducing the network parameters required for accurate inference. Besides, a transformer-based network and a refinement module are proposed for achieving even more accurate results. Extensive experiments demonstrate our method not only significantly improves the accuracy compared with the SOTA unsupervised methods, but also possesses strong generalization capacity, even for real-world data, compared with supervised methods. Our code will be made publicly available.

preprint2022arXiv

ProbNVS: Fast Novel View Synthesis with Learned Probability-Guided Sampling

Existing state-of-the-art novel view synthesis methods rely on either fairly accurate 3D geometry estimation or sampling of the entire space for neural volumetric rendering, which limit the overall efficiency. In order to improve the rendering efficiency by reducing sampling points without sacrificing rendering quality, we propose to build a novel view synthesis framework based on learned MVS priors that enables general, fast and photo-realistic view synthesis simultaneously. Specifically, fewer but important points are sampled under the guidance of depth probability distributions extracted from the learned MVS architecture. Based on the learned probability-guided sampling, a neural volume rendering module is elaborately devised to fully aggregate source view information as well as the learned scene structures to synthesize photorealistic target view images. Finally, the rendering results in uncertain, occluded and unreferenced regions can be further improved by incorporating a confidence-aware refinement module. Experiments show that our method achieves 15 to 40 times faster rendering compared to state-of-the-art baselines, with strong generalization capacity and comparable high-quality novel view synthesis performance.

preprint2022arXiv

Salvaging Federated Learning by Local Adaptation

Federated learning (FL) is a heavily promoted approach for training ML models on sensitive data, e.g., text typed by users on their smartphones. FL is expressly designed for training on data that are unbalanced and non-iid across the participants. To ensure privacy and integrity of the fedeated model, latest FL approaches use differential privacy or robust aggregation. We look at FL from the \emph{local} viewpoint of an individual participant and ask: (1) do participants have an incentive to participate in FL? (2) how can participants \emph{individually} improve the quality of their local models, without re-designing the FL framework and/or involving other participants? First, we show that on standard tasks such as next-word prediction, many participants gain no benefit from FL because the federated model is less accurate on their data than the models they can train locally on their own. Second, we show that differential privacy and robust aggregation make this problem worse by further destroying the accuracy of the federated model for many participants. Then, we evaluate three techniques for local adaptation of federated models: fine-tuning, multi-task learning, and knowledge distillation. We analyze where each is applicable and demonstrate that all participants benefit from local adaptation. Participants whose local models are poor obtain big accuracy improvements over conventional FL. Participants whose local models are better than the federated model\textemdash and who have no incentive to participate in FL today\textemdash improve less, but sufficiently to make the adapted federated model better than their local models.

preprint2022arXiv

Spectrum Sharing between Directional-Antenna- Equipped UAV System and Terrestrial Systems

Unmanned aerial vehicles (UAVs)-based applications, such as surveillance systems and wireless relays, are attracting increasing attention from academia and industrial fields. The high-performance aerial communication system is one of the key enablers for them. However, due to the low attenuation of radio waves in the air-to-ground channels, the interference between aerial and terrestrial communication systems would significantly deteriorate their communication performance and greatly limit the potential UAV applications. To address the problem, in this paper, the spectrum sharing strategy between a multiple UAV communication system, in which both UAVs and ground station (GS) are equipped with directional antennas, and terrestrial systems is proposed. The GS position is selected and the flyable areas of the UAVs using certain spectrum resources are defined in advance using prior knowledge from spectrum monitoring on terrestrial communication systems to minimize interference and maximize the flyable areas of the UAVs instead of the low-efficient dynamic channel sensing and allocation for interference elimination. The simulations are conducted through a case study of the spectrum sharing between a multi-UAV video transmission system and the terrestrial wireless local area network (WLAN) system in the 5.7GHz band. The simulation results show that thanks to the proposed system the entire area can be enabled for UAV flight.

preprint2022arXiv

Structured Local Radiance Fields for Human Avatar Modeling

It is extremely challenging to create an animatable clothed human avatar from RGB videos, especially for loose clothes due to the difficulties in motion modeling. To address this problem, we introduce a novel representation on the basis of recent neural scene rendering techniques. The core of our representation is a set of structured local radiance fields, which are anchored to the pre-defined nodes sampled on a statistical human body template. These local radiance fields not only leverage the flexibility of implicit representation in shape and appearance modeling, but also factorize cloth deformations into skeleton motions, node residual translations and the dynamic detail variations inside each individual radiance field. To learn our representation from RGB data and facilitate pose generalization, we propose to learn the node translations and the detail variations in a conditional generative latent space. Overall, our method enables automatic construction of animatable human avatars for various types of clothes without the need for scanning subject-specific templates, and can generate realistic images with dynamic details for novel poses. Experiment show that our method outperforms state-of-the-art methods both qualitatively and quantitatively.

preprint2022arXiv

SwinIQA: Learned Swin Distance for Compressed Image Quality Assessment

Image compression has raised widespread interest recently due to its significant importance for multimedia storage and transmission. Meanwhile, a reliable image quality assessment (IQA) for compressed images can not only help to verify the performance of various compression algorithms but also help to guide the compression optimization in turn. In this paper, we design a full-reference image quality assessment metric SwinIQA to measure the perceptual quality of compressed images in a learned Swin distance space. It is known that the compression artifacts are usually non-uniformly distributed with diverse distortion types and degrees. To warp the compressed images into the shared representation space while maintaining the complex distortion information, we extract the hierarchical feature representations from each stage of the Swin Transformer. Besides, we utilize cross attention operation to map the extracted feature representations into a learned Swin distance space. Experimental results show that the proposed metric achieves higher consistency with human's perceptual judgment compared with both traditional methods and learning-based methods on CLIC datasets.

preprint2021arXiv

Back-n White Neutron Source at CSNS and its Applications

Back-streaming neutrons from the spallation target of the China Spallation Neutron Source (CSNS) that emit through the incoming proton channel were exploited to build a white neutron beam facility (the so-called Back-n white neutron source), which was completed in March 2018. The Back-n neutron beam is very intense, at approximately 2*10^7 n/cm^2/s at 55 m from the target, and has a nominal proton beam with a power of 100 kW in the CSNS-I phase and a kinetic energy of 1.6 GeV and a thick tungsten target in multiple slices with modest moderation from the cooling water through the slices. In addition, the excellent energy spectrum spanning from 0.5 eV to 200 MeV, and a good time resolution related to the time-of-flight measurements make it a typical white neutron source for nuclear data measurements; its overall performance is among that of the best white neutron sources in the world. Equipped with advanced spectrometers, detectors, and application utilities, the Back-n facility can serve wide applications, with a focus on neutron-induced cross-section measurements. This article presents an overview of the neutron beam characteristics, the experimental setups, and the ongoing applications at Back-n.

preprint2021arXiv

Learning Omni-frequency Region-adaptive Representations for Real Image Super-Resolution

Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i.e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due to the complicated realistic degradations. The key to solving this more challenging real image super-resolution (RealSR) problem lies in learning feature representations that are both informative and content-aware. In this paper, we propose an Omni-frequency Region-adaptive Network (ORNet) to address both challenges, here we call features of all low, middle and high frequencies omni-frequency features. Specifically, we start from the frequency perspective and design a Frequency Decomposition (FD) module to separate different frequency components to comprehensively compensate the information lost for real LR image. Then, considering the different regions of real LR image have different frequency information lost, we further design a Region-adaptive Frequency Aggregation (RFA) module by leveraging dynamic convolution and spatial attention to adaptively restore frequency components for different regions. The extensive experiments endorse the effective, and scenario-agnostic nature of our OR-Net for RealSR.

preprint2020arXiv

4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras

This paper contributes a novel realtime multi-person motion capture algorithm using multiview video inputs. Due to the heavy occlusions in each view, joint optimization on the multiview images and multiple temporal frames is indispensable, which brings up the essential challenge of realtime efficiency. To this end, for the first time, we unify per-view parsing, cross-view matching, and temporal tracking into a single optimization framework, i.e., a 4D association graph that each dimension (image space, viewpoint and time) can be treated equally and simultaneously. To solve the 4D association graph efficiently, we further contribute the idea of 4D limb bundle parsing based on heuristic searching, followed with limb bundle assembling by proposing a bundle Kruskal's algorithm. Our method enables a realtime online motion capture system running at 30fps using 5 cameras on a 5-person scene. Benefiting from the unified parsing, matching and tracking constraints, our method is robust to noisy detection, and achieves high-quality online pose reconstruction quality. The proposed method outperforms the state-of-the-art method quantitatively without using high-level appearance information. We also contribute a multiview video dataset synchronized with a marker-based motion capture system for scientific evaluation.

preprint2020arXiv

An Analytical Framework for Delay Optimal Mobile Edge Deployment in Wireless Networks

The emerging edge caching provides an effective way to reduce service delay for mobile users. However, due to high deployment cost of edge hosts, a practical problem is how to achieve minimum delay under a proper edge deployment strategy. In this letter, we provide an analytical framework for delay optimal mobile edge deployment in a partially connected wireless network, where the request files can be cached at the edge hosts and cooperatively transmitted through multiple base stations. In order to deal with the heterogeneous transmission requirements, we separate the entire transmission into backhaul and wireless phases, and propose average user normalized delivery time (AUNDT) as the performance metric. On top of that, we characterize the trade-off relations between the proposed AUNDT and other network deployment parameters. Using the proposed analytical framework, we are able to provide the optimal mobile edge deployment strategy in terms of AUNDT, which provides a useful guideline for future mobile edge deployment.

preprint2020arXiv

Circulating cavity magnon polaritons

We predict magnon polariton states circulating unidirectionally in a microwave cavity when loaded by a number of magnets on special lines. Realistic finite-element numerical simulations, including dielectric, time-dependent and non-linear effects, confirm the validity of the approximations of a fully analytical input-output model. We find that a phased antenna array can focus all power into a coherent microwave beam with controlled direction and an intensity that scales with the number of magnets.

preprint2020arXiv

Distributed Networked Controller Design for Large-scale Systems under Round-Robin Communication Protocol

This paper studies the distributed L2-gain control problem for continuous-time large-scale systems under Round-Robin communication protocol. In this protocol, each sub-controller obtains its own subsystem's state information continuously, while communicating with neighbors at discrete-time instants periodically. Distributed controllers are designed such that the closed-loop system is exponentially stable and that the prescribed L2-gain is satisfied. The design condition is obtained based on a time-delay approach and given in terms of linear matrix inequalities. Finally, three numerical examples are presented to illustrate the efficiency of the proposed scheme.

preprint2020arXiv

Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration

Hybrid-distorted image restoration (HD-IR) is dedicated to restore real distorted image that is degraded by multiple distortions. Existing HD-IR approaches usually ignore the inherent interference among hybrid distortions which compromises the restoration performance. To decompose such interference, we introduce the concept of Disentangled Feature Learning to achieve the feature-level divide-and-conquer of hybrid distortions. Specifically, we propose the feature disentanglement module (FDM) to distribute feature representations of different distortions into different channels by revising gain-control-based normalization. We also propose a feature aggregation module (FAM) with channel-wise attention to adaptively filter out the distortion representations and aggregate useful content information from different channels for the construction of raw image. The effectiveness of the proposed scheme is verified by visualizing the correlation matrix of features and channel responses of different distortions. Extensive experimental results also prove superior performance of our approach compared with the latest HD-IR schemes.

preprint2020arXiv

Magnon Trap by Chiral Spin Pumping

Chiral spin pumping is the generation of a unidirectional spin current in half of ferromagnetic films or conductors by dynamic dipolar stray fields from close-by nanomagnets. We formulate a general theory of long-range chiral interactions between magnets mediated by unidirectional traveling waves, e.g., spin waves in a magnetic film or microwaves in a waveguide. The traveling waves emitted by an excited magnet can be perfectly trapped by a second, initially passive, magnet by a dynamical interference effect. When both magnets are excited by a uniform microwave, the chiral interaction between them creates a large imbalance in their magnon numbers.

preprint2020arXiv

Measurement of the neutron beam profile of the Back-n white neutron facility at CSNS with a Micromegas detector

The Back-n white neutron beam line, which uses back-streaming white neutrons from the spallation target of the China Spallation Neutron Source, is used for nuclear data measurements. A Micromegas-based neutron detector with two variants was specially developed to measure the beam spot distribution for this beam line. In this article, the design, fabrication, and characterization of the detector are described. The results of the detector performance tests are presented, which include the relative electron transparency, the gain and the gain uniformity, and the neutron beam profile reconstruction capability. The result of the first measurement of the Back-n neutron beam spot distribution is also presented.

preprint2020arXiv

Non-Contact Spin Pumping by Microwave Evanescent Fields

The angular momentum of evanescent light fields has been studied in nano-optics and plasmonics, but not in the microwave regime. Here we predict non-contact pumping of electron spin currents in conductors by the evanescent stray fields of excited magnetic nanostructures. The coherent transfer of the photon to the electron spin is proportional to the $g$-factor, which is large in narrow-gap semiconductors and surface states of topological insulators. The spin pumping current is chiral when the spin susceptibility displays singularities that indicate collective states. However, 1D systems with linear dispersion at the Fermi energy such as metallic carbon nanotubes are an exception since spin pumping is chiral even without interactions.

preprint2020arXiv

NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image

We propose NormalGAN, a fast adversarial learning-based method to reconstruct the complete and detailed 3D human from a single RGB-D image. Given a single front-view RGB-D image, NormalGAN performs two steps: front-view RGB-D rectification and back-view RGBD inference. The final model was then generated by simply combining the front-view and back-view RGB-D information. However, inferring backview RGB-D image with high-quality geometric details and plausible texture is not trivial. Our key observation is: Normal maps generally encode much more information of 3D surface details than RGB and depth images. Therefore, learning geometric details from normal maps is superior than other representations. In NormalGAN, an adversarial learning framework conditioned by normal maps is introduced, which is used to not only improve the front-view depth denoising performance, but also infer the back-view depth image with surprisingly geometric details. Moreover, for texture recovery, we remove shading information from the front-view RGB image based on the refined normal map, which further improves the quality of the back-view color inference. Results and experiments on both testing data set and real captured data demonstrate the superior performance of our approach. Given a consumer RGB-D sensor, NormalGAN can generate the complete and detailed 3D human reconstruction results in 20 fps, which further enables convenient interactive experiences in telepresence, AR/VR and gaming scenarios.

preprint2020arXiv

Proof-of-Concept of Uncompressed 4K Video Transmission from Drone through mmWave

Drones are attracting increasing attention in varieties of research fields because of their flexibility and are expected to be applied to a wide range of potential applications, among which the super-high-resolution video surveillance system using drones especially gains the authors research attention. Surveillance systems using cameras with fixed locations always suffer the blind spots due to the blockage or inappropriate deployments. Instead, by using the drones equipped with cameras, the surveillance performance can be drastically improved due to their high mobilities. The video quality is also a key factor of the surveillance performance. In face recognition, one of the most important surveillance applications, the uncompressed video can greatly improve the detection accuracy, but it is difficult to transmit uncompressed video in real time due to the huge data sizes. To address the issue, we propose to use the ultra-high speed mmWave communication for the video transmission from drones. Moreover, due to the limited battery energy and computing power in drones, we introduce the edge computing and propose to offload all the computation from the drones to the ground station. In addition, a proof-of-concept prototype hardware of the proposed uncompressed 4K video transmission system from drones through mmWave is developed, and the experiments results are consistent with the system design expectations.

preprint2020arXiv

Quantum traces and embeddings of stated skein algebras into quantum tori

The stated skein algebra of a punctured bordered surface (or equivalently, a marked surface) is a generalization of the well-known Kauffman bracket skein algebra of unmarked surfaces and can be considered as an extension of the quantum special linear group $\mathcal{O}_{q^2}(SL_2)$ from a bigon to general surfaces. We show that the stated skein algebra of a punctured bordered surface with non-empty boundary can be embedded into quantum tori in two different ways. The first embedding can be considered as a quantization of the map expressing the trace of a closed curve in terms of the shear coordinates of the enhanced Teichmüller space, and is a lift of Bonahon-Wong's quantum trace map. The second embedding can be considered as a quantization of the map expresses the trace of a closed curve in terms of the lambda length coordinates of the decorated Teichmüller space, and is an extension of Muller's quantum trace map. We explain the relation between the two quantum trace maps. We also show that the quantum cluster algebra of Muller is equal to a reduced version of the stated skein algebra. As applications we show that the stated skein algebra is an orderly finitely generated Noetherian domain and calculate its Gelfand-Kirillov dimension.

preprint2020arXiv

Robust 3D Self-portraits in Seconds

In this paper, we propose an efficient method for robust 3D self-portraits using a single RGBD camera. Benefiting from the proposed PIFusion and lightweight bundle adjustment algorithm, our method can generate detailed 3D self-portraits in seconds and shows the ability to handle subjects wearing extremely loose clothes. To achieve highly efficient and robust reconstruction, we propose PIFusion, which combines learning-based 3D recovery with volumetric non-rigid fusion to generate accurate sparse partial scans of the subject. Moreover, a non-rigid volumetric deformation method is proposed to continuously refine the learned shape prior. Finally, a lightweight bundle adjustment algorithm is proposed to guarantee that all the partial scans can not only "loop" with each other but also remain consistent with the selected live key observations. The results and experiments show that the proposed method achieves more robust and efficient 3D self-portraits compared with state-of-the-art methods.

preprint2020arXiv

Stated skein modules of marked 3-manifolds/surfaces, a survey

We give a survey of some old and new results about the stated skein modules/algebras of 3-manifolds/surfaces. For generic quantum parameter, we discuss the splitting homomorphism for the 3-manifold case, general structures of the stated skein algebras of marked surfaces (or bordered punctured surfaces) and their embeddings into quantum tori. For roots of 1 quantum parameter, we discuss the Frobenius homomorphism (for both marked 3-manifolds and marked surfaces), describe the center of the skein algebra of marked surfaces, the dimension of the skein algebra over the center, and the representation theory of the skein algebra. In particular, we show that the skein algebra of non-closed marked surface at any root of 1 is a maximal order. We give a full description of the Azumaya locus of the skein algebra of the puncture torus and give partial results for closed surfaces.

preprint2020arXiv

Unidirectional Pumping of Phonons by Magnetization Dynamics

We propose a method to control surface phonon transport by weak magnetic fields based on the pumping of surface acoustic waves (SAWs) by magnetostriction. We predict that the magnetization dynamics of a nanowire on top of a dielectric films injects SAWs with opposite angular momenta into opposite directions. Two parallel nanowires form a phononic cavity that at magnetic resonances pump a unidirectional SAW current into half of the substrate.

preprint2019arXiv

Chiral coupling of magnons in waveguides

We theoretically investigate the collective excitation of multiple (sub)millimeter-sized ferromagnets mediated by waveguide photons. By the position of the magnets in the waveguide, the magnon-photon coupling can be tuned to be chiral, i.e., magnons only couple with photons propagating in one direction, leading to asymmetric transfer of angular momentum and energy between the magnets. A large imbalance in the magnon number distribution over the magnets can be achieved with a long chain of magnets, which concentrate at one edge. The chain also supports standing waves with low radiation efficiency that is inert to the chirality.

preprint2019arXiv

Measurements of differential and angle-integrated cross sections for the $^{10}$B($n, α$)$^{7}$Li reaction in the neutron energy range from 1.0 eV to 2.5 MeV

Differential and angle-integrated cross sections for the $^{10}$B($n, α$)$^{7}$Li, $^{10}$B($n, α$$_{0}$)$^{7}$Li and $^{10}$B($n, α$$_{1}$)$^{7}$Li$^{*}$ reactions have been measured at CSNS Back-n white neutron source. Two enriched (90%) $^{10}$B samples 5.0 cm in diameter and ~85.0 $μ$g/cm$^{2}$ in thickness each with an aluminum backing were prepared, and back-to-back mounted at the sample holder. The charged particles were detected using the silicon-detector array of the Light-charged Particle Detector Array (LPDA) system. The neutron energy E$_{n}$ was determined by TOF (time-of-flight) method, and the valid $α$ events were extracted from the E$_{n}$-Amplitude two-dimensional spectrum. With 15 silicon detectors, the differential cross sections of $α$-particles were measured from 19.2° to 160.8°. Fitted with the Legendre polynomial series, the ($n, α$) cross sections were obtained through integration. The absolute cross sections were normalized using the standard cross sections of the $^{10}$B($n, α$)$^{7}$Li reaction in the 0.3 - 0.5 MeV neutron energy region. The measurement neutron energy range for the $^{10}$B($n, α$)$^{7}$Li reaction is 1.0 eV $\le$ En < 2.5 MeV (67 energy points), and for the $^{10}$B($n, α$$_{0}$)$^{7}$Li and $^{10}$B($n, α$$_{1}$)$^{7}$Li$^{*}$ reactions is 1.0 eV $\le$ En < 1.0 MeV (59 energy points). The present results have been analyzed by the resonance reaction mechanism and the level structure of the $^{11}$B compound system, and compared with existing measurements and evaluations.

preprint2019arXiv

Microscopic mechanism of level attraction

The emerging level attraction from dissipative light-matter coupling converges the typical Rabi-splitting feature from coherent coupling and exhibits potentials in topological information processing. However, the underlying microscopic quantum mechanism of dissipative coupling still remains unclear, which brings difficulties in quantifying and manipulating coherence-dissipation competition and thereby the flexible control of level attraction. Here, by coupling magnon to a cavity supporting both standing and travelling waves, we identify the travelling-wave state to be responsible for magnon-photon dissipative coupling. By characterizing radiative broadening of magnon linewidth, we quantify the coherent and dissipative coupling strengths and their competition. The effective magnon-photon coupling strength, as a net result of competition, is analytically presented in quantum theory to show good agreement with measurements. In this manner, we extend the control dimension of level attraction by tuning field torque on magnetization or global cavity geometry. Our finding opens new routines to engineer coupled harmonic oscillator system.