Researcher profile

Wenjun Zhang

Wenjun Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
25works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

25 published item(s)

preprint2026arXiv

DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations

Generative joint source-channel coding (GJSCC) has emerged as a new Deep JSCC paradigm for achieving high-fidelity and robust image transmission under extreme wireless channel conditions, such as ultra-low bandwidth and low signal-to-noise ratio. Recent studies commonly adopt diffusion models as generative decoders, but they frequently produce visually realistic results with limited semantic consistency. This limitation stems from a fundamental mismatch between reconstruction-oriented JSCC encoders and generative decoders, as the former lack explicit semantic discriminability and fail to provide reliable conditional cues. In this paper, we propose DiT-JSCC, a novel GJSCC backbone that can jointly learn a semantics-prioritized representation encoder and a diffusion transformer (DiT) based generative decoder, our open-source project aims to promote the future research in GJSCC. Specifically, we design a semantics-detail dual-branch encoder that aligns naturally with a coarse-to-fine conditional DiT decoder, prioritizing semantic consistency under extreme channel conditions. Moreover, a training-free adaptive bandwidth allocation strategy inspired by Kolmogorov complexity is introduced to further improve the transmission efficiency, thereby indeed redefining the notion of information value in the era of generative decoding. Extensive experiments demonstrate that DiT-JSCC consistently outperforms existing JSCC methods in both semantic consistency and visual quality, particularly in extreme regimes.

preprint2026arXiv

Domain-Adaptive Communication-Rate Optimization for Sim-to-Real Humanoid-Robot Wireless XR Teleoperation

Wireless extended reality (XR) teleoperation provides embodied interaction capability for collecting humanoid robot demonstrations, but the large-scale adoption is restricted by the overhead of high-frequency motion transmission. This paper develops a system framework that integrates sampling, transmission, interpolation, and reconstruction and formulates a communication-rate optimization that aims to minimize the communication energy while maintaining the reconstruction accuracy of robot motion trajectories through dimension-wise sampling-rate control. Since acquiring real-time feedback from physical robots is limited by hardware costs, it is necessary to solve the problem through simulator interaction with offline real-domain data correction. To guide sim-to-real adaptation, we provide a PAC-Bayes generalization characterization that reveals the effects of latent density-ratio estimation, finite-sample deviation, and encoder bias. Building on this analysis, we propose a proximal policy optimization (PPO) method with density-ratio weighting and trust-region regularization. Experiments on public humanoid teleoperation dataset show that the proposed method improves the tradeoff between reconstruction error and communication energy consumption under sim-to-real distribution shift. We further analyze the effectiveness of the proposed algorithm across various wireless channels and dynamic motion trajectories.

preprint2026arXiv

When Wires Can't Keep Up: Reconfigurable AI Data Centers Empowered by Terahertz Wireless Communications

The explosive growth of artificial intelligence (AI) workloads in modern data centers demands a radical transformation of interconnect architectures. Traditional copper and optical wiring face fundamental challenges in latency, power consumption, and rigidity, constraining the scalability of distributed AI clusters. This article introduces a vision for Terahertz (THz) Wireless Data Center (THz-WDC) that combines ultra-broadband capacity, one-hop low-latency communication, and energy efficiency in the short-to-medium range (1-100m). Performance and technical requirements are first articulated, including up to 1 Tbps per link, aggregate throughput up to 10 Tbps via spatial multiplexing, sub-50 ns single-hop latency, and sub-10 pJ/bit energy efficiency over 20m. To achieve these ambitious goals, key enabling technologies are explored, including digital-twin-based orchestration, low-complexity beam manipulation technologies, all-silicon THz transceivers, and low-complexity analog baseband architectures. Moreover, as future data centers shift toward quantum and chiplet-based modular architectures, THz wireless links provide a flexible mechanism for interconnecting, testing, and reconfiguring these modules. Finally, numerical analysis is presented on the latency and power regimes of THz versus optical and copper interconnects, identifying the specific distance and throughput domains where THz links can surpass conventional wired solutions. The article concludes with a roadmap toward wireless-defined, reconfigurable, and sustainable AI data centers.

preprint2025arXiv

Context Video Semantic Transmission with Variable Length and Rate Coding over MIMO Channels

The evolution of semantic communications has profoundly impacted wireless video transmission, whose applications dominate driver of modern bandwidth consumption. However, most existing schemes are predominantly optimized for simple additive white Gaussian noise or Rayleigh fading channels, neglecting the ubiquitous multiple-input multiple-output (MIMO) environments that critically hinder practical deployment. To bridge this gap, we propose the context video semantic transmission (CVST) framework under MIMO channels. Building upon an efficient contextual video transmission backbone, CVST effectively learns a context-channel correlation map to explicitly formulate the relationships between feature groups and MIMO subchannels. Leveraging these channel-aware features, we design a multi-reference entropy coding mechanism, enabling channel state-aware variable length coding. Furthermore, CVST incorporates a checkerboard-based feature modulation strategy to achieve multiple rate points within a single trained model, thereby enhancing deployment flexibility. These innovations constitute our multi-reference variable length and rate coding (MR-VLRC) scheme. By integrating contextual transmission with MR-VLRC, CVST demonstrates substantial performance gains over various standardized separated coding methods and recent wireless video semantic communication approaches. The code is available at https://github.com/xie233333/CVST.

preprint2025arXiv

Structure-Guided Allocation of 2D Gaussians for Image Representation and Compression

Recent advances in 2D Gaussian Splatting (2DGS) have demonstrated its potential as a compact image representation with millisecond-level decoding. However, existing 2DGS-based pipelines allocate representation capacity and parameter precision largely oblivious to image structure, limiting their rate-distortion (RD) efficiency at low bitrates. To address this, we propose a structure-guided allocation principle for 2DGS, which explicitly couples image structure with both representation capacity and quantization precision, while preserving native decoding speed. First, we introduce a structure-guided initialization that assigns 2D Gaussians according to spatial structural priors inherent in natural images, yielding a localized and semantically meaningful distribution. Second, during quantization-aware fine-tuning, we propose adaptive bitwidth quantization of covariance parameters, which grants higher precision to small-scale Gaussians in complex regions and lower precision elsewhere, enabling RD-aware optimization, thereby reducing redundancy without degrading edge quality. Third, we impose a geometry-consistent regularization that aligns Gaussian orientations with local gradient directions to better preserve structural details. Extensive experiments demonstrate that our approach substantially improves both the representational power and the RD performance of 2DGS while maintaining over 1000 FPS decoding. Compared with the baseline GSImage, we reduce BD-rate by 43.44% on Kodak and 29.91% on DIV2K.

preprint2023arXiv

Excess Distortion Exponent Analysis for Semantic-Aware MIMO Communication Systems

In this paper, the analysis of excess distortion exponent for joint source-channel coding (JSCC) in semantic-aware communication systems is presented. By introducing an unobservable semantic source, we extend the classical results by Csiszar to semantic-aware communication systems. Both upper and lower bounds of the exponent for the discrete memoryless source-channel pair are established. Moreover, an extended achievable bound of the excess distortion exponent for MIMO systems is derived. Further analysis explores how the block fading and numbers of antennas influence the exponent of semanticaware MIMO systems. Our results offer some theoretical bounds of error decay performance and can be used to guide future semantic communications with joint source-channel coding scheme.

preprint2022arXiv

Collaborative Perception for Autonomous Driving: Current Status and Future Trend

Perception is one of the crucial module of the autonomous driving system, which has made great progress recently. However, limited ability of individual vehicles results in the bottleneck of improvement of the perception performance. To break through the limits of individual perception, collaborative perception has been proposed which enables vehicles to share information to perceive the environments beyond line-of-sight and field-of-view. In this paper, we provide a review of the related work about the promising collaborative perception technology, including introducing the fundamental concepts, generalizing the collaboration modes and summarizing the key ingredients and applications of collaborative perception. Finally, we discuss the open challenges and issues of this research area and give some potential further directions.

preprint2022arXiv

Latency-Aware Collaborative Perception

Collaborative perception has recently shown great potential to improve perception capabilities over single-agent perception. Existing collaborative perception methods usually consider an ideal communication environment. However, in practice, the communication system inevitably suffers from latency issues, causing potential performance degradation and high risks in safety-critical applications, such as autonomous driving. To mitigate the effect caused by the inevitable latency, from a machine learning perspective, we present the first latency-aware collaborative perception system, which actively adapts asynchronous perceptual features from multiple agents to the same time stamp, promoting the robustness and effectiveness of collaboration. To achieve such a feature-level synchronization, we propose a novel latency compensation module, called SyncNet, which leverages feature-attention symbiotic estimation and time modulation techniques. Experiments results show that the proposed latency aware collaborative perception system with SyncNet can outperforms the state-of-the-art collaborative perception method by 15.6% in the communication latency scenario and keep collaborative perception being superior to single agent perception under severe latency.

preprint2022arXiv

Learning Distilled Collaboration Graph for Multi-Agent Perception

To promote better performance-bandwidth trade-off for multi-agent perception, we propose a novel distilled collaboration graph (DiscoGraph) to model trainable, pose-aware, and adaptive collaboration among agents. Our key novelties lie in two aspects. First, we propose a teacher-student framework to train DiscoGraph via knowledge distillation. The teacher model employs an early collaboration with holistic-view inputs; the student model is based on intermediate collaboration with single-view inputs. Our framework trains DiscoGraph by constraining post-collaboration feature maps in the student model to match the correspondences in the teacher model. Second, we propose a matrix-valued edge weight in DiscoGraph. In such a matrix, each element reflects the inter-agent attention at a specific spatial region, allowing an agent to adaptively highlight the informative regions. During inference, we only need to use the student model named as the distilled collaboration network (DiscoNet). Attributed to the teacher-student framework, multiple agents with the shared DiscoNet could collaboratively approach the performance of a hypothetical teacher model with a holistic view. Our approach is validated on V2X-Sim 1.0, a large-scale multi-agent perception dataset that we synthesized using CARLA and SUMO co-simulation. Our quantitative and qualitative experiments in multi-agent 3D object detection show that DiscoNet could not only achieve a better performance-bandwidth trade-off than the state-of-the-art collaborative perception methods, but also bring more straightforward design rationale. Our code is available on https://github.com/ai4ce/DiscoNet.

preprint2022arXiv

LEO Satellite-Enabled Grant-Free Random Access with MIMO-OTFS

This paper investigates joint channel estimation and device activity detection in the LEO satellite-enabled grant-free random access systems with large differential delay and Doppler shift. In addition, the multiple-input multiple-output (MIMO) with orthogonal time-frequency space modulation (OTFS) is utilized to combat the dynamics of the terrestrial-satellite link. To simplify the computation process, we estimate the channel tensor in parallel along the delay dimension. Then, the deep learning and expectation-maximization approach are integrated into the generalized approximate message passing with cross-correlation--based Gaussian prior to capture the channel sparsity in the delay-Doppler-angle domain and learn the hyperparameters. Finally, active devices are detected by computing energy of the estimated channel. Simulation results demonstrate that the proposed algorithms outperform conventional methods.

preprint2022arXiv

Linear MIMO Precoders Design for Finite Alphabet Inputs via Model-Free Training

This paper investigates a novel method for designing linear precoders with finite alphabet inputs based on autoencoders (AE) without the knowledge of the channel model. By model-free training of the autoencoder in a multiple-input multiple-output (MIMO) system, the proposed method can effectively solve the optimization problem to design the precoders that maximize the mutual information between the channel inputs and outputs, when only the input-output information of the channel can be observed. Specifically, the proposed method regards the receiver and the precoder as two independent parameterized functions in the AE and alternately trains them using the exact and approximated gradient, respectively. Compared with previous precoders design methods, it alleviates the limitation of requiring the explicit channel model to be known. Simulation results show that the proposed method works as well as those methods under known channel models in terms of maximizing the mutual information and reducing the bit error rate.

preprint2022arXiv

Massive Unsourced Random Access: Exploiting Angular Domain Sparsity

This paper investigates the unsourced random access (URA) scheme to accommodate numerous machine-type users communicating to a base station equipped with multiple antennas. Existing works adopt a slotted transmission strategy to reduce system complexity; they operate under the framework of coupled compressed sensing (CCS) which concatenates an outer tree code to an inner compressed sensing code for slot-wise message stitching. We suggest that by exploiting the MIMO channel information in the angular domain, redundancies required by the tree encoder/decoder in CCS can be removed to improve spectral efficiency, thereby an uncoupled transmission protocol is devised. To perform activity detection and channel estimation, we propose an expectation-maximization-aided generalized approximate message passing algorithm with a Markov random field support structure, which captures the inherent clustered sparsity structure of the angular domain channel. Then, message reconstruction in the form of a clustering decoder is performed by recognizing slot-distributed channels of each active user based on similarity. We put forward the slot-balanced K-means algorithm as the kernel of the clustering decoder, resolving constraints and collisions specific to the application scene. Extensive simulations reveal that the proposed scheme achieves a better error performance at high spectral efficiency compared to the CCS-based URA schemes.

preprint2022arXiv

Random Access with Massive MIMO-OTFS in LEO Satellite Communications

This paper considers the joint channel estimation and device activity detection in the grant-free random access systems, where a large number of Internet-of-Things devices intend to communicate with a low-earth orbit satellite in a sporadic way. In addition, the massive multiple-input multiple-output (MIMO) with orthogonal time-frequency space (OTFS) modulation is adopted to combat the dynamics of the terrestrial-satellite link. We first analyze the input-output relationship of the single-input single-output OTFS when the large delay and Doppler shift both exist, and then extend it to the grant-free random access with massive MIMO-OTFS. Next, by exploring the sparsity of channel in the delay-Doppler-angle domain, a two-dimensional pattern coupled hierarchical prior with the sparse Bayesian learning and covariance-free method (TDSBL-CF) is developed for the channel estimation. Then, the active devices are detected by computing the energy of the estimated channel. Finally, the generalized approximate message passing algorithm combined with the sparse Bayesian learning and two-dimensional convolution (ConvSBL-GAMP) is proposed to decrease the computations of the TDSBL-CF algorithm. Simulation results demonstrate that the proposed algorithms outperform conventional methods.

preprint2022arXiv

Representation-Agnostic Shape Fields

3D shape analysis has been widely explored in the era of deep learning. Numerous models have been developed for various 3D data representation formats, e.g., MeshCNN for meshes, PointNet for point clouds and VoxNet for voxels. In this study, we present Representation-Agnostic Shape Fields (RASF), a generalizable and computation-efficient shape embedding module for 3D deep learning. RASF is implemented with a learnable 3D grid with multiple channels to store local geometry. Based on RASF, shape embeddings for various 3D shape representations (point clouds, meshes and voxels) are retrieved by coordinate indexing. While there are multiple ways to optimize the learnable parameters of RASF, we provide two effective schemes among all in this paper for RASF pre-training: shape reconstruction and normal estimation. Once trained, RASF becomes a plug-and-play performance booster with negligible cost. Extensive experiments on diverse 3D representation formats, networks and applications, validate the universal effectiveness of the proposed RASF. Code and pre-trained models are publicly available https://github.com/seanywang0408/RASF

preprint2022arXiv

SPARC-LDPC Coding for MIMO Massive Unsourced Random Access

A joint sparse-regression-code (SPARC) and low-density-parity-check (LDPC) coding scheme for multiple-input multiple-output (MIMO) massive unsourced random access (URA) is proposed in this paper. Different from the state-of-the-art covariance-based maximum likelihood (CB-ML) detection scheme, we first split users' messages into two parts. The former part is encoded by SPARCs and tasked to recover part of the messages, the corresponding channel coefficients as well as the interleaving patterns by compressed sensing. The latter part is coded by LDPC codes and then interleaved by the interleave-division multiple access (IDMA) scheme. The decoding of the latter part is based on belief propagation (BP) joint with successive interference cancellation (SIC). Numerical results show our scheme outperforms the CB-ML scheme when the number of antennas at the base station is smaller than that of active users. The complexity of our scheme is with the order $\mathcal{O}\left(2^{B_p}ML+\widehat{K}ML\right)$ and lower than the CB-ML scheme. Moreover, our scheme has higher spectral efficiency (nearly $15$ times larger) than CB-ML as we only split messages into two parts.

preprint2021arXiv

Evidence of Potts-Nematic Superfluidity in a Hexagonal $sp^2$ Optical Lattice

As in between liquid and crystal phases lies a nematic liquid crystal, which breaks rotation with preservation of translation symmetry, there is a nematic superfluid phase bridging a superfluid and a supersolid. The nematic order also emerges in interacting electrons and has been found to largely intertwine with multi-orbital correlation in high-temperature superconductivity, where Ising nematicity arises from a four-fold rotation symmetry $C_4$ broken down to $C_2$. Here we report an observation of a three-state ($\mathbb{Z}_3$) quantum nematic order, dubbed "Potts-nematicity", in a system of cold atoms loaded in an excited band of a hexagonal optical lattice described by an $sp^2$-orbital hybridized model. This Potts-nematic quantum state spontaneously breaks a three-fold rotation symmetry of the lattice, qualitatively distinct from the Ising nematicity. Our field theory analysis shows that the Potts-nematic order is stabilized by intricate renormalization effects enabled by strong inter-orbital mixing present in the hexagonal lattice. This discovery paves a way to investigate quantum vestigial orders in multi-orbital atomic superfluids.

preprint2020arXiv

Chinese cities' air quality pattern and correlation

Air quality impacts people's health and daily life, affects the sensitive ecosystems, and even restrains a country's development. By collecting and processing the time series data of Air Quality Index (AQI) of 363 cities of China from Jan. 2015 to Mar. 2019, we dedicated to characterize the universal patterns, the clustering and correlation of air quality of different cities by using the methods of complex network and time series analysis. The main results are as follows: 1) The Air Quality Network of China (AQNC) is constructed by using the Planar Maximally Filtered Graph (PMFG) method. The geographical distances on the correlation of air quality of different cities have been studied, it is found that 100 km is a critical distance for strong correlation. 2) Seven communities of AQNC have been detected, and their patterns have been analyzed by taking into account the Hurst exponent and climate environment, it is shown that the seven communities are reasonable, and they are significantly influenced by the climate factors, such as monsoon, precipitation, geographical regions, etc. 3) The motifs of air quality time series of seven communities have been investigated by the visibility graph, for some communities, the evolutionary patterns of the motifs are a bit stable, and they have the long-term memory effects. While for others, there are no stable patterns.

preprint2020arXiv

Cross-domain Detection via Graph-induced Prototype Alignment

Applying the knowledge of an object detector trained on a specific domain directly onto a new domain is risky, as the gap between two domains can severely degrade model's performance. Furthermore, since different instances commonly embody distinct modal information in object detection scenario, the feature alignment of source and target domain is hard to be realized. To mitigate these problems, we propose a Graph-induced Prototype Alignment (GPA) framework to seek for category-level domain alignment via elaborate prototype representations. In the nutshell, more precise instance-level features are obtained through graph-based information propagation among region proposals, and, on such basis, the prototype representation of each class is derived for category-level domain alignment. In addition, in order to alleviate the negative effect of class-imbalance on domain adaptation, we design a Class-reweighted Contrastive Loss to harmonize the adaptation training process. Combining with Faster R-CNN, the proposed framework conducts feature alignment in a two-stage manner. Comprehensive results on various cross-domain detection tasks demonstrate that our approach outperforms existing methods with a remarkable margin. Our code is available at https://github.com/ChrisAllenMing/GPA-detection.

preprint2020arXiv

Energy-efficiency of Massive Random Access with Individual Codebook

The massive machine-type communication has been one of the most representative services for future wireless networks. It aims to support massive connectivity of user equipments (UEs) which sporadically transmit packets with small size. In this work, we assume the number of UEs grows linearly and unboundedly with blocklength and each UE has an individual codebook. Among all UEs, an unknown subset of UEs are active and transmit a fixed number of data bits to a base station over a shared-spectrum radio link. Under these settings, we derive the achievability and converse bounds on the minimum energy-per-bit for reliable random access over quasi-static fading channels with and without channel state information (CSI) at the receiver. These bounds provide energy-efficiency guidance for new schemes suited for massive random access. Simulation results indicate that the orthogonalization scheme TDMA is energy-inefficient for large values of UE density $μ$. Besides, the multi-user interference can be perfectly cancelled when $μ$ is below a critical threshold. In the case of no-CSI, the energy-per-bit for random access is only a bit more than that with the knowledge UE activity.

preprint2020arXiv

Implementation of a double-path multimode interferometer using a spinor Bose-Einstein condensate

We realize a double-path multimode matter wave interferometer with spinor Bose-Einstein condensate and observe clear spatial interference fringes as well as a periodic change of the visibility in the time domain, which we refer to as the time domain interference and which is different from the traditional double-path interferometer. By changing the relative phase of the two paths, we find that the spatial fringes first lose coherence and recover. As the number of modes increases, the time domain interference signal with the narrower peaks is observed, which is beneficial to the improvement of the resolution of the phase measurement. We also investigated the influence of initial phase configuration and phase evolution rate between different modes in the two paths. With enhanced resolution, the sensitivity of interferometric measurements of physical observables can also be improved by properly assigning measurable quantities to the relative phase between two paths.

preprint2020arXiv

Learning to Combine: Knowledge Aggregation for Multi-Source Domain Adaptation

Transferring knowledges learned from multiple source domains to target domain is a more practical and challenging task than conventional single-source domain adaptation. Furthermore, the increase of modalities brings more difficulty in aligning feature distributions among multiple domains. To mitigate these problems, we propose a Learning to Combine for Multi-Source Domain Adaptation (LtC-MSDA) framework via exploring interactions among domains. In the nutshell, a knowledge graph is constructed on the prototypes of various domains to realize the information propagation among semantically adjacent representations. On such basis, a graph model is learned to predict query samples under the guidance of correlated prototypes. In addition, we design a Relation Alignment Loss (RAL) to facilitate the consistency of categories' relational interdependency and the compactness of features, which boosts features' intra-class invariance and inter-class separability. Comprehensive results on public benchmark datasets demonstrate that our approach outperforms existing methods with a remarkable margin. Our code is available at \url{https://github.com/ChrisAllenMing/LtC-MSDA}

preprint2020arXiv

Massive Unsourced Random Access for Massive MIMO Correlated Channels

This paper investigates the massive random access for a huge amount of user devices served by a base station (BS) equipped with a massive number of antennas. We consider a grant-free unsourced random access (U-RA) scheme where all users possess the same codebook and the BS aims at declaring a list of transmitted codewords and recovering the messages sent by active users. Most of the existing works concentrate on applying U-RA in the oversimplified independent and identically distributed (i.i.d.) channels. In this paper, we consider a fairly general joint-correlated MIMO channel model with line-of-sight components for the realistic outdoor wireless propagation environments. We conduct the activity detection for the emitted codewords by performing an improved coordinate descent approach with Bayesian learning automaton to solve a covariance-based maximum likelihood estimation problem. The proposed algorithm exhibits a faster convergence rate than traditional descent approaches. We further employ a coupled coding scheme to resolve the issue that the dimensions of the common codebook expand exponentially with user payload size in the practical massive machine-type communications scenario. Our simulations reveal that to achieve an error probability of 0.05 for reliable communications in correlated channels, one must pay a 0.9 to 1.3 dB penalty comparing to the minimum signal to noise ratio needed in i.i.d. channels on condition that a sufficient number of receiving antennas is equipped at the BS.

preprint2020arXiv

Polar Coding and Sparse Spreading for Massive Unsourced Random Access

In this paper, we propose a new polar coding scheme for the unsourced, uncoordinated Gaussian random access channel. Our scheme is based on sparse spreading, treat interference as noise and successive interference cancellation (SIC). On the transmitters side, each user randomly picks a code-length and a transmit power from multiple choices according to some probability distribution to encode its message, and an interleaver to spread its encoded codeword bits across the entire transmission block. The encoding configuration of each user is transmitted by compressive sensing, similar to some previous works. On the receiver side, after recovering the encoding configurations of all users, it applies single-user polar decoding and SIC to recover the message list. Numerical results show that our scheme outperforms all previous schemes for active user number $K_a\geq 250$, and provides competitive performance for $K_a\leq 225$. Moreover, our scheme has much lower complexity compared to other schemes as we only use single-user polar coding.

preprint2020arXiv

Pricing variance swaps with stochastic volatility and stochastic interest rate under full correlation structure

This paper considers the case of pricing discretely-sampled variance swaps under the class of equity-interest rate hybridization. Our modeling framework consists of the equity which follows the dynamics of the Heston stochastic volatility model, and the stochastic interest rate is driven by the Cox-Ingersoll-Ross (CIR) process with full correlation structure imposed among the state variables. This full correlation structure possess the limitation to have fully analytical pricing formula for hybrid models of variance swaps, due to the non-affinity property embedded in the model itself. We address this issue by obtaining an efficient semi-closed form pricing formula of variance swaps for an approximation of the hybrid model via the derivation of characteristic functions. Subsequently, we implement numerical experiments to evaluate the accuracy of our pricing formula. Our findings confirmed that the impact of the correlation between the underlying and the interest rate is significant for pricing discretely-sampled variance swaps.

preprint2020arXiv

Toward Better Understanding of Saliency Prediction in Augmented 360 Degree Videos

Augmented reality (AR) overlays digital content onto the reality. In AR system, correct and precise estimations of user's visual fixations and head movements can enhance the quality of experience by allocating more computation resources on the areas of interest. However, there is inadequate research about understanding the visual exploration of users when using an AR system or modeling AR visual attention. To bridge the gap between the saliency prediction on real-world scene and on scene augmented by virtual information, we construct the ARVR saliency dataset with 12 diverse videos viewed by 20 people. The virtual reality (VR) technique is employed to simulate the real-world. Annotations of object recognition and tracking as augmented contents are blended into the omnidirectional videos. The saliency annotations of head and eye movements for both original and augmented videos are collected and together constitute the ARVR dataset. We also design a model which is capable of solving the saliency prediction problem in AR. Local block images are extracted to simulate the viewport and offset the projection distortion. Conspicuous visual cues in local viewports are extracted to constitute the spatial features. The optical flow information is estimated as the important temporal feature. We also consider the interplay between virtual information and reality. The composition of the augmentation information is distinguished, and the joint effects of adversarial augmentation and complementary augmentation are estimated. We generate a graph by taking each block image as one node. Both the visual saliency mechanism and the characteristics of viewing behaviors are considered in the computation of edge weights on the graph which are interpreted as Markov chains. The fraction of the visual attention that is diverted to each block image is estimated through equilibrium distribution on of this chain.