Source author record

Bin Xia

Bin Xia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Computer Vision cond-mat.mes-hall Networking and Internet Architecture Artificial Intelligence eess.AS eess.IV Machine Learning Sound

Catalog footprint

What is connected

22works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models

Emerging multi-modal world models attempt to jointly generate videos across diverse modalities (e.g., RGB, depth, and mask), yet they fail to fully exploit the rich priors of existing foundation models. We propose $M^2$-REPA, the first representation alignment method tailored for multi-modal video generation. Our key insight is that foundation models trained on different modality spaces naturally capture distinct domain-specific priors, acting as complementary "experts." Specifically, we first decouple modality-specific features from the diffusion model's intermediate representations, then align each with its corresponding expert foundation model. To this end, we design two synergistic objectives: a multi-modal representation alignment loss that enforces feature-to-expert matching, and a modality-specific decoupling regularization that encourages complementarity across different modalities. This design enables joint optimization, fully exploiting priors from multiple foundation models. Extensive experiments demonstrate that our method significantly outperforms baselines in visual quality and long-term consistency.

preprint2025arXiv

Index-ASR Technical Report

Automatic speech recognition (ASR) has witnessed remarkable progress in recent years, largely driven by the emergence of LLM-based ASR paradigm. Despite their strong performance on a variety of open-source benchmarks, existing LLM-based ASR systems still suffer from two critical limitations. First, they are prone to hallucination errors, often generating excessively long and repetitive outputs that are not well grounded in the acoustic input. Second, they provide limited support for flexible and fine-grained contextual customization. To address these challenges, we propose Index-ASR, a large-scale LLM-based ASR system designed to simultaneously enhance robustness and support customizable hotword recognition. The core idea of Index-ASR lies in the integration of LLM and large-scale training data enriched with background noise and contextual information. Experimental results show that our Index-ASR achieves strong performance on both open-source benchmarks and in-house test sets, highlighting its robustness and practicality for real-world ASR applications.

preprint2022arXiv

Age of Information-based Scheduling for Wireless D2D Systems with a Deep Learning Approach

Device-to-device (D2D) links scheduling for avoiding excessive interference is critical to the success of wireless D2D communications. Most of the traditional scheduling schemes only consider the maximum throughput or fairness of the system and do not consider the freshness of information. In this paper, we propose a novel D2D links scheduling scheme to optimize an age of information (AoI) and throughput jointly scheduling problem when D2D links transmit packets under the last-come-first-serve policy with packet-replacement (LCFS-PR). It is motivated by the fact that the maximum throughput scheduling may reduce the activation probability of links with poor channel conditions, which results in terrible AoI performance. Specifically, We derive the expression of the overall average AoI and throughput of the network under the spatio-temporal interfering queue dynamics with the mean-field assumption. Moreover, a neural network structure is proposed to learn the mapping from the geographic location to the optimal scheduling parameters under a stationary randomized policy, where the scheduling decision can be made without estimating the channel state information(CSI) after the neural network is well-trained. To overcome the problem that implicit loss functions cannot be back-propagated, we derive a numerical solution of the gradient. Finally, numerical results reveal that the performance of the deep learning approach is close to that of a local optimal algorithm which has a higher computational complexity. The trade-off curve of AoI and throughput is also obtained, where the AoI tends to infinity when throughput is maximized.

preprint2022arXiv

Coarse-to-Fine Embedded PatchMatch and Multi-Scale Dynamic Aggregation for Reference-based Super-Resolution

Reference-based super-resolution (RefSR) has made significant progress in producing realistic textures using an external reference (Ref) image. However, existing RefSR methods obtain high-quality correspondence matchings consuming quadratic computation resources with respect to the input size, limiting its application. Moreover, these approaches usually suffer from scale misalignments between the low-resolution (LR) image and Ref image. In this paper, we propose an Accelerated Multi-Scale Aggregation network (AMSA) for Reference-based Super-Resolution, including Coarse-to-Fine Embedded PatchMatch (CFE-PatchMatch) and Multi-Scale Dynamic Aggregation (MSDA) module. To improve matching efficiency, we design a novel Embedded PatchMacth scheme with random samples propagation, which involves end-to-end training with asymptotic linear computational cost to the input size. To further reduce computational cost and speed up convergence, we apply the coarse-to-fine strategy on Embedded PatchMacth constituting CFE-PatchMatch. To fully leverage reference information across multiple scales and enhance robustness to scale misalignment, we develop the MSDA module consisting of Dynamic Aggregation and Multi-Scale Aggregation. The Dynamic Aggregation corrects minor scale misalignment by dynamically aggregating features, and the Multi-Scale Aggregation brings robustness to large scale misalignment by fusing multi-scale information. Experimental results show that the proposed AMSA achieves superior performance over state-of-the-art approaches on both quantitative and qualitative evaluations.

preprint2022arXiv

Efficient Non-Local Contrastive Attention for Image Super-Resolution

Non-Local Attention (NLA) brings significant improvement for Single Image Super-Resolution (SISR) by leveraging intrinsic feature correlation in natural images. However, NLA gives noisy information large weights and consumes quadratic computation resources with respect to the input size, limiting its performance and application. In this paper, we propose a novel Efficient Non-Local Contrastive Attention (ENLCA) to perform long-range visual modeling and leverage more relevant non-local features. Specifically, ENLCA consists of two parts, Efficient Non-Local Attention (ENLA) and Sparse Aggregation. ENLA adopts the kernel method to approximate exponential function and obtains linear computation complexity. For Sparse Aggregation, we multiply inputs by an amplification factor to focus on informative features, yet the variance of approximation increases exponentially. Therefore, contrastive learning is applied to further separate relevant and irrelevant features. To demonstrate the effectiveness of ENLCA, we build an architecture called Efficient Non-Local Contrastive Network (ENLCN) by adding a few of our modules in a simple backbone. Extensive experimental results show that ENLCN reaches superior performance over state-of-the-art approaches on both quantitative and qualitative evaluations.

preprint2022arXiv

Polar Transformation Based Multiple Instance Learning Assisting Weakly Supervised Image Segmentation With Loose Bounding Box Annotations

This study investigates weakly supervised image segmentation using loose bounding box supervision. It presents a multiple instance learning strategy based on polar transformation to assist image segmentation when loose bounding boxes are employed as supervision. In this strategy, weighted smooth maximum approximation is introduced to incorporate the observation that pixels closer to the origin of the polar transformation are more likely to belong to the object in the bounding box. The proposed approach was evaluated on a public medical dataset using Dice coefficient. The results demonstrate its superior performance. The codes are available at \url{https://github.com/wangjuan313/wsis-polartransform}.

preprint2022arXiv

SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization

Image harmonization aims to achieve visual consistency in composite images by adapting a foreground to make it compatible with a background. However, existing methods always only use the real image as the positive sample to guide the training, and at most introduce the corresponding composite image as a single negative sample for an auxiliary constraint, which leads to limited distortion knowledge, and further causes a too large solution space, making the generated harmonized image distorted. Besides, none of them jointly constrain from the foreground self-style and foreground-background style consistency, which exacerbates this problem. Moreover, recent region-aware adaptive instance normalization achieves great success but only considers the global background feature distribution, making the aligned foreground feature distribution biased. To address these issues, we propose a self-consistent style contrastive learning scheme (SCS-Co). By dynamically generating multiple negative samples, our SCS-Co can learn more distortion knowledge and well regularize the generated harmonized image in the style representation space from two aspects of the foreground self-style and foreground-background style consistency, leading to a more photorealistic visual result. In addition, we propose a background-attentional adaptive instance normalization (BAIN) to achieve an attention-weighted background feature distribution according to the foreground-background feature similarity. Experiments demonstrate the superiority of our method over other state-of-the-art methods in both quantitative comparison and visual analysis.

preprint2016arXiv

Cache-enabled Uplink Transmission in Wireless Small Cell Networks

It is starting to become a big trend in the era of social networking that people produce and upload user-generated contents to Internet via wireless networks, bringing a significant burden on wireless uplink networks. In this paper, we contribute to designing and theoretical understanding of wireless cache-enabled upload transmission in a delay-tolerant small cell network to relieve the burden, and then propose the corresponding scheduling policies for the small base station (SBS) under the infinite and finite cache sizes. Specifically, the cache ability introduced by SBS enables SBS to eliminate the redundancy among the upload contents from users. This strategy not only alleviates the wireless backhual traffic congestion from SBS to a macro base station (MBS), but also improves the transmission efficiency of SBS. We then investigate the scheduling schemes of SBS to offload more data traffic under caching size constraint. Moreover, two operational regions for the wireless cache-enabled upload network, namely, the delay-limited region and the cache-limited region, are established to reveal the fundamental tradeoff between the delay tolerance and the cache ability. Finally, numerical results are provided to demonstrate the significant performance gains of the proposed wireless cache-enabled upload network.

preprint2016arXiv

Interference Cancellation at Receivers in Cache-Enabled Wireless Networks

In this paper, we propose to exploit the limited cache packets as side information to cancel incoming interference at the receiver side. We consider a stochastic network where the random locations of base stations and users are modeled using Poisson point processes. Caching schemes to reap both the local caching gain and the interference cancellation gain for the users are developed based on two factors: the density of different user subsets and the packets cached in the corresponding subsets. The packet loss rate (PLR) is analyzed, which depends on both the cached packets and the channel state information (CSI) available at the receiver. Theoretical results reveal the tradeoff between caching resource and wireless resource. The performance for different caching schemes are analyzed and the minimum achievable PLR for the distributed caching is derived.

preprint2016arXiv

Modeling and Analysis for Cache-enabled Cognitive D2D Communications in Cellular Networks

Exploiting cognition to the cache-enabled device-to-device (D2D) communication underlaying the multi-channel cellular network is the main focus of this paper. D2D pairs perform direct communications via sensing the available cellular channels, bypassing the base station (BS). Dynamic service is considered and the network performance is evaluated with the stochastic geometry. Node locations are first modeled as mutually independent Poisson Point Processes, and the service queueing process is formulated. Then the corresponding tier association and cognitive access protocol are developed. The delay and the length for the queue at the BS and D2D transmitter are further elaborated, with modeling the traffic dynamics of request arrivals and departures as the discrete-time multiserver queue with priorities. Moreover, impacts of the physical layer and content-centric features on the system performance are jointly investigated to provide a valuable insight.

preprint2016arXiv

Modeling and Analysis for Cache-Enabled Networks with Dynamic Traffic

Instead of assuming fully loaded cells in the analysis on cache-enabled networks with tools of stochastic geometry, we focus on the dynamic traffic in this letter. With modeling traffic dynamics of request arrivals and departures, probabilities of full-, free-, and modest-load cells in the large-scale cache-enabled network are elaborated based on the traffic queue state. Moreover, we propose to exploit the packets cached at cache-enabled users as side information to cancel the incoming interference. Then the packet loss rates for both the cache-enabled and cache-untenable users are investigated. The simulation results verify our analysis.

preprint2016arXiv

Opportunistic Channel Sharing in Stochastic Networks with Dynamic Traffic

In this paper, we consider the stochastic network with dynamic traffic. The spatial distribution of access points (APs) and users are first modeled as mutually independent Poisson point processes (PPPs). Different from most previous literatures which assume all the APs are fully loaded, we consider the fact that APs having no data to transmit do not generate interference to users. The APs opportunistically share the channel according to the existence of the packet to be transmitted and the proposed interference suppression strategy. In the interference suppression region, only one AP can be active at a time to transmit the packet on the channel and the other adjacent APs keep silent to reduce serious interference. The idle probability of any AP, influenced by the traffic load and availability of the channels, is analyzed. The density of simultaneously active APs in the network is obtained and the packet loss rate is further elaborated. We reveal the impacts of network features (e.g., AP density, user density and channel state) and service features (e.g., user request, packet size) on the network performance. Simulation results validate our proposed model.

preprint2016arXiv

Performance Analysis for Training-Based Multi-Pair Two-Way Full-Duplex Relaying with Massive Antennas

This paper considers a multi-pair two-way amplify-and-forward relaying system, where multiple pairs of full-duplex users are served via a full-duplex relay with massive antennas, and the relay adopts maximum-ratio combining/maximum-ratio transmission (MRC/MRT) processing. The orthogonal pilot scheme and the least square method are firstly exploited to estimate the channel state information (CSI). When the number of relay antennas is finite, we derive an approximate sum rate expression which is shown to be a good predictor of the ergodic sum rate, especially in large number of antennas. Then the corresponding achievable rate expression is obtained by adopting another pilot scheme which estimates the composite CSI for each user pair to reduce the pilot overhead of channel estimation. We analyze the achievable rates of the two pilot schemes and then show the relative merits of the two methods. Furthermore, power allocation strategies for users and the relay are proposed based on sum rate maximization and max-min fairness criterion, respectively. Finally, numerical results verify the accuracy of the analytical results and show the performance gains achieved by the proposed power allocation.

preprint2015arXiv

Analysis on Cache-enabled Wireless Heterogeneous Networks

Caching the popular multimedia content is a promising way to unleash the ultimate potential of wireless networks. In this paper, we contribute to proposing and analyzing the cache-based content delivery in a three-tier heterogeneous network (HetNet), where base stations (BSs), relays and device-to-device (D2D) pairs are included. We advocate to proactively cache the popular contents in the relays and parts of the users with caching ability when the network is off-peak. The cached contents can be reused for frequent access to offload the cellular network traffic. The node locations are first modeled as mutually independent Poisson Point Processes (PPPs) and the corresponding content access protocol is developed. The average ergodic rate and outage probability in the downlink are then analyzed theoretically. We further derive the throughput and the delay based on the \emph{multiclass processor-sharing queue} model and the continuous-time Markov process. According to the critical condition of the steady state in the HetNet, the maximum traffic load and the global throughput gain are investigated. Moreover, impacts of some key network characteristics, e.g., the heterogeneity of multimedia contents, node densities and the limited caching capacities, on the system performance are elaborated to provide a valuable insight.

preprint2015arXiv

Iterative detection and decoding for SCMA systems with LDPC codes

Sparse code multiple access (SCMA) is a promising multiplexing approach to achieve high system capacity. In this paper, we develop a novel iterative detection and decoding scheme for SCMA systems combined with Low-density Parity-check (LDPC) decoding. In particular, we decompose the output of the message passing algorithm (MPA) based SCMA multiuser detection into intrinsic part and prior part. Then we design a joint detection and decoding scheme which iteratively exchanges the intrinsic information between the detector and the decoder, yielding a satisfied performance gain. Moreover, the proposed scheme has almost the same complexity compared to the traditional receiver for LDPC-coded SCMA systems. As numerical results demonstrate, the proposed scheme has a substantial gain over the traditional SCMA receiver on AWGN channels and Rayleigh fading channels.

preprint2015arXiv

Optimal Caching Placement for D2D Assisted Wireless Caching Networks

In this paper, we devise the optimal caching placement to maximize the offloading probability for a two-tier wireless caching system, where the helpers and a part of users have caching ability. The offloading comes from the local caching, D2D sharing and the helper transmission. In particular, to maximize the offloading probability we reformulate the caching placement problem for users and helpers into a difference of convex (DC) problem which can be effectively solved by DC programming. Moreover, we analyze the two extreme cases where there is only help-tier caching network and only user-tier. Specifically, the placement problem for the helper-tier caching network is reduced to a convex problem, and can be effectively solved by the classical water-filling method. We notice that users and helpers prefer to cache popular contents under low node density and prefer to cache different contents evenly under high node density. Simulation results indicate a great performance gain of the proposed caching placement over existing approaches.

preprint2015arXiv

Simplified Multiuser Detection for SCMA with Sum-Product Algorithm

Sparse code multiple access (SCMA) is a novel non-orthogonal multiple access technique, which fully exploits the shaping gain of multi-dimensional codewords. However, the lack of simplified multiuser detection algorithm prevents further implementation due to the inherently high computation complexity. In this paper, general SCMA detector algorithms based on Sum-product algorithm are elaborated. Then two improved algorithms are proposed, which simplify the detection structure and curtail exponent operations quantitatively in logarithm domain. Furthermore, to analyze these detection algorithms fairly, we derive theoretical expression of the average mutual information (AMI) of SCMA (SCMA-AMI), and employ a statistical method to calculate SCMA-AMI based specific detection algorithm. Simulation results show that the performance is almost as well as the based message passing algorithm in terms of both BER and AMI while the complexity is significantly decreased, compared to the traditional Max-Log approximation method.

preprint2015arXiv

When ICN Meets C-RAN for HetNets: An SDN Approach

In this paper, we contribute to novelly proposing and elaborating the integration of the ICN, C-RAN and SDN for the HetNet to achieve win-win situation. The vision of the proposed system is demonstrated, followed by the advantages and challenges. We further present the hybrid system with a large-scale wireless heterogeneous campus network.

preprint2013arXiv

Beta-Ag2Te: A topological insulator with strong anisotropy

We present evidence of topological surface states in beta-Ag2Te through first-principles calculations and periodic quantum interference effect in single crystalline nanoribbon. Our first-principles calculations show that beta-Ag2Te is a topological insulator with a gapless Dirac cone with strong anisotropy. To experimentally probe the topological surface state, we synthesized high quality beta-Ag2Te nanoribbons and performed electron transport measurements. The coexistence of pronounced Aharonov-Bohm oscillations and weak Altshuler-Aronov-Spivak oscillations clearly demonstrates coherent electron transport around the perimeter of beta-Ag2Te nanoribbon and therefore the existence of metallic surface states, which is further supported by the temperature dependence of resistivity for beta-Ag2Te nanoribbons with different cross section areas. Highly anisotropic topological surface state of beta-Ag2Te suggests that the material is a promising material for fundamental study and future spintronic devices.

preprint2013arXiv

Surface dominated transport in single crystalline nanoflake devices of topological insulator Bi1.5Sb0.5Te1.8Se1.2

We report experimental evidence of surface dominated transport in single crystalline nanoflake devices of topological insulator Bi1.5Sb0.5Te1.8Se1.2. The resistivity measurements show dramatic difference between the nanoflake devices and bulk single crystal. The resistivity and Hall analysis based on a two-channel model indicates that ~99% surface transport contribution can be realized in 200 nm thick BSTS nanoflake devices. Using standard bottom gate with SiO2 as a dielectric layer, pronounced ambipolar electric field effect was observed in devices fabricated with flakes of 100 - 200 nm thick. Moreover, angle-dependent magneto-resistances of a nanoflake device with thickness of 596 nm are fitted to a universal curve for the perpendicular component of the applied magnetic field. The value of phase coherence length obtained from 2D weak antilocalization fitting further confirmed the surface dominated transport. Our results open a path for realization of novel electric and spintronic devices based on the topological helical surface states.

preprint2013arXiv

Wireless Information and Power Transfer in Two-Way Amplify-and-Forward Relaying Channels

The various wireless networks have made the ambient radio frequency signals around the world. Wireless information and power transfer enables the devices to recycle energy from these ambient radio frequency signals and process information simultaneously. In this paper, we develop a wireless information and power transfer protocol in two-way amplify-and-forward relaying channels, where two sources exchange information via an energy harvesting relay node. The relay node collects energy from the received signals and uses it to provide the transmission power to forward the received signals. We analytically derive the exact expressions of the outage probability, the ergodic capacity and the finite-SNR diversity-multiplexing trade-off (DMT). Furthermore, the tight closed-form upper and lower bounds of the outage probability and the ergodic capacity are then developed. Moreover, the impact of the power splitting ratio is also evaluated and analyzed. Finally, we show that compared to the non-cooperative relaying scheme, the proposed protocol is a green solution to offer higher transmission rate and more reliable communication without consuming additional resource.

preprint2012arXiv

Temperature-dependent terahertz conductivity of topological insulator Bi$_{1.5}$Sb$_{0.5}$Te$_{1.8}$Se$_{1.2}$

Using Terahertz Time-Domain Spectroscopy, we study the temperature-dependent complex optical conductivity of the topological insulator, Bi$_{1.5}$Sb$_{0.5}$Te$_{1.8}$Se$_{1.2}$ single-crystal from 5 K to 150 K in the terahertz regime (0.4 -- 3.0 THz). We analyze our experimental results using the Drude-Lorentz model, with the Drude component representing the metallic surface state and the Lorentz term representing the bulk insulating state. We find the conductivity to be dominated by the Drude contribution, suggesting the presence of metallic surface states. The low-frequency real conductivity follows a thermally-activated behavior. Its origin is also discussed.

Bin Xia

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models

Index-ASR Technical Report

Age of Information-based Scheduling for Wireless D2D Systems with a Deep Learning Approach

Coarse-to-Fine Embedded PatchMatch and Multi-Scale Dynamic Aggregation for Reference-based Super-Resolution

Efficient Non-Local Contrastive Attention for Image Super-Resolution

Polar Transformation Based Multiple Instance Learning Assisting Weakly Supervised Image Segmentation With Loose Bounding Box Annotations

SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization

Cache-enabled Uplink Transmission in Wireless Small Cell Networks

Interference Cancellation at Receivers in Cache-Enabled Wireless Networks

Modeling and Analysis for Cache-enabled Cognitive D2D Communications in Cellular Networks

Modeling and Analysis for Cache-Enabled Networks with Dynamic Traffic

Opportunistic Channel Sharing in Stochastic Networks with Dynamic Traffic

Performance Analysis for Training-Based Multi-Pair Two-Way Full-Duplex Relaying with Massive Antennas

Analysis on Cache-enabled Wireless Heterogeneous Networks

Iterative detection and decoding for SCMA systems with LDPC codes

Optimal Caching Placement for D2D Assisted Wireless Caching Networks

Simplified Multiuser Detection for SCMA with Sum-Product Algorithm

When ICN Meets C-RAN for HetNets: An SDN Approach

Beta-Ag2Te: A topological insulator with strong anisotropy

Surface dominated transport in single crystalline nanoflake devices of topological insulator Bi1.5Sb0.5Te1.8Se1.2

Wireless Information and Power Transfer in Two-Way Amplify-and-Forward Relaying Channels

Temperature-dependent terahertz conductivity of topological insulator Bi$_{1.5}$Sb$_{0.5}$Te$_{1.8}$Se$_{1.2}$