Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
24works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

24 published item(s)

preprint2026arXiv

Efficient Beam Selection for ISAC in Cell-Free Massive MIMO via Digital Twin-Assisted Deep Reinforcement Learning

Beamforming enhances signal strength and quality by focusing energy in specific directions. This capability is particularly crucial in cell-free integrated sensing and communication (ISAC) systems, where multiple distributed access points (APs) collaborate to provide both communication and sensing services. In this work, we first derive the distribution of joint target detection probabilities across multiple receiving APs under false alarm rate constraints, and then formulate the beam selection procedure as a Markov decision process (MDP). We establish a deep reinforcement learning (DRL) framework, in which reward shaping and sinusoidal embedding are introduced to facilitate agent learning. To eliminate the high costs and associated risks of real-time agent-environment interactions, we further propose a novel digital twin (DT)-assisted offline DRL approach. Different from traditional online DRL, a conditional generative adversarial network (cGAN)-based DT module, operating as a replica of the real world, is meticulously designed to generate virtual state-action transition pairs and enrich data diversity, enabling offline adjustment of the agent's policy. Additionally, we address the out-of-distribution issue by incorporating an extra penalty term into the loss function design. The convergency of agent-DT interaction and the upper bound of the Q-error function are theoretically derived. Numerical results demonstrate the remarkable performance of our proposed approach, which significantly reduces online interaction overhead while maintaining effective beam selection across diverse conditions including strict false alarm control, low signal-to-noise ratios, and high target velocities.

preprint2024arXiv

Entropy-based Probing Beam Selection and Beam Prediction via Deep Learning

Hierarchical beam search in mmWave communications incurs substantial training overhead, necessitating deep learning-enabled beam predictions to effectively leverage channel priors and mitigate this overhead. In this study, we introduce a comprehensive probabilistic model of power distribution in beamspace, and formulate the joint optimization problem of probing beam selection and probabilistic beam prediction as an entropy minimization problem. Then, we propose a greedy scheme to iteratively and alternately solve this problem, where a transformer-based beam predictor is trained to estimate the conditional power distribution based on the probing beams and user location within each iteration, and the trained predictor selects an unmeasured beam that minimizes the entropy of remaining beams. To further reduce the number of interactions and the computational complexity of the iterative scheme, we propose a two-stage probing beam selection scheme. Firstly, probing beams are selected from a location-specific codebook designed by an entropy-based criterion, and predictions are made with corresponding feedback. Secondly, the optimal beam is identified using additional probing beams with the highest predicted power values. Simulation results demonstrate the superiority of the proposed schemes compared to hierarchical beam search and beam prediction with uniform probing beams.

preprint2023arXiv

Secure Communication for Spatially Correlated RIS-Aided Multiuser Massive MIMO Systems: Analysis and Optimization

This letter investigates the secure communication in a reconfigurable intelligent surface (RIS)-aided multiuser massive multiple-input multiple-output (MIMO) system exploiting artificial noise (AN). We first derive a closed-form expression of the ergodic secrecy rate under spatially correlated MIMO channels. By using this derived result, we further optimize the power fraction of AN in closed form and the RIS phase shifts by developing a gradient-based algorithm, which requires only statistical channel state information (CSI). Our analysis shows that spatial correlation at the RIS provides an additional dimension for optimizing the RIS phase shifts. Numerical simulations validate the analytical results which show the insightful interplay among the system parameters and the degradation of secrecy performance due to high spatial correlation at the RIS.

preprint2022arXiv

An Unsupervised Deep Unrolling Framework for Constrained Optimization Problems in Wireless Networks

In wireless network, the optimization problems generally have complex constraints, and are usually solved via utilizing the traditional optimization methods that have high computational complexity and need to be executed repeatedly with the change of network environments. In this paper, to overcome these shortcomings, an unsupervised deep unrolling framework based on projection gradient descent, i.e., unrolled PGD network (UPGDNet), is designed to solve a family of constrained optimization problems. The set of constraints is divided into two categories according to the coupling relations among optimization variables and the convexity of constraints. One category of constraints includes convex constraints with decoupling among optimization variables, and the other category of constraints includes non-convex or convex constraints with coupling among optimization variables. Then, the first category of constraints is directly projected onto the feasible region, while the second category of constraints is projected onto the feasible region using neural network. Finally, an unrolled sum rate maximization network (USRMNet) is designed based on UPGDNet to solve the weighted SR maximization problem for the multiuser ultra-reliable low latency communication system. Numerical results show that USRMNet has a comparable performance with low computational complexity and an acceptable generalization ability in terms of the user distribution.

preprint2022arXiv

Belief-selective Propagation Detection for MIMO Systems

Compared to the linear MIMO detectors, the Belief Propagation (BP) detector has shown greater capabilities in achieving near optimal performance and better nature to iteratively cooperate with channel decoders. Aiming at real applications, recent works mainly fall into the category of reducing the complexity by simplified calculations, at the expense of performance sacrifice. However, the complexity is still unsatisfactory with exponentially increasing complexity or required exponentiation operations. Furthermore, due to the inherent loopy structure, the existing BP detectors persistently encounter error floor in high signal-to-noise ratio (SNR) region, which becomes even worse with calculation approximation. This work aims at a revised BP detector, named {Belief-selective Propagation (BsP)} detector by selectively utilizing the \emph{trusted} incoming messages with sufficiently large \textit{a priori} probabilities for updates. Two proposed strategies: symbol-based truncation (ST) and edge-based simplification (ES) squeeze the complexity (orders lower than the Original-BP), while greatly relieving the error floor issue over a wide range of antenna and modulation combinations. For the $16$-QAM $8 \times 4$ MIMO system, the $\mathcal{B}(1,1)$ {BsP} detector achieves more than $4$\,dB performance gain (@$\text{BER}=10^{-4}$) with roughly $4$ orders lower complexity than the Original-BP detector. Trade-off between performance and complexity towards different application requirement can be conveniently obtained by configuring the ST and ES parameters.

preprint2022arXiv

Cross-Layer Optimization: Joint User Scheduling and Beamforming Design With QoS Support in Joint Transmission Networks

User scheduling and beamforming design are two crucial yet coupled topics for wireless communication systems. They are usually optimized separately with conventional optimization methods. In this paper, a novel cross-layer optimization problem is considered, namely, the user scheduling and beamforming are jointly discussed subjecting to the requirement of per-user quality of service (QoS) and the maximum allowable transmit power for multicell multiuser joint transmission networks. To achieve the goal, a mixed discrete-continue variables combinational optimization problem is investigated with aiming at maximizing the sum rate of the communication system. To circumvent the original non-convex problem with dynamic solution space, we first transform it into a 0-1 integer and continue variables optimization problem, and then obtain a tractable form with continuous variables by exploiting the characteristics of 0-1 constraint. Finally, the scheduled users and the optimized beamforming vectors are simultaneously calculated by an alternating optimization algorithm. We also theoretically prove that the base stations allocate zero power to the unscheduled users. Furthermore, two heuristic optimization algorithms are proposed respectively based on brute-force search and greedy search. Numerical results validate the effectiveness of our proposed methods, and the optimization approach gets relatively balanced results compared with the other two approaches.

preprint2022arXiv

Efficient Joint DOA and TOA Estimation for Indoor Positioning with 5G Picocell Base Stations

The ubiquity, large bandwidth, and spatial diversity of the fifth generation (5G) cellular signal render it a promising candidate for accurate positioning in indoor environments where the global navigation satellite system (GNSS) signal is absent. In this paper, a joint angle and delay estimation (JADE) scheme is designed for 5G picocell base stations (gNBs) which addresses two crucial issues to make it both effective and efficient in realistic indoor environments. Firstly, the direction-dependence of the array modeling error for picocell gNB as well as its impact on JADE is revealed. This error is mitigated by fitting the array response measurements to a vector-valued function and pre-calibrating the ideal steering-vector with the fitted function. Secondly, based on the deployment reality that 5G picocell gNBs only have a small-scale antenna array but have a large signal bandwidth, the proposed scheme decouples the estimation of time-of-arrival (TOA) and direction-of-arrival (DOA) to reduce the huge complexity induced by two-dimensional joint processing. It employs the iterative-adaptive-approach (IAA) to resolve multipath signals in the TOA domain, followed by a conventional beamformer (CBF) to retrieve the desired line-of-sight DOA. By further exploiting a dimension-reducing pre-processing module and accelerating spectrum computing by fast Fourier transforms, an efficient implementation is achieved for real-time JADE. Numerical simulations demonstrate the superiority of the proposed method in terms of DOA estimation accuracy. Field tests show that a triangulation positioning error of 0.44 m is achieved for 90% cases using only DOAs estimated at two separated receiving points.

preprint2022arXiv

Energy Efficient Beamforming Optimization for Integrated Sensing and Communication

This paper investigates the optimization of beamforming design in a system with integrated sensing and communication (ISAC), where the base station (BS) sends signals for simultaneous multiuser communication and radar sensing. We aim at maximizing the energy efficiency (EE) of the multiuser communication while guaranteeing the sensing requirement in terms of individual radar beampattern gains. The problem is a complicated nonconvex fractional program which is challenging to be solved. By appropriately reformulating the problem and then applying the techniques of successive convex approximation (SCA) and semidefinite relaxation (SDR), we propose an iterative algorithm to address this problem. In theory, we prove that the introduced relaxation of the SDR is rigorously tight. Numerical results validate the effectiveness of the proposed algorithm.

preprint2022arXiv

Experimental Performance Evaluation of Cell-free Massive MIMO Systems Using COTS RRU with OTA Reciprocity Calibration and Phase Synchronization

Downlink coherent multiuser transmission is an essential technique for cell-free massive multiple-input multiple output (MIMO) systems, and the availability of channel state information (CSI) at the transmitter is a basic requirement. To avoid CSI feedback in a time-division duplex system, the uplink channel parameters should be calibrated to obtain the downlink CSI due to the radio frequency circuit mismatch of the transceiver. In this paper, a design of a reference signal for over-the-air reciprocity calibration is proposed. The frequency domain generated reference signals can make full use of the flexible frame structure of the fifth generation (5G) new radio, which can be completely transparent to commercial off-the-shelf (COTS) remote radio units (RRUs) and commercial user equipments. To further obtain the calibration of multiple RRUs, an interleaved RRU grouping with a genetic algorithm is proposed, and an averaged Argos calibration algorithm is also presented. We develop a cell-free massive MIMO prototype system with COTS RRUs, demonstrate the statistical characteristics of the calibration error and the effectiveness of the calibration algorithm, and evaluate the impact of the calibration delay on the different cooperative transmission schemes.

preprint2022arXiv

Joint DoA-Range Estimation Using Space-Frequency Virtual Difference Coarray

In this paper, we address the problem of joint direction-of-arrival (DoA) and range estimation using frequency diverse coprime array (FDCA). By incorporating the coprime array structure and coprime frequency offsets, a two-dimensional space-frequency virtual difference coarray corresponding to uniform array and uniform frequency offset is considered to increase the number of degrees-of-freedom (DoFs). However, the reconstruction of the doubly-Toeplitz covariance matrix is computationally prohibitive. To solve this problem, we propose an interpolation algorithm based on decoupled atomic norm minimization (DANM), which converts the coarray signal to a simple matrix form. On this basis, a relaxation-based optimization problem is formulated to achieve joint DoA-range estimation with enhanced DoFs. The reconstructed coarray signal enables application of existing subspace-based spectral estimation methods. The proposed DANM problem is further reformulated as an equivalent rank-minimization problem which is solved by cyclic rank minimization. This approach avoids the approximation errors introduced in nuclear norm-based approach, thereby achieving superior root-mean-square error which is closer to the Cramer-Rao bound. The effectiveness of proposed method is confirmed by theoretical analyses and numerical simulations.

preprint2022arXiv

Learnable Model-Driven Performance Prediction and Optimization for Imperfect MIMO System: Framework and Application

State-of-the-art schemes for performance analysis and optimization of multiple-input multiple-output systems generally experience degradation or even become invalid in dynamic complex scenarios with unknown interference and channel state information (CSI) uncertainty. To adapt to the challenging settings and better accomplish these network auto-tuning tasks, we propose a generic learnable model-driven framework in this paper. To explain how the proposed framework works, we consider regularized zero-forcing precoding as a usage instance and design a light-weight neural network for refined prediction of sum rate and detection error based on coarse model-driven approximations. Then, we estimate the CSI uncertainty on the learned predictor in an iterative manner and, on this basis, optimize the transmit regularization term and subsequent receive power scaling factors. A deep unfolded projected gradient descent based algorithm is proposed for power scaling, which achieves favorable trade-off between convergence rate and robustness.

preprint2022arXiv

Learning-Aided Beam Prediction in mmWave MU-MIMO Systems for High-Speed Railway

The problem of beam alignment and tracking in high mobility scenarios such as high-speed railway (HSR) becomes extremely challenging, since large overhead cost and significant time delay are introduced for fast time-varying channel estimation. To tackle this challenge, we propose a learning-aided beam prediction scheme for HSR networks, which predicts the beam directions and the channel amplitudes within a period of future time with fine time granularity, using a group of observations. Concretely, we transform the problem of high-dimensional beam prediction into a two-stage task, i.e., a low-dimensional parameter estimation and a cascaded hybrid beamforming operation. In the first stage, the location and speed of a certain terminal are estimated by maximum likelihood criterion, and a data-driven data fusion module is designed to improve the final estimation accuracy and robustness. Then, the probable future beam directions and channel amplitudes are predicted, based on the HSR scenario priors including deterministic trajectory, motion model, and channel model. Furthermore, we incorporate a learnable non-linear mapping module into the overall beam prediction to allow non-linear tracks. Both of the proposed learnable modules are model-based and have a good interpretability. Compared to the existing beam management scheme, the proposed beam prediction has (near) zero overhead cost and time delay. Simulation results verify the effectiveness of the proposed scheme.

preprint2022arXiv

Representation Learning of Knowledge Graph for Wireless Communication Networks

With the application of the fifth-generation wireless communication technologies, more smart terminals are being used and generating huge amounts of data, which has prompted extensive research on how to handle and utilize these wireless data. Researchers currently focus on the research on the upper-layer application data or studying the intelligent transmission methods concerning a specific problem based on a large amount of data generated by the Monte Carlo simulations. This article aims to understand the endogenous relationship of wireless data by constructing a knowledge graph according to the wireless communication protocols, and domain expert knowledge and further investigating the wireless endogenous intelligence. We firstly construct a knowledge graph of the endogenous factors of wireless core network data collected via a 5G/B5G testing network. Then, a novel model based on graph convolutional neural networks is designed to learn the representation of the graph, which is used to classify graph nodes and simulate the relation prediction. The proposed model realizes the automatic nodes classification and network anomaly cause tracing. It is also applied to the public datasets in an unsupervised manner. Finally, the results show that the classification accuracy of the proposed model is better than the existing unsupervised graph neural network models, such as VGAE and ARVGE.

preprint2022arXiv

Spatiotemporal 2-D Channel Coding for Very Low Latency Reliable MIMO Transmission

To fully support vertical industries, 5G and its corresponding channel coding are expected to meet requirements of different applications. However, for applications of 5G and beyond 5G (B5G) such as URLLC, the transmission latency is required to be much shorter than that in eMBB. Therefore, the resulting channel code length reduces drastically. In this case, the traditional 1-D channel coding suffers a lot from the performance degradation and fails to deliver strong reliability with very low latency. To remove this bottleneck, new channel coding scheme beyond the existing 1-D one is in urgent need. By making full use of the spacial freedom of massive MIMO systems, this paper devotes itself in proposing a spatiotemporal 2-D channel coding for very low latency reliable transmission. For a very short time-domain code length $N^{\text{time}}=16$, $64 \times 128$ MIMO system employing the proposed spatiotemporal 2-D coding scheme successfully shows more than $3$\,dB performance gain at $\text{FER}=10^{-3}$, compared to the 1-D time-domain channel coding. It is noted that the proposed coding scheme is suitable for different channel codes and enjoys high flexibility to adapt to difference scenarios. By appropriately selecting the code rate, code length, and the number of codewords in the time and space domains, the proposed coding scheme can achieve a good trade-off between the transmission latency and reliability.

preprint2022arXiv

Unsupervised Recurrent Federated Learning for Edge Popularity Prediction in Privacy-Preserving Mobile Edge Computing Networks

Nowadays wireless communication is rapidly reshaping entire industry sectors. In particular, mobile edge computing (MEC) as an enabling technology for industrial Internet of things (IIoT) brings powerful computing/storage infrastructure closer to the mobile terminals and, thereby, significant lowers the response latency. To reap the benefit of proactive caching at the network edge, precise knowledge on the popularity pattern among the end devices is essential. However, the complex and dynamic nature of the content popularity over space and time as well as the data-privacy requirements in many IIoT scenarios pose tough challenges to its acquisition. In this article, we propose an unsupervised and privacy-preserving popularity prediction framework for MEC-enabled IIoT. The concepts of local and global popularities are introduced and the time-varying popularity of each user is modelled as a model-free Markov chain. On this basis, a novel unsupervised recurrent federated learning (URFL) algorithm is proposed to predict the distributed popularity while achieve privacy preservation and unsupervised training. Simulations indicate that the proposed framework can enhance the prediction accuracy in terms of a reduced root-mean-squared error by up to $60.5\%-68.7\%$. Additionally, manual labeling and violation of users' data privacy are both avoided.

preprint2021arXiv

Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge Caching

Mobile edge computing (MEC) is a prominent computing paradigm which expands the application fields of wireless communication. Due to the limitation of the capacities of user equipments and MEC servers, edge caching (EC) optimization is crucial to the effective utilization of the caching resources in MEC-enabled wireless networks. However, the dynamics and complexities of content popularities over space and time as well as the privacy preservation of users pose significant challenges to EC optimization. In this paper, a privacy-preserving distributed deep deterministic policy gradient (P2D3PG) algorithm is proposed to maximize the cache hit rates of devices in the MEC networks. Specifically, we consider the fact that content popularities are dynamic, complicated and unobservable, and formulate the maximization of cache hit rates on devices as distributed problems under the constraints of privacy preservation. In particular, we convert the distributed optimizations into distributed model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction. Subsequently, a P2D3PG algorithm is developed based on distributed reinforcement learning to solve the distributed problems. Simulation results demonstrate the superiority of the proposed approach in improving EC hit rate over the baseline methods while preserving user privacy.

preprint2021arXiv

Hybrid Policy Learning for Energy-Latency Tradeoff in MEC-Assisted VR Video Service

Virtual reality (VR) is promising to fundamentally transform a broad spectrum of industry sectors and the way humans interact with virtual content. However, despite unprecedented progress, current networking and computing infrastructures are incompetent to unlock VR's full potential. In this paper, we consider delivering the wireless multi-tile VR video service over a mobile edge computing (MEC) network. The primary goal is to minimize the system latency/energy consumption and to arrive at a tradeoff thereof. To this end, we first cast the time-varying view popularity as a model-free Markov chain to effectively capture its dynamic characteristics. After jointly assessing the caching and computing capacities on both the MEC server and the VR playback device, a hybrid policy is then implemented to coordinate the dynamic caching replacement and the deterministic offloading, so as to fully utilize the system resources. The underlying multi-objective problem is reformulated as a partially observable Markov decision process, and a deep deterministic policy gradient algorithm is proposed to iteratively learn its solution, where a long short-term memory neural network is embedded to continuously predict the dynamics of the unobservable popularity. Simulation results demonstrate the superiority of the proposed scheme in achieving a trade-off between the energy efficiency and the latency reduction over the baseline methods.

preprint2021arXiv

Learning Rate Optimization for Federated Learning Exploiting Over-the-air Computation

Federated learning (FL) as a promising edge-learning framework can effectively address the latency and privacy issues by featuring distributed learning at the devices and model aggregation in the central server. In order to enable efficient wireless data aggregation, over-the-air computation (AirComp) has recently been proposed and attracted immediate attention. However, fading of wireless channels can produce aggregate distortions in an AirComp-based FL scheme. To combat this effect, the concept of dynamic learning rate (DLR) is proposed in this work. We begin our discussion by considering multiple-input-single-output (MISO) scenario, since the underlying optimization problem is convex and has closed-form solution. We then extend our studies to more general multiple-input-multiple-output (MIMO) case and an iterative method is derived. Extensive simulation results demonstrate the effectiveness of the proposed scheme in reducing the aggregate distortion and guaranteeing the testing accuracy using the MNIST and CIFAR10 datasets. In addition, we present the asymptotic analysis and give a near-optimal receive beamforming design solution in closed form, which is verified by numerical simulations.

preprint2021arXiv

Rank Minimization-based Toeplitz Reconstruction for DoA Estimation Using Coprime Array

In this paper, we address the problem of direction finding using coprime array, which is one of the most preferred sparse array configurations. Motivated by the fact that non-uniform element spacing hinders full utilization of the underlying information in the receive signals, we propose a direction-of-arrival (DoA) estimation algorithm based on low-rank reconstruction of the Toeplitz covariance matrix. The atomic-norm representation of the measurements from the interpolated virtual array is considered, and the equivalent dual-variable rank minimization problem is formulated and solved using a cyclic optimization approach. The recovered covariance matrix enables the application of conventional subspace-based spectral estimation algorithms, such as MUSIC, to achieve enhanced DoA estimation performance. The estimation performance of the proposed approach, in terms of the degrees-of-freedom and spatial resolution, is examined. We also show the superiority of the proposed method over the competitive approaches in the root-mean-square error sense.

preprint2021arXiv

True-data Testbed for 5G/B5G Intelligent Network

Future beyond fifth-generation (B5G) and sixth-generation (6G) mobile communications will shift from facilitating interpersonal communications to supporting Internet of Everything (IoE), where intelligent communications with full integration of big data and artificial intelligence (AI) will play an important role in improving network efficiency and providing high-quality service. As a rapid evolving paradigm, the AI-empowered mobile communications demand large amounts of data acquired from real network environment for systematic test and verification. Hence, we build the world's first true-data testbed for 5G/B5G intelligent network (TTIN), which comprises 5G/B5G on-site experimental networks, data acquisition & data warehouse, and AI engine & network optimization. In the TTIN, true network data acquisition, storage, standardization, and analysis are available, which enable system-level online verification of B5G/6G-orientated key technologies and support data-driven network optimization through the closed-loop control mechanism. This paper elaborates on the system architecture and module design of TTIN. Detailed technical specifications and some of the established use cases are also showcased.

preprint2020arXiv

Achievable Rate Region of MISO Interference Channel Aided by Intelligent Reflecting Surface

This paper investigates the achievable rate region of the multiple-input single-output (MISO) interference channel aided by intelligent reflecting surfaces (IRSs). We exploit the the additional design degree of freedom provided by the coordinated IRSs to enhance the desired signal and suppress interference so as to enlarge the achievable rate region of the interference channel. To this end, we jointly optimize the active transmit beamforming at the transmitters and passive reflective beamforming at the IRSs, subject to the constant modulus constraints of reflective beamforming vectors. To address the non-convex optimization problem, we propose an iterative algorithm to optimize the transmit beamforming via second-order cone program (SOCP) and the reflective beamforming via the semi-definite relaxation (SDR). Numerical results demonstrate that the performance of the IRS-aided interference channel with the proposed algorithm can significantly outperform the conventional interference channel without IRS.

preprint2020arXiv

Attention Mechanism Enhanced Kernel Prediction Networks for Denoising of Burst Images

Deep learning based image denoising methods have been extensively investigated. In this paper, attention mechanism enhanced kernel prediction networks (AME-KPNs) are proposed for burst image denoising, in which, nearly cost-free attention modules are adopted to first refine the feature maps and to further make a full use of the inter-frame and intra-frame redundancies within the whole image burst. The proposed AME-KPNs output per-pixel spatially-adaptive kernels, residual maps and corresponding weight maps, in which, the predicted kernels roughly restore clean pixels at their corresponding locations via an adaptive convolution operation, and subsequently, residuals are weighted and summed to compensate the limited receptive field of predicted kernels. Simulations and real-world experiments are conducted to illustrate the robustness of the proposed AME-KPNs in burst image denoising.

preprint2020arXiv

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

Instance segmentation is one of the fundamental vision tasks. Recently, fully convolutional instance segmentation methods have drawn much attention as they are often simpler and more efficient than two-stage approaches like Mask R-CNN. To date, almost all such approaches fall behind the two-stage Mask R-CNN method in mask precision when models have similar computation complexity, leaving great room for improvement. In this work, we achieve improved mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity. Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches. The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference. BlendMask can be easily incorporated with the state-of-the-art one-stage detection frameworks and outperforms Mask R-CNN under the same training schedule while being 20% faster. A light-weight version of BlendMask achieves $ 34.2% $ mAP at 25 FPS evaluated on a single 1080Ti GPU card. Because of its simplicity and efficacy, we hope that our BlendMask could serve as a simple yet strong baseline for a wide range of instance-wise prediction tasks. Code is available at https://git.io/AdelaiDet