Source author record

Lu Yang

Lu Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Information Theory math.IT Symbolic Computation Applications Artificial Intelligence eess.SP Machine Learning math.AP math.DS Neurons and Cognition

Catalog footprint

What is connected

21works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Controllable Generation with Text-to-Image Diffusion Models: A Survey

In the rapidly advancing realm of visual generation, diffusion models have revolutionized the landscape, marking a significant shift in capabilities with their impressive text-guided generative functions. However, relying solely on text for conditioning these models does not fully cater to the varied and complex requirements of different applications and scenarios. Acknowledging this shortfall, a variety of studies aim to control pre-trained text-to-image (T2I) models to support novel conditions. In this survey, we undertake a thorough review of the literature on controllable generation with T2I diffusion models, covering both the theoretical foundations and practical advancements in this domain. Our review begins with a brief introduction to the basics of denoising diffusion probabilistic models (DDPMs) and widely used T2I diffusion models. We then reveal the controlling mechanisms of diffusion models, theoretically analyzing how novel conditions are introduced into the denoising process for conditional generation. Additionally, we offer a detailed overview of research in this area, organizing it into distinct categories from the condition perspective: generation with specific conditions, generation with multiple conditions, and universal controllable generation. For an exhaustive list of the controllable generation literature surveyed, please refer to our curated repository at https://github.com/PRIV-Creation/Awesome-Controllable-T2I-Diffusion-Models.

preprint2025arXiv

FitControler: Toward Fit-Aware Virtual Try-On

Realistic virtual try-on (VTON) concerns not only faithful rendering of garment details but also coordination of the style. Prior art typically pursues the former, but neglects a key factor that shapes the holistic style -- garment fit. Garment fit delineates how a garment aligns with the body of a wearer and is a fundamental element in fashion design. In this work, we introduce fit-aware VTON and present FitControler, a learnable plug-in that can seamlessly integrate into modern VTON models to enable customized fit control. To achieve this, we highlight two challenges: i) how to delineate layouts of different fits and ii) how to render the garment that matches the layout. FitControler first features a fit-aware layout generator to redraw the body-garment layout conditioned on a set of delicately processed garment-agnostic representations, and a multi-scale fit injector is then used to deliver layout cues to enable layout-driven VTON. In particular, we build a fit-aware VTON dataset termed Fit4Men, including 13,000 body-garment pairs of different fits, covering both tops and bottoms, and featuring varying camera distances and body poses. Two fit consistency metrics are also introduced to assess the fitness of generations. Extensive experiments show that FitControler can work with various VTON models and achieve accurate fit control. Code and data will be released.

preprint2022arXiv

A Survey on Long-Tailed Visual Recognition

The heavy reliance on data is one of the major reasons that currently limit the development of deep learning. Data quality directly dominates the effect of deep learning models, and the long-tailed distribution is one of the factors affecting data quality. The long-tailed phenomenon is prevalent due to the prevalence of power law in nature. In this case, the performance of deep learning models is often dominated by the head classes while the learning of the tail classes is severely underdeveloped. In order to learn adequately for all classes, many researchers have studied and preliminarily addressed the long-tailed problem. In this survey, we focus on the problems caused by long-tailed data distribution, sort out the representative long-tailed visual recognition datasets and summarize some mainstream long-tailed studies. Specifically, we summarize these studies into ten categories from the perspective of representation learning, and outline the highlights and limitations of each category. Besides, we have studied four quantitative metrics for evaluating the imbalance, and suggest using the Gini coefficient to evaluate the long-tailedness of a dataset. Based on the Gini coefficient, we quantitatively study 20 widely-used and large-scale visual datasets proposed in the last decade, and find that the long-tailed phenomenon is widespread and has not been fully studied. Finally, we provide several future directions for the development of long-tailed learning to provide more ideas for readers.

preprint2022arXiv

CGN: A Capacity-Guaranteed Network Architecture for Future Ultra-Dense Wireless Systems

The sixth generation (6G) era is envisioned to be a fully intelligent and autonomous era, with physical and digital lifestyles merged together. Future wireless network architectures should provide a solid support for such new lifestyles. A key problem thus arises that what kind of network architectures are suitable for 6G. In this paper, we propose a capacity-guaranteed network (CGN) architecture, which provides high capacity for wireless devices densely distributed everywhere, and ensures a superior scalability with low signaling overhead and computation complexity simultaneously. Our theorem proves that the essence of a CGN architecture is to decompose the whole network into non-overlapping clusters with equal cluster sum capacity. Simulation results reveal that in terms of the minimum cluster sum capacity, the proposed CGN can achieve at least 30% performance gain compared with existing base station clustering (BS-clustering) architectures. In addition, our theorem is sufficiently general and can be applied for networks with different distributions of BSs and users.

preprint2022arXiv

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

Our target is to learn visual correspondence from unlabeled videos. We develop LIIR, a locality-aware inter-and intra-video reconstruction framework that fills in three missing pieces, i.e., instance discrimination, location awareness, and spatial compactness, of self-supervised correspondence learning puzzle. First, instead of most existing efforts focusing on intra-video self-supervision only, we exploit cross video affinities as extra negative samples within a unified, inter-and intra-video reconstruction scheme. This enables instance discriminative representation learning by contrasting desired intra-video pixel association against negative inter-video correspondence. Second, we merge position information into correspondence matching, and design a position shifting strategy to remove the side-effect of position encoding during inter-video affinity computation, making our LIIR location-sensitive. Third, to make full use of the spatial continuity nature of video data, we impose a compactness-based constraint on correspondence matching, yielding more sparse and reliable solutions. The learned representation surpasses self-supervised state-of-the-arts on label propagation tasks including objects, semantic parts, and keypoints.

preprint2022arXiv

Multi-Domain Joint Training for Person Re-Identification

Deep learning-based person Re-IDentification (ReID) often requires a large amount of training data to achieve good performance. Thus it appears that collecting more training data from diverse environments tends to improve the ReID performance. This paper re-examines this common belief and makes a somehow surprising observation: using more samples, i.e., training with samples from multiple datasets, does not necessarily lead to better performance by using the popular ReID models. In some cases, training with more samples may even hurt the performance of the evaluation is carried out in one of those datasets. We postulate that this phenomenon is due to the incapability of the standard network in adapting to diverse environments. To overcome this issue, we propose an approach called Domain-Camera-Sample Dynamic network (DCSD) whose parameters can be adaptive to various factors. Specifically, we consider the internal domain-related factor that can be identified from the input features, and external domain-related factors, such as domain information or camera information. Our discovery is that training with such an adaptive model can better benefit from more training samples. Experimental results show that our DCSD can greatly boost the performance (up to 12.3%) while joint training in multiple datasets.

preprint2022arXiv

The Moment Passing Method for Wireless Channel Capacity Estimation

Wireless network capacity can be regarded as the most important performance metric for wireless communication systems. With the fast development of wireless communication technology, future wireless systems will become more and more complicated. As a result, the channel gain matrix will become a large-dimensional random matrix, leading to an extremely high computational cost to obtain the capacity. In this paper, we propose a moment passing method (MPM) to realize the fast and accurate capacity estimation for future ultra-dense wireless systems. It can determine the capacity with quadratic complexity, which is optimal considering that the cost of a single matrix operation is not less than quadratic complexity. Moreover, it has high accuracy. The simulation results show that the estimation error of this method is below 2 percent. Finally, our method is highly general, as it is independent of the distributions of BSs and users, and the shape of network areas. More importantly, it can be applied not only to the conventional multi-user multiple input and multiple output (MU-MIMO) networks, but also to the capacity-centric networks designed for B5G/6G.

preprint2022arXiv

TOSE: A Fast Capacity Estimation Algorithm Based on Spike Approximations

Capacity is one of the most important performance metrics for wireless communication networks. It describes the maximum rate at which the information can be transmitted of a wireless communication system. To support the growing demand for wireless traffic, wireless networks are becoming more dense and complicated, leading to a higher difficulty to derive the capacity. Unfortunately, most existing methods for the capacity calculation take a polynomial time complexity. This will become unaffordable for future ultra-dense networks, where both the number of base stations (BSs) and the number of users are extremely large. In this paper, we propose a fast algorithm TOSE to estimate the capacity for ultra-dense wireless networks. Based on the spiked model of random matrix theory (RMT), our algorithm can avoid the exact eigenvalue derivations of large dimensional matrices, which are complicated and inevitable in conventional capacity calculation methods. Instead, fast eigenvalue estimations can be realized based on the spike approximations in our TOSE algorithm. Our simulation results show that TOSE is an accurate and fast capacity approximation algorithm. Its estimation error is below 5%, and it runs in linear time, which is much lower than the polynomial time complexity of existing methods. In addition, TOSE has superior generality, since it is independent of the distributions of BSs and users, and the shape of network areas.

preprint2021arXiv

On unified framework for nonlinear grey system models: an integro-differential equation perspective

Nonlinear grey system models, serving to time series forecasting, are extensively used in diverse areas of science and engineering. However, most research concerns improving classical models and developing novel models, relatively limited attention has been paid to the relationship among diverse models and the modelling mechanism. The current paper proposes a unified framework and reconstructs the unified model from an integro-differential equation perspective. First, we propose a methodological framework that subsumes various nonlinear grey system models as special cases, providing a cumulative sum series-orientated modelling paradigm. Then, by introducing an integral operator, the unified model is reduced to an equivalent integro-differential equation; on this basis, the structural parameters and initial value are estimated simultaneously via the integral matching approach. The modelling procedure comparison further indicates that the integral matching-based integro-differential equation provides a direct modelling paradigm. Next, large-scale Monte Carlo simulations are conducted to compare the finite sample performance, and the results show that the reduced model has higher accuracy and robustness to noise. Applications of forecasting the municipal sewage discharge and water consumption in the Yangtze River Delta of China further illustrate the effectiveness of the reconstructed nonlinear grey models.

preprint2021arXiv

Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle Re-Identification

Learning cross-view consistent feature representation is the key for accurate vehicle Re-identification (ReID), since the visual appearance of vehicles changes significantly under different viewpoints. To this end, most existing approaches resort to the supervised cross-view learning using extensive extra viewpoints annotations, which however, is difficult to deploy in real applications due to the expensive labelling cost and the continous viewpoint variation that makes it hard to define discrete viewpoint labels. In this study, we present a pluggable Weakly-supervised Cross-View Learning (WCVL) module for vehicle ReID. Through hallucinating the cross-view samples as the hardest positive counterparts in feature domain, we can learn the consistent feature representation via minimizing the cross-view feature distance based on vehicle IDs only without using any viewpoint annotation. More importantly, the proposed method can be seamlessly plugged into most existing vehicle ReID baselines for cross-view learning without re-training the baselines. To demonstrate its efficacy, we plug the proposed method into a bunch of off-the-shelf baselines and obtain significant performance improvement on four public benchmark datasets, i.e., VeRi-776, VehicleID, VRIC and VRAI.

preprint2020arXiv

Efficient Scene Text Detection with Textual Attention Tower

Scene text detection has received attention for years and achieved an impressive performance across various benchmarks. In this work, we propose an efficient and accurate approach to detect multioriented text in scene images. The proposed feature fusion mechanism allows us to use a shallower network to reduce the computational complexity. A self-attention mechanism is adopted to suppress false positive detections. Experiments on public benchmarks including ICDAR 2013, ICDAR 2015 and MSRA-TD500 show that our proposed approach can achieve better or comparable performances with fewer parameters and less computational cost.

preprint2020arXiv

Recurrent Solutions of a Nonautonomous Modified Swift-Hohenberg Equation

We consider recurrent solutions of the nonautonomous modified Swift-Hohenberg equation $$u_t+Δ^2u+2Δu+au+b|\nabla u|^2+u^3=g(t,x).$$ We employ Conley index theory to show that, if the forcing $g:\mathbb{R}\rightarrow L^2(Ω)$ is a recurrent function, then there are at least two recurrent solutions in $H_0^2(Ω)$ under appropriate assumptions on the parameters $a$, $b$ and $g$.

preprint2020arXiv

Renovating Parsing R-CNN for Accurate Multiple Human Parsing

Multiple human parsing aims to segment various human parts and associate each part with the corresponding instance simultaneously. This is a very challenging task due to the diverse human appearance, semantic ambiguity of different body parts, and complex background. Through analysis of multiple human parsing task, we observe that human-centric global perception and accurate instance-level parsing scoring are crucial for obtaining high-quality results. But the most state-of-the-art methods have not paid enough attention to these issues. To reverse this phenomenon, we present Renovating Parsing R-CNN (RP R-CNN), which introduces a global semantic enhanced feature pyramid network and a parsing re-scoring network into the existing high-performance pipeline. The proposed RP R-CNN adopts global semantic representation to enhance multi-scale features for generating human parsing maps, and regresses a confidence score to represent its quality. Extensive experiments show that RP R-CNN performs favorably against state-of-the-art methods on CIHP and MHP-v2 datasets. Code and models are available at https://github.com/soeaver/RP-R-CNN.

preprint2016arXiv

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

Although the latest high-end smartphone has powerful CPU and GPU, running deeper convolutional neural networks (CNNs) for complex tasks such as ImageNet classification on mobile devices is challenging. To deploy deep CNNs on mobile devices, we present a simple and effective scheme to compress the entire CNN, which we call one-shot whole network compression. The proposed scheme consists of three steps: (1) rank selection with variational Bayesian matrix factorization, (2) Tucker decomposition on kernel tensor, and (3) fine-tuning to recover accumulated loss of accuracy, and each step can be easily implemented using publicly available tools. We demonstrate the effectiveness of the proposed scheme by testing the performance of various compressed CNNs (AlexNet, VGGS, GoogLeNet, and VGG-16) on the smartphone. Significant reductions in model size, runtime, and energy consumption are obtained, at the cost of small loss in accuracy. In addition, we address the important implementation level issue on 1?1 convolution, which is a key operation of inception module of GoogLeNet as well as CNNs compressed by our proposed scheme.

preprint2016arXiv

Finding best possible constant for a polynomial inequality

Given a multi-variant polynomial inequality with a parameter, how to find the best possible value of this parameter that satisfies the inequality? For instance, find the greatest number $k$ that satisfies $ a^3+b^3+c^3+ k(a^2b+b^2c+c^2a)-(k+1)(ab^2+bc^2+ca^2)\geq 0 $ for all nonnegative real numbers $ a,b,c $. Analogues problems often appeared in studies of inequalities and were dealt with by various methods. In this paper, a general algorithm is proposed for finding the required best possible constant. The algorithm can be easily implemented by computer algebra tools such as Maple.

preprint2016arXiv

Multi-User Millimeter Wave MIMO with Full-Dimensional Lens Antenna Array

Millimeter wave (mmWave) communication by utilizing lens antenna arrays is a promising technique for realizing cost-effective 5G wireless systems with large MIMO (multiple-input multiple-output) but only limited radio frequency (RF) chains. This paper studies an uplink multi-user mmWave single-sided lens MIMO system, where only the base station (BS) is equipped with a full-dimensional (FD) lens antenna array with both elevation and azimuth angle resolution capabilities, and each mobile station (MS) employs the conventional uniform planar array (UPA) without the lens. By exploiting the angle-dependent energy focusing property of the lens antenna array at the BS as well as the multi-path sparsity of mmWave channels, we propose a low-complexity path division multiple access (PDMA) scheme, which enables virtually interference-free multi-user communications when the angle of arrivals (AoAs) of all MS multi-path signals are sufficiently separable at the BS. To this end, a new technique called path delay compensation is proposed at the BS to effectively transform the multi-user frequency-selective MIMO channels to parallel frequency-flat small-size MIMO channels for different MSs, for each of which the low-complexity single-carrier(SC) transmission is applied. For general scenarios with insufficient AoA separations, analog beamforming at the MSs and digital combining at the BS are jointly designed to maximize the achievable sum-rate of the MSs based on their effective MIMO channels resulting from path delay compensation. In addition, we propose a new and efficient channel estimation scheme tailored for PDMA, which requires negligible training overhead in practical mmWave systems and yet leads to comparable performance as that based on perfect channel state information (CSI).

preprint2015arXiv

Index reduction of differential algebraic equations by differential algebraic elimination

High index differential algebraic equations (DAEs) are ordinary differential equations (ODEs) with constraints and arise frequently from many mathematical models of physical phenomenons and engineering fields. In this paper, we generalize the idea of differential elimination with Dixon resultant to polynomially nonlinear DAEs. We propose a new algorithm for index reduction of DAEs and establish the notion of differential algebraic elimination, which can provide the differential algebraic resultant of the enlarged system of original equations. To make use of structure of DAEs, variable pencil technique is given to determine the termination of differentiation. Moreover, we also provide a heuristics method for removing the extraneous factors from differential algebraic resultant. The experimentation shows that the proposed algorithm outperforms existing ones for many examples taken from the literature.

preprint2014arXiv

A Successive Resultant Projection for Cylindrical Algebraic Decomposition

This note shows the equivalence of two projection operators which both can be used in cylindrical algebraic decomposition (CAD) . One is known as Brown's Projection (C. W. Brown (2001)); the other was proposed by Lu Yang in his earlier work (L.Yang and S.~H. Xia (2000)) that is sketched as follows: given a polynomial $f$ in $x_1,\,x_2,\,\cdots$, by $f_1$ denote the resultant of $f$ and its partial derivative with respect to $x_1$ (removing the multiple factors), by $f_2$ denote the resultant of $f_1$ and its partial derivative with respect to $x_2$, (removing the multiple factors), $\cdots$, repeat this procedure successively until the last resultant becomes a univariate polynomial. Making use of an identity, the equivalence of these two projection operators is evident.

preprint2014arXiv

Synaptotagmin 7 Functions as a Ca2+-sensor for Synaptic Vesicle Replenishment

Synaptotagmin (syt) 7 is one of three syt isoforms found in all metazoans; it is ubiquitously expressed, yet its function in neurons remains obscure. Here, we resolved Ca2+-dependent and Ca2+-independent synaptic vesicle (SV) replenishment pathways, and found that syt 7 plays a selective and critical role in the Ca2+-dependent pathway. Mutations that disrupt Ca2+-binding to syt 7 abolish this function, suggesting that syt 7 functions as a Ca2+-sensor for replenishment. The Ca2+-binding protein calmodulin (CaM) has also been implicated in SV replenishment, and we found that loss of syt 7 was phenocopied by a CaM antagonist. Moreover, we discovered that syt 7 binds to CaM in a highly specific and Ca2+-dependent manner; this interaction requires intact Ca2+-binding sites within syt 7. Together, these data indicate that a complex of two conserved Ca2+-binding proteins, syt 7 and CaM, serve as a key regulator of SV replenishment in presynaptic nerve terminals.

preprint2013arXiv

Deciding Nonnegativity of Polynomials by MAPLE

There have been some effective tools for solving (constant/parametric) semi-algebraic systems in Maple's library RegularChains since Maple 13. By using the functions of the library, e.g., RealRootClassfication, one can prove and discover polynomial inequalities. This paper is more or less a user guide on using RealRootClassfication to prove the nonnegativity of polynomials. We show by examples how to use this powerful tool to prove a polynomial is nonnegative under some polynomial inequality and/or equation constraints. Some tricks for using the tool are also provided.

preprint2012arXiv

On Achievable Degrees of Freedom for MIMO X Channels

In this paper, the achievable DoF of MIMO X channels for constant channel coefficients with $M_t$ antennas at transmitter $t$ and $N_r$ antennas at receiver $r$ ($t,r=1,2$) is studied. A spatial interference alignment and cancelation scheme is proposed to achieve the maximum DoF of the MIMO X channels. The scenario of $M_1\geq M_2\geq N_1\geq N_2$ is first considered and divided into 3 cases, $3N_2<M_1+M_2<2N_1+N_2$ (Case $A$), $M_1+M_2\geq2N_1+N_2$ (Case $B$), and $M_1+M_2\leq3N_2$ (Case $C$). With the proposed scheme, it is shown that in Case $A$, the outer-bound $\frac{M_1+M_2+N_2}{2}$ is achievable; in Case $B$, the achievable DoF equals the outer-bound $N_1+N_2$ if $M_2>N_1$, otherwise it is 1/2 or 1 less than the outer-bound; in Case $C$, the achievable DoF is equal to the outer-bound $2/3(M_1+M_2)$ if $(3N_2-M_1-M_2)\mod 3=0$, and it is 1/3 or 1/6 less than the outer-bound if $(3N_2-M_1-M_2)\mod 3=1 \mathrm{or} 2$. In the scenario of $M_t\leq N_r$, the exact symmetrical results of DoF can be obtained.

Lu Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Controllable Generation with Text-to-Image Diffusion Models: A Survey

FitControler: Toward Fit-Aware Virtual Try-On

A Survey on Long-Tailed Visual Recognition

CGN: A Capacity-Guaranteed Network Architecture for Future Ultra-Dense Wireless Systems

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

Multi-Domain Joint Training for Person Re-Identification

The Moment Passing Method for Wireless Channel Capacity Estimation

TOSE: A Fast Capacity Estimation Algorithm Based on Spike Approximations

On unified framework for nonlinear grey system models: an integro-differential equation perspective

Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle Re-Identification

Efficient Scene Text Detection with Textual Attention Tower

Recurrent Solutions of a Nonautonomous Modified Swift-Hohenberg Equation

Renovating Parsing R-CNN for Accurate Multiple Human Parsing

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

Finding best possible constant for a polynomial inequality

Multi-User Millimeter Wave MIMO with Full-Dimensional Lens Antenna Array

Index reduction of differential algebraic equations by differential algebraic elimination

A Successive Resultant Projection for Cylindrical Algebraic Decomposition

Synaptotagmin 7 Functions as a Ca2+-sensor for Synaptic Vesicle Replenishment

Deciding Nonnegativity of Polynomials by MAPLE

On Achievable Degrees of Freedom for MIMO X Channels