Researcher profile

Yanan Li

Yanan Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Crowded Video Individual Counting Informed by Social Grouping and Spatial-Temporal Displacement Priors

Video Individual Counting (VIC) is a recently introduced task aiming to estimate pedestrian flux from a video. It extends Video Crowd Counting (VCC) beyond the per-frame pedestrian count. In contrast to VCC that learns to count pedestrians across frames, VIC must identify co-existent pedestrians between frames, which turns out to be a correspondence problem. Existing VIC approaches, however, can underperform in congested scenes such as metro commuting. To address this, we build WuhanMetroCrowd, one of the first VIC datasets that characterize crowded, dynamic pedestrian flows. It features sparse-to-dense density levels, short-to-long video clips, slow-to-fast flow variations, front-to-back appearance changes, and light-to-heavy occlusions. To better adapt VIC approaches to crowds, we rethink the nature of VIC and recognize two informative priors: i) the social grouping prior that indicates pedestrians tend to gather in groups and ii) the spatial-temporal displacement prior that informs an individual cannot teleport physically. The former inspires us to relax the standard one-to-one (O2O) matching used by VIC to one-to-many (O2M) matching, implemented by an implicit context generator and a O2M matcher; the latter facilitates the design of a displacement prior injector, which strengthens not only O2M matching but also feature extraction and model training. These designs jointly form a novel and strong VIC baseline OMAN++. Extensive experiments show that OMAN++ not only outperforms state-of-the-art VIC baselines on the standard SenseCrowd, CroHD, and MovingDroneCrowd benchmarks, but also indicates a clear advantage in crowded scenes, with a 38.12% error reduction on our WuhanMetroCrowd dataset. Code, data, and pretrained models are available at https://github.com/tiny-smart/OMAN.

preprint2026arXiv

DepthCropSeg++: Scaling a Crop Segmentation Foundation Model With Depth-Labeled Data

DepthCropSeg++: a foundation model for crop segmentation, capable of segmenting different crop species under open in-field environment. Crop segmentation is a fundamental task for modern agriculture, which closely relates to many downstream tasks such as plant phenotyping, density estimation, and weed control. In the era of foundation models, a number of generic large language and vision models have been developed. These models have demonstrated remarkable real world generalization due to significant model capacity and largescale datasets. However, current crop segmentation models mostly learn from limited data due to expensive pixel-level labelling cost, often performing well only under specific crop types or controlled environment. In this work, we follow the vein of our previous work DepthCropSeg, an almost unsupervised approach to crop segmentation, to scale up a cross-species and crossscene crop segmentation dataset, with 28,406 images across 30+ species and 15 environmental conditions. We also build upon a state-of-the-art semantic segmentation architecture ViT-Adapter architecture, enhance it with dynamic upsampling for improved detail awareness, and train the model with a two-stage selftraining pipeline. To systematically validate model performance, we conduct comprehensive experiments to justify the effectiveness and generalization capabilities across multiple crop datasets. Results demonstrate that DepthCropSeg++ achieves 93.11% mIoU on a comprehensive testing set, outperforming both supervised baselines and general-purpose vision foundation models like Segmentation Anything Model (SAM) by significant margins (+0.36% and +48.57% respectively). The model particularly excels in challenging scenarios including night-time environment (86.90% mIoU), high-density canopies (90.09% mIoU), and unseen crop varieties (90.09% mIoU), indicating a new state of the art for crop segmentation.

preprint2025arXiv

FitControler: Toward Fit-Aware Virtual Try-On

Realistic virtual try-on (VTON) concerns not only faithful rendering of garment details but also coordination of the style. Prior art typically pursues the former, but neglects a key factor that shapes the holistic style -- garment fit. Garment fit delineates how a garment aligns with the body of a wearer and is a fundamental element in fashion design. In this work, we introduce fit-aware VTON and present FitControler, a learnable plug-in that can seamlessly integrate into modern VTON models to enable customized fit control. To achieve this, we highlight two challenges: i) how to delineate layouts of different fits and ii) how to render the garment that matches the layout. FitControler first features a fit-aware layout generator to redraw the body-garment layout conditioned on a set of delicately processed garment-agnostic representations, and a multi-scale fit injector is then used to deliver layout cues to enable layout-driven VTON. In particular, we build a fit-aware VTON dataset termed Fit4Men, including 13,000 body-garment pairs of different fits, covering both tops and bottoms, and featuring varying camera distances and body poses. Two fit consistency metrics are also introduced to assess the fitness of generations. Extensive experiments show that FitControler can work with various VTON models and achieve accurate fit control. Code and data will be released.

preprint2022arXiv

Continuity of the attractors in time-dependent spaces and applications

In this paper, we investigate the continuity of the attractors in time-dependent phase spaces. (i) We establish two abstract criteria on the upper semicontinuity and the residual continuity of the pullback $\mathscr D$-attractor with respect to the perturbations, and an equivalence criterion between their continuity and the pullback equi-attraction, which generalize the continuity theory of attractors developed recently in [27,28] to that in time-dependent spaces. (ii) We propose the notion of pullback $\mathscr D$-exponential attractor, which includes the notion of time-dependent exponential attractor [33] as its spacial case, and establish its existence and Hölder continuity criterion via quasi-stability method introduced originally by Chueshov and Lasiecka [12,13]. (iii) We apply above-mentioned criteria to the semilinear damped wave equations with perturbed time-dependent speed of propagation: $\eρ(t) u_{tt}+αu_t -Δu+f(u)=g$, with perturbation parameter $\e\in(0, 1]$, to realize above mentioned continuity of pullback $\mathscr D$ and $\mathscr D$-exponential attractors in time-dependent phase spaces, and the method developed here allows to overcome the difficulty of the hyperbolicity of the model. These results deepen and extend recent theory of attractors in time-dependent spaces in literatures [15,20,19].

preprint2022arXiv

Dual Path Structural Contrastive Embeddings for Learning Novel Objects

Learning novel classes from a very few labeled samples has attracted increasing attention in machine learning areas. Recent research on either meta-learning based or transfer-learning based paradigm demonstrates that gaining information on a good feature space can be an effective solution to achieve favorable performance on few-shot tasks. In this paper, we propose a simple but effective paradigm that decouples the tasks of learning feature representations and classifiers and only learns the feature embedding architecture from base classes via the typical transfer-learning training strategy. To maintain both the generalization ability across base and novel classes and discrimination ability within each class, we propose a dual path feature learning scheme that effectively combines structural similarity with contrastive feature construction. In this way, both inner-class alignment and inter-class uniformity can be well balanced, and result in improved performance. Experiments on three popular benchmarks show that when incorporated with a simple prototype based classifier, our method can still achieve promising results for both standard and generalized few-shot problems in either an inductive or transductive inference setting.

preprint2022arXiv

Towards Efficient and Stable K-Asynchronous Federated Learning with Unbounded Stale Gradients on Non-IID Data

Federated learning (FL) is an emerging privacy-preserving paradigm that enables multiple participants collaboratively to train a global model without uploading raw data. Considering heterogeneous computing and communication capabilities of different participants, asynchronous FL can avoid the stragglers effect in synchronous FL and adapts to scenarios with vast participants. Both staleness and non-IID data in asynchronous FL would reduce the model utility. However, there exists an inherent contradiction between the solutions to the two problems. That is, mitigating the staleness requires to select less but consistent gradients while coping with non-IID data demands more comprehensive gradients. To address the dilemma, this paper proposes a two-stage weighted $K$ asynchronous FL with adaptive learning rate (WKAFL). By selecting consistent gradients and adjusting learning rate adaptively, WKAFL utilizes stale gradients and mitigates the impact of non-IID data, which can achieve multifaceted enhancement in training speed, prediction accuracy and training stability. We also present the convergence analysis for WKAFL under the assumption of unbounded staleness to understand the impact of staleness and non-IID data. Experiments implemented on both benchmark and synthetic FL datasets show that WKAFL has better overall performance compared to existing algorithms.

preprint2020arXiv

Improving Tracking through Human-Robot Sensory Augmentation

This paper introduces human-robot sensory augmentation and illustrates it on a tracking task, where performance can be improved by the exchange of sensory information between the robot and its human user. It was recently found that during interaction between humans, the partners use each other's sensory information to improve their own sensing, thus also their performance and learning. In this paper, we develop a computational model of this unique human ability, and use it to build a novel control framework for human-robot interaction. The human partner's control is formulated as a feedback control with unknown control gains and desired trajectory. A Kalman filter is used to estimate first the control gains and then the desired trajectory. The estimated human partner's desired trajectory is used as augmented sensory information about the system and combined with the robot's measurement to estimate an uncertain target trajectory. Simulations and an implementation of the presented framework on a robotic interface validate the proposed observer-predictor pair for a tracking task. The results obtained using this robot demonstrate how the human user's control can be identified, and exhibit similar benefits of this sensory augmentation as was observed between interacting humans.

preprint2020arXiv

Possible Lattice and Charge Order in CuxBi2Te2Se

Metal intercalation into layered topological insulator materials such as the binary chalcogenide Bi2X3 (X=Te or Se) has yielded novel two-dimensional electron-gas physics, phase transitions to superconductivity, as well as interesting magnetic ground states. Of recent interest is the intercalation-driven interplay between lattice distortions, density wave ordering, and the emergence of new phenomena in the vicinity of instabilities induced by intercalation. Here, we examine the effects of Cu-intercalation on the ternary chalcogenide Bi2Te2Se. We report the discovery, in Cu0.3Bi2Te2Se, of a periodic lattice distortion at room temperature, together with a charge density wave transition around Td = 220K. We also report, for the first time, a complete study of the CuxBi2Te2Se system, and the effect of Cu-intercalation on crystal structure, phonon structure, and electronic properties for 0.0 $\le$ x $\le$ 0.5. Our electron diffraction studies reveal strong Bragg spots at reciprocal lattice positions forbidden by ABC stacking, possibly resulting from stacking faults, or a superlattice. The c-axis lattice parameter varies monotonically with x for 0 $\lt$ x $\lt$ 0.2, but drops precipitously for higher x. Similarly, Raman phonon modes $A^2_{1g}$ and $E_g$ soften monotonically for 0 $\lt$ x $\lt$ 0.2 but harden sharply for x $\gt$ 0.2. This indicates that Cu likely intercalates up to x $\sim$0.2, followed by partial site-substitutions at higher values. The resulting strain makes the 0.2 $\lt$ x $\lt$ 0.3 region susceptible to instabilities and distortions. Our results point toward the presence of an incommensurate CDW above Td = 220 K. This work strengthens prevalent thought that intercalation contributes significantly to instabilities in the lattice and charge degrees of freedom in layered chalcogenides.