Source author record

Jia Wan

Jia Wan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Computational Complexity Data Structures and Algorithms Discrete Mathematics Machine Learning math.CO

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Video Individual Counting and Tracking from Moving Drones: A Benchmark and Methods

Counting and tracking dense crowds in large-scale scenes is highly challenging, yet existing methods mainly rely on datasets captured by fixed cameras, which provide limited spatial coverage and are inadequate for large-scale dense crowd analysis. To address this limitation, we propose a flexible solution using moving drones to capture videos and perform video-level crowd counting and tracking of unique pedestrians across entire scenes. We introduce MovingDroneCrowd++, the largest video-level dataset for dense crowd counting and tracking captured by moving drones, covering diverse and complex conditions with varying flight altitudes, camera angles, and illumination. Existing methods fail to achieve satisfactory performance on this dataset. To this end, we propose GD3A (Global Density Map Decomposition via Descriptor Association), a density map-based video individual counting method that avoids explicit localization. GD3A establishes pixel-level correspondences between pedestrian descriptors across consecutive frames via optimal transport with an adaptive dustbin score, enabling the decomposition of global density maps into shared, inflow, and outflow components. Building on this framework, we further introduce DVTrack, which converts descriptor-level matching into instance-level associations through a descriptor voting mechanism for pedestrian tracking. Experimental results show that our methods significantly outperform existing approaches under dense crowds and complex motion, reducing counting error by 47.4 percent and improving tracking performance by 39.2 percent.

preprint2023arXiv

Internal Closedness and von Neumann-Morgenstern Stability in Matching Theory: Structures and Complexity

Let $G$ be a graph and suppose we are given, for each $v \in V(G)$, a strict ordering of the neighbors of $v$. A set of matchings ${\cal M}$ of $G$ is called internally stable if there are no matchings $M,M' \in {\cal M}$ such that an edge of $M$ blocks $M'$. The sets of stable (à la Gale and Shapley) matchings and of von Neumann-Morgenstern stable matchings are examples of internally stable sets of matching. In this paper, we study, in both the marriage and the roommate case, inclusionwise maximal internally stable sets of matchings. We call those sets internally closed. By building on known and newly developed algebraic structures associated to sets of matchings, we investigate the complexity of deciding if a set of matchings is internally closed or von Neumann-Morgenstern stable, and of finding sets with those properties.

preprint2022arXiv

Weight Rescaling: Effective and Robust Regularization for Deep Neural Networks with Batch Normalization

Weight decay is often used to ensure good generalization in the training practice of deep neural networks with batch normalization (BN-DNNs), where some convolution layers are invariant to weight rescaling due to the normalization. In this paper, we demonstrate that the practical usage of weight decay still has some unsolved problems in spite of existing theoretical work on explaining the effect of weight decay in BN-DNNs. On the one hand, when the non-adaptive learning rate e.g. SGD with momentum is used, the effective learning rate continues to increase even after the initial training stage, which leads to an overfitting effect in many neural architectures. On the other hand, in both SGDM and adaptive learning rate optimizers e.g. Adam, the effect of weight decay on generalization is quite sensitive to the hyperparameter. Thus, finding an optimal weight decay parameter requires extensive parameter searching. To address those weaknesses, we propose to regularize the weight norm using a simple yet effective weight rescaling (WRS) scheme as an alternative to weight decay. WRS controls the weight norm by explicitly rescaling it to the unit norm, which prevents a large increase to the gradient but also ensures a sufficiently large effective learning rate to improve generalization. On a variety of computer vision applications including image classification, object detection, semantic segmentation and crowd counting, we show the effectiveness and robustness of WRS compared with weight decay, implicit weight rescaling (weight standardization) and gradient projection (AdamP).

preprint2020arXiv

Fine-Grained Crowd Counting

Current crowd counting algorithms are only concerned about the number of people in an image, which lacks low-level fine-grained information of the crowd. For many practical applications, the total number of people in an image is not as useful as the number of people in each sub-category. E.g., knowing the number of people waiting inline or browsing can help retail stores; knowing the number of people standing/sitting can help restaurants/cafeterias; knowing the number of violent/non-violent people can help police in crowd management. In this paper, we propose fine-grained crowd counting, which differentiates a crowd into categories based on the low-level behavior attributes of the individuals (e.g. standing/sitting or violent behavior) and then counts the number of people in each category. To enable research in this area, we construct a new dataset of four real-world fine-grained counting tasks: traveling direction on a sidewalk, standing or sitting, waiting in line or not, and exhibiting violent behavior or not. Since the appearance features of different crowd categories are similar, the challenge of fine-grained crowd counting is to effectively utilize contextual information to distinguish between categories. We propose a two branch architecture, consisting of a density map estimation branch and a semantic segmentation branch. We propose two refinement strategies for improving the predictions of the two branches. First, to encode contextual information, we propose feature propagation guided by the density map prediction, which eliminates the effect of background features during propagation. Second, we propose a complementary attention model to share information between the two branches. Experiment results confirm the effectiveness of our method.