Source author record

Jiawei Ge

Jiawei Ge appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence physics.ins-det

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ViewSAM: Learning View-aware Cross-modal Semantics for Weakly Supervised Cross-view Referring Multi-Object Tracking

Cross-view Referring Multi-Object Tracking (CRMOT) aims to track multiple objects specified by natural language across multiple camera views, with globally consistent identities. Despite recent progress, existing methods rely heavily on costly frame-level spatial annotations and cross-view identity supervision. To reduce such reliance, we explore CRMOT under weak supervision by leveraging the capabilities of foundation models. However, our empirical study shows that directly applying foundation models such as SAM2 and SAM3, even with task-specific modifications, fails to accurately understand referring expressions and maintain consistent identities across views. Yet, they remain effective at producing reliable object tracklets that can serve as pseudo supervision. We therefore repurpose foundation models as pseudo-label generators and propose a two-stage framework for weakly supervised CRMOT, using only object category labels as coarse-grained supervision. In the first stage, we design an Affinity-guided Cross-view Re-prompting strategy to refine and associate SAM3-generated tracklets across cameras, producing reliable cross-view pseudo labels for subsequent training. In the second stage, we introduce ViewSAM, a CRMOT model built upon SAM2 that explicitly models view-aware cross-modal semantics. By formulating view-induced variations as learnable conditions, ViewSAM bridges the gap between view-variant visual observations and view-invariant textual expressions, enabling robust cross-view referring tracking with only approximately 10% additional parameters. Extensive experiments demonstrate that ViewSAM achieves SOTA performance under weak supervision and remains competitive with fully supervised methods.

preprint2024arXiv

Query-Based Knowledge Sharing for Open-Vocabulary Multi-Label Classification

Identifying labels that did not appear during training, known as multi-label zero-shot learning, is a non-trivial task in computer vision. To this end, recent studies have attempted to explore the multi-modal knowledge of vision-language pre-training (VLP) models by knowledge distillation, allowing to recognize unseen labels in an open-vocabulary manner. However, experimental evidence shows that knowledge distillation is suboptimal and provides limited performance gain in unseen label prediction. In this paper, a novel query-based knowledge sharing paradigm is proposed to explore the multi-modal knowledge from the pretrained VLP model for open-vocabulary multi-label classification. Specifically, a set of learnable label-agnostic query tokens is trained to extract critical vision knowledge from the input image, and further shared across all labels, allowing them to select tokens of interest as visual clues for recognition. Besides, we propose an effective prompt pool for robust label embedding, and reformulate the standard ranking learning into a form of classification to allow the magnitude of feature vectors for matching, which both significantly benefit label recognition. Experimental results show that our framework significantly outperforms state-of-the-art methods on zero-shot task by 5.9% and 4.5% in mAP on the NUS-WIDE and Open Images, respectively.

preprint2021arXiv

Positive Pressure Testing Booths Development and Deployment In Response To The COVID-19 Outbreak

The COVID-19 pandemic left an unprecedented impact on the general public health, resulting in thousands of deaths in the US alone. Nationwide testing plans were initiated to control the spread, with drive-through being the currently dominant testing approach, which, however, exhausts personal protective equipment supplies, and is unfriendly to individuals not owning a vehicle. Walk-up testing booths are a safe alternative, but are too prohibitively priced on the market to allow for nationwide deployment. In this paper, we present an accessible, mobile, affordable, and safe version of a positive-pressure COVID-19 testing booth. The booths are manufactured using primarily off-the-shelf components from US vendors with minimized customization. The booths' mobility allows them to be easily transported within local communities to test a larger subset of the population with fewer transportation options. Moreover, the final bill of materials does not surpass USD 3,900, which is about half of the market price. The booths are air conditioned and HEPA filtered to offer healthcare providers a safe and comfortable working environment. The prototype passed required pressure and air exchange tests, and was positively reviewed by two healthcare professionals. Currently, five booths are deployed and used at the Johns Hopkins University School of Nursing, Baltimore City Health Department, and two community health centers in Baltimore. Our design facilitates walk-up testing in the US, as it decreases PPE consumption; reduces the risk of infection; and is accessible to lower-income communities and non-drivers.