Source author record

Haoyu Qin

Haoyu Qin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Computation and Language Machine Learning physics.optics Robotics

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Microwave vortex beam lasing via photonic time crystals

Microwave lasing carrying orbital angular momentum (OAM) holds significant potential for advanced applications in fields such as high-capacity communications, precision sensing, and radar imaging. However, conventional approaches to masers fail to produce emission with embedded OAM. The recent emergence of photonic time crystals (PTCs)-artificially structured media with periodically varying electromagnetic properties in time-offers a paradigm shift toward resonance-free lasing without the need for gain media. Yet, pioneering PTC designs have been based on three-dimensional bulk structures, which lack a surface-emitting configuration, and do not possess the capability to modulate OAM, thus hindering the realization of surface-emitted PTC masing that carries OAM. Here, we report the first experimental demonstration of non-resonant, gain medium-free, and surface-emitted microwave vortex beam lasing OAM using ring-shaped PTCs. By developing a multiplier-driven time-varying metamaterial that achieves over 100% equivalent permittivity modulation depth, we establish momentum bandgaps (k gaps) with sufficient bandwidth to overcome intrinsic losses and enable self-sustained coherent microwave amplification. Furthermore, space-time modulation induces non-reciprocity between clockwise and counterclockwise k gap modes within the circularly symmetric PTC structure, facilitating the selective generation of microwave lasing carrying OAM-a capability beyond the reach of conventional maser technologies. Our work bridges PTC physics with coherent OAM-carrying microwave emission, establishing a transformative platform for next-generation wireless communications, advanced sensing systems, and OAM-based technologies.

preprint2022arXiv

GLPanoDepth: Global-to-Local Panoramic Depth Estimation

In this paper, we propose a learning-based method for predicting dense depth values of a scene from a monocular omnidirectional image. An omnidirectional image has a full field-of-view, providing much more complete descriptions of the scene than perspective images. However, fully-convolutional networks that most current solutions rely on fail to capture rich global contexts from the panorama. To address this issue and also the distortion of equirectangular projection in the panorama, we propose Cubemap Vision Transformers (CViT), a new transformer-based architecture that can model long-range dependencies and extract distortion-free global features from the panorama. We show that cubemap vision transformers have a global receptive field at every stage and can provide globally coherent predictions for spherical signals. To preserve important local features, we further design a convolution-based branch in our pipeline (dubbed GLPanoDepth) and fuse global features from cubemap vision transformers at multiple scales. This global-to-local strategy allows us to fully exploit useful global and local features in the panorama, achieving state-of-the-art performance in panoramic depth estimation.

preprint2020arXiv

Analogical Reasoning for Visually Grounded Language Acquisition

Children acquire language subconsciously by observing the surrounding world and listening to descriptions. They can discover the meaning of words even without explicit language knowledge, and generalize to novel compositions effortlessly. In this paper, we bring this ability to AI, by studying the task of Visually grounded Language Acquisition (VLA). We propose a multimodal transformer model augmented with a novel mechanism for analogical reasoning, which approximates novel compositions by learning semantic mapping and reasoning operations from previously seen compositions. Our proposed method, Analogical Reasoning Transformer Networks (ARTNet), is trained on raw multimedia data (video frames and transcripts), and after observing a set of compositions such as "washing apple" or "cutting carrot", it can generalize and recognize new compositions in new video frames, such as "washing carrot" or "cutting apple". To this end, ARTNet refers to relevant instances in the training data and uses their visual features and captions to establish analogies with the query image. Then it chooses the suitable verb and noun to create a new composition that describes the new image best. Extensive experiments on an instructional video dataset demonstrate that the proposed method achieves significantly better generalization capability and recognition accuracy compared to state-of-the-art transformer models.

preprint2020arXiv

Asymmetric Rejection Loss for Fairer Face Recognition

Face recognition performance has seen a tremendous gain in recent years, mostly due to the availability of large-scale face images dataset that can be exploited by deep neural networks to learn powerful face representations. However, recent research has shown differences in face recognition performance across different ethnic groups mostly due to the racial imbalance in the training datasets where Caucasian identities largely dominate other ethnicities. This is actually symptomatic of the under-representation of non-Caucasian ethnic groups in the celebdom from which face datasets are usually gathered, rendering the acquisition of labeled data of the under-represented groups challenging. In this paper, we propose an Asymmetric Rejection Loss, which aims at making full use of unlabeled images of those under-represented groups, to reduce the racial bias of face recognition models. We view each unlabeled image as a unique class, however as we cannot guarantee that two unlabeled samples are from a distinct class we exploit both labeled and unlabeled data in an asymmetric manner in our loss formalism. Extensive experiments show our method's strength in mitigating racial bias, outperforming state-of-the-art semi-supervision methods. Performance on the under-represented ethnicity groups increases while that on the well-represented group is nearly unchanged.

Haoyu Qin

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Microwave vortex beam lasing via photonic time crystals

GLPanoDepth: Global-to-Local Panoramic Depth Estimation

Analogical Reasoning for Visually Grounded Language Acquisition

Asymmetric Rejection Loss for Fairer Face Recognition