Source author record

Yipeng Zhang

Yipeng Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Information Retrieval Machine Learning Multimedia physics.optics Social and Information Networks

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Action-conditioned On-demand Motion Generation

We propose a novel framework, On-Demand MOtion Generation (ODMO), for generating realistic and diverse long-term 3D human motion sequences conditioned only on action types with an additional capability of customization. ODMO shows improvements over SOTA approaches on all traditional motion evaluation metrics when evaluated on three public datasets (HumanAct12, UESTC, and MoCap). Furthermore, we provide both qualitative evaluations and quantitative metrics demonstrating several first-known customization capabilities afforded by our framework, including mode discovery, interpolation, and trajectory customization. These capabilities significantly widen the spectrum of potential applications of such motion generation models. The novel on-demand generative capabilities are enabled by innovations in both the encoder and decoder architectures: (i) Encoder: Utilizing contrastive learning in low-dimensional latent space to create a hierarchical embedding of motion sequences, where not only the codes of different action types form different groups, but within an action type, codes of similar inherent patterns (motion styles) cluster together, making them readily discoverable; (ii) Decoder: Using a hierarchical decoding strategy where the motion trajectory is reconstructed first and then used to reconstruct the whole motion sequence. Such an architecture enables effective trajectory control. Our code is released on the Github page: https://github.com/roychowdhuryresearch/ODMO

preprint2022arXiv

Disentangling Transfer and Interference in Multi-Domain Learning

Humans are incredibly good at transferring knowledge from one domain to another, enabling rapid learning of new tasks. Likewise, transfer learning has enabled enormous success in many computer vision problems using pretraining. However, the benefits of transfer in multi-domain learning, where a network learns multiple tasks defined by different datasets, has not been adequately studied. Learning multiple domains could be beneficial, or these domains could interfere with each other given limited network capacity. Understanding how deep neural networks of varied capacity facilitate transfer across inputs from different distributions is a critical step towards open world learning. In this work, we decipher the conditions where interference and knowledge transfer occur in multi-domain learning. We propose new metrics disentangling interference and transfer, set up experimental protocols, and examine the roles of network capacity, task grouping, and dynamic loss weighting in reducing interference and facilitating transfer.

preprint2022arXiv

Fine-Grained Visual Entailment

Visual entailment is a recently proposed multimodal reasoning task where the goal is to predict the logical relationship of a piece of text to an image. In this paper, we propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image. Unlike prior work, our method is inherently explainable and makes logical predictions at different levels of granularity. Because we lack fine-grained labels to train our method, we propose a novel multi-instance learning approach which learns a fine-grained labeling using only sample-level supervision. We also impose novel semantic structural constraints which ensure that fine-grained predictions are internally semantically consistent. We evaluate our method on a new dataset of manually annotated knowledge elements and show that our method achieves 68.18\% accuracy at this challenging task while significantly outperforming several strong baselines. Finally, we present extensive qualitative results illustrating our method's predictions and the visual evidence our method relied on. Our code and annotated dataset can be found here: https://github.com/SkrighYZ/FGVE.

preprint2020arXiv

A Smartphone-based System for Real-time Early Childhood Caries Diagnosis

Early childhood caries (ECC) is the most common, yet preventable chronic disease in children under the age of 6. Treatments on severe ECC are extremely expensive and unaffordable for socioeconomically disadvantaged families. The identification of ECC in an early stage usually requires expertise in the field, and hence is often ignored by parents. Therefore, early prevention strategies and easy-to-adopt diagnosis techniques are desired. In this study, we propose a multistage deep learning-based system for cavity detection. We create a dataset containing RGB oral images labeled manually by dental practitioners. We then investigate the effectiveness of different deep learning models on the dataset. Furthermore, we integrate the deep learning system into an easy-to-use mobile application that can diagnose ECC from an early stage and provide real-time results to untrained users.

preprint2020arXiv

Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report

This technical report summarizes submissions and compiles from Actor-Action video classification challenge held as a final project in CSC 249/449 Machine Vision course (Spring 2020) at University of Rochester

preprint2020arXiv

Generations of high efficiency, high purity, and broadband Laguerre-Gaussian modes from a Janus optical parametric oscillator

Laguerre-Gaussian (LG) modes, carrying orbital angular momentum of light, are critical for important applications such as high-capacity optical communications, super-resolution imaging, and multi-dimensional quantum entanglement. Advanced developments in these applications strongly demand reliable and tunable LG mode laser sources, which, however, do not yet exist. Here, we experimentally demonstrate highly-efficient, highly-pure, broadly-tunable, and topological-charge-controllable LG modes from a Janus optical parametric oscillator (OPO). Janus OPO featuring two-face cavity mode is designed to guarantee an efficient evolution from a Gaussian-shaped fundamental pumping mode to a desired LG parametric mode. The output LG mode has a tunable wavelength between 1.5 um and 1.6 um with a conversion efficiency above 15%, a topological charge switchable from -4 to 4, and a mode purity as high as 97%, which provides a high-performance solid-state light source for high-end demands in multi-dimensional multiplexing/demultiplexing, control of spin-orbital coupling between light and atoms, and so on.

preprint2020arXiv

Monitoring Depression Trend on Twitter during the COVID-19 Pandemic

The COVID-19 pandemic has severely affected people's daily lives and caused tremendous economic loss worldwide. However, its influence on people's mental health conditions has not received as much attention. To study this subject, we choose social media as our main data resource and create by far the largest English Twitter depression dataset containing 2,575 distinct identified depression users with their past tweets. To examine the effect of depression on people's Twitter language, we train three transformer-based depression classification models on the dataset, evaluate their performance with progressively increased training sizes, and compare the model's "tweet chunk"-level and user-level performances. Furthermore, inspired by psychological studies, we create a fusion classifier that combines deep learning model scores with psychological text features and users' demographic information and investigate these features' relations to depression signals. Finally, we demonstrate our model's capability of monitoring both group-level and population-level depression trends by presenting two of its applications during the COVID-19 pandemic. We hope this study can raise awareness among researchers and the general public of COVID-19's impact on people's mental health.

Yipeng Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Action-conditioned On-demand Motion Generation

Disentangling Transfer and Interference in Multi-Domain Learning

Fine-Grained Visual Entailment

A Smartphone-based System for Real-time Early Childhood Caries Diagnosis

Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report

Generations of high efficiency, high purity, and broadband Laguerre-Gaussian modes from a Janus optical parametric oscillator

Monitoring Depression Trend on Twitter during the COVID-19 Pandemic