Source author record

Jiyang Gao

Jiyang Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math.CO Machine Learning math.AC math.AG Robotics

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Acyclic Orientations and the Chromatic Polynomial of Signed Graphs

We present a new correspondence between acyclic orientations and coloring of a signed graph (symmetric graph). Goodall et al. introduced a bivariate chromatic polynomial $χ_G(k,l)$ that counts the number of signed colorings using colors $0,\pm1,\dots,\pm k$ along with $l-1$ symmetric colors $0_1,\dots,0_{l-1}$. We show that the evaluation of the bivariate chromatic polynomial $|χ_G(-1,2)|$ is equal to the number of acyclic orientations of the signed graph modulo the equivalence relation generated by swapping sources and sinks. We present three proofs of this fact, a proof using toric hyperplane arrangements, a proof using deletion-contraction, and a direct proof.

preprint2022arXiv

Balanced shifted tableaux

We introduce balanced shifted tableaux, as an analogue of balanced tableaux of Edelman and Greene, from the perspective of root systems of type B and C. We show that they are equinumerous to standard Young tableaux of the corresponding shifted shape by presenting an explicit bijection.

preprint2020arXiv

CPARR: Category-based Proposal Analysis for Referring Relationships

The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of \texttt{<subject, predicate, object>}. This requires simultaneous localization of the subject and object entities in a specified relationship. We introduce a simple yet effective proposal-based method for referring relationships. Different from the existing methods such as SSAS, our method can generate a high-resolution result while reducing its complexity and ambiguity. Our method is composed of two modules: a category-based proposal generation module to select the proposals related to the entities and a predicate analysis module to score the compatibility of pairs of selected proposals. We show state-of-the-art performance on the referring relationship task on two public datasets: Visual Relationship Detection and Visual Genome.

preprint2020arXiv

STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction

Detecting pedestrians and predicting future trajectories for them are critical tasks for numerous applications, such as autonomous driving. Previous methods either treat the detection and prediction as separate tasks or simply add a trajectory regression head on top of a detector. In this work, we present a novel end-to-end two-stage network: Spatio-Temporal-Interactive Network (STINet). In addition to 3D geometry modeling of pedestrians, we model the temporal information for each of the pedestrians. To do so, our method predicts both current and past locations in the first stage, so that each pedestrian can be linked across frames and the comprehensive spatio-temporal information can be captured in the second stage. Also, we model the interaction among objects with an interaction graph, to gather the information among the neighboring objects. Comprehensive experiments on the Lyft Dataset and the recently released large-scale Waymo Open Dataset for both object detection and future trajectory prediction validate the effectiveness of the proposed method. For the Waymo Open Dataset, we achieve a bird-eyes-view (BEV) detection AP of 80.73 and trajectory prediction average displacement error (ADE) of 33.67cm for pedestrians, which establish the state-of-the-art for both tasks.

preprint2020arXiv

TNT: Target-driveN Trajectory Prediction

Predicting the future behavior of moving agents is essential for real world applications. It is challenging as the intent of the agent and the corresponding behavior is unknown and intrinsically multimodal. Our key insight is that for prediction within a moderate time horizon, the future modes can be effectively captured by a set of target states. This leads to our target-driven trajectory prediction (TNT) framework. TNT has three stages which are trained end-to-end. It first predicts an agent's potential target states $T$ steps into the future, by encoding its interactions with the environment and the other agents. TNT then generates trajectory state sequences conditioned on targets. A final stage estimates trajectory likelihoods and a final compact set of trajectory predictions is selected. This is in contrast to previous work which models agent intents as latent variables, and relies on test-time sampling to generate diverse trajectories. We benchmark TNT on trajectory prediction of vehicles and pedestrians, where we outperform state-of-the-art on Argoverse Forecasting, INTERACTION, Stanford Drone and an in-house Pedestrian-at-Intersection dataset.

preprint2020arXiv

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation

Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e.g. pedestrians and vehicles) and road context information (e.g. lanes, traffic lights). This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components. In contrast to most recent approaches, which render trajectories of moving agents and road context information as bird-eye images and encode them with convolutional neural networks (ConvNets), our approach operates on a vector representation. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. To further boost VectorNet's capability in learning context features, we propose a novel auxiliary task to recover the randomly masked out map entities and agent trajectories based on their context. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset. Our method achieves on par or better performance than the competitive rendering approach on both benchmarks while saving over 70% of the model parameters with an order of magnitude reduction in FLOPs. It also outperforms the state of the art on the Argoverse dataset.

preprint2020arXiv

Virtual Complete Intersections in $\mathbb{P}^1 \times \mathbb{P}^1$

The minimal free resolution of the coordinate ring of a complete intersection in projective space is a Koszul complex on a regular sequence. In the product of projective spaces $\mathbb{P}^1 \times \mathbb{P}^1$, we investigate which sets of points have a virtual resolution that is a Koszul complex on a regular sequence. This paper provides conditions on sets of points; some of which guarantee the points have this property, and some of which guarantee the points do not have this property.

preprint2016arXiv

ACD: Action Concept Discovery from Image-Sentence Corpora

Action classification in still images is an important task in computer vision. It is challenging as the appearances of ac- tions may vary depending on their context (e.g. associated objects). Manually labeling of context information would be time consuming and difficult to scale up. To address this challenge, we propose a method to automatically discover and cluster action concepts, and learn their classifiers from weakly supervised image-sentence corpora. It obtains candidate action concepts by extracting verb-object pairs from sentences and verifies their visualness with the associated images. Candidate action concepts are then clustered by using a multi-modal representation with image embeddings from deep convolutional networks and text embeddings from word2vec. More than one hundred human action concept classifiers are learned from the Flickr 30k dataset with no additional human effort and promising classification results are obtained. We further apply the AdaBoost algorithm to automatically select and combine relevant action concepts given an action query. Promising results have been shown on the PASCAL VOC 2012 action classification benchmark, which has zero overlap with Flickr30k.

preprint2016arXiv

Learning Action Concept Trees and Semantic Alignment Networks from Image-Description Data

Action classification in still images has been a popular research topic in computer vision. Labelling large scale datasets for action classification requires tremendous manual work, which is hard to scale up. Besides, the action categories in such datasets are pre-defined and vocabularies are fixed. However humans may describe the same action with different phrases, which leads to the difficulty of vocabulary expansion for traditional fully-supervised methods. We observe that large amounts of images with sentence descriptions are readily available on the Internet. The sentence descriptions can be regarded as weak labels for the images, which contain rich information and could be used to learn flexible expressions of action categories. We propose a method to learn an Action Concept Tree (ACT) and an Action Semantic Alignment (ASA) model for classification from image-description data via a two-stage learning process. A new dataset for the task of learning actions from descriptions is built. Experimental results show that our method outperforms several baseline methods significantly.

Jiyang Gao

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Acyclic Orientations and the Chromatic Polynomial of Signed Graphs

Balanced shifted tableaux

CPARR: Category-based Proposal Analysis for Referring Relationships

STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction

TNT: Target-driveN Trajectory Prediction

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation

Virtual Complete Intersections in $\mathbb{P}^1 \times \mathbb{P}^1$

ACD: Action Concept Discovery from Image-Sentence Corpora

Learning Action Concept Trees and Semantic Alignment Networks from Image-Description Data