Researcher profile

Zesheng Ye

Zesheng Ye contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Contrastive Conditional Neural Processes

Conditional Neural Processes~(CNPs) bridge neural networks with probabilistic inference to approximate functions of Stochastic Processes under meta-learning settings. Given a batch of non-{\it i.i.d} function instantiations, CNPs are jointly optimized for in-instantiation observation prediction and cross-instantiation meta-representation adaptation within a generative reconstruction pipeline. There can be a challenge in tying together such two targets when the distribution of function observations scales to high-dimensional and noisy spaces. Instead, noise contrastive estimation might be able to provide more robust representations by learning distributional matching objectives to combat such inherent limitation of generative models. In light of this, we propose to equip CNPs by 1) aligning prediction with encoded ground-truth observation, and 2) decoupling meta-representation adaptation from generative reconstruction. Specifically, two auxiliary contrastive branches are set up hierarchically, namely in-instantiation temporal contrastive learning~({\tt TCL}) and cross-instantiation function contrastive learning~({\tt FCL}), to facilitate local predictive alignment and global function consistency, respectively. We empirically show that {\tt TCL} captures high-level abstraction of observations, whereas {\tt FCL} helps identify underlying functions, which in turn provides more efficient representations. Our model outperforms other CNPs variants when evaluating function distribution reconstruction and parameter identification across 1D, 2D and high-dimensional time-series.

preprint2022arXiv

IDNP: Interest Dynamics Modeling using Generative Neural Processes for Sequential Recommendation

Recent sequential recommendation models rely increasingly on consecutive short-term user-item interaction sequences to model user interests. These approaches have, however, raised concerns about both short- and long-term interests. (1) {\it short-term}: interaction sequences may not result from a monolithic interest, but rather from several intertwined interests, even within a short period of time, resulting in their failures to model skip behaviors; (2) {\it long-term}: interaction sequences are primarily observed sparsely at discrete intervals, other than consecutively over the long run. This renders difficulty in inferring long-term interests, since only discrete interest representations can be derived, without taking into account interest dynamics across sequences. In this study, we address these concerns by learning (1) multi-scale representations of short-term interests; and (2) dynamics-aware representations of long-term interests. To this end, we present an \textbf{I}nterest \textbf{D}ynamics modeling framework using generative \textbf{N}eural \textbf{P}rocesses, coined IDNP, to model user interests from a functional perspective. IDNP learns a global interest function family to define each user's long-term interest as a function instantiation, manifesting interest dynamics through function continuity. Specifically, IDNP first encodes each user's short-term interactions into multi-scale representations, which are then summarized as user context. By combining latent global interest with user context, IDNP then reconstructs long-term user interest functions and predicts interactions at upcoming query timestep. Moreover, IDNP can model such interest functions even when interaction sequences are limited and non-consecutive. Extensive experiments on four real-world datasets demonstrate that our model outperforms state-of-the-arts on various evaluation metrics.

preprint2022arXiv

See What You See: Self-supervised Cross-modal Retrieval of Visual Stimuli from Brain Activity

Recent studies demonstrate the use of a two-stage supervised framework to generate images that depict human perception to visual stimuli from EEG, referring to EEG-visual reconstruction. They are, however, unable to reproduce the exact visual stimulus, since it is the human-specified annotation of images, not their data, that determines what the synthesized images are. Moreover, synthesized images often suffer from noisy EEG encodings and unstable training of generative models, making them hard to recognize. Instead, we present a single-stage EEG-visual retrieval paradigm where data of two modalities are correlated, as opposed to their annotations, allowing us to recover the exact visual stimulus for an EEG clip. We maximize the mutual information between the EEG encoding and associated visual stimulus through optimization of a contrastive self-supervised objective, leading to two additional benefits. One, it enables EEG encodings to handle visual classes beyond seen ones during training, since learning is not directed at class annotations. In addition, the model is no longer required to generate every detail of the visual stimulus, but rather focuses on cross-modal alignment and retrieves images at the instance level, ensuring distinguishable model output. Empirical studies are conducted on the largest single-subject EEG dataset that measures brain activities evoked by image stimuli. We demonstrate the proposed approach completes an instance-level EEG-visual retrieval task which existing methods cannot. We also examine the implications of a range of EEG and visual encoder structures. Furthermore, for a mostly studied semantic-level EEG-visual classification task, despite not using class annotations, the proposed method outperforms state-of-the-art supervised EEG-visual reconstruction approaches, particularly on the capability of open class recognition.