Source author record

Jianyuan Zhong

Jianyuan Zhong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation and Language eess.AS Sound eess.SP math.GT math.QA

Catalog footprint

What is connected

5works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution

Large Language Models (LLMs) are increasingly used to brainstorm and evaluate research ideas, yet assessing such judgments is fundamentally difficult because the true impact of a new idea may take years to emerge. We address this challenge by using the impact forecasting of human-authored manuscripts as a verifiable proxy task. In a prospective forecasting study, we find that frontier LLMs fail to reliably distinguish high-impact papers from ordinary publications, suggesting that static text-based judging is insufficient for scientific evaluation. To address this limitation, we propose $\textbf{FAME}$ ($\underline{\text{F}}$orecasting $\underline{\text{A}}$cademic Impact via Continuous-Time $\underline{\text{M}}$anifold $\underline{\text{E}}$volution), a spatiotemporal framework for modeling the dynamic trajectories of scientific topics. FAME projects papers into a dynamic latent space informed by textual features and a verified knowledge-flow graph, learning geometric constraints that align impactful manuscripts with the forward momentum of their fields. Experiments on 3,200 arXiv papers across three fast-evolving subfields show that FAME consistently and substantially outperforms state-of-the-art LLM evaluators in prospective multidimensional impact forecasting. Furthermore, integrating FAME's dynamic geometric signals into LLMs significantly improves their forecasting performance. These results support manuscript impact forecasting as a useful, measurable proxy benchmark and position FAME as a strong, trajectory-aware foundation for automated scientific evaluation.

preprint2021arXiv

Attention is All You Need in Speech Separation

Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism. In this paper, we propose the SepFormer, a novel RNN-free Transformer-based neural network for speech separation. The SepFormer learns short and long-term dependencies with a multi-scale approach that employs transformers. The proposed model achieves state-of-the-art (SOTA) performance on the standard WSJ0-2/3mix datasets. It reaches an SI-SNRi of 22.3 dB on WSJ0-2mix and an SI-SNRi of 19.5 dB on WSJ0-3mix. The SepFormer inherits the parallelization advantages of Transformers and achieves a competitive performance even when downsampling the encoded representation by a factor of 8. It is thus significantly faster and it is less memory-demanding than the latest speech separation systems with comparable performance.

preprint2020arXiv

Multi-task self-supervised learning for Robust Speech Recognition

Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require manual annotations as ground truth). PASE was shown to capture relevant speech information, including speaker voice-print and phonemes. This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments. To this end, we employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances. We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks. Finally, we refine the set of workers used in self-supervision to encourage better cooperation. Results on TIMIT, DIRHA and CHiME-5 show that PASE+ significantly outperforms both the previous version of PASE as well as common acoustic features. Interestingly, PASE+ learns transferable representations suitable for highly mismatched acoustic conditions.

preprint2019arXiv

UR-FUNNY: A Multimodal Language Dataset for Understanding Humor

Humor is a unique and creative communicative behavior displayed during social interactions. It is produced in a multimodal manner, through the usage of words (text), gestures (vision) and prosodic cues (acoustic). Understanding humor from these three modalities falls within boundaries of multimodal language; a recent research trend in natural language processing that models natural language as it happens in face-to-face communication. Although humor detection is an established research area in NLP, in a multimodal context it is an understudied area. This paper presents a diverse multimodal dataset, called UR-FUNNY, to open the door to understanding multimodal language used in expressing humor. The dataset and accompanying studies, present a framework in multimodal humor detection for the natural language processing community. UR-FUNNY is publicly available for research.

preprint2000arXiv

On The Homflypt Skein Module of S^1 x S^2

Let $k$ be a subring of the field of rational functions in $x, v, s$ which contains $x^{\pm 1}, v^{\pm 1}, s^{\pm 1}$. If $M$ is an oriented 3-manifold, let $S(M)$ denote the Homflypt skein module of $M$ over $k$. This is the free $k$-module generated by isotopy classes of framed oriented links in $M$ quotiented by the Homflypt skein relations: (1) $x^{-1}L_{+}-xL_{-}=(s-s^{-1})L_{0}$; (2) $L$ with a positive twist $=(xv^{-1})L$; (3) $L\sqcup O=(\frac{v-v^{-1}}{s-s^{-1}})L$ where $O$ is the unknot. We give two bases for the relative Homflypt skein module of the solid torus with 2 points in the boundary. The first basis is related to the basis of $S(S^1\times D^2)$ given by V. Turaev and also J. Hoste and M. Kidwell; the second basis is related to a Young idempotent basis for $S(S^1\times D^2)$ based on the work of A. Aiston, H. Morton and C. Blanchet. We prove that if the elements $s^{2n}-1$, for $n$ a nonzero integer, and the elements $s^{2m}-v^{2}$, for any integer $m$, are invertible in $k$, then $S(S^{1} \times S^2)=k$-torsion module $\oplus k$. Here the free part is generated by the empty link $ϕ$. In addition, if the elements $s^{2m}-v^{4}$, for $m$ an integer, are invertible in $k$, then $S(S^{1} \times S^2)$ has no torsion. We also obtain some results for more general $k$.