Source author record

Fei Tao

Fei Tao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Machine Learning Computer Vision cond-mat.mtrl-sci math.AP Sound

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CASHEW: Stabilizing Multimodal Reasoning via Iterative Trajectory Aggregation

Vision-language models achieve strong performance across a wide range of multimodal understanding and reasoning tasks, yet their multi-step reasoning remains unstable. Repeated sampling over the same input often produces divergent reasoning trajectories and inconsistent final predictions. To address this, we introduce two complementary approaches inspired by test-time scaling: (1) CASHEW, an inference-time framework that stabilizes reasoning by iteratively aggregating multiple candidate trajectories into higher-quality reasoning traces, with explicit visual verification filtering hallucinated steps and grounding reasoning in visual evidence, and (2) CASHEW-RL, a learned variant that internalizes this aggregation behavior within a single model. CASHEW-RL is trained using Group Sequence Policy Optimization (GSPO) with a composite reward that encourages correct answers grounded in minimal yet sufficient visual evidence, while adaptively allocating reasoning effort based on task difficulty. This training objective enables robust self-aggregation at inference. Extensive experiments on 13 image understanding, video understanding, and video reasoning benchmarks show significant performance improvements, including gains of up to +23.6 percentage points on ScienceQA and +8.1 percentage points on EgoSchema.

preprint2022arXiv

Almost global smooth solutions of the 3D quasilinear Klein-Gordon equations on the product space $\mathbb{R}^{2}\times \mathbb{T}$

In the paper, for the 3D quasilinear Klein-Gordon equation with the small initial data posed on the product space $\mathbb{R}^{2}\times \mathbb{T}$, we focus on the lower bound of the lifespan of the smooth solution. When the size of initial data is bounded by $\varepsilon_0>0$, by the space-time resonance method, it is shown that smooth solution exists up to the time $e^{c_{0}/\varepsilon_{0}^2}$ with $\varepsilon_0$ being sufficiently small and $c_0>0$ being some suitable constant.

preprint2022arXiv

Crystallographic effects on transgranular chloride-induced stress corrosion crack propagation of arc welded austenitic stainless steel

The effect of crystallography on transgranular chloride-induced stress corrosion cracking (TGCISCC) of arc welded 304L austenitic stainless steel is studied on >300 grains along crack paths. Schmid and Taylor factor mismatches across grain boundaries (GBs) reveal that cracks propagate either from a hard to soft grain, which can be explained merely by mechanical arguments, or soft to hard grain. In the latter case, finite element analysis reveals that TGCISCC will arrest at GBs without sufficient mechanical stress, favorable crystallographic orientations, or crack tip corrosion. GB type does not play a significant role in determining TGCISCC cracking behavior nor susceptibility. TGCISCC crack behaviors at GBs are discussed in the context of the competition between mechanical, crystallographic, and corrosion factors.

preprint2020arXiv

Improving Embedding Extraction for Speaker Verification with Ladder Network

Speaker verification is an established yet challenging task in speech processing and a very vibrant research area. Recent speaker verification (SV) systems rely on deep neural networks to extract high-level embeddings which are able to characterize the users' voices. Most of the studies have investigated on improving the discriminability of the networks to extract better embeddings for performances improvement. However, only few research focus on improving the generalization. In this paper, we propose to apply the ladder network framework in the SV systems, which combines the supervised and unsupervised learning fashions. The ladder network can make the system to have better high-level embedding by balancing the trade-off to keep/discard as much useful/useless information as possible. We evaluated the framework on two state-of-the-art SV systems, d-vector and x-vector, which can be used for different use cases. The experiments showed that the proposed approach relatively improved the performance by 10% at most without adding parameters and augmented data.

preprint2020arXiv

Multi-Task Siamese Neural Network for Improving Replay Attack Detection

Automatic speaker verification systems are vulnerable to audio replay attacks which bypass security by replaying recordings of authorized speakers. Replay attack detection (RA) detection systems built upon Residual Neural Networks (ResNet)s have yielded astonishing results on the public benchmark ASVspoof 2019 Physical Access challenge. With most teams using fine-tuned feature extraction pipelines and model architectures, the generalizability of such systems remains questionable though. In this work, we analyse the effect of discriminative feature learning in a multi-task learning (MTL) setting can have on the generalizability and discriminability of RA detection systems. We use a popular ResNet architecture optimized by the cross-entropy criterion as our baseline and compare it to the same architecture optimized by MTL using Siamese Neural Networks (SNN). It can be shown that SNN outperform the baseline by relative 26.8 % Equal Error Rate (EER). We further enhance the model's architecture and demonstrate that SNN with additional reconstruction loss yield another significant improvement of relative 13.8 % EER.

Fei Tao

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

CASHEW: Stabilizing Multimodal Reasoning via Iterative Trajectory Aggregation

Almost global smooth solutions of the 3D quasilinear Klein-Gordon equations on the product space $\mathbb{R}^{2}\times \mathbb{T}$

Crystallographic effects on transgranular chloride-induced stress corrosion crack propagation of arc welded austenitic stainless steel

Improving Embedding Extraction for Speaker Verification with Ladder Network

Multi-Task Siamese Neural Network for Improving Replay Attack Detection