Source author record

Yuchen Wang

Yuchen Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Machine Learning physics.optics math.AP math.DG math.DS Multiagent Systems Multimedia Networking and Internet Architecture physics.app-ph physics.soc-ph Populations and Evolution quant-ph

Catalog footprint

What is connected

16works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

1-GHz VIS-to-MIR frequency combs enabled by CMOS-compatible nanophotonic waveguides

A fully stabilized frequency comb is essential for precision metrology and coherent optical synthesis. However, fully-stabilized frequency combs generally require separate stages for supercontinuum generation (SCG) and self-referencing, largely limiting their compactness. Here, enabled by the low-threshold multi-octave supercontinuum generation and concurrent third-harmonic generation in low-loss silicon nitride waveguides, we present a novel approach to a self-referenced frequency comb source at 1 GHz repetition rate spanning from the full visible (VIS) to the mid-infrared (MIR). Our coherent comb is seeded by an all-polarization-maintaining ultrafast fiber laser at 1556 nm, with a pulse duration of 73 fs at 1 GHz repetition rate. With an injected energy of merely 110 pJ, the pulses propagate through dispersion-engineered Si3N4 waveguides, generating supercontinuum spanning over three octaves from 350-3280 nm i.e. 0.76 PHz of coherent bandwidth. Moreover, the on-chip third harmonic generation provides a carrier envelope offset beat note via f-3f with a signal-to-noise ratio of 43 dB. Fueled by the evolving photonic integration providing possibilities of on-chip filtering and photo-detectors, this approach for single-chip self-referencing of high-repetition-rate frequency combs paves the way for ultrabroadband comb sources with unprecedented compactness and field-readiness.

preprint2026arXiv

Exploring Data-Free LoRA Transferability for Video Diffusion Models

Video diffusion models leveraging step distillation or causal distillation have achieved remarkable performance. However, adapting existing LoRAs to these variants remains a critical challenge due to weight space mismatches. We observe that direct application leads to style degradation and structural collapse, yet the underlying mechanisms remain poorly understood. To fill this gap, we delve into the weight space and identify that the incompatibility stems from spectral interference within shared functional clusters defined over singular subspaces. Specifically, our analysis reveals that while both paradigms respect spectral rigidity, they establish conflicting routing pathways that clash through constructive overload or destructive cancellation. To address this issue, we propose Cluster-Aware Spectral Arbitration (CASA), a data-free framework that dynamically arbitrates between safeguarding the target's manifold and restoring LoRA alignment based on spectral density. Extensive experiments demonstrate that CASA effectively mitigates artifacts and revives LoRA functionality. Our code is available at https://github.com/Noahwangyuchen/CASA

preprint2026arXiv

InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents

LLM agents can reason and use tools, but they often break down on long-horizon tasks due to unbounded context growth and accumulated errors. Common remedies such as context compression or retrieval-augmented prompting introduce trade-offs between information fidelity and reasoning stability. We present InfiAgent, a general-purpose framework that keeps the agent's reasoning context strictly bounded regardless of task duration by externalizing persistent state into a file-centric state abstraction. At each step, the agent reconstructs context from a workspace state snapshot plus a fixed window of recent actions. Experiments on DeepResearch and an 80-paper literature review task show that, without task-specific fine-tuning, InfiAgent with a 20B open-source model is competitive with larger proprietary systems and maintains substantially higher long-horizon coverage than context-centric baselines. These results support explicit state externalization as a practical foundation for stable long-horizon agents. Github Repo:https://github.com/ChenglinPoly/infiAgent

preprint2026arXiv

Lens: A Knowledge-Guided Foundation Model for Network Traffic

Network traffic refers to the amount of data being sent and received over the Internet or any system that connects computers. Analyzing network traffic is vital for security and management, yet remains challenging due to the heterogeneity of plain-text packet headers and encrypted payloads. To capture the latent semantics of traffic, recent studies have adopted Transformer-based pretraining techniques to learn network representations from massive traffic data. However, these methods pre-train on data-driven tasks but overlook network knowledge, such as masking partial digits of the indivisible network port numbers for prediction, thereby limiting semantic understanding. In addition, they struggle to extend classification to new classes during fine-tuning due to the distribution shift. Motivated by these limitations, we propose \Lens, a unified knowledge-guided foundation model for both network traffic classification and generation. In pretraining, we propose a Knowledge-Guided Mask Span Prediction method with textual context for learning knowledge-enriched representations. For extending to new classes in finetuning, we reframe the traffic classification as a closed-ended generation task and introduce context-aware finetuning to adapt to the distribution shift. Evaluation results across various benchmark datasets demonstrate that the proposed Lens~achieves superior performance on both classification and generation tasks. For traffic classification, Lens~outperforms competitive baselines substantially on 8 out of 12 tasks with an average accuracy of \textbf{96.33\%} and extends to novel classes with significantly better performance. For traffic generation, Lens~generates better high-fidelity network traffic for network simulation, gaining up to \textbf{30.46\%} and \textbf{33.3\%} better accuracy and F1 in fuzzing tests. We will open-source the code upon publication.

preprint2024arXiv

Degenerate bifurcations of two-fold doubly-connected uniformly rotating vortex patches

In this paper, we obtain families of two-fold doubly-connected uniformly rotating vortex patches of the 2-D incompressible Euler equations emanating from some specific annuli. The main difficulty comes from strong degeneracy of the problem, neither the kernel of linearization is one-dimensional nor the transeversallity condition holds. To this end, we make a detailed analysis on the nonlinear functional and the bifurcation curves are obtained by perturbing real algebraic varieties defined by truncated polynomials. In addition, our result partially answers an problem proposed by Hmidi and Mateu in \cite{Hmidi2016a} (\emph{Adv.Math.302 (2016), 799-850}).

preprint2023arXiv

A Monolithic Graphene-Functionalized Microlaser for Multispecies Gas Detection

Optical microcavity enhanced light-matter interaction offers a powerful tool to develop fast and precise sensing techniques, spurring applications in the detection of biochemical targets ranging from cells, nanoparticles, and large molecules. However, the intrinsic inertness of such pristine microresonators limits their spread in new fields such as gas detection. Here, a functionalized microlaser sensor is realized by depositing graphene in an erbium-doped over-modal microsphere. By using a 980 nm pump, multiple laser lines excited in different mode families of the microresonator are co-generated in a single device. The interference between these splitting mode lasers produce beat notes in the electrical domain (0.2-1.1 MHz) with sub-kHz accuracy, thanks to the graphene-induced intracavity backward scattering. This allows for multispecies gas identification from a mixture, and ultrasensitive gas detection down to individual molecule.

preprint2023arXiv

Robustness-enhanced Uplift Modeling with Adversarial Feature Desensitization

Uplift modeling has shown very promising results in online marketing. However, most existing works are prone to the robustness challenge in some practical applications. In this paper, we first present a possible explanation for the above phenomenon. We verify that there is a feature sensitivity problem in online marketing using different real-world datasets, where the perturbation of some key features will seriously affect the performance of the uplift model and even cause the opposite trend. To solve the above problem, we propose a novel robustness-enhanced uplift modeling framework with adversarial feature desensitization (RUAD). Specifically, our RUAD can more effectively alleviate the feature sensitivity of the uplift model through two customized modules, including a feature selection module with joint multi-label modeling to identify a key subset from the input features and an adversarial feature desensitization module using adversarial training and soft interpolation operations to enhance the robustness of the model against this selected subset of features. Finally, we conduct extensive experiments on a public dataset and a real product dataset to verify the effectiveness of our RUAD in online marketing. In addition, we also demonstrate the robustness of our RUAD to the feature sensitivity, as well as the compatibility with different uplift models.

preprint2022arXiv

Ego4D: Around the World in 3,000 Hours of Egocentric Video

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/

preprint2022arXiv

Human-centric Spatio-Temporal Video Grounding via the Combination of Mutual Matching Network and TubeDETR

In this technical report, we represent our solution for the Human-centric Spatio-Temporal Video Grounding (HC-STVG) track of the 4th Person in Context (PIC) workshop and challenge. Our solution is built on the basis of TubeDETR and Mutual Matching Network (MMN). Specifically, TubeDETR exploits a video-text encoder and a space-time decoder to predict the starting time, the ending time and the tube of the target person. MMN detects persons in images, links them as tubes, extracts features of person tubes and the text description, and predicts the similarities between them to choose the most likely person tube as the grounding result. Our solution finally finetunes the results by combining the spatio localization of MMN and with temporal localization of TubeDETR. In the HC-STVG track of the 4th PIC challenge, our solution achieves the third place.

preprint2022arXiv

Key-frame Guided Network for Thyroid Nodule Recognition using Ultrasound Videos

Ultrasound examination is widely used in the clinical diagnosis of thyroid nodules (benign/malignant). However, the accuracy relies heavily on radiologist experience. Although deep learning techniques have been investigated for thyroid nodules recognition. Current solutions are mainly based on static ultrasound images, with limited temporal information used and inconsistent with clinical diagnosis. This paper proposes a novel method for the automated recognition of thyroid nodules through an exhaustive exploration of ultrasound videos and key-frames. We first propose a detection-localization framework to automatically identify the clinical key-frame with a typical nodule in each ultrasound video. Based on the localized key-frame, we develop a key-frame guided video classification model for thyroid nodule recognition. Besides, we introduce a motion attention module to help the network focus on significant frames in an ultrasound video, which is consistent with clinical diagnosis. The proposed thyroid nodule recognition framework is validated on clinically collected ultrasound videos, demonstrating superior performance compared with other state-of-the-art methods.

preprint2022arXiv

Multiplicity of non-contractible closed geodesics on Finsler compact space forms

Let $M=S^n/ Γ$ and $h$ be a nontrivial element of finite order $p$ in $π_1(M)$, where the integer $n, p\geq2$, $Γ$ is a finite abelian group which acts freely and isometrically on the $n$-sphere and therefore $M$ is diffeomorphic to a compact space form. In this paper, we prove that for every irreversible Finsler compact space form $(M,F)$ with reversibility $λ$ and flag curvature $K$ satisfying \[ \frac{4p^2}{(p+1)^2} \big(\fracλ{λ+1} \big)^2 < K \leq 1,\;\;λ< \frac{p+1}{p-1}, \] there exist at least $n-1$ non-contractible closed geodesics of class $[h]$. In addition, if the metric $F$ is bumpy and \[ (\frac{4p}{2p+1})^2 (\fracλ{λ+1})^2 < K \leq 1,\;\;λ<\frac{2p+1}{2p-1}, \] then there exist at least $2[\frac{n+1}{2}]$ non-contractible closed geodesics of class $[h]$, which is the optimal lower bound due to Katok's example. For $C^4$-generic Finsler metrics, there are infinitely many non-contractible closed geodesics of class $[h]$ on $(M, F)$ if $\frac{λ^2}{(λ+1)^2} < K \leq 1$ with $n$ being odd, or $\frac{λ^2}{(λ+1)^2}\frac{4}{(n-1)^2} < K \leq 1$ with $n$ being even.

preprint2022arXiv

Stepwise Goal-Driven Networks for Trajectory Prediction

We propose to predict the future trajectories of observed agents (e.g., pedestrians or vehicles) by estimating and using their goals at multiple time scales. We argue that the goal of a moving agent may change over time, and modeling goals continuously provides more accurate and detailed information for future trajectory estimation. To this end, we present a recurrent network for trajectory prediction, called Stepwise Goal-Driven Network (SGNet). Unlike prior work that models only a single, long-term goal, SGNet estimates and uses goals at multiple temporal scales. In particular, it incorporates an encoder that captures historical information, a stepwise goal estimator that predicts successive goals into the future, and a decoder that predicts future trajectory. We evaluate our model on three first-person traffic datasets (HEV-I, JAAD, and PIE) as well as on three bird's eye view datasets (NuScenes, ETH, and UCY), and show that our model achieves state-of-the-art results on all datasets. Code has been made available at: https://github.com/ChuhuaW/SGNet.pytorch.

preprint2021arXiv

A Flexible Rolling Regression Framework for Time-Varying SIRD models: Application to COVID-19

The present paper introduces a data-driven framework for describing the time-varying nature of an SIRD model in the context of COVID-19. By embedding a rolling regression in a mixed integer bilevel nonlinear programming problem, our aim is to provide the research community with a model that reproduces accurately the observed changes in the number of infected, recovered, and death cases, while providing information about the time dependency of the parameters that govern the SIRD model. We propose this optimization model and a genetic algorithm to tackle its solution. Moreover, we test this algorithm with 2020 COVID-19 data from the state of Minnesota and found that our results are consistent both qualitatively and quantitatively, thus proving that the framework proposed is an effective an flexible tool to describe the dynamics of a pandemic.

preprint2021arXiv

Boosting the SiN nonlinear photonic platform with transition metal dichalcogenide monolayers

In the past few years, we have witnessed an increased interest in the use of 2D materials for the realization of hybrid photonic nonlinear waveguides. Although graphene has attracted most of the attention, other families of 2D materials such as transition metal dichalcogenides have also shown promising nonlinear performances. In this work, we propose a strategy for designing silicon nitride waveguide structures embedded with molybdenum disulfide for nonlinear applications. The transverse geometry of the hybrid waveguides structure is optimized for high third order nonlinear effects using optogeometrical engineering and multiple layers of molybdenum disulfide. Stacking multiple monolayers, results in an improvement of 2 orders of magnitude in comparison with standard silicon nitride waveguides. The performance of the hybrid waveguides is then investigated in terms of four wave mixing enhancement in micro ring resonator configurations. A 6,3 dB signal idler conversion efficiency is reached around 1550 nm wavelength for a 5 mW pumping level.

preprint2021arXiv

Forecasting Black Sigatoka Infection Risks with Latent Neural ODEs

Black Sigatoka disease severely decreases global banana production, and climate change aggravates the problem by altering fungal species distributions. Due to the heavy financial burden of managing this infectious disease, farmers in developing countries face significant banana crop losses. Though scientists have produced mathematical models of infectious diseases, adapting these models to incorporate climate effects is difficult. We present MR. NODE (Multiple predictoR Neural ODE), a neural network that models the dynamics of black Sigatoka infection learnt directly from data via Neural Ordinary Differential Equations. Our method encodes external predictor factors into the latent space in addition to the variable that we infer, and it can also predict the infection risk at an arbitrary point in time. Empirically, we demonstrate on historical climate data that our method has superior generalization performance on time points up to one month in the future and unseen irregularities. We believe that our method can be a useful tool to control the spread of black Sigatoka.

preprint2021arXiv

Statistical Approach to Quantum Phase Estimation

We introduce a new statistical and variational approach to the phase estimation algorithm (PEA). Unlike the traditional and iterative PEAs which return only an eigenphase estimate, the proposed method can determine any unknown eigenstate-eigenphase pair from a given unitary matrix utilizing a simplified version of the hardware intended for the Iterative PEA (IPEA). This is achieved by treating the probabilistic output of an IPEA-like circuit as an eigenstate-eigenphase proximity metric, using this metric to estimate the proximity of the input state and input phase to the nearest eigenstate-eigenphase pair and approaching this pair via a variational process on the input state and phase. This method may search over the entire computational space, or can efficiently search for eigenphases (eigenstates) within some specified range (directions), allowing those with some prior knowledge of their system to search for particular solutions. We show the simulation results of the method with the Qiskit package on the IBM Q platform and on a local computer.

Yuchen Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

1-GHz VIS-to-MIR frequency combs enabled by CMOS-compatible nanophotonic waveguides

Exploring Data-Free LoRA Transferability for Video Diffusion Models

InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents

Lens: A Knowledge-Guided Foundation Model for Network Traffic

Degenerate bifurcations of two-fold doubly-connected uniformly rotating vortex patches

A Monolithic Graphene-Functionalized Microlaser for Multispecies Gas Detection

Robustness-enhanced Uplift Modeling with Adversarial Feature Desensitization

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Human-centric Spatio-Temporal Video Grounding via the Combination of Mutual Matching Network and TubeDETR

Key-frame Guided Network for Thyroid Nodule Recognition using Ultrasound Videos

Multiplicity of non-contractible closed geodesics on Finsler compact space forms

Stepwise Goal-Driven Networks for Trajectory Prediction

A Flexible Rolling Regression Framework for Time-Varying SIRD models: Application to COVID-19

Boosting the SiN nonlinear photonic platform with transition metal dichalcogenide monolayers

Forecasting Black Sigatoka Infection Risks with Latent Neural ODEs

Statistical Approach to Quantum Phase Estimation