Source author record

Jingyi Lu

Jingyi Lu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision Computation and Language eess.SY Information Retrieval Machine Learning Systems and Control

Catalog footprint

What is connected

4works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Semantic-Enriched Latent Visual Reasoning

Multimodal latent-space reasoning aims to replace explicit thinking with images by performing visual reasoning directly in a compact latent space. However, existing approaches largely rely on visual supervision and produce latent representations that lack sufficient semantic richness, limiting their ability to support diverse region-level reasoning tasks. In this work, we introduce Semantic-Enriched Latent Visual Reasoning (SLVR), a two-stage learning framework that enriches latent representations with attribute-level visual semantics and aligns them with diverse reasoning objectives. In the first stage, SLVR learns semantically enriched region-centric latents under fine-grained attribute supervision. In the second stage, we design Multi-query Group Relative Policy Optimization (M-GRPO) to align latent representations across multiple queries grounded in the same region. To support this framework, we construct SLV-Set, comprising approximately 400K region-level attribute annotations and 800K multi-query question answering samples, and introduce SV-QA, a benchmark that evaluates latent reasoning under semantic variation. Experiments demonstrate that SLVR improves the robustness and semantic consistency of latent visual reasoning compared to existing baselines.

preprint2026arXiv

VAGS: Velocity Adaptive Guidance Scale for Image Editing and Generation

Classifier-free guidance (CFG) is the primary control over how strongly text semantics move a flow-based sampler, yet standard practice holds its scale fixed across the entire ODE trajectory. This is a fundamental mismatch: early steps are noise-dominated and carry weak semantic signal, while late steps commit image structure and demand stronger directional commitment; more critically, the value of any guidance strength depends on whether the guided velocity is consistent with the model's current dynamics or working against them. We propose \textit{Velocity-Adaptive Guidance Scale} (VAGS), a training-free replacement that multiplies the nominal scale by a bounded factor combining a temporal signal-level term with the cosine similarity between task-relevant velocity fields. For inversion-free editing, VAGS measures the alignment between source- and target-guided velocities, so edit strength at each step reflects local compatibility between preservation and transformation. For generation, VAGS-Gen uses the alignment between unconditional and conditional velocities as the analogous signal. Neither variant requires fine-tuning, auxiliary networks, or extra forward passes, and fixed CFG is recovered as a special case. On PIE-Bench and DIV2K for editing, and COCO17, CUB-200, and Flickr30K for generation, VAGS consistently improves structural fidelity and generation quality over fixed CFG and recent training-free guidance variants. The code is publicly available at https://github.com/Harvard-AI-and-Robotics-Lab/Velocity_Adaptive_Guidance_Scale.

preprint2022arXiv

A Jointly Optimal Design of Control and Scheduling in Networked Systems under Denial-of-Service Attacks

We consider the joint design of control and scheduling under stochastic Denial-of-Service (DoS) attacks in the context of networked control systems. A sensor takes measurements of the system output and forwards its dynamic state estimates to a remote controller over a packet-dropping link. The controller determines the optimal control law for the process using the estimates it receives. An attacker aims at degrading the control performance by increasing the packet-dropout rate with a DoS attack towards the sensor-controller channel. We assume both the controller and the attacker are rational in a game-theoretic sense and establish a partially observable stochastic game to derive the optimal joint design of scheduling and control. Using dynamic programming we prove that the control and scheduling policies can be designed separately without sacrificing optimality, making the problem equivalent to a complete information game. We employ Nash Q-learning to solve the problem and prove that the solution is guaranteed to constitute an $ε$-Nash equilibrium. Numerical examples are provided to illustrate the tradeoffs between control performance and communication cost.

preprint2022arXiv

FORCE: A Framework of Rule-Based Conversational Recommender System

The conversational recommender systems (CRSs) have received extensive attention in recent years. However, most of the existing works focus on various deep learning models, which are largely limited by the requirement of large-scale human-annotated datasets. Such methods are not able to deal with the cold-start scenarios in industrial products. To alleviate the problem, we propose FORCE, a Framework Of Rule-based Conversational Recommender system that helps developers to quickly build CRS bots by simple configuration. We conduct experiments on two datasets in different languages and domains to verify its effectiveness and usability.

Jingyi Lu

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Semantic-Enriched Latent Visual Reasoning

VAGS: Velocity Adaptive Guidance Scale for Image Editing and Generation

A Jointly Optimal Design of Control and Scheduling in Networked Systems under Denial-of-Service Attacks

FORCE: A Framework of Rule-Based Conversational Recommender System