Source author record

Yu Su

Yu Su appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence Machine Learning Computer Vision physics.chem-ph Biological Physics cs.CY Distributed, Parallel, and Cluster Computing eess.SP eess.SY Information Retrieval Performance quant-ph Robotics Systems and Control

Catalog footprint

What is connected

14works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Intelligent Multimodal Multi-Sensor Fusion-Based UAV Identification, Localization, and Countermeasures for Safeguarding Low-Altitude Economy

The development of the low-altitude economy has led to a growing prominence of uncrewed aerial vehicle (UAV) safety management issues. Therefore, accurate identification, real-time localization, and effective countermeasures have become core challenges in airspace security assurance. This paper introduces an integrated UAV management and control system based on deep learning, which integrates multimodal multi-sensor fusion perception, precise positioning, and collaborative countermeasures. By incorporating deep learning methods, the system combines radio frequency (RF) spectral feature analysis, radar detection, electro-optical identification, and other methods at the detection level to achieve the identification and classification of UAVs. At the localization level, the system relies on multi-sensor data fusion and the air-space-ground integrated communication network to conduct real-time tracking and prediction of UAV flight status, providing support for early warning and decision-making. At the countermeasure level, it adopts comprehensive measures that integrate ``soft kill'' and ``hard kill'', including technologies such as electromagnetic signal jamming, navigation spoofing, and physical interception, to form a closed-loop management and control process from early warning to final disposal, which significantly enhances the response efficiency and disposal accuracy of low-altitude UAV management.

preprint2026arXiv

Leveraging Latent Visual Reasoning in Silence

Latent visual reasoning involves visual evidence more directly in multimodal reasoning by inserting continuous latent tokens before textual generation. However, the necessity of these latent tokens at inference remains ambiguous. We show that replacing latent tokens with random noise or removing them completely causes little performance degradation across spatial reasoning benchmarks. Reinforcement learning further diminishes the latent generation behavior after post-training. These observations raise a central question: Is latent visual reasoning still meaningful? We argue that its value should be measured by how effectively latent tokens guide learning, rather than whether they persist as an inference-time format. Our analysis shows that latent reasoning is unevenly favorable across question types, yet hard task-level routing for applying latent generation is brittle. Motivated by these findings, we propose an attention-based reward that encourages generated latent tokens to interact with later text tokens during RL. This reward promotes latent utilization when the latent mode is activated while preserving the flexibility to use pure-text reasoning. Experiments show that our method improves performance across perception and visual reasoning benchmarks, even when latent tokens are rarely generated after post-training. Our results highlight that, without explicit expression at inference, latent visual reasoning can shape better visual grounding and more accurate textual reasoning in silence. Our code and trained models are publicly available at \href{https://github.com/ddydyd32/silent-lvr/tree/master}{GitHub} and \href{https://huggingface.co/collections/cornuHGF/silent-lvr}{Hugging Face}.

preprint2022arXiv

Bootstrapping a User-Centered Task-Oriented Dialogue System

We present TacoBot, a task-oriented dialogue system built for the inaugural Alexa Prize TaskBot Challenge, which assists users in completing multi-step cooking and home improvement tasks. TacoBot is designed with a user-centered principle and aspires to deliver a collaborative and accessible dialogue experience. Towards that end, it is equipped with accurate language understanding, flexible dialogue management, and engaging response generation. Furthermore, TacoBot is backed by a strong search engine and an automated end-to-end test suite. In bootstrapping the development of TacoBot, we explore a series of data augmentation strategies to train advanced neural language processing models and continuously improve the dialogue experience with collected real conversations. At the end of the semifinals, TacoBot achieved an average rating of 3.55/5.0.

preprint2022arXiv

Build Smart Grids on Artificial Intelligence -- A Real-world Example

Power grid data are going big with the deployment of various sensors. The big data in power grids creates huge opportunities for applying artificial intelligence technologies to improve resilience and reliability. This paper introduces multiple real-world applications based on artificial intelligence to improve power grid situational awareness and resilience. These applications include event identification, inertia estimation, event location and magnitude estimation, data authentication, control, and stability assessment. These applications are operating on a real-world system called FNET-GridEye, which is a wide-area measurement network and arguably the world-largest cyber-physical system that collects power grid big data. These applications showed much better performance compared with conventional approaches and accomplished new tasks that are impossible to realized using conventional technologies. These encouraging results demonstrate that combining power grid big data and artificial intelligence can uncover and capture the non-linear correlation between power grid data and its stabilities indices and will potentially enable many advanced applications that can significantly improve power grid resilience.

preprint2022arXiv

Coherent excitation energy transfer in model photosynthetic reaction center: Effects of non-Markovian quantum environment

Excitation energy transfer (EET) and electron transfer (ET) are crucially involved in photosynthetic processes. In reality, the photosynthetic reaction center constitutes an open quantum system of EET and ET, which manifests an interplay of pigments, solar light and phonon baths. So far theoretical studies have been mainly based on master equation approaches in the Markovian condition. The non-Markovian environmental effect, which may play a crucial role, has not been sufficiently considered. In this work, we propose a mixed dynamic approach to investigate this open system. The influence of phonon bath is treated via the exact dissipaton equation of motion (DEOM) while that of photon bath is via the Lindblad master equation. Specifically, we explore the effect of non-Markovian quantum phonon bath on the coherent transfer dynamics and its manipulation on the current-voltage behavior. Distinguished from the results of completely Markovian Lindblad equation and those adopting classical environment description, the mixed DEOM-Lindblad simulations exhibit transfer coherence up to a few hundreds femtoseconds and the related environmental manipulation effect on current. These non-Markovian quantum coherent effects be extended tomore complex and realistic systems and be helpful to the design of organic photovoltaic devices.

preprint2022arXiv

Electron transfer under the Floquet modulation in donor-bridge-acceptor systems

Electron transfer (ET) processes are of broad interest in modern chemistry. With the advancements of experimental techniques, one may modulate the ET via such as the light-matter interactions. In this work, we study the ET under a Floquet modulation occurring in the donor-bridge-acceptor systems, with the rate kernels projected out from the exact disspaton equation of motion formalism. This together with the Floquet theorem enables us to investigate the interplay between the intrinsic non-Markovianity and the driving periodicity. The observed rate kernel exhibits a Herzberg-Teller-like mechanism induced by the bridge fluctuation subject to effective modulation.

preprint2022arXiv

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

We study the problem of developing autonomous agents that can follow human instructions to infer and perform a sequence of actions to complete the underlying task. Significant progress has been made in recent years, especially for tasks with short horizons. However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in the middle of the long instructions and eventually fail the task. To address this challenge, we propose a model-agnostic milestone-based task tracker (M-TRACK) to guide the agent and monitor its progress. Specifically, we propose a milestone builder that tags the instructions with navigation and interaction milestones which the agent needs to complete step by step, and a milestone checker that systemically checks the agent's progress in its current milestone and determines when to proceed to the next. On the challenging ALFRED dataset, our M-TRACK leads to a notable 33% and 52% relative improvement in unseen success rate over two competitive base models.

preprint2021arXiv

Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases

Existing studies on question answering on knowledge bases (KBQA) mainly operate with the standard i.i.d assumption, i.e., training distribution over questions is the same as the test distribution. However, i.i.d may be neither reasonably achievable nor desirable on large-scale KBs because 1) true user distribution is hard to capture and 2) randomly sample training examples from the enormous space would be highly data-inefficient. Instead, we suggest that KBQA models should have three levels of built-in generalization: i.i.d, compositional, and zero-shot. To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64,331 questions, GrailQA, and provide evaluation settings for all three levels of generalization. In addition, we propose a novel BERT-based KBQA model. The combination of our dataset and model enables us to thoroughly examine and demonstrate, for the first time, the key role of pre-trained contextual embeddings like BERT in the generalization of KBQA.

preprint2021arXiv

Quality meets Diversity: A Model-Agnostic Framework for Computerized Adaptive Testing

Computerized Adaptive Testing (CAT) is emerging as a promising testing application in many scenarios, such as education, game and recruitment, which targets at diagnosing the knowledge mastery levels of examinees on required concepts. It shows the advantage of tailoring a personalized testing procedure for each examinee, which selects questions step by step, depending on her performance. While there are many efforts on developing CAT systems, existing solutions generally follow an inflexible model-specific fashion. That is, they need to observe a specific cognitive model which can estimate examinee's knowledge levels and design the selection strategy according to the model estimation. In this paper, we study a novel model-agnostic CAT problem, where we aim to propose a flexible framework that can adapt to different cognitive models. Meanwhile, this work also figures out CAT solution with addressing the problem of how to generate both high-quality and diverse questions simultaneously, which can give a comprehensive knowledge diagnosis for each examinee. Inspired by Active Learning, we propose a novel framework, namely Model-Agnostic Adaptive Testing (MAAT) for CAT solution, where we design three sophisticated modules including Quality Module, Diversity Module and Importance Module. Extensive experimental results on two real-world datasets clearly demonstrate that our MAAT can support CAT with guaranteeing both quality and diversity perspectives.

preprint2021arXiv

Task-Oriented Dialogue as Dataflow Synthesis

We describe an approach to task-oriented dialogue in which dialogue state is represented as a dataflow graph. A dialogue agent maps each user utterance to a program that extends this graph. Programs include metacomputation operators for reference and revision that reuse dataflow fragments from previous turns. Our graph-based state enables the expression and manipulation of complex user intents, and explicit metacomputation makes these intents easier for learned models to predict. We introduce a new dataset, SMCalFlow, featuring complex dialogues about events, weather, places, and people. Experiments show that dataflow graphs and metacomputation substantially improve representability and predictability in these natural dialogues. Additional experiments on the MultiWOZ dataset show that our dataflow representation enables an otherwise off-the-shelf sequence-to-sequence model to match the best existing task-specific state tracking model. The SMCalFlow dataset and code for replicating experiments are available at https://www.microsoft.com/en-us/research/project/dataflow-based-dialogue-semantic-machines.

preprint2020arXiv

Communication-Aware Scheduling of Precedence-Constrained Tasks on Related Machines

Scheduling precedence-constrained tasks is a classical problem that has been studied for more than fifty years. However, little progress has been made in the setting where there are communication delays between tasks. Results for the case of identical machines were derived nearly thirty years ago, and yet no results for related machines have followed. In this work, we propose a new scheduler, Generalized Earliest Time First (GETF), and provide the first provable, worst-case approximation guarantees for the goals of minimizing both the makespan and total weighted completion time of tasks with precedence constraints on related machines with machine-dependent communication times.

preprint2020arXiv

Document Classification for COVID-19 Literature

The global pandemic has made it more important than ever to quickly and accurately retrieve relevant scientific literature for effective consumption by researchers in a wide range of fields. We provide an analysis of several multi-label document classification models on the LitCovid dataset, a growing collection of 23,000 research papers regarding the novel 2019 coronavirus. We find that pre-trained language models fine-tuned on this dataset outperform all other baselines and that BioBERT surpasses the others by a small margin with micro-F1 and accuracy scores of around 86% and 75% respectively on the test set. We evaluate the data efficiency and generalizability of these models as essential features of any system prepared to deal with an urgent situation like the current health crisis. Finally, we explore 50 errors made by the best performing models on LitCovid documents and find that they often (1) correlate certain labels too closely together and (2) fail to focus on discriminative sections of the articles; both of which are important issues to address in future work. Both data and code are available on GitHub.

preprint2020arXiv

Logical Natural Language Generation from Open-Domain Tables

Neural natural language generation (NLG) models have recently shown remarkable progress in fluency and coherence. However, existing studies on neural NLG are primarily focused on surface-level realizations with limited emphasis on logical inference, an important aspect of human thinking and language. In this paper, we suggest a new NLG task where a model is tasked with generating natural language statements that can be \emph{logically entailed} by the facts in an open-domain semi-structured table. To facilitate the study of the proposed logical NLG problem, we use the existing TabFact dataset \cite{chen2019tabfact} featured with a wide range of logical/symbolic inferences as our testbed, and propose new automatic metrics to evaluate the fidelity of generation models w.r.t.\ logical inference. The new task poses challenges to the existing monotonic generation frameworks due to the mismatch between sequence order and logical order. In our experiments, we comprehensively survey different generation architectures (LSTM, Transformer, Pre-Trained LM) trained with different algorithms (RL, Adversarial Training, Coarse-to-Fine) on the dataset and made following observations: 1) Pre-Trained LM can significantly boost both the fluency and logical fidelity metrics, 2) RL and Adversarial Training are trading fluency for fidelity, 3) Coarse-to-Fine generation can help partially alleviate the fidelity issue while maintaining high language fluency. The code and data are available at \url{https://github.com/wenhuchen/LogicNLG}.

preprint2010arXiv

Visual-hint Boundary to Segment Algorithm for Image Segmentation

Image segmentation has been a very active research topic in image analysis area. Currently, most of the image segmentation algorithms are designed based on the idea that images are partitioned into a set of regions preserving homogeneous intra-regions and inhomogeneous inter-regions. However, human visual intuition does not always follow this pattern. A new image segmentation method named Visual-Hint Boundary to Segment (VHBS) is introduced, which is more consistent with human perceptions. VHBS abides by two visual hint rules based on human perceptions: (i) the global scale boundaries tend to be the real boundaries of the objects; (ii) two adjacent regions with quite different colors or textures tend to result in the real boundaries between them. It has been demonstrated by experiments that, compared with traditional image segmentation method, VHBS has better performance and also preserves higher computational efficiency.

Yu Su

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Intelligent Multimodal Multi-Sensor Fusion-Based UAV Identification, Localization, and Countermeasures for Safeguarding Low-Altitude Economy

Leveraging Latent Visual Reasoning in Silence

Bootstrapping a User-Centered Task-Oriented Dialogue System

Build Smart Grids on Artificial Intelligence -- A Real-world Example

Coherent excitation energy transfer in model photosynthetic reaction center: Effects of non-Markovian quantum environment

Electron transfer under the Floquet modulation in donor-bridge-acceptor systems

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases

Quality meets Diversity: A Model-Agnostic Framework for Computerized Adaptive Testing

Task-Oriented Dialogue as Dataflow Synthesis

Communication-Aware Scheduling of Precedence-Constrained Tasks on Related Machines

Document Classification for COVID-19 Literature

Logical Natural Language Generation from Open-Domain Tables

Visual-hint Boundary to Segment Algorithm for Image Segmentation