Source author record

Fang Zhao

Fang Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision Machine Learning Networking and Internet Architecture quant-ph Computation and Language Cryptography and Security cs.CY Information Theory math.IT Robotics

Catalog footprint

What is connected

13works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Lexicalized Constituency Parsing for Middle Dutch: Low-resource Training and Cross-Domain Generalization

Recent years have seen growing interest in applying neural networks and contextualized word embeddings to the parsing of historical languages. However, most advances have focused on dependency parsing, while constituency parsing for low-resource historical languages like Middle Dutch has received little attention. In this paper, we adapt a transformer-based constituency parser to Middle Dutch, a highly heterogeneous and low-resource language, and investigate methods to improve both its in-domain and cross-domain performance. We show that joint training with higher-resource auxiliary languages increases F1 scores by up to 0.73, with the greatest gains achieved from languages that are geographically and temporally closer to Middle Dutch. We further evaluate strategies for leveraging newly annotated data from additional domains, finding that fine-tuning and data combination yield comparable improvements, and our neural parser consistently outperforms the currently used PCFG-based parser for Middle Dutch. We further explore feature-separation techniques for domain adaptation and demonstrate that a minimum threshold of approximately 200 examples per domain is needed to effectively enhance cross-domain performance.

preprint2026arXiv

MediaClaw: Multimodal Intelligent-Agent Platform Technical Report

MediaClaw is a multimodal agent platform built on the OpenClaw ecosystem. Its core design follows a three-layer architecture of unified abstraction, pluginized extension, and workflow orchestration. The system is intended to address practical deployment pain points in AIGC adoption, including fragmented capabilities, heterogeneous interfaces, disconnected production processes, and limited reuse of high-quality production workflows. \system{} abstracts full-category AIGC capabilities into a unified invocation model, uses plugins to support hot-pluggable capability expansion, and uses task-oriented Skills to turn complex production processes into reusable workflow assets. This report focuses on the architectural design philosophy of MediaClaw, the design logic of its core capability model, and the key engineering trade-offs in implementation. It aims to provide reusable practical reference for building multimodal capability platforms.

preprint2025arXiv

OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding

3D visual grounding aims to locate objects based on natural language descriptions in 3D scenes. Existing methods rely on a pre-defined Object Lookup Table (OLT) to query Visual Language Models (VLMs) for reasoning about object locations, which limits the applications in scenarios with undefined or unforeseen targets. To address this problem, we present OpenGround, a novel zero-shot framework for open-world 3D visual grounding. Central to OpenGround is the Active Cognition-based Reasoning (ACR) module, which is designed to overcome the fundamental limitation of pre-defined OLTs by progressively augmenting the cognitive scope of VLMs. The ACR module performs human-like perception of the target via a cognitive task chain and actively reasons about contextually relevant objects, thereby extending VLM cognition through a dynamically updated OLT. This allows OpenGround to function with both pre-defined and open-world categories. We also propose a new dataset named OpenTarget, which contains over 7000 object-description pairs to evaluate our method in open-world scenarios. Extensive experiments demonstrate that OpenGround achieves competitive performance on Nr3D, state-of-the-art on ScanRefer, and delivers a substantial 17.6% improvement on OpenTarget. Project Page at https://why-102.github.io/openground.io/.

preprint2022arXiv

Efficient measurement of the time-dependent cavity field through compressed sensing

We propose a method based on compressed sensing (CS) to measure the evolution processes of the states of a driven cavity quantum electrodynamics system. In precisely reconstructing the coherent cavity field amplitudes, we have to prepare the same states repetitively and each time perform one measurement with short sampling intervals considering the quantum nature of measurement and the Nyquist-Shannon sampling theorem. However, with the help of CS, the number of measurements can be exponentially reduced without loss of the recovery accuracy. We use largely detuned atoms and control their interactions with the cavity field to modulate coherent state amplitudes according to the scheme encoded in the sensing matrix. The simulation results show that the CS method efficiently recovers the amplitudes of the coherent cavity field even in the presence of noise.

preprint2022arXiv

Fine-Grained Trajectory-based Travel Time Estimation for Multi-city Scenarios Based on Deep Meta-Learning

Travel Time Estimation (TTE) is indispensable in intelligent transportation system (ITS). It is significant to achieve the fine-grained Trajectory-based Travel Time Estimation (TTTE) for multi-city scenarios, namely to accurately estimate travel time of the given trajectory for multiple city scenarios. However, it faces great challenges due to complex factors including dynamic temporal dependencies and fine-grained spatial dependencies. To tackle these challenges, we propose a meta learning based framework, MetaTTE, to continuously provide accurate travel time estimation over time by leveraging well-designed deep neural network model called DED, which consists of Data preprocessing module and Encoder-Decoder network module. By introducing meta learning techniques, the generalization ability of MetaTTE is enhanced using small amount of examples, which opens up new opportunities to increase the potential of achieving consistent performance on TTTE when traffic conditions and road networks change over time in the future. The DED model adopts an encoder-decoder network to capture fine-grained spatial and temporal representations. Extensive experiments on two real-world datasets are conducted to confirm that our MetaTTE outperforms six state-of-art baselines, and improve 29.35% and 25.93% accuracy than the best baseline on Chengdu and Porto datasets, respectively.

preprint2022arXiv

Spatio-Temporal meets Wavelet: Disentangled Traffic Flow Forecasting via Efficient Spectral Graph Attention Network

Traffic forecasting is crucial for public safety and resource optimization, yet is very challenging due to three aspects: i) current existing works mostly exploit intricate temporal patterns (e.g., the short-term thunderstorm and long-term daily trends) within a single method, which fail to accurately capture spatio-temporal dependencies under different schemas; ii) the under-exploration of the graph positional encoding limit the extraction of spatial information in the commonly used full graph attention network; iii) the quadratic complexity of the full graph attention introduces heavy computational needs. To achieve the effective traffic flow forecasting, we propose an efficient spectral graph attention network with disentangled traffic sequences. Specifically, the discrete wavelet transform is leveraged to obtain the low- and high-frequency components of traffic sequences, and a dual-channel encoder is elaborately designed to accurately capture the spatio-temporal dependencies under long- and short-term schemas of the low- and high-frequency components. Moreover, a novel wavelet-based graph positional encoding and a query sampling strategy are introduced in our spectral graph attention to effectively guide message passing and efficiently calculate the attention. Extensive experiments on four real-world datasets show the superiority of our model, i.e., the higher traffic forecasting precision with lower computational cost.

preprint2022arXiv

The Charging Performance of Su-Schrieffer-Heeger Quantum Battery

The Su-Schrieffer-Heeger (SSH) model has recently received considerable attention in condensed matter because it describes a typical one-dimensional system with topological edge states. Here, we investigate SSH-based charging protocols of quantum batteries (QB) with N quantum cells. This SSH QB hopping interaction induced ground state splitting makes the different effects of the dimerize parameter to the QB in the different quantum phase region. In the non-splitting region, the dimerize parameter has little influence on the QB. Whereas the fully-splitting region, the dimerize parameter has a significantly quantum advantage to the energy and ergotropy in the ground state fully splitting region, which leads the dimerize spin couples will have larger occupations than other spins. Although we have enhanced energy and ergotropy by the dimerize parameter, QB's capacity will decrease.

preprint2016arXiv

Robust LSTM-Autoencoders for Face De-Occlusion in the Wild

Face recognition techniques have been developed significantly in recent years. However, recognizing faces with partial occlusion is still challenging for existing face recognizers which is heavily desired in real-world applications concerning surveillance and security. Although much research effort has been devoted to developing face de-occlusion methods, most of them can only work well under constrained conditions, such as all the faces are from a pre-defined closed set. In this paper, we propose a robust LSTM-Autoencoders (RLA) model to effectively restore partially occluded faces even in the wild. The RLA model consists of two LSTM components, which aims at occlusion-robust face encoding and recurrent occlusion removal respectively. The first one, named multi-scale spatial LSTM encoder, reads facial patches of various scales sequentially to output a latent representation, and occlusion-robustness is achieved owing to the fact that the influence of occlusion is only upon some of the patches. Receiving the representation learned by the encoder, the LSTM decoder with a dual channel architecture reconstructs the overall face and detects occlusion simultaneously, and by feat of LSTM, the decoder breaks down the task of face de-occlusion into restoring the occluded part step by step. Moreover, to minimize identify information loss and guarantee face recognition accuracy over recovered faces, we introduce an identity-preserving adversarial training scheme to further improve RLA. Extensive experiments on both synthetic and real datasets of faces with occlusion clearly demonstrate the effectiveness of our proposed RLA in removing different types of facial occlusion at various locations. The proposed method also provides significantly larger performance gain than other de-occlusion methods in promoting recognition performance over partially-occluded faces.

preprint2015arXiv

Activity recognition for a smartphone and web based travel survey

In transport modeling and prediction, trip purposes play an important role since mobility choices (e.g. modes, routes, departure times) are made in order to carry out specific activities. Activity based models, which have been gaining popularity in recent years, are built from a large number of observed trips and their purposes. However, data acquired through traditional interview-based travel surveys lack the accuracy and quantity required by such models. Smartphones and interactive web interfaces have emerged as an attractive alternative to conventional travel surveys. A smartphone-based travel survey, Future Mobility Survey (FMS), was developed and field-tested in Singapore and collected travel data from more than 1000 participants for multiple days. To provide a more intelligent interface, inferring the activities of a user at a certain location is a crucial challenge. This paper presents a learning model that infers the most likely activity associated to a certain visited place. The data collected in FMS contain errors or noise due to various reasons, so a robust approach via ensemble learning is used to improve generalization performance. Our model takes advantage of cross-user historical data as well as user-specific information, including socio-demographics. Our empirical results using FMS data demonstrate that the proposed method contributes significantly to our travel survey application.

preprint2015arXiv

Deep Semantic Ranking Based Hashing for Multi-Label Image Retrieval

With the rapid growth of web images, hashing has received increasing interests in large scale image retrieval. Research efforts have been devoted to learning compact binary codes that preserve semantic similarity based on labels. However, most of these hashing methods are designed to handle simple binary similarity. The complex multilevel semantic structure of images associated with multiple labels have not yet been well explored. Here we propose a deep semantic ranking based method for learning hash functions that preserve multilevel semantic similarity between multi-label images. In our approach, deep convolutional neural network is incorporated into hash functions to jointly learn feature representations and mappings from them to hash codes, which avoids the limitation of semantic representation power of hand-crafted features. Meanwhile, a ranking list that encodes the multilevel similarity information is employed to guide the learning of such deep hash functions. An effective scheme based on surrogate loss is used to solve the intractable optimization problem of nonsmooth and multivariate ranking measures involved in the learning procedure. Experimental results show the superiority of our proposed approach over several state-of-the-art hashing methods in term of ranking evaluation metrics when tested on multi-label image datasets.

preprint2012arXiv

A GPS Pseudorange Based Cooperative Vehicular Distance Measurement Technique

Accurate vehicular localization is important for various cooperative vehicle safety (CVS) applications such as collision avoidance, turning assistant, etc. In this paper, we propose a cooperative vehicular distance measurement technique based on the sharing of GPS pseudorange measurements and a weighted least squares method. The classic double difference pseudorange solution, which was originally designed for high-end survey level GPS systems, is adapted to low-end navigation level GPS receivers for its wide availability in ground vehicles. The Carrier to Noise Ratio (CNR) of raw pseudorange measurements are taken into account for noise mitigation. We present a Dedicated Short Range Communications (DSRC) based mechanism to implement the exchange of pseudorange information among neighboring vehicles. As demonstrated in field tests, our proposed technique increases the accuracy of the distance measurement significantly compared with the distance obtained from the GPS fixes.

preprint2009arXiv

Network Coding for Multi-Resolution Multicast

Multi-resolution codes enable multicast at different rates to different receivers, a setup that is often desirable for graphics or video streaming. We propose a simple, distributed, two-stage message passing algorithm to generate network codes for single-source multicast of multi-resolution codes. The goal of this "pushback algorithm" is to maximize the total rate achieved by all receivers, while guaranteeing decodability of the base layer at each receiver. By conducting pushback and code generation stages, this algorithm takes advantage of inter-layer as well as intra-layer coding. Numerical simulations show that in terms of total rate achieved, the pushback algorithm outperforms routing and intra-layer coding schemes, even with codeword sizes as small as 10 bits. In addition, the performance gap widens as the number of receivers and the number of nodes in the network increases. We also observe that naiive inter-layer coding schemes may perform worse than intra-layer schemes under certain network conditions.

preprint2009arXiv

On Counteracting Byzantine Attacks in Network Coded Peer-to-Peer Networks

Random linear network coding can be used in peer-to-peer networks to increase the efficiency of content distribution and distributed storage. However, these systems are particularly susceptible to Byzantine attacks. We quantify the impact of Byzantine attacks on the coded system by evaluating the probability that a receiver node fails to correctly recover a file. We show that even for a small probability of attack, the system fails with overwhelming probability. We then propose a novel signature scheme that allows packet-level Byzantine detection. This scheme allows one-hop containment of the contamination, and saves bandwidth by allowing nodes to detect and drop the contaminated packets. We compare the net cost of our signature scheme with various other Byzantine schemes, and show that when the probability of Byzantine attacks is high, our scheme is the most bandwidth efficient.

Fang Zhao

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Lexicalized Constituency Parsing for Middle Dutch: Low-resource Training and Cross-Domain Generalization

MediaClaw: Multimodal Intelligent-Agent Platform Technical Report

OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding

Efficient measurement of the time-dependent cavity field through compressed sensing

Fine-Grained Trajectory-based Travel Time Estimation for Multi-city Scenarios Based on Deep Meta-Learning

Spatio-Temporal meets Wavelet: Disentangled Traffic Flow Forecasting via Efficient Spectral Graph Attention Network

The Charging Performance of Su-Schrieffer-Heeger Quantum Battery

Robust LSTM-Autoencoders for Face De-Occlusion in the Wild

Activity recognition for a smartphone and web based travel survey

Deep Semantic Ranking Based Hashing for Multi-Label Image Retrieval

A GPS Pseudorange Based Cooperative Vehicular Distance Measurement Technique

Network Coding for Multi-Resolution Multicast

On Counteracting Byzantine Attacks in Network Coded Peer-to-Peer Networks