Source author record

Chen Luo

Chen Luo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Information Retrieval Artificial Intelligence Computation and Language Social and Information Networks Computer Vision cond-mat.mes-hall cond-mat.mtrl-sci eess.SP Hardware Architecture Machine Learning Multiagent Systems Performance physics.acc-ph physics.soc-ph Populations and Evolution Software Engineering

Catalog footprint

What is connected

14works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs

Training tool-calling agents requires large-scale trajectory data with verifiable labels, yet existing approaches either synthesize environments that diverge from real API behavior or generate tasks without ground-truth outcomes for verification. We present FireFly, a pipeline for generating verified tool-call data from real-world MCP servers. Our key insight is to invert the standard synthesis pipeline: rather than generating tasks and hoping they are solvable, we first let a strong LLM explore real APIs along graph-guided DAG structures, then synthesize tasks backward from observed outcomes, guaranteeing label correctness by construction. To handle the scale of real-world tool spaces (${\sim}$1,000 tools), we build a pairwise tool graph and sample sub-DAGs to focus exploration on semantically coherent workflows. To address environment drift in live APIs, we construct a retrieval-augmented simulator that caches all exploration results and replays them during training and evaluation, enabling fully offline and reproducible RL. Applying this pipeline yields 5,144 verified tasks spanning 240 servers and 993 tools. A 4B-parameter model trained with GRPO on FireFly matches Claude Sonnet 4.6 on our held-out test set and shows improvements on multiple tool-calling benchmarks including Tau2-Bench, MCPMark, and MCP-Atlas.

preprint2026arXiv

Intention Knowledge Graph Construction for User Intention Relation Modeling

Understanding user intentions is challenging for online platforms. Recent work on intention knowledge graphs addresses this but often lacks focus on connecting intentions, which is crucial for modeling user behavior and predicting future actions. This paper introduces a framework to automatically generate an intention knowledge graph, capturing connections between user intentions. Using the Amazon m2 dataset, we construct an intention graph with 351 million edges, demonstrating high plausibility and acceptance. Our model effectively predicts new session intentions and enhances product recommendations, outperforming previous state-of-the-art methods and showcasing the approach's practical utility.

preprint2026arXiv

Subspace Alignment for Vision-Language Model Test-time Adaptation

Vision-language models (VLMs), despite their extraordinary zero-shot capabilities, are vulnerable to distribution shifts. Test-time adaptation (TTA) emerges as a predominant strategy to adapt VLMs to unlabeled test data on the fly. However, existing TTA methods heavily rely on zero-shot predictions as pseudo-labels for self-training, which can be unreliable under distribution shifts and misguide adaptation due to two fundamental limitations. First (Modality Gap), distribution shifts induce gaps between visual and textual modalities, making cross-modal relations inaccurate. Second (Visual Nuisance), visual embeddings encode rich but task-irrelevant noise that often overwhelms task-specific semantics under distribution shifts. To address these limitations, we propose SubTTA, which aligns the semantic subspaces of both modalities to enhance zero-shot predictions to better guide the TTA process. To bridge the modality gap, SubTTA extracts the principal subspaces of both modalities and aligns the visual manifold to the textual semantic anchor by minimizing their chordal distance. To eliminate visual nuisance, SubTTA projects the aligned visual features onto the task-specific textual subspace, which filters out task-irrelevant noise by constraining visual embeddings within the valid semantic span, and standard TTA is further performed on the purified space to refine the decision boundaries. Extensive experiments on various benchmarks and VLM architectures demonstrate the effectiveness of SubTTA, yielding an average improvement of 2.24% over state-of-the-art TTA methods.

preprint2022arXiv

CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data

User sessions empower many search and recommendation tasks on a daily basis. Such session data are semi-structured, which encode heterogeneous relations between queries and products, and each item is described by the unstructured text. Despite recent advances in self-supervised learning for text or graphs, there lack of self-supervised learning models that can effectively capture both intra-item semantics and inter-item interactions for semi-structured sessions. To fill this gap, we propose CERES, a graph-based transformer model for semi-structured session data. CERES learns representations that capture both inter- and intra-item semantics with (1) a graph-conditioned masked language pretraining task that jointly learns from item text and item-item relations; and (2) a graph-conditioned transformer architecture that propagates inter-item contexts to item-level representations. We pretrained CERES using ~468 million Amazon sessions and find that CERES outperforms strong pretraining baselines by up to 9% in three session search and entity linking tasks.

preprint2020arXiv

Breaking Down Memory Walls: Adaptive Memory Management in LSM-based Storage Systems (Extended Version)

Log-Structured Merge-trees (LSM-trees) have been widely used in modern NoSQL systems. Due to their out-of-place update design, LSM-trees have introduced memory walls among the memory components of multiple LSM-trees and between the write memory and the buffer cache. Optimal memory allocation among these regions is non-trivial because it is highly workload-dependent. Existing LSM-tree implementations instead adopt static memory allocation schemes due to their simplicity and robustness, sacrificing performance. In this paper, we attempt to break down these memory walls in LSM-based storage systems. We first present a memory management architecture that enables adaptive memory management. We then present a partitioned memory component structure with new flush policies to better exploit the write memory to minimize the write cost. To break down the memory wall between the write memory and the buffer cache, we further introduce a memory tuner that tunes the memory allocation between these two regions. We have conducted extensive experiments in the context of Apache AsterixDB using the YCSB and TPC-C benchmarks and we present the results here.

preprint2020arXiv

Efficiently Reclaiming Space in a Log Structured Store

A log structured store uses a single write I/O for a number of diverse and non-contiguous pages within a large buffer instead of using a write I/O for each page separately. This requires that pages be relocated on every write, because pages are never updated in place. Instead, pages are dynamically remapped on every write. Log structuring was invented for and used initially in file systems. Today, a form of log structuring is used in SSD controllers because an SSD requires the erasure of a large block of pages before flash storage can be reused. No update-in-place requires that the storage for out-of-date pages be reclaimed (garbage collected or "cleaned"). We analyze cleaning performance and introduce a cleaning strategy that uses a new way to prioritize the order in which stale pages are garbage collected. Our cleaning strategy approximates an "optimal cleaning strategy". Simulation studies confirm the results of the analysis. This strategy is a significant improvement over previous cleaning strategies.

preprint2020arXiv

On Performance Stability in LSM-based Storage Systems (Extended Version)

The Log-Structured Merge-Tree (LSM-tree) has been widely adopted for use in modern NoSQL systems for its superior write performance. Despite the popularity of LSM-trees, they have been criticized for suffering from write stalls and large performance variances due to the inherent mismatch between their fast in-memory writes and slow background I/O operations. In this paper, we use a simple yet effective two-phase experimental approach to evaluate write stalls for various LSM-tree designs. We further explore the design choices of LSM merge schedulers to minimize write stalls given an I/O bandwidth budget. We have conducted extensive experiments in the context of the Apache AsterixDB system and we present the results here.

preprint2020arXiv

Simulation of Real-time Routing for UAS traffic Management with Communication and Airspace Safety Considerations

Small Unmanned Aircraft Systems (sUAS) will be an important component of the smart city and intelligent transportation environments of the near future. The demand for sUAS related applications, such as commercial delivery and land surveying, is expected to grow rapidly in next few years. In general, sUAS traffic routing and management functions are needed to coordinate the launching of sUAS from different launch sites and determine their trajectories to avoid conflict while considering several other constraints such as expected arrival time, minimum flight energy, and availability of communication resources. However, as the airborne sUAS density grows in a certain area, it is difficult to foresee the potential airspace and communications resource conflicts and make immediate decisions to avoid them. To address this challenge, we present a temporal and spatial routing algorithm and simulation platform for sUAS trajectory management in a high density urban area that plans sUAS movements in a spatial and temporal maze taking into account obstacles that are either static or dynamic in time. The routing allows the sUAS to avoid static no-fly areas (i.e. static obstacles) or other in-flight sUAS and areas that have congested communication resources (i.e. dynamic obstacles). The algorithm is evaluated using an agent-based simulation platform. The simulation results show that the proposed algorithm outperforms other route management algorithms in many areas, especially in processing speed and memory efficiency. Detailed comparisons are provided for the sUAS flight time, the overall throughput, conflict rate and communication resource utilization. The results demonstrate that our proposed algorithm can be used to address the airspace and communication resource utilization needs for a next generation smart city and smart transportation.

preprint2020arXiv

Structural Temporal Graph Neural Networks for Anomaly Detection in Dynamic Graphs

Detecting anomalies in dynamic graphs is a vital task, with numerous practical applications in areas such as security, finance, and social media. Previous network embedding based methods have been mostly focusing on learning good node representations, whereas largely ignoring the subgraph structural changes related to the target nodes in dynamic graphs. In this paper, we propose StrGNN, an end-to-end structural temporal Graph Neural Network model for detecting anomalous edges in dynamic graphs. In particular, we first extract the $h$-hop enclosing subgraph centered on the target edge and propose the node labeling function to identify the role of each node in the subgraph. Then, we leverage graph convolution operation and Sortpooling layer to extract the fixed-size feature from each snapshot/timestamp. Based on the extracted features, we utilize Gated recurrent units (GRUs) to capture the temporal information for anomaly detection. Extensive experiments on six benchmark datasets and a real enterprise security system demonstrate the effectiveness of StrGNN.

preprint2020arXiv

Using Reports of Own and Others' Symptoms and Diagnosis on Social Media to Predict COVID-19 Case Counts: Observational Infoveillance Study in Mainland China

Can public social media data be harnessed to predict COVID-19 case counts? We analyzed approximately 15 million COVID-19 related posts on Weibo, a popular Twitter-like social media platform in China, from November 1, 2019 to March 31, 2020. We developed a machine learning classifier to identify "sick posts," which are reports of one's own and other people's symptoms and diagnosis related to COVID-19. We then modeled the predictive power of sick posts and other COVID-19 posts on daily case counts. We found that reports of symptoms and diagnosis of COVID-19 significantly predicted daily case counts, up to 14 days ahead of official statistics. But other COVID-19 posts did not have similar predictive power. For a subset of geotagged posts (3.10% of all retrieved posts), we found that the predictive pattern held true for both Hubei province and the rest of mainland China, regardless of unequal distribution of healthcare resources and outbreak timeline. Researchers and disease control agencies should pay close attention to the social media infosphere regarding COVID-19. On top of monitoring overall search and posting activities, it is crucial to sift through the contents and efficiently identify true signals from noise.

preprint2019arXiv

Anomalous and topological Hall effects in epitaxial thin films of the noncollinear antiferromagnet Mn$_{3}$Sn

Noncollinear antiferromagnets with a D0$_{19}$ (space group = 194, P6$_{3}$/mmc) hexagonal structure have garnered much attention for their potential applications in topological spintronics. Here, we report the deposition of continuous epitaxial thin films of such a material, Mn$_{3}$Sn, and characterize their crystal structure using a combination of x-ray diffraction and transmission electron microscopy. Growth of Mn$_{3}$Sn films with both (0001) c-axis orientation and (40$\bar{4}$3) texture is achieved. In the latter case, the thin films exhibit a small uncompensated Mn moment in the basal plane, quantified via magnetometry and x-ray magnetic circular dichroism experiments. This cannot account for the large anomalous Hall effect simultaneously observed in these films, even at room temperature, with magnitude $σ_{\mathrm{xy}}$ ($μ_{0}H$ = 0 T) = 21 $\mathrmΩ^{-1}\mathrm{cm}^{-1}$ and coercive field $μ_{0}H_{\mathrm{C}}$ = 1.3 T. We attribute the origin of this anomalous Hall effect to momentum-space Berry curvature arising from the symmetry-breaking inverse triangular spin structure of Mn$_{3}$Sn. Upon cooling through the transition to a glassy ferromagnetic state at around 50 K, a peak in the Hall resistivity close to the coercive field indicates the onset of a topological Hall effect contribution, due to the emergence of a scalar spin chirality generating a real-space Berry phase. We demonstrate that the polarity of this topological Hall effect, and hence the chiral-nature of the noncoplanar magnetic structure driving it, can be controlled using different field cooling conditions.

preprint2016arXiv

SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series

Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.

preprint2014arXiv

Hete-CF: Social-Based Collaborative Filtering Recommendation using Heterogeneous Relations

Collaborative filtering algorithms haven been widely used in recommender systems. However, they often suffer from the data sparsity and cold start problems. With the increasing popularity of social media, these problems may be solved by using social-based recommendation. Social-based recommendation, as an emerging research area, uses social information to help mitigate the data sparsity and cold start problems, and it has been demonstrated that the social-based recommendation algorithms can efficiently improve the recommendation performance. However, few of the existing algorithms have considered using multiple types of relations within one social network. In this paper, we investigate the social-based recommendation algorithms on heterogeneous social networks and proposed Hete-CF, a Social Collaborative Filtering algorithm using heterogeneous relations. Distinct from the exiting methods, Hete-CF can effectively utilize multiple types of relations in a heterogeneous social network. In addition, Hete-CF is a general approach and can be used in arbitrary social networks, including event based social networks, location based social networks, and any other types of heterogeneous information networks associated with social information. The experimental results on two real-world data sets, DBLP (a typical heterogeneous information network) and Meetup (a typical event based social network) show the effectiveness and efficiency of our algorithm.

preprint2014arXiv

Studies of LL-type 500MHz 5-cell superconducting cavity at SINAP

A low loss (LL) type 500 MHz 5-cell superconducting niobium prototype cavity with large beam aperture has been developed successfully including the optimization, the deep drawing and electron beam welding, the surface treatment and the vertical testing. The performance of the fundamental mode was optimized and the higher order modes were damped by adopting an enlarged beam pipe for propagation. Surface preparation or treatment including mechanical polishing, buffered chemical polishing and high pressure rinsing with ultra-pure water and so on was carried out carefully to promise a perfect inner surface condition. The vertical testing results show that the accelerating voltage higher than 7.5 MV was obtained while the quality factor was better than 1E9 at 4.2 K. No obvious multipacting or field emission was found during the test. However, a quench happened while increasing the field a little higher than 7.5 MV that at present limited the cavity performance.

Chen Luo

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs

Intention Knowledge Graph Construction for User Intention Relation Modeling

Subspace Alignment for Vision-Language Model Test-time Adaptation

CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data

Breaking Down Memory Walls: Adaptive Memory Management in LSM-based Storage Systems (Extended Version)

Efficiently Reclaiming Space in a Log Structured Store

On Performance Stability in LSM-based Storage Systems (Extended Version)

Simulation of Real-time Routing for UAS traffic Management with Communication and Airspace Safety Considerations

Structural Temporal Graph Neural Networks for Anomaly Detection in Dynamic Graphs

Using Reports of Own and Others' Symptoms and Diagnosis on Social Media to Predict COVID-19 Case Counts: Observational Infoveillance Study in Mainland China

Anomalous and topological Hall effects in epitaxial thin films of the noncollinear antiferromagnet Mn$_{3}$Sn

SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series

Hete-CF: Social-Based Collaborative Filtering Recommendation using Heterogeneous Relations

Studies of LL-type 500MHz 5-cell superconducting cavity at SINAP