Source author record

Yanyan Shen

Yanyan Shen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Databases Artificial Intelligence eess.IV Information Retrieval Distributed, Parallel, and Cluster Computing Information Theory math.IT Networking and Internet Architecture Operating Systems Social and Information Networks

Catalog footprint

What is connected

11works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

RelServe: Fast LLM Inference Serving on Relational Data

The use of Large Language Models (LLMs) for querying relational data has given rise to relQuery, a workload pattern that applies templated LLM calls to structured tables. As relQuery services become more widely adopted in applications such as AI-powered spreadsheets, fast response times under concurrent query loads are increasingly important. Unfortunately, current LLM engines face severe latency bottlenecks from Head-of-Line (HoL) blocking across three comparable inference phases: waiting, core running, and tail running. Existing static priority scheduling methods only address HoL blocking during the waiting phase, leaving two critical problems unsolved. First, the absence of a priority update mechanism causes inaccurate prioritization and continued HoL blocking during core execution. Second, suboptimal prefill-decode batching exacerbates HoL blocking in tail execution and worsens latency trade-offs between running and waiting relQueries. To address these problems, we propose RelServe, an optimized LLM engine for low-latency relQuery serving. RelServe features two core innovations: a Dynamic Priority Updater that continuously adjusts priorities while minimizing overhead via statistical approximations, and an Adaptive Batch Arranger that quantitatively evaluates candidate prefill and decode batches to minimize projected average latency. Extensive experiments on four real-world datasets using LLMs ranging from 13B to 70B parameters show that RelServe reduces average serving latency by up to 3.1x compared to vLLM.

preprint2022arXiv

Dynamic Community Detection via Adversarial Temporal Graph Representation Learning

Dynamic community detection has been prospered as a powerful tool for quantifying changes in dynamic brain network connectivity patterns by identifying strongly connected sets of nodes. However, as the network science problems and network data to be processed become gradually more sophisticated, it awaits a better method to efficiently learn low dimensional representation from dynamic network data and reveal its latent function that changes over time in the brain network. In this work, an adversarial temporal graph representation learning (ATGRL) framework is proposed to detect dynamic communities from a small sample of brain network data. It adopts a novel temporal graph attention network as an encoder to capture more efficient spatio-temporal features by attention mechanism in both spatial and temporal dimensions. In addition, the framework employs adversarial training to guide the learning of temporal graph representation and optimize the measurable modularity loss to maximize the modularity of community. Experiments on the real-world brain networks datasets are demonstrated to show the effectiveness of this new method.

preprint2020arXiv

Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions

Various factorization-based methods have been proposed to leverage second-order, or higher-order cross features for boosting the performance of predictive models. They generally enumerate all the cross features under a predefined maximum order, and then identify useful feature interactions through model training, which suffer from two drawbacks. First, they have to make a trade-off between the expressiveness of higher-order cross features and the computational cost, resulting in suboptimal predictions. Second, enumerating all the cross features, including irrelevant ones, may introduce noisy feature combinations that degrade model performance. In this work, we propose the Adaptive Factorization Network (AFN), a new model that learns arbitrary-order cross features adaptively from data. The core of AFN is a logarithmic transformation layer to convert the power of each feature in a feature combination into the coefficient to be learned. The experimental results on four real datasets demonstrate the superior predictive performance of AFN against the start-of-the-arts.

preprint2020arXiv

Bidirectional Mapping Generative Adversarial Networks for Brain MR to PET Synthesis

Fusing multi-modality medical images, such as MR and PET, can provide various anatomical or functional information about human body. But PET data is always unavailable due to different reasons such as cost, radiation, or other limitations. In this paper, we propose a 3D end-to-end synthesis network, called Bidirectional Mapping Generative Adversarial Networks (BMGAN), where image contexts and latent vector are effectively used and jointly optimized for brain MR-to-PET synthesis. Concretely, a bidirectional mapping mechanism is designed to embed the semantic information of PET images into the high dimensional latent space. And the 3D DenseU-Net generator architecture and the extensive objective functions are further utilized to improve the visual quality of synthetic results. The most appealing part is that the proposed method can synthesize the perceptually realistic PET images while preserving the diverse brain structures of different subjects. Experimental results demonstrate that the performance of the proposed method outperforms other competitive cross-modality synthesis methods in terms of quantitative measures, qualitative displays, and classification evaluation.

preprint2020arXiv

DDSL: Efficient Subgraph Listing on Distributed and Dynamic Graphs

Subgraph listing is a fundamental problem in graph theory and has wide applications in areas like sociology, chemistry, and social networks. Modern graphs can usually be large-scale as well as highly dynamic, which challenges the efficiency of existing subgraph listing algorithms. Recent works have shown the benefits of partitioning and processing big graphs in a distributed system, however, there is only few work targets subgraph listing on dynamic graphs in a distributed environment. In this paper, we propose an efficient approach, called Distributed and Dynamic Subgraph Listing (DDSL), which can incrementally update the results instead of running from scratch. DDSL follows a general distributed join framework. In this framework, we use a Neighbor-Preserved storage for data graphs, which takes bounded extra space and supports dynamic updating. After that, we propose a comprehensive cost model to estimate the I/O cost of listing subgraphs. Then based on this cost model, we develop an algorithm to find the optimal join tree for a given pattern. To handle dynamic graphs, we propose an efficient left-deep join algorithm to incrementally update the join results. Extensive experiments are conducted on real-world datasets. The results show that DDSL outperforms existing methods in dealing with both static dynamic graphs in terms of the responding time.

preprint2020arXiv

Differentiable Neural Input Search for Recommender Systems

Latent factor models are the driving forces of the state-of-the-art recommender systems, with an important insight of vectorizing raw input features into dense embeddings. The dimensions of different feature embeddings are often set to a same value empirically, which limits the predictive performance of latent factor models. Existing works have proposed heuristic or reinforcement learning-based methods to search for mixed feature embedding dimensions. For efficiency concern, these methods typically choose embedding dimensions from a restricted set of candidate dimensions. However, this restriction will hurt the flexibility of dimension selection, leading to suboptimal performance of search results. In this paper, we propose Differentiable Neural Input Search (DNIS), a method that searches for mixed feature embedding dimensions in a more flexible space through continuous relaxation and differentiable optimization. The key idea is to introduce a soft selection layer that controls the significance of each embedding dimension, and optimize this layer according to model's validation performance. DNIS is model-agnostic and thus can be seamlessly incorporated with existing latent factor models for recommendation. We conduct experiments with various architectures of latent factor models on three public real-world datasets for rating prediction, Click-Through-Rate (CTR) prediction, and top-k item recommendation. The results demonstrate that our method achieves the best predictive performance compared with existing neural input search approaches with fewer embedding parameters and less time cost.

preprint2020arXiv

Tensorizing GAN with High-Order Pooling for Alzheimer's Disease Assessment

It is of great significance to apply deep learning for the early diagnosis of Alzheimer's Disease (AD). In this work, a novel tensorizing GAN with high-order pooling is proposed to assess Mild Cognitive Impairment (MCI) and AD. By tensorizing a three-player cooperative game based framework, the proposed model can benefit from the structural information of the brain. By incorporating the high-order pooling scheme into the classifier, the proposed model can make full use of the second-order statistics of the holistic Magnetic Resonance Imaging (MRI) images. To the best of our knowledge, the proposed Tensor-train, High-pooling and Semi-supervised learning based GAN (THS-GAN) is the first work to deal with classification on MRI images for AD diagnosis. Extensive experimental results on Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset are reported to demonstrate that the proposed THS-GAN achieves superior performance compared with existing methods, and to show that both tensor-train and high-order pooling can enhance classification performance. The visualization of generated samples also shows that the proposed model can generate plausible samples for semi-supervised learning purpose.

preprint2016arXiv

An Evaluation of Coarse-Grained Locking for Multicore Microkernels

The trade-off between coarse- and fine-grained locking is a well understood issue in operating systems. Coarse-grained locking provides lower overhead under low contention, fine-grained locking provides higher scalability under contention, though at the expense of implementation complexity and re- duced best-case performance. We revisit this trade-off in the context of microkernels and tightly-coupled cores with shared caches and low inter-core migration latencies. We evaluate performance on two architectures: x86 and ARM MPCore, in the former case also utilising transactional memory (Intel TSX). Our thesis is that on such hardware, a well-designed microkernel, with short system calls, can take advantage of coarse-grained locking on modern hardware, avoid the run-time and complexity cost of multiple locks, enable formal verification, and still achieve scalability comparable to fine-grained locking.

preprint2016arXiv

Energy Efficient Resource Allocation for Time-Varying OFDMA Relay Systems with Hybrid Energy Supplies

This paper investigates the energy efficient resource allocation for orthogonal frequency division multiple access (OFDMA) relay systems, where the system is supplied by the conventional utility grid and a renewable energy generator equipped with a storage device. The optimal usage of radio resource depends on the characteristics of the renewable energy generation and the mobile traffic, which exhibit both temporal and spatial diversities. Lyapunov optimization method is used to decompose the problem into the joint flow control, radio resource allocation and energy management without knowing a priori knowledge of system statistics. It is proven that the proposed algorithm can result in close-to-optimal performance with capacity limited data buffer and storage device. Simulation results show that the flexible tradeoff between the system utility and the conventional energy consumption can be achieved. Compared with other schemes, the proposed algorithm demonstrates better performance.

preprint2012arXiv

CDAS: A Crowdsourcing Data Analytics System

Some complex problems, such as image tagging and natural language processing, are very challenging for computers, where even state-of-the-art technology is yet able to provide satisfactory accuracy. Therefore, rather than relying solely on developing new and better algorithms to handle such tasks, we look to the crowdsourcing solution -- employing human participation -- to make good the shortfall in current technology. Crowdsourcing is a good supplement to many computer tasks. A complex job may be divided into computer-oriented tasks and human-oriented tasks, which are then assigned to machines and humans respectively. To leverage the power of crowdsourcing, we design and implement a Crowdsourcing Data Analytics System, CDAS. CDAS is a framework designed to support the deployment of various crowdsourcing applications. The core part of CDAS is a quality-sensitive answering model, which guides the crowdsourcing engine to process and monitor the human tasks. In this paper, we introduce the principles of our quality-sensitive model. To satisfy user required accuracy, the model guides the crowdsourcing query engine for the design and processing of the corresponding crowdsourcing jobs. It provides an estimated accuracy for each generated result based on the human workers' historical performances. When verifying the quality of the result, the model employs an online strategy to reduce waiting time. To show the effectiveness of the model, we implement and deploy two analytics jobs on CDAS, a twitter sentiment analytics job and an image tagging job. We use real Twitter and Flickr data as our queries respectively. We compare our approaches with state-of-the-art classification and image annotation techniques. The results show that the human-assisted methods can indeed achieve a much higher accuracy. By embedding the quality-sensitive model into crowdsourcing query engine, we effectiv...[truncated].

preprint2012arXiv

Efficient Processing of k Nearest Neighbor Joins using MapReduce

k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the join operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centralized machine efficiently. In this paper, we investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. In brief, the mappers cluster objects into groups; the reducers perform the kNN join on each group of objects separately. We design an effective mapping mechanism that exploits pruning rules for distance filtering, and hence reduces both the shuffling and computational costs. To reduce the shuffling cost, we propose two approximate algorithms to minimize the number of replicas. Extensive experiments on our in-house cluster demonstrate that our proposed methods are efficient, robust and scalable.

Yanyan Shen

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

RelServe: Fast LLM Inference Serving on Relational Data

Dynamic Community Detection via Adversarial Temporal Graph Representation Learning

Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions

Bidirectional Mapping Generative Adversarial Networks for Brain MR to PET Synthesis

DDSL: Efficient Subgraph Listing on Distributed and Dynamic Graphs

Differentiable Neural Input Search for Recommender Systems

Tensorizing GAN with High-Order Pooling for Alzheimer's Disease Assessment

An Evaluation of Coarse-Grained Locking for Multicore Microkernels

Energy Efficient Resource Allocation for Time-Varying OFDMA Relay Systems with Hybrid Energy Supplies

CDAS: A Crowdsourcing Data Analytics System

Efficient Processing of k Nearest Neighbor Joins using MapReduce