Source author record

Minxian Xu

Minxian Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Artificial Intelligence

Catalog footprint

What is connected

13works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving

Large language models (LLMs) have become increasingly popular in various areas, traditional business gradually shifting from rule-based systems to LLM-based solutions. However, the inference of LLMs is resource-intensive or latency-sensitive, posing significant challenges for serving systems. Existing LLM serving systems often use static or continuous batching strategies, which can lead to inefficient GPU memory utilization and increased latency, especially under heterogeneous workloads. These methods may also struggle to adapt to dynamic workload fluctuations, resulting in suboptimal throughput and potential service level objective (SLO) violations. In this paper, we introduce BucketServe, a bucket-based dynamic batching framework designed to optimize LLM inference performance. By grouping requests into size-homogeneous buckets based on sequence length, BucketServe minimizes padding overhead and optimizes GPU memory usage through real-time batch size adjustments preventing out-of-memory (OOM) errors. It introduces adaptive bucket splitting/merging and priority-aware scheduling to mitigate resource fragmentation and ensure SLO compliance. Experiment shows that BucketServe significantly outperforms UELLM in throughput, achieving up to 3.58x improvement. It can also handle 1.93x more request load under the SLO attainment of 80% compared with DistServe and demonstrates 1.975x higher system load capacity compared to the UELLM.

preprint2025arXiv

CloudNativeSim: a toolkit for modeling and simulation of cloud-native applications

Cloud-native applications are increasingly becoming popular in modern software design. Employing a microservice-based architecture into these applications is a prevalent strategy that enhances system availability and flexibility. However, cloud-native applications also introduce new challenges, such as frequent inter-service communication and the complexity of managing heterogeneous codebases and hardware, resulting in unpredictable complexity and dynamism. Furthermore, as applications scale, only limited research teams or enterprises possess the resources for large-scale deployment and testing, which impedes progress in the cloud-native domain. To address these challenges, we propose CloudNativeSim, a simulator for cloud-native applications with a microservice-based architecture. CloudNativeSim offers several key benefits: (i) comprehensive and dynamic modeling for cloud-native applications, (ii) an extended simulation framework with new policy interfaces for scheduling cloud-native applications, and (iii) support for customized application scenarios and user feedback based on Quality of Service (QoS) metrics. CloudNativeSim can be easily deployed on standard computers to manage a high volume of requests and services. Its performance was validated through a case study, demonstrating higher than 94.5% accuracy in terms of response time. The study further highlights the feasibility of CloudNativeSim by illustrating the effects of various scaling policies.

preprint2024arXiv

StatuScale: Status-aware and Elastic Scaling Strategy for Microservice Applications

Microservice architecture has transformed traditional monolithic applications into lightweight components. Scaling these lightweight microservices is more efficient than scaling servers. However, scaling microservices still faces the challenges resulted from the unexpected spikes or bursts of requests, which are difficult to detect and can degrade performance instantaneously. To address this challenge and ensure the performance of microservice-based applications, we propose a status-aware and elastic scaling framework called StatuScale, which is based on load status detector that can select appropriate elastic scaling strategies for differentiated resource scheduling in vertical scaling. Additionally, StatuScale employs a horizontal scaling controller that utilizes comprehensive evaluation and resource reduction to manage the number of replicas for each microservice. We also present a novel metric named correlation factor to evaluate the resource usage efficiency. Finally, we use Kubernetes, an open-source container orchestration and management platform, and realistic traces from Alibaba to validate our approach. The experimental results have demonstrated that the proposed framework can reduce the average response time in the Sock-Shop application by 8.59% to 12.34%, and in the Hotel-Reservation application by 7.30% to 11.97%, decrease service level objective violations, and offer better performance in resource usage compared to baselines.

preprint2022arXiv

AI for Next Generation Computing: Emerging Trends and Future Directions

Autonomic computing investigates how systems can achieve (user) specified control outcomes on their own, without the intervention of a human operator. Autonomic computing fundamentals have been substantially influenced by those of control theory for closed and open-loop systems. In practice, complex systems may exhibit a number of concurrent and inter-dependent control loops. Despite research into autonomic models for managing computer resources, ranging from individual resources (e.g., web servers) to a resource ensemble (e.g., multiple resources within a data center), research into integrating Artificial Intelligence (AI) and Machine Learning (ML) to improve resource autonomy and performance at scale continues to be a fundamental challenge. The integration of AI/ML to achieve such autonomic and self-management of systems can be achieved at different levels of granularity, from full to human-in-the-loop automation. In this article, leading academics, researchers, practitioners, engineers, and scientists in the fields of cloud computing, AI/ML, and quantum computing join to discuss current research and potential future directions for these fields. Further, we discuss challenges and opportunities for leveraging AI and ML in next generation computing for emerging computing paradigms, including cloud, fog, edge, serverless and quantum computing environments.

preprint2022arXiv

EsDNN: Deep Neural Network based Multivariate Workload Prediction Approach in Cloud Environment

Cloud computing has been regarded as a successful paradigm for IT industry by providing benefits for both service providers and customers. In spite of the advantages, cloud computing also suffers from distinct challenges, and one of them is the inefficient resource provisioning for dynamic workloads. Accurate workload predictions for cloud computing can support efficient resource provisioning and avoid resource wastage. However, due to the high-dimensional and high-variable features of cloud workloads, it is difficult to predict the workloads effectively and accurately. The current dominant work for cloud workload prediction is based on regression approaches or recurrent neural networks, which fail to capture the long-term variance of workloads. To address the challenges and overcome the limitations of existing works, we proposed an efficient supervised learning-based Deep Neural Network (esDNN}) approach for cloud workload prediction. Firstly, we utilize a sliding window to convert the multivariate data into supervised learning time series that allow deep learning for processing. Then we apply a revised Gated Recurrent Unit (GRU) to achieve accurate prediction. To show the effectiveness of esDNN, we also conduct comprehensive experiments based on realistic traces derived from Alibaba and Google cloud data centers. The experimental results demonstrate that esDNN can accurately and efficiently predict cloud workloads. Compared with the state-of-the-art baselines, esDNN can reduce the mean square errors significantly, e.g. 15% than the approach using GRU only. We also apply esDNN for machines auto-scaling, which illustrates that esDNN can reduce the number of active hosts efficiently, thus the costs of service providers can be optimized.

preprint2020arXiv

A Self-adaptive Approach for Managing Applications and Harnessing Renewable Energy for Sustainable Cloud Computing

Rapid adoption of Cloud computing for hosting services and its success is primarily attributed to its attractive features such as elasticity, availability and pay-as-you-go pricing model. However, the huge amount of energy consumed by cloud data centers makes it to be one of the fastest growing sources of carbon emissions. Approaches for improving the energy efficiency include enhancing the resource utilization to reduce resource wastage and applying the renewable energy as the energy supply. This work aims to reduce the carbon footprint of the data centers by reducing the usage of brown energy and maximizing the usage of renewable energy. Taking advantage of microservices and renewable energy, we propose a self-adaptive approach for the resource management of interactive workloads and batch workloads. To ensure the quality of service of workloads, a brownout-based algorithm for interactive workloads and a deferring algorithm for batch workloads are proposed. We have implemented the proposed approach in a prototype system and evaluated it with web services under real traces. The results illustrate our approach can reduce the brown energy usage by 21% and improve the renewable energy usage by 10%.

preprint2020arXiv

Energy Efficient Algorithms based on VM Consolidation for Cloud Computing: Comparisons and Evaluations

Cloud Computing paradigm has revolutionized IT industry and be able to offer computing as the fifth utility. With the pay-as-you-go model, cloud computing enables to offer the resources dynamically for customers anytime. Drawing the attention from both academia and industry, cloud computing is viewed as one of the backbones of the modern economy. However, the high energy consumption of cloud data centers contributes to high operational costs and carbon emission to the environment. Therefore, Green cloud computing is required to ensure energy efficiency and sustainability, which can be achieved via energy efficient techniques. One of the dominant approaches is to apply energy efficient algorithms to optimize resource usage and energy consumption. Currently, various virtual machine consolidation-based energy efficient algorithms have been proposed to reduce the energy of cloud computing environment. However, most of them are not compared comprehensively under the same scenario, and their performance is not evaluated with the same experimental settings. This makes users hard to select the appropriate algorithm for their objectives. To provide insights for existing energy efficient algorithms and help researchers to choose the most suitable algorithm, in this paper, we compare several state-of-the-art energy efficient algorithms in depth from multiple perspectives, including architecture, modelling and metrics. In addition, we also implement and evaluate these algorithms with the same experimental settings in CloudSim toolkit. The experimental results show the performance comparison of these algorithms with comprehensive results. Finally, detailed discussions of these algorithms are provided.

preprint2020arXiv

Green-aware Mobile Edge Computing for IoT: Challenges, Solutions and Future Directions

The development of Internet of Things (IoT) technology enables the rapid growth of connected smart devices and mobile applications. However, due to the constrained resources and limited battery capacity, there are bottlenecks when utilizing the smart devices. Mobile edge computing (MEC) offers an attractive paradigm to handle this challenge. In this work, we concentrate on the MEC application for IoT and deal with the energy saving objective via offloading workloads between cloud and edge. In this regard, we firstly identify the energy-related challenges in MEC. Then we present a green-aware framework for MEC to address the energy-related challenges, and provide a generic model formulation for the green MEC. We also discuss some state-of-the-art workloads offloading approaches to achieve green IoT and compare them in comprehensive perspectives. Finally, some future research directions related to energy efficiency in MEC are given.

preprint2019arXiv

Transformative effects of IoT, Blockchain and Artificial Intelligence on cloud computing: Evolution, vision, trends and open challenges

Cloud computing plays a critical role in modern society and enables a range of applications from infrastructure to social media. Such system must cope with varying load and evolving usage reflecting societies interaction and dependency on automated computing systems whilst satisfying Quality of Service (QoS) guarantees. Enabling these systems are a cohort of conceptual technologies, synthesized to meet demand of evolving computing applications. In order to understand current and future challenges of such system, there is a need to identify key technologies enabling future applications. In this study, we aim to explore how three emerging paradigms (Blockchain, IoT and Artificial Intelligence) will influence future cloud computing systems. Further, we identify several technologies driving these paradigms and invite international experts to discuss the current status and future directions of cloud computing. Finally, we proposed a conceptual model for cloud futurology to explore the influence of emerging paradigms and technologies on evolution of cloud computing.

preprint2015arXiv

CloudSimNFV: Modeling and Simulation of Energy-Efficient NFV in Cloud Data Centers

Network Function Virtualization (NFV) takes advantage of hardware virtualization to undertake software processing for various functions, and complements the drawbacks of traditional network technology. To speed up NFV related research, we need a user friendly and easy to use research tool, which could support data center simulation, scheduling algorithms implementation and extension, and provide energy consumption simulation. As a cloud simulation toolkit, CloudSim has strong extendibility that could be extended to simulate NFV environment. This paper introduces a NFV cloud framework based on CloudSim and an energy consumption model based on multi-dimensional extension, implementing a toolkit named ClousimNFV to simulate the NFV scenario, proposing several scheduling algorithm based on for NFV applications. The toolkit validation and algorithm performance comparison are also given.

preprint2015arXiv

FlexCloud: A Flexible and Extendible Simulator for Performance Evaluation of Virtual Machine Allocation

Cloud Data centers aim to provide reliable, sustainable and scalable services for all kinds of applications. Resource scheduling is one of keys to cloud services. To model and evaluate different scheduling policies and algorithms, we propose FlexCloud, a flexible and scalable simulator that enables users to simulate the process of initializing cloud data centers, allocating virtual machine requests and providing performance evaluation for various scheduling algorithms. FlexCloud can be run on a single computer with JVM to simulate large scale cloud environments with focus on infrastructure as a service; adopts agile design patterns to assure the flexibility and extensibility; models virtual machine migrations which is lack in the existing tools; provides user-friendly interfaces for customized configurations and replaying. Comparing to existing simulators, FlexCloud has combining features for supporting public cloud providers, load-balance and energy-efficiency scheduling. FlexCloud has advantage in computing time and memory consumption to support large-scale simulations. The detailed design of FlexCloud is introduced and performance evaluation is provided.

preprint2015arXiv

Open-Source Simulators for Cloud Computing: Comparative Study and Challenging Issues

Resource scheduling in infrastructure as a service (IaaS) is one of the keys for large-scale Cloud applications. Extensive research on all issues in real environment is extremely difficult because it requires developers to consider network infrastructure and the environment, which may be beyond the control. In addition, the network conditions cannot be controlled or predicted. Performance evaluations of workload models and Cloud provisioning algorithms in a repeatable manner under different configurations are difficult. Therefore, simulators are developed. To understand and apply better the state-of-the-art of cloud computing simulators, and to improve them, we study four known open-source simulators. They are compared in terms of architecture, modeling elements, simulation process, performance metrics and scalability in performance. Finally, a few challenging issues as future research trends are outlined.

preprint2015arXiv

Prepartition: Paradigm for the Load Balance of Virtual Machine Allocation in Data Centers

It is significant to apply load-balancing strategy to improve the performance and reliability of resource in data centers. One of the challenging scheduling problems in Cloud data centers is to take the allocation and migration of reconfigurable virtual machines (VMs) as well as the integrated features of hosting physical machines (PMs) into consideration. In the reservation model, the workload of data centers has fixed process interval characteristics. In general, load-balance scheduling is NP-hard problem as proved in many open literatures. Traditionally, for offline load balance without migration, one of the best approaches is LPT (Longest Process Time first), which is well known to have approximation ratio 4/3. With virtualization, reactive (post) migration of VMs after allocation is one popular way for load balance and traffic consolidation. However, reactive migration has difficulty to reach predefined load balance objectives, and may cause interruption and instability of service and other associated costs. In view of this, we propose a new paradigm, called Prepartition, it proactively sets process-time bound for each request on each PM and prepares in advance to migrate VMs to achieve the predefined balance goal. Prepartition can reduce process time by preparing VM migration in advance and therefore reduce instability and achieve better load balance as desired. We also apply the Prepartition to online (PrepartitionOn) load balance and compare it with existing online scheduling algorithms. Both theoretical and experimental results are provided.

Minxian Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving

CloudNativeSim: a toolkit for modeling and simulation of cloud-native applications

StatuScale: Status-aware and Elastic Scaling Strategy for Microservice Applications

AI for Next Generation Computing: Emerging Trends and Future Directions

EsDNN: Deep Neural Network based Multivariate Workload Prediction Approach in Cloud Environment

A Self-adaptive Approach for Managing Applications and Harnessing Renewable Energy for Sustainable Cloud Computing

Energy Efficient Algorithms based on VM Consolidation for Cloud Computing: Comparisons and Evaluations

Green-aware Mobile Edge Computing for IoT: Challenges, Solutions and Future Directions

Transformative effects of IoT, Blockchain and Artificial Intelligence on cloud computing: Evolution, vision, trends and open challenges

CloudSimNFV: Modeling and Simulation of Energy-Efficient NFV in Cloud Data Centers

FlexCloud: A Flexible and Extendible Simulator for Performance Evaluation of Virtual Machine Allocation

Open-Source Simulators for Cloud Computing: Comparative Study and Challenging Issues

Prepartition: Paradigm for the Load Balance of Virtual Machine Allocation in Data Centers