Source author record

Trever Schirmer

Trever Schirmer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Machine Learning

Catalog footprint

What is connected

6works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving

Mixture-of-Experts (MoE) models offer high capacity with efficient inference cost by activating a small subset of expert models per input. However, deploying MoE models requires all experts to reside in memory, creating a gap between the resource used by activated experts and the provisioned resources. This underutilization is further pronounced in multi-tenant scenarios. In this paper, we propose FaaSMoE, a multi-tenant MoE serving architecture built on Function-as-a-Service (FaaS) platforms. FaaSMoE decouples the control and execution planes of MoE by deploying experts as stateless FaaS functions, enabling on-demand and scale-to-zero expert invocation across tenants. FaaSMoE further supports configurable expert granularity within functions, trading off per-expert elasticity for reduced invocation overhead. We implement a prototype with an open-source edge-oriented FaaS platform and evaluate it using Qwen1.5-moe-2.7B under multi-tenant workloads. Compared to a full-model baseline, FaaSMoE uses less than one third of the resources, demonstrating a practical and resource-efficient path towards scalable MoE serving in a multi-tenant environment.

preprint2026arXiv

Konflux: Optimized Function Fusion for Serverless Applications

Function-as-a-Service (FaaS) has become a central paradigm in serverless cloud computing, yet optimizing FaaS deployments remains challenging. Using function fusion, multiple functions can be combined into a single deployment unit, which can be used to reduce cost and latency of complex serverless applications comprising multiple functions. Even in small-scale applications, the number of possible fusion configurations is vast, making brute-force benchmarking in production both cost- and time-prohibitive. In this paper, we present a system that can analyze every possible fusion setup of complex applications. By emulating the FaaS platform, our system enables local experimentation, eliminating the need to reconfigure the live platform and significantly reducing associated cost and time. We evaluate all fusion configurations across a number of example FaaS applications and resource limits. Our results reveal that, when analyzing cost and latency trade-offs, only a limited set of fusion configurations represent optimal solutions, which are strongly influenced by the specific pricing model in use.

preprint2022arXiv

Fusionize: Improving Serverless Application Performance through Feedback-Driven Function Fusion

Serverless computing increases developer productivity by removing operational concerns such as managing hardware or software runtimes. Developers, however, still need to partition their application into functions, which can be error-prone and adds complexity: Using a small function size where only the smallest logical unit of an application is inside a function maximizes flexibility and reusability. Yet, having small functions leads to invocation overheads, additional cold starts, and may increase cost due to busy waiting. In this paper we present Fusionize, a framework that removes these concerns from developers by automatically fusing the application code into a multi-function orchestration with varying function size. Developers only need to write the application code following a lightweight programming model and do not need to worry how the application is turned into functions. Our framework automatically fuses different parts of the application into functions and manages their interactions. Leveraging monitoring data, the framework optimizes the distribution of application parts to functions to optimize deployment goals such as end-to-end latency and cost. Using two example applications, we show that Fusionize can automatically and iteratively improve the deployment artifacts of the application.

preprint2022arXiv

Hardless: A Generalized Serverless Compute Architecture for Hardware Processing Accelerators

The increasing use of hardware processing accelerators tailored for specific applications, such as the Vision Processing Unit (VPU) for image recognition, further increases developers' configuration, development, and management overhead. Developers have successfully used fully automated elastic cloud services such as serverless computing to counter these additional efforts and shorten development cycles for applications running on CPUs. Unfortunately, current cloud solutions do not yet provide these simplifications for applications that require hardware acceleration. However, as the development of specialized hardware acceleration continues to provide performance and cost improvements, it will become increasingly important to enable ease of use in the cloud. In this paper, we present an initial design and implementation of Hardless, an extensible and generalized serverless computing architecture that can support workloads for arbitrary hardware accelerators. We show how Hardless can scale across different commodity hardware accelerators and support a variety of workloads using the same execution and programming model common in serverless computing today.

preprint2022arXiv

Streaming vs. Functions: A Cost Perspective on Cloud Event Processing

In cloud event processing, data generated at the edge is processed in real-time by cloud resources. Both distributed stream processing (DSP) and Function-as-a-Service (FaaS) have been proposed to implement such event processing applications. FaaS emphasizes fast development and easy operation, while DSP emphasizes efficient handling of large data volumes. Despite their architectural differences, both can be used to model and implement loosely-coupled job graphs. In this paper, we consider the selection of FaaS and DSP from a cost perspective. We implement stateless and stateful workflows from the Theodolite benchmarking suite using cloud FaaS and DSP. In an extensive evaluation, we show how application type, cloud service provider, and runtime environment can influence the cost of application deployments and derive decision guidelines for cloud engineers.

preprint2022arXiv

Towards Distributed Coordination for Fog Platforms

Distributed fog and edge applications communicate over unreliable networks and are subject to high communication delays. This makes using existing distributed coordination technologies from cloud applications infeasible, as they are built on the assumption of a highly reliable, low-latency datacenter network to achieve strict consistency with low overheads. To help implement configuration and state management for fog platforms and applications, we propose a novel decentralized approach that lets systems specify coordination strategies and membership for different sets of coordination data.

Trever Schirmer

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving

Konflux: Optimized Function Fusion for Serverless Applications

Fusionize: Improving Serverless Application Performance through Feedback-Driven Function Fusion

Hardless: A Generalized Serverless Compute Architecture for Hardware Processing Accelerators

Streaming vs. Functions: A Cost Perspective on Cloud Event Processing

Towards Distributed Coordination for Fog Platforms