Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

KVBuffer: IO-aware Serving for Linear Attention

Linear attention has recently gained significant attention for long-context inference due to its constant decoding cost with respect to context length. However, existing serving systems typically serve linear attention by recurrently computing and updating a large linear attention state in every decoding step. Since the state is much larger than the per-token key and value, recurrent decoding incurs substantial memory access and becomes inefficient for serving linear attention. In this paper, we propose KVBuffer, an IO-aware serving mechanism for linear attention. By buffering recent keys and values, KVBuffer enables serving systems to compute linear attention outputs in more flexible and memory-efficient ways. For decoding, KVBuffer enables chunkwise computation, which reduces average memory access and decoding latency by deferring state updates and applying them in batch. For speculative decoding, KVBuffer verifies draft tokens in parallel and avoids storing temporary states. For short contexts, KVBuffer computes attention outputs directly from buffered keys and values, without creating or updating the linear attention state. We implement KVBuffer in SGLang for Qwen3-Next. Our evaluations show that KVBuffer can reduce linear attention decoding latency by up to 45.17% and increase the maximum number of serving requests by 5x for speculative decoding when verifying four draft tokens.

preprint2020arXiv

Privacy Adversarial Network: Representation Learning for Mobile Data Privacy

The remarkable success of machine learning has fostered a growing number of cloud-based intelligent services for mobile users. Such a service requires a user to send data, e.g. image, voice and video, to the provider, which presents a serious challenge to user privacy. To address this, prior works either obfuscate the data, e.g. add noise and remove identity information, or send representations extracted from the data, e.g. anonymized features. They struggle to balance between the service utility and data privacy because obfuscated data reduces utility and extracted representation may still reveal sensitive information. This work departs from prior works in methodology: we leverage adversarial learning to a better balance between privacy and utility. We design a \textit{representation encoder} that generates the feature representations to optimize against the privacy disclosure risk of sensitive information (a measure of privacy) by the \textit{privacy adversaries}, and concurrently optimize with the task inference accuracy (a measure of utility) by the \textit{utility discriminator}. The result is the privacy adversarial network (\systemname), a novel deep model with the new training algorithm, that can automatically learn representations from the raw data. Intuitively, PAN adversarially forces the extracted representations to only convey the information required by the target task. Surprisingly, this constitutes an implicit regularization that actually improves task accuracy. As a result, PAN achieves better utility and better privacy at the same time! We report extensive experiments on six popular datasets and demonstrate the superiority of \systemname compared with alternative methods reported in prior work.

preprint2012arXiv

Guadalupe: a browser design for heterogeneous hardware

Mobile systems are embracing heterogeneous architectures by getting more types of cores and more specialized cores, which allows applications to be faster and more efficient. We aim at exploiting the hardware heterogeneity from the browser without requiring any changes to either the OS or the web applications. Our design, Guadalupe, can use hardware processing units with different degrees of capability for matched browser services. It starts with a weak hardware unit, determines if and when a strong unit is needed, and seamlessly migrates to the strong one when necessary. Guadalupe not only makes more computing resources available to mobile web browsing but also improves its energy proportionality. Based on Chrome for Android and TI OMAP4, We provide a prototype browser implementation for resource loading and rendering. Compared to Chrome for Android, we show that Guadalupe browser for rendering can increase other 3D application's frame rate by up to 767% and save 4.7% of the entire system's energy consumption. More importantly, by using the two cases, we demonstrate that Guadalupe creates the great opportunity for many browser services to get better resource utilization and energy proportionality by exploiting hardware heterogeneity.

preprint2012arXiv

Practical Context Awareness: Measuring and Utilizing the Context Dependency of Mobile Usage

Context information brings new opportunities for efficient and effective applications and services on mobile devices. A wide range of research has exploited context dependency, i.e., the relations between context(s) and the outcome, to achieve significant, quantified, performance gains for a variety of applications. These works often have to deal with the challenges of multiple sources of context that can lead to a sparse training data set, and the challenge of energy hungry context sensors. Often, they address these challenges in an application specific and ad-hoc manner. We liberate mobile application designers and researchers from these burdens by providing a methodical approach to these challenges. In particular, we 1) define and measure the context-dependency of three fundamental types of mobile usage in an application agnostic yet practical manner, which can provide clear insight into the performance of potential ap-plication. 2) Address the challenge of data sparseness when dealing with multiple and different sources of context in a systematic manner. 3) Present SmartContext to address the energy challenge by automatically selecting among context sources while ensuring the minimum accuracy for each estimation event is met. Our analysis and findings are based on usage and context traces collected in real-life settings from 24 iPhone users over a period of one year. We present findings regarding the context dependency of the three principal types of mobile usage; visited websites, phone calls, and app usage. Yet, our methodology and the lessons we learn can be readily extended to other context-dependent mobile usage and system resources as well. Our findings guide the development of context aware systems, and highlight the challenges and expectations regarding the context dependency of mobile usage.

preprint2012arXiv

SportSense: Real-Time Detection of NFL Game Events from Twitter

We report our experience in building a working system, SportSense (http://www.sportsense.us), which exploits Twitter users as human sensors of the physical world to detect events in real-time. Using the US National Football League (NFL) games as a case study, we report in-depth measurement studies of the delay and post rate of tweets, and their dependence on other properties. We subsequently develop a novel event detection method based on these findings, and demonstrate that it can effectively and accurately extract game events using open access Twitter data. SportSense has been evolving during the 2010-11 and 2011-12 NFL seasons and is able to recognize NFL game big plays in 30 to 90 seconds with 98% true positive, and 9% false positive rates. Using a smart electronic TV program guide, we show that SportSense can utilize human sensors to empower novel services.

preprint2011arXiv

How Far Can Client-Only Solutions Go for Mobile Browser Speed?

Mobile browser is known to be slow because of the bottleneck in resource loading. Client-only solutions to improve resource loading are attractive because they are immediately deployable, scalable, and secure. We present the first publicly known treatment of client-only solutions to understand how much they can improve mobile browser speed without infrastructure support. Leveraging an unprecedented set of web usage data collected from 24 iPhone users continuously over one year, we examine the three fundamental, orthogonal approaches a client-only solution can take: caching, prefetching, and speculative loading, which is first proposed and studied in this work. Speculative loading predicts and speculatively loads the subresources needed to open a web page once its URL is given. We show that while caching and prefetching are highly limited for mobile browsing, speculative loading can be significantly more effective. Empirically, we show that client-only solutions can improve the browser speed by about 1.4 second on average for web sites visited by the 24 iPhone users. We also report the design, realization, and evaluation of speculative loading in a WebKit-based browser called Tempo. On average, Tempo can reduce browser delay by 1 second (~20%).

preprint2011arXiv

Human as Real-Time Sensors of Social and Physical Events: A Case Study of Twitter and Sports Games

In this work, we study how Twitter can be used as a sensor to detect frequent and diverse social and physical events in real-time. We devise efficient data collection and event recognition solutions that work despite various limits on free access to Twitter data. We describe a web service implementation of our solution and report our experience with the 2010-2011 US National Football League (NFL) games. The service was able to recognize NFL game events within 40 seconds and with accuracy up to 90%. This capability will be very useful for not only real-time electronic program guide for live broadcast programs but also refined auction of advertisement slots. More importantly, it demonstrates for the first time the feasibility of using Twitter for real-time social and physical event detection for ubiquitous computing.

preprint2011arXiv

In Situ Imaging of the Conducting Filament in a Silicon Oxide Resistive Switch

The nature of the conducting filaments in many resistive switching systems has been elusive. Through in situ transmission electron microscopy, we image the real-time formation and evolution of the filament in a silicon oxide resistive switch. The electroforming process is revealed to involve the local enrichment of silicon from the silicon oxide matrix. Semi-metallic silicon nanocrystals with structural variations from the conventional diamond cubic form of silicon are observed, which likely accounts for the conduction in the filament. The growth and shrinkage of the silicon nanocrystals in response to different electrical stimuli show energetically viable transition processes in the silicon forms, offering evidence to the switching mechanism. The study here also provides insights into the electrical breakdown process in silicon oxide layers, which are ubiquitous in a host of electronic devices.

preprint2011arXiv

Opportunistic Content Search of Smartphone Photos

Photos taken by smartphone users can accidentally contain content that is timely and valuable to others, often in real-time. We report the system design and evaluation of a distributed search system, Theia, for crowd-sourced real-time content search of smartphone photos. Because smartphones are resource-constrained, Theia incorporates two key innovations to control search cost and improve search efficiency. Incremental Search expands search scope incrementally and exploits user feedback. Partitioned Search leverages the cloud to reduce the energy consumption of search in smartphones. Through user studies, measurement studies, and field studies, we show that Theia reduces the cost per relevant photo by an average of 59%. It reduces the energy consumption of search by up to 55% and 81% compared to alternative strategies of executing entirely locally or entirely in the cloud. Search results from smartphones are obtained in seconds. Our experiments also suggest approaches to further improve these results.

preprint2011arXiv

Tales of 34 iPhone Users: How they change and why they are different

We present results from a longitudinal study of 34 iPh-one 3GS users, called LiveLab. LiveLab collected unprecedented usage data through an in-device, programmable logger and several structured interviews with the participants throughout the study. We have four objectives in writing this paper: (i) share the findings with the research community; (ii) provide insights guiding the design of smartphone systems and applications; (iii) demonstrate the power of prudently designed longitudinal field studies and the power of advanced research methods; and (iv) raise important questions that the research community can help answer in a collaborative, multidisciplinary manner. We show how the smartphone usage changes over the year and why the users are different (and similar) in their usage. In particular, our findings highlight application and web usage dynamics, the influence of socioeconomic status (SES) on usage, and the shortcomings of iPhone 3GS and its ecosystem. We further show that distinct classes of usage patterns exist, and these classes are best served by different phone designs, instead of the one-size-fits-all phone Apple provides. Our findings are significant not only for understanding smartphone users but also in guiding device and application development and optimizations. While we present novel results that can only be produced by a study of this nature, we also raise new research questions to be investigated by the mobile research community.

preprint2011arXiv

Transparent Programming of Heterogeneous Smartphones for Sensing

Sensing on smartphones is known to be power-hungry. It has been shown that this problem can be solved by adding an ultra low-power processor to execute simple, frequent sensor data processing. While very effective in saving energy, this resulting heterogeneous, distributed architecture poses a significant challenge to application development. We present Reflex, a suite of runtime and compilation techniques to conceal the heterogeneous, distributed nature from developers. The Reflex automatically transforms the developer's code for distributed execution with the help of the Reflex runtime. To create a unified system illusion, Reflex features a novel software distributed shared memory (DSM) design that leverages the extreme architectural asymmetry between the low-power processor and the powerful central processor to achieve both energy efficiency and performance. We report a complete realization of Reflex for heterogeneous smartphones with Maemo/Linux as the central kernel. Using a tri-processor hardware prototype and sensing applications reported in recent literature, we evaluate the Reflex realization for programming transparency, energy efficiency, and performance. We show that Reflex supports a programming style that is very close to contemporary smartphone programming. It allows existing sensing applications to be ported with minor source code changes. Reflex reduces the system power in sensing by up to 83%, and its runtime system only consumes 10% local memory on a typical ultra-low power processor.

preprint2010arXiv

A Longitudinal Study of Non-Voice Mobile Phone Usage by Teens from an Underserved Urban Community

We report a user study of over four months on the non-voice usage of mobile phones by teens from an underserved urban community in the USA where a community-wide, open-access Wi-Fi network exists. We instrumented the phones to record quantitative information regarding their usage and location in a privacy-respecting manner. We conducted focus group meetings and interviewed participants regularly for qualitative data. We present our findings on what applications our participants used and how their usage changed over time. The findings highlight the challenges to evaluating the usability of mobile systems and the value of long-term methodologies. Based on our findings, we analyze the unique values of mobile phones, as a platform technology. Our study shows that the usage is highly mobile, location-dependent, and serves multiple social purposes for the participants. Furthermore, we present concrete findings on how to perform and analyze similar user studies on mobile phones, including four contributing factors to usage evolution, and provide guidelines for their design and evaluation.

preprint2010arXiv

Beamsteering on Mobile Devices: Network Capacity and Client Efficiency

Current and emerging mobile devices are omni directional in wireless communication. Such omni directionality not only limits device energy efficiency but also poses a significant challenge toward the capacity of wireless networks through inter-link interference. In this work, we seek to make mobile clients directional with beamsteering. We first demonstrate that beamsteering is already feasible to mobile devices such as Netbooks and eBook readers in terms of form factor, power efficiency, and device mobility. We further reveal that beamsteering mobile clients face a unique challenge to balance client efficiency and network capacity. There is an optimal operating point for a beamsteering mobile client in terms of the number of antennas and transmit power that achieve the required capacity with lowest power. Finally, we provide a distributed algorithm called BeamAdapt that allows each client to closely approach its optimal point iteratively without central coordination. We also offer a cellular system realization of BeamAdapt. Using Qualnet-based simulation, we show that BeamAdapt with four antennas can reduce client power consumption by 55% while maintaining a required network throughput for a large-scale network, compared to the same network with omni directional mobile clients.

preprint2010arXiv

Chameleon: A Color-Adaptive Web Browser for Mobile OLED Displays

Displays based on organic light-emitting diode (OLED) technology are appearing on many mobile devices. Unlike liquid crystal displays (LCD), OLED displays consume dramatically different power for showing different colors. In particular, OLED displays are inefficient for showing bright colors. This has made them undesirable for mobile devices because much of the web content is of bright colors. To tackle this problem, we present the motivational studies, design, and realization of Chameleon, a color adaptive web browser that renders web pages with power-optimized color schemes under user-supplied constraints. Driven by the findings from our motivational studies, Chameleon provides end users with important options, offloads tasks that are not absolutely needed in real-time, and accomplishes real-time tasks by carefully enhancing the codebase of a browser engine. According to measure-ments with OLED smartphones, Chameleon is able to re-duce average system power consumption for web browsing by 41% and reduce display power consumption by 64% without introducing any noticeable delay.

preprint2010arXiv

Seamless Flow Migration on Smartphones without Network Support

This paper addresses the following question: Is it possible to migrate TCP/IP flows between different networks on modern mobile devices, without infrastructure support or protocol changes? To answer this question, we make three research contributions. (i) We report a comprehensive characterization of IP traffic on smartphones using traces collected from 27 iPhone 3GS users for three months. (ii) Driven by the findings from the characterization, we devise two novel system mechanisms for mobile devices to sup-port seamless flow migration without network support, and extensively evaluate their effectiveness using our field collected traces of real-life usage. Wait-n-Migrate leverages the fact that most flows are short lived. It establishes new flows on newly available networks but allows pre-existing flows on the old network to terminate naturally, effectively decreasing, or even eliminating, connectivity gaps during network switches. Resumption Agent takes advantage of the functionality integrated into many modern protocols to securely resume flows without application intervention. When combined, Wait-n-Migrate and Resumption Agent provide an unprecedented opportunity to immediately deploy performance and efficiency-enhancing policies that leverage multiple networks to improve the performance, efficiency, and connectivity of mobile devices. (iii) Finally, we report an iPhone 3GS based implementation of these two system mechanisms and show that their overhead is negligible. Furthermore, we employ an example network switching policy, called AutoSwitch, to demonstrate their performance. AutoSwitch improves the Wi-Fi user experience by intelligently migrating TCP flows between Wi-Fi and cellular networks. Through traces and field measurements, we show that AutoSwitch reduces the number of user disruptions by an order of magnitude.

preprint2010arXiv

Sesame: Self-Constructive System Energy Modeling for Battery-Powered Mobile Systems

System energy models are important for energy optimization and management in mobile systems. However, existing system energy models are built in lab with the help from a second computer. Not only are they labor-intensive; but also they will not adequately account for the great diversity in the hardware and usage of mobile systems. Moreover, existing system energy models are intended for energy estimation for time intervals of one second or longer; they do not provide the required rate for fine-grain use such as per-application energy accounting. In this work, we study a self-modeling paradigm in which a mobile system automatically generates its energy model without any external assistance. Our solution, Se-same, leverages the possibility of self power measurement through the smart battery interface and employs a suite of novel techniques to achieve accuracy and rate much higher than that of the smart battery interface. We report the implementation and evaluation of Se-same on a laptop and a smartphone. The experiment results show that Sesame generates system energy models of 95% accuracy at one estimation per second and 88% accuracy at one estimation per 10ms, without any external assistance. A five-day field studies with four laptop and four smartphones users further demonstrate the effectiveness, efficiency, and noninvasiveness of Sesame.