Paper detail

PerCache: Predictive Hierarchical Cache for RAG Applications on Mobile Devices

Retrieval-augmented generation (RAG) has been extensively used as a de facto paradigm in various large language model (LLM)-driven applications on mobile devices, such as mobile assistants leveraging personal emails or meeting records. However, due to the lengthy prompts and the resource constraints, mobile RAG systems exhibit significantly high response latency. On this issue, one promising approach is to reuse intermediate computational results across different queries to eliminate redundant computation. But most existing approaches, such as KV cache reuse and semantic cache reuse, are designed for cloud settings and perform poorly, overlooking the distinctive characteristics of mobile RAG. We propose PerCache, a novel hierarchical cache solution designed for reducing end-to-end latency of personalized RAG applications on mobile platforms. PerCache adopts a hierarchical architecture that progressively matches similar queries and QKV cache to maximize the reuse of intermediate results at different computing stages. To improve cache hit rate, PerCache applies a predictive method to populate cache with queries that are likely to be raised in the future. In addition, PerCache can adapt its configurations to dynamic system loads, aiming at maximizing the caching utility with minimal resource consumption. We implement PerCache on top of an existing mobile LLM inference engine with commodity mobile phones. Extensive evaluations show that PerCache can surpass the best-performing baseline by 34.4% latency reduction across various applications and maintain optimal latency performance under dynamic resource changes.

preprint2025arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.