Source author record

Melih Bastopcu

Melih Bastopcu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT eess.SP Networking and Internet Architecture Artificial Intelligence Machine Learning math.OC Social and Information Networks

Catalog footprint

What is connected

7works

8topics

3close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Queueing-Aware Optimization of Reasoning Tokens for Accuracy-Latency Trade-offs in LLM Servers

We consider a single large language model (LLM) server that serves a heterogeneous stream of queries belonging to $N$ distinct task types. Queries arrive according to a Poisson process, and each type occurs with a known prior probability. For each task type, the server allocates a fixed number of internal thinking tokens, which determines the computational effort devoted to that query. The token allocation induces an accuracy-latency trade-off: the service time follows an approximately affine function of the allocated tokens, while the probability of a correct response exhibits diminishing returns. Under a first-in, first-out (FIFO) service discipline, the system operates as an $M/G/1$ queue, and the mean system time depends on the first and second moments of the resulting service-time distribution. We formulate a constrained optimization problem that maximizes a weighted average accuracy objective penalized by the mean system time, subject to architectural token-budget constraints and queue-stability conditions. The objective function is shown to be strictly concave over the stability region, which ensures existence and uniqueness of the optimal token allocation. The first-order optimality conditions yield a coupled projected fixed-point characterization of the optimum, together with an iterative solution and an explicit sufficient condition for contraction. Moreover, a projected gradient method with a computable global step-size bound is developed to guarantee convergence beyond the contractive regime. Finally, integer-valued token allocations are attained via rounding of the continuous solution, and the resulting performance loss is evaluated in simulation results.

preprint2022arXiv

Using Timeliness in Tracking Infections

We consider real-time timely tracking of infection status (e.g., covid-19) of individuals in a population. In this work, a health care provider wants to detect infected people as well as people who have recovered from the disease as quickly as possible. In order to measure the timeliness of the tracking process, we use the long-term average difference between the actual infection status of the people and their real-time estimate by the health care provider based on the most recent test results. We first find an analytical expression for this average difference for given test rates, infection rates and recovery rates of people. Next, we propose an alternating minimization based algorithm to find the test rates that minimize the average difference. We observe that if the total test rate is limited, instead of testing all members of the population equally, only a portion of the population may be tested in unequal rates calculated based on their infection and recovery rates. Next, we characterize the average difference when the test measurements are erroneous (i.e., noisy). Further, we consider the case where the infection status of individuals may be dependent, which happens when an infected person spreads the disease to another person if they are not detected and isolated by the health care provider. Then, we consider an age of incorrect information based error metric where the staleness metric increases linearly over time as long as the health care provider does not detect the changes in the infection status of the people. In numerical results, we observe that an increased population size increases diversity of people with different infection and recovery rates which may be exploited to spend testing capacity more efficiently. Depending on the health care provider's preferences, test rate allocation can be adjusted to detect either the infected people or the recovered people more quickly.

preprint2020arXiv

Age of Information for Updates with Distortion: Constant and Age-Dependent Distortion Constraints

We consider an information update system where an information receiver requests updates from an information provider in order to minimize its age of information. The updates are generated at the information provider (transmitter) as a result of completing a set of tasks such as collecting data and performing computations. We refer to this as the update generation process. We model the $quality$ of an update as an increasing function of the processing time spent while generating the update at the transmitter. In particular, we use $distortion$ as a proxy for $quality$, and model distortion as a decreasing function of processing time. Processing longer at the transmitter results in a better quality (lower distortion) update, but it causes the update to age. We determine the age-optimal policies for the update request times at the receiver and the update processing times at the transmitter subject to a minimum required quality (maximum allowed distortion) constraint on the updates. For the required quality constraint, we consider the cases of constant maximum allowed distortion constraints, as well as age-dependent maximum allowed distortion constraints.

preprint2020arXiv

Optimal Selective Encoding for Timely Updates

We consider a system in which an information source generates independent and identically distributed status update packets from an observed phenomenon that takes $n$ possible values based on a given pmf. These update packets are encoded at the transmitter node to be sent to the receiver node. Instead of encoding all $n$ possible realizations, the transmitter node only encodes the most probable $k$ realizations and disregards whenever a realization from the remaining $n-k$ values occurs. We find the average age and determine the age-optimal real codeword lengths such that the average age at the receiver node is minimized. Through numerical evaluations for arbitrary pmfs, we show that this selective encoding policy results in a lower average age than encoding every realization and find the age-optimal $k$. We also analyze a randomized selective encoding policy in which the remaining $n-k$ realizations are encoded and sent with a certain probability to further inform the receiver at the expense of longer codewords for the selected $k$ realizations.

preprint2020arXiv

Partial Updates: Losing Information for Freshness

We consider an information updating system where a source produces updates as requested by a transmitter. The transmitter further processes these updates in order to generate $partial$ $updates$, which have smaller information compared to the original updates, to be sent to a receiver. We study the problem of generating partial updates, and finding their corresponding real-valued codeword lengths, in order to minimize the average age experienced by the receiver, while maintaining a desired level of mutual information between the original and partial updates. This problem is NP hard. We relax the problem and develop an alternating minimization based iterative algorithm that generates a pmf for the partial updates, and the corresponding age-optimal real-valued codeword length for each update. We observe that there is a tradeoff between the attained average age and the mutual information between the original and partial updates.

preprint2020arXiv

Selective Encoding Policies for Maximizing Information Freshness

An information source generates independent and identically distributed status update messages from an observed random phenomenon which takes $n$ distinct values based on a given pmf. These update packets are encoded at the transmitter node to be sent to a receiver node which wants to track the observed random variable with as little age as possible. The transmitter node implements a selective $k$ encoding policy such that rather than encoding all possible $n$ realizations, the transmitter node encodes the most probable $k$ realizations. We consider three different policies regarding the remaining $n-k$ less probable realizations: $highest$ $k$ $selective$ $encoding$ which disregards whenever a realization from the remaining $n-k$ values occurs; $randomized$ $selective$ $encoding$ which encodes and sends the remaining $n-k$ realizations with a certain probability to further inform the receiver node at the expense of longer codewords for the selected $k$ realizations; and $highest$ $k$ $selective$ $encoding$ $with$ $an$ $empty$ $symbol$ which sends a designated empty symbol when one of the remaining $n-k$ realizations occurs. For all of these three encoding schemes, we find the average age and determine the age-optimal real codeword lengths, including the codeword length for the empty symbol in the case of the latter scheme, such that the average age at the receiver node is minimized. Through numerical evaluations for arbitrary pmfs, we show that these selective encoding policies result in a lower average age than encoding every realization, and find the corresponding age-optimal $k$ values.

preprint2020arXiv

Who Should Google Scholar Update More Often?

We consider a resource-constrained updater, such as Google Scholar, which wishes to update the citation records of a group of researchers, who have different mean citation rates (and optionally, different importance coefficients), in such a way to keep the overall citation index as up to date as possible. The updater is resource-constrained and cannot update citations of all researchers all the time. In particular, it is subject to a total update rate constraint that it needs to distribute among individual researchers. We use a metric similar to the age of information: the long-term average difference between the actual citation numbers and the citation numbers according to the latest updates. We show that, in order to minimize this difference metric, the updater should allocate its total update capacity to researchers proportional to the $square$ $roots$ of their mean citation rates. That is, more prolific researchers should be updated more often, but there are diminishing returns due to the concavity of the square root function. More generally, our paper addresses the problem of optimal operation of a resource-constrained sampler that wishes to track multiple independent counting processes in a way that is as up to date as possible.

Melih Bastopcu

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Queueing-Aware Optimization of Reasoning Tokens for Accuracy-Latency Trade-offs in LLM Servers

Using Timeliness in Tracking Infections

Age of Information for Updates with Distortion: Constant and Age-Dependent Distortion Constraints

Optimal Selective Encoding for Timely Updates

Partial Updates: Losing Information for Freshness

Selective Encoding Policies for Maximizing Information Freshness

Who Should Google Scholar Update More Often?