Source author record

Mahdi Haghifam

Mahdi Haghifam appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory Machine Learning math.IT

Catalog footprint

What is connected

6works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Adaptive Generate-Rank-Verify: Inference-Time Search with Costly Verification

Many inference-time language-model pipelines combine a cheap reward signal with an expensive verifier, such as exact answer checking in mathematical reasoning or hidden-test execution in code generation. We formalize this setting using a learning-theoretic lens as generative active search: a cost-sensitive first-positive search problem in which a policy adaptively samples candidates from an unknown distribution, observes cheap scores, and pays for verifier labels until it finds a positive example. For a fixed prompt, the generator and reward model induce two unknown objects: a distribution over reward scores and a score-conditioned success function. When these quantities are known, we characterize the distribution-aware optimal policy using a dynamic programming approach. In the realistic and practical setting where both the score distribution and success function are unknown, we propose ADAP, a shellwise adaptive generate-rank-verify algorithm that progressively increases the number of sampled responses and top-ranked verifications. Under the monotonicity assumption that higher reward scores are no less likely to pass verification, we show that ADAP achieves expected cost within a constant factor of the distribution-aware optimum. We complement this result with learning-theoretic lower bounds, based on a centered star number, showing that structural assumptions on the score--label relationship are necessary. Experiments on mathematical reasoning and competitive programming validate the predicted advantage over both fixed non-adaptive policies and difficulty-adaptive baselines.

preprint2022arXiv

Understanding Generalization via Leave-One-Out Conditional Mutual Information

We study the mutual information between (certain summaries of) the output of a learning algorithm and its $n$ training data, conditional on a supersample of $n+1$ i.i.d. data from which the training data is chosen at random without replacement. These leave-one-out variants of the conditional mutual information (CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control the mean generalization error of learning algorithms with bounded loss functions. For learning algorithms achieving zero empirical risk under 0-1 loss (i.e., interpolating algorithms), we provide an explicit connection between leave-one-out CMI and the classical leave-one-out error estimate of the risk. Using this connection, we obtain upper and lower bounds on risk in terms of the (evaluated) leave-one-out CMI. When the limiting risk is constant or decays polynomially, the bounds converge to within a constant factor of two. As an application, we analyze the population risk of the one-inclusion graph algorithm, a general-purpose transductive learning algorithm for VC classes in the realizable setting. Using leave-one-out CMI, we match the optimal bound for learning VC classes in the realizable setting, answering an open challenge raised by Steinke and Zakynthinou (2020). Finally, in order to understand the role of leave-one-out CMI in studying generalization, we place leave-one-out CMI in a hierarchy of measures, with a novel unconditional mutual information at the root. For 0-1 loss and interpolating learning algorithms, this mutual information is observed to be precisely the risk.

preprint2021arXiv

Sequential Classification with Empirically Observed Statistics

Motivated by real-world machine learning applications, we consider a statistical classification task in a sequential setting where test samples arrive sequentially. In addition, the generating distributions are unknown and only a set of empirically sampled sequences are available to a decision maker. The decision maker is tasked to classify a test sequence which is known to be generated according to either one of the distributions. In particular, for the binary case, the decision maker wishes to perform the classification task with minimum number of the test samples, so, at each step, she declares that either hypothesis 1 is true, hypothesis 2 is true, or she requests for an additional test sample. We propose a classifier and analyze the type-I and type-II error probabilities. We demonstrate the significant advantage of our sequential scheme compared to an existing non-sequential classifier proposed by Gutman. Finally, we extend our setup and results to the multi-class classification scenario and again demonstrate that the variable-length nature of the problem affords significant advantages as one can achieve the same set of exponents as Gutman's fixed-length setting but without having the rejection option.

preprint2020arXiv

Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019). Our main contributions are significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via data-dependent estimates. Our approach is based on the variational characterization of mutual information and the use of data-dependent priors that forecast the mini-batch gradient based on a subset of the training samples. Our approach is broadly applicable within the information-theoretic framework of Russo and Zou (2015) and Xu and Raginsky (2017). Our bound can be tied to a measure of flatness of the empirical risk surface. As compared with other bounds that depend on the squared norms of gradients, empirical investigations show that the terms in our bounds are orders of magnitude smaller.

preprint2016arXiv

On Wireless Energy and Information Transfer in Relay Networks

This paper investigates the outage probability and the throughput of relay networks with wireless information and energy transfer where the relays harvest energy from the transmitted radio-frequency signal of the source. Considering different power consumption models, we derive the outage probability for both adaptive and non-adaptive power allocations at the relay. With a total energy consumption constraint at the source, we provide closed-form expressions for the optimal time sharing and power allocation between the source energy and information transfer signals as well as the optimal relay positioning such that the outage probability is minimized. Finally, we extend our analysis to multi-relay networks. We show that with perfect channel state information (CSI) available at the relays and $N$ relays the opportunistic relaying scheme achieves diversity order of $\frac{N+1}{2}$. Also, we analyze the opportunistic relaying with partial CSI where either the source-relay or the relay-destination CSI is provided at its corresponding transmit terminal, and prove that the relay selection based on the source-relay CSI outperforms the relay selection based on the relay-destination CSI, in terms of outage probability. The analytical and simulation results demonstrate the efficiency of wireless energy and information transfer systems in different conditions.

preprint2016arXiv

Wireless-powered relaying with finite block-length codes

This paper studies the outage probability and the throughput of amplify-and-forward relay networks with wireless information and energy transfer. We use some recent results on finite block-length codes to analyze the system performance in the cases with short codewords. Specifically, the time switching relaying and the power splitting relaying protocols are considered for energy and information transfer. We derive tight approximations for the outage probability/throughput. Then, we analyze the outage probability in asymptotically high signal-to-noise ratios. Finally, we use numerical results to confirm the accuracy of our analysis and to evaluate the system performance in different scenarios. Our results indicate that, in delay-constrained scenarios, the codeword length affects the outage probability/throughput of the joint energy and information transfer systems considerably.

Mahdi Haghifam

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Adaptive Generate-Rank-Verify: Inference-Time Search with Costly Verification

Understanding Generalization via Leave-One-Out Conditional Mutual Information

Sequential Classification with Empirically Observed Statistics

Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

On Wireless Energy and Information Transfer in Relay Networks

Wireless-powered relaying with finite block-length codes