Researcher profile

Sina Aghaei

Sina Aghaei contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Large Language Models for Sequential Decision-Making: Improving In-Context Learning via Supervised Fine-Tuning

Large language models (LLMs) have shown remarkable in-context learning (ICL) capabilities, yet their potential for sequential decision-making remains underexplored. In this paper, we study the ICL capabilities of LLMs in sequential decision-making settings, including Markov Decision Processes (MDPs), Partially Observable MDPs (POMDPs), and Ambiguous POMDPs (APOMDPs). We fine-tune pretrained LLMs to perform few-shot decision-making directly from offline, oracle-labeled trajectories. Our framework enables flexible imitation of policies through supervised fine-tuning (SFT). Theoretically, we focus on linear MDPs and interpret a fine-tuned attention layer as implicitly estimating optimal Q-functions from in-context data. Building on this interpretation, we derive an end-to-end suboptimality bound for the induced policy that separates the in-context estimation error from the training-length bias. Empirically, across synthetic MDP, POMDP, and APOMDP settings, we find that fine-tuned LLMs achieve substantially smaller optimality gaps than in-context-only and random baselines, with especially large gains in longer-horizon, partially observed, and model-ambiguous environments. Together, these results show that supervised fine-tuning provides an effective route to endowing pretrained LLMs with sequential decision-making capabilities from offline data, which is an important advantage in domains such as healthcare where offline data are abundant.

preprint2020arXiv

Learning Optimal Classification Trees: Strong Max-Flow Formulations

We consider the problem of learning optimal binary classification trees. Literature on the topic has burgeoned in recent years, motivated both by the empirical suboptimality of heuristic approaches and the tremendous improvements in mixed-integer programming (MIP) technology. Yet, existing approaches from the literature do not leverage the power of MIP to its full extent. Indeed, they rely on weak formulations, resulting in slow convergence and large optimality gaps. To fill this gap in the literature, we propose a flow-based MIP formulation for optimal binary classification trees that has a stronger linear programming relaxation. Our formulation presents an attractive decomposable structure. We exploit this structure and max-flow/min-cut duality to derive a Benders' decomposition method, which scales to larger instances. We conduct extensive computational experiments on standard benchmark datasets on which we show that our proposed approaches are 50 times faster than state-of-the art MIP-based techniques and improve out of sample performance up to 13.8%.