Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Data-Driven Dynamic Factor Modeling via Manifold Learning

We introduce a data-driven dynamic factor framework for modeling the joint evolution of high-dimensional covariates and responses without parametric assumptions. Standard factor models applied to covariates alone often lose explanatory power for responses. Our approach uses anisotropic diffusion maps, a manifold learning technique, to learn low-dimensional embeddings that preserve both the intrinsic geometry of the covariates and the predictive relationship with responses. For time series arising from Langevin diffusions in Euclidean space, we show that the associated graph Laplacian converges to the generator of the underlying diffusion. We further establish a bound on the approximation error between the diffusion map coordinates and linear diffusion processes, and we show that ergodic averages in the embedding space converge under standard spectral assumptions. These results justify using Kalman filtering in diffusion-map coordinates for predicting joint covariate-response evolution. We apply this methodology to equity-portfolio stress testing using macroeconomic and financial variables from Federal Reserve supervisory scenarios, achieving mean absolute error improvements of up to 55% over classical scenario analysis and 39% over principal component analysis benchmarks.

preprint2026arXiv

PREFER: Personalized Review Summarization with Online Preference Learning

Product reviews significantly influence purchasing decisions on e-commerce platforms. However, the sheer volume of reviews can overwhelm users, obscuring the information most relevant to their specific needs. Current e-commerce summarization systems typically produce generic, static summaries that fail to account for the fact that (i) different users care about different product characteristics, and (ii) these preferences may evolve with interactions. To address the challenge of unknown latent preferences, we propose an online learning framework that generates personalized summaries for each user. Our system iteratively refines its understanding of user preferences by incorporating feedback directly from the generated summaries over time. We provide a case study using the Amazon Reviews'23 dataset, showing in controlled simulations that online preference learning improves alignment with target user interests while maintaining summary quality.

preprint2026arXiv

SmartEval: A Benchmark for Evaluating LLM-Generated Smart Contracts from Natural Language Specifications

We introduce SmartEval, a benchmark for systematically evaluating the quality of Solidity smart contracts generated by large language models (LLMs) from natural language specifications. SmartEval provides a corpus of 9,000 generated contracts paired with expert-written ground-truth implementations drawn from the FSMSCG dataset, a five-dimensional evaluation rubric covering functional completeness, variable fidelity, state-machine correctness, business-logic fidelity, and code quality, and a reproducible generation-and-evaluation pipeline. To validate the benchmark's reliability, we conduct three independent empirical studies: a five-condition ablation study (N=300 per condition) isolating the contribution of each pipeline component, a human expert evaluation by three Columbia University PhD researchers confirming automated scores align with expert judgment to within 0.34 points, and external security analysis via the Slither static analyzer confirming 79.4% agreement between the LLM auditor and a non-LLM rule-based tool. Systematic analysis of 9,000 generated contracts reveals characteristic failure modes (logic omissions at 35.3%, state transition errors at 23.4%, and complexity-driven degradation) and quantifies a +8.29 composite-score advantage of generated contracts over ground-truth implementations, attributable to LLMs' literal specification-following behavior. SmartEval establishes a reproducible, validated foundation for empirical research on LLM smart contract synthesis quality, with all data, evaluation code, and generated contracts publicly released.

preprint2023arXiv

Causal Inference (C-inf) -- asymmetric scenario of typical phase transitions

In this paper, we revisit and further explore a mathematically rigorous connection between Causal inference (C-inf) and the Low-rank recovery (LRR) established in [10]. Leveraging the Random duality - Free probability theory (RDT-FPT) connection, we obtain the exact explicit typical C-inf asymmetric phase transitions (PT). We uncover a doubling low-rankness phenomenon, which means that exactly two times larger low rankness is allowed in asymmetric scenarios compared to the symmetric worst case ones considered in [10]. Consequently, the final PT mathematical expressions are as elegant as those obtained in [10], and highlight direct relations between the targeted C-inf matrix low rankness and the time of treatment. Our results have strong implications for applications, where C-inf matrices are not necessarily symmetric.

preprint2023arXiv

Causal Inference (C-inf) -- closed form worst case typical phase transitions

In this paper we establish a mathematically rigorous connection between Causal inference (C-inf) and the low-rank recovery (LRR). Using Random Duality Theory (RDT) concepts developed in [46,48,50] and novel mathematical strategies related to free probability theory, we obtain the exact explicit typical (and achievable) worst case phase transitions (PT). These PT precisely separate scenarios where causal inference via LRR is possible from those where it is not. We supplement our mathematical analysis with numerical experiments that confirm the theoretical predictions of PT phenomena, and further show that the two closely match for fairly small sample sizes. We obtain simple closed form representations for the resulting PTs, which highlight direct relations between the low rankness of the target C-inf matrix and the time of the treatment. Hence, our results can be used to determine the range of C-inf's typical applicability.

preprint2023arXiv

Exact Error in Matrix Completion: Approximately Low-Rank Structures and Missing Blocks

We study the completion of approximately low rank matrices with entries missing not at random (MNAR). In the context of typical large-dimensional statistical settings, we establish a framework for the performance analysis of the nuclear norm minimization ($\ell_1^*$) algorithm. Our framework produces \emph{exact} estimates of the worst-case residual root mean squared error and the associated phase transitions (PT), with both exhibiting remarkably simple characterizations. Our results enable to {\it precisely} quantify the impact of key system parameters, including data heterogeneity, size of the missing block, and deviation from ideal low rankness, on the accuracy of $\ell_1^*$-based matrix completion. To validate our theoretical worst-case RMSE estimates, we conduct numerical simulations, demonstrating close agreement with their numerical counterparts.

preprint2022arXiv

Large Sample Mean-Field Stochastic Optimization

We study a class of sampled stochastic optimization problems, where the underlying state process has diffusive dynamics of the mean-field type. We establish the existence of optimal relaxed controls when the sample set has finite size. The core of our paper is to prove, via $Γ$-convergence, that the minimizer of the finite sample relaxed problem converges to that of the limiting optimization problem. We connect the limit of the sampled objective functional to the unique solution, in the trajectory sense, of a nonlinear Fokker-Planck-Kolmogorov (FPK) equation in a random environment. We highlight the connection between the minimizers of our optimization problems and the optimal training weights of a deep residual neural network.

preprint2022arXiv

Power Forward Performance in Semimartingale Markets with Stochastic Integrated Factors

We study the forward investment performance process (FIPP) in an incomplete semimartingale market model with closed and convex portfolio constraints, when the investor's risk preferences are of the power form. We provide necessary and sufficient conditions for the construction of such a performance process, and show that it can be recovered as the unique solution of an infinite horizon quadratic backward stochastic differential equation (BSDE) with a nonmonotone driver. In an integrated stochastic factor model, we relate the factor representation of the BSDE solution to the smooth solution of an ill-posed partial integro-differential Hamilton-Jacobi-Bellman (HJB) equation. We provide an explicit construction of the BSDE solution for the class of time-monotone FIPPs, generalizing existing results from Brownian to semimartingale market models.

preprint2022arXiv

Power Forward Performance in Semimartingale Markets with Stochastic Integrated Factors

We study the forward investment performance process (FIPP) in an incomplete semimartingale market model with closed and convex portfolio constraints, when the investor's risk preferences are of the power form. We provide necessary and sufficient conditions for the existence of such FIPP. In a semimartingale factor model, we show that the FIPP can be recovered as a triplet of processes which admit an integral representation with respect to semimartingales. Using an integrated stochastic factor model, we relate the factor representation of the triplet of processes to the smooth solution of an ill-posed partial integro-differential Hamilton-Jacobi-Bellman (HJB) equation. We develop explicit constructions for the class of time-monotone FIPPs, generalizing existing results from Brownian to semimartingale market models.

preprint2022arXiv

The Evolution of Blockchain: from Lit to Dark

Transactions submitted through the blockchain peer-to-peer (P2P) network may leak out exploitable information. We study the economic incentives behind the adoption of blockchain dark venues, where users' transactions are observable only by miners on these venues. We show that miners may not fully adopt dark venues to preserve rents extracted from arbitrageurs, hence creating execution risk for users. The dark venue neither eliminates frontrunning risk nor reduces transaction costs. It strictly increases the payoff of miners, weakly increases the payoff of users, and weakly reduces arbitrageurs' profits. We provide empirical support for our main implications, and show that they are economically significant. A 1% increase in the probability of being frontrun raises users' adoption rate of the dark venue by 0.6%. Arbitrageurs' cost-to-revenue ratio increases by a third with a dark venue.

preprint2021arXiv

Market Making with Stochastic Liquidity Demand: Simultaneous Order Arrival and Price Change Forecasts

We provide an explicit characterization of the optimal market making strategy in a discrete-time Limit Order Book (LOB). In our model, the number of filled orders during each period depends linearly on the distance between the fundamental price and the market maker's limit order quotes, with random slope and intercept coefficients. The high-frequency market maker (HFM) incurs an end-of-the-day liquidation cost resulting from linear price impact. The optimal placement strategy incorporates in a novel and parsimonious way forecasts about future changes in the asset's fundamental price. We show that the randomness in the demand slope reduces the inventory management motive, and that a positive correlation between demand slope and investors' reservation prices leads to wider spreads. Our analysis reveals that the simultaneous arrival of buy and sell market orders (i) reduces the shadow cost of inventory, (ii) leads the HFM to reduce price pressures to execute larger flows, and (iii) introduces patterns of nonlinearity in the intraday dynamics of bid and ask spreads. Our empirical study shows that the market making strategy outperforms those which ignores randomness in demand, simultaneous arrival of buy and sell market orders, and local drift in the fundamental price.

preprint2020arXiv

Robust XVA

We introduce an arbitrage-free framework for robust valuation adjustments. An investor trades a credit default swap portfolio with a risky counterparty, and hedges credit risk by taking a position in defaultable bonds. The investor does not know the return rate of her counterparty's bond, but is confident that it lies within an uncertainty interval. We derive both upper and lower bounds for the XVA process of the portfolio, and show that these bounds may be recovered as solutions of nonlinear ordinary differential equations. The presence of collateralization and closeout payoffs leads to important differences with respect to classical credit risk valuation. The value of the super-replicating portfolio cannot be directly obtained by plugging one of the extremes of the uncertainty interval in the valuation equation, but rather depends on the relation between the XVA replicating portfolio and the close-out value throughout the life of the transaction. Our comparative statics analysis indicates that credit contagion has a nonlinear effect on the replication strategies and on the XVA.

preprint2019arXiv

Robo-advising: Learning Investors' Risk Preferences via Portfolio Choices

We introduce a reinforcement learning framework for retail robo-advising. The robo-advisor does not know the investor's risk preference, but learns it over time by observing her portfolio choices in different market environments. We develop an exploration-exploitation algorithm which trades off costly solicitations of portfolio choices by the investor with autonomous trading decisions based on stale estimates of investor's risk aversion. We show that the algorithm's value function converges to the optimal value function of an omniscient robo-advisor over a number of periods that is polynomial in the state and action space. By correcting for the investor's mistakes, the robo-advisor may outperform a stand-alone investor, regardless of the investor's opportunity cost for making portfolio decisions.

preprint2016arXiv

Arbitrage-Free XVA

We develop a framework for computing the total valuation adjustment (XVA) of a European claim accounting for funding costs, counterparty credit risk, and collateralization. Based on no-arbitrage arguments, we derive backward stochastic differential equations (BSDEs) associated with the replicating portfolios of long and short positions in the claim. This leads to the definition of buyer's and seller's XVA, which in turn identify a no-arbitrage interval. In the case that borrowing and lending rates coincide, we provide a fully explicit expression for the unique XVA, expressed as a percentage of the price of the traded claim, and for the corresponding replication strategies. In the general case of asymmetric funding, repo and collateral rates, we study the semilinear partial differential equations (PDE) characterizing buyer's and seller's XVA and show the existence of a unique classical solution to it. To illustrate our results, we conduct a numerical study demonstrating how funding costs, repo rates, and counterparty risk contribute to determine the total valuation adjustment.