Source author record

Alexander Moreno

Alexander Moreno appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning cond-mat.str-el Artificial Intelligence Computation and Language Methodology

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GQA-μP: The maximal parameterization update for grouped query attention

Hyperparameter transfer across model architectures dramatically reduces the amount of compute necessary for tuning large language models (LLMs). The maximal update parameterization (μP) ensures transfer through principled mathematical analysis but can be challenging to derive for new model architectures. Building on the spectral feature-learning view of Yang et al. (2023a), we make two advances. First, we promote spectral norm conditions on the weights from a heuristic to the definition of feature learning, and as a consequence arrive at the Complete-P depth and weight-decay scalings without recourse to lazy-learning. Second, we consider a modified spectral norm that preserves the valid scaling law of network weights when weight matrices are not full rank. This enables (to our knowledge, the first) derivation of μP scalings for grouped-query attention (GQA). We demonstrate the efficacy of our theoretical derivations by showing learning rate transfer across the GQA repetition hyperparameter as well as experiments regarding transfer over weight decay.

preprint2021arXiv

Transformers for prompt-level EMA non-response prediction

Ecological Momentary Assessments (EMAs) are an important psychological data source for measuring current cognitive states, affect, behavior, and environmental factors from participants in mobile health (mHealth) studies and treatment programs. Non-response, in which participants fail to respond to EMA prompts, is an endemic problem. The ability to accurately predict non-response could be utilized to improve EMA delivery and develop compliance interventions. Prior work has explored classical machine learning models for predicting non-response. However, as increasingly large EMA datasets become available, there is the potential to leverage deep learning models that have been effective in other fields. Recently, transformer models have shown state-of-the-art performance in NLP and other domains. This work is the first to explore the use of transformers for EMA data analysis. We address three key questions in applying transformers to EMA data: 1. Input representation, 2. encoding temporal information, 3. utility of pre-training on improving downstream prediction task performance. The transformer model achieves a non-response prediction AUC of 0.77 and is significantly better than classical ML and LSTM-based deep learning models. We will make our a predictive model trained on a corpus of 40K EMA samples freely-available to the research community, in order to facilitate the development of future transformer-based EMA analysis works.

preprint2020arXiv

A Robust Functional EM Algorithm for Incomplete Panel Count Data

Panel count data describes aggregated counts of recurrent events observed at discrete time points. To understand dynamics of health behaviors, the field of quantitative behavioral research has evolved to increasingly rely upon panel count data collected via multiple self reports, for example, about frequencies of smoking using in-the-moment surveys on mobile devices. However, missing reports are common and present a major barrier to downstream statistical learning. As a first step, under a missing completely at random assumption (MCAR), we propose a simple yet widely applicable functional EM algorithm to estimate the counting process mean function, which is of central interest to behavioral scientists. The proposed approach wraps several popular panel count inference methods, seamlessly deals with incomplete counts and is robust to misspecification of the Poisson process assumption. Theoretical analysis of the proposed algorithm provides finite-sample guarantees by expanding parametric EM theory to our general non-parametric setting. We illustrate the utility of the proposed algorithm through numerical experiments and an analysis of smoking cessation data. We also discuss useful extensions to address deviations from the MCAR assumption and covariate effects.

preprint2016arXiv

Automatic Variational ABC

Approximate Bayesian Computation (ABC) is a framework for performing likelihood-free posterior inference for simulation models. Stochastic Variational inference (SVI) is an appealing alternative to the inefficient sampling approaches commonly used in ABC. However, SVI is highly sensitive to the variance of the gradient estimators, and this problem is exacerbated by approximating the likelihood. We draw upon recent advances in variance reduction for SV and likelihood-free inference using deterministic simulations to produce low variance gradient estimators of the variational lower-bound. By then exploiting automatic differentiation libraries we can avoid nearly all model-specific derivations. We demonstrate performance on three problems and compare to existing SVI algorithms. Our results demonstrate the correctness and efficiency of our algorithm.

preprint2013arXiv

Transport through two interacting resonant levels connected by a Fermi sea

We study transport at finite bias, i.e. beyond the linear regime, through two interacting resonant levels connected by a Fermi sea, by means of time-dependent density matrix renormalization group. We first consider methodological issues, like the protocol that leads to a current-currying state and the characterization of the steady state. At finite sizes both the current and the occupations of the interacting levels oscillate as a function of time. We determine the amplitude and period of such oscillations as a function of bias. We find that the occupations on the two dots oscillate with a relative phase which depends on the distance between the impurities and on the Fermi momentum of the Fermi sea, as expected for RKKY interactions. Also the approximant to the steady-state current displays oscillations as a function of the distance between the impurities. Such a behavior can be explained by resonances in the free case. We then discuss the incidence of interaction on such a behavior. We conclude by showing the effect of the bias on the current, making connection with the one-impurity case.

preprint2011arXiv

Ground-State Phase Diagram of the 1D t-J model

We examine the ground-state phase diagram of the t-J model in one dimension by means of the Density Matrix Renormalization Group. This model is characterized by a rich phase diagram as a function of the exchange interaction J and the density n, displaying Luttinger-liquid (LL) behavior both of repulsive and attractive (i.e. superconducting) nature, a spin-gap phase, and phase-separation. The phase boundaries separating the repulsive from the attractive LL phase as J is increased, and also the boundaries of the spin-gap region at low densities, and phase-separation at even larger J, are determined on the basis of correlation functions and energy-gaps. In particular, we shed light on a contradiction between variational and renormalization-group (RG) results about the extent of the spin-gap phase, that results larger than the variational but smaller than the RG one. Furthermore, we show that the spin gap can reach a sizable value (~ 0.1 t) at low enough filling, such that preformed pairs should be observable at temperatures below these energy scales. No evidence for a phase with clustering of more than two particles is found on approaching phase separation.