Source author record

Annie Qu

Annie Qu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning math.ST Statistics Theory Artificial Intelligence Robotics

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing

Soft Actor-Critic (SAC) and its variants dominate Multi-Task Reinforcement Learning (MTRL) due to their off-policy sample efficiency, while on-policy methods such as Proximal Policy Optimization (PPO) remain underexplored. We diagnose that PPO in MTRL suffers from a previously overlooked issue: critic-side gradient ill-conditioning, which may cause tail tasks to stall while easy tasks dominate the value function's updates. To address this, we propose TOPPO (Tail-Optimized PPO), a reformulation of PPO via Critic Balancing -- a set of modules that improve gradient conditioning and balance learning dynamics across tasks. Unlike prior approaches that rely on modular architectures or large models, TOPPO targets the optimization bottleneck within PPO itself. Empirically, TOPPO achieves stronger mean and tail-task performance than published SAC-family and ARS-family baselines while using substantially fewer parameters and environment steps on Meta-World+ benchmark. Notably, TOPPO matches or surpasses strong SAC baselines early in training and maintains superior performance at full budget. Ablations confirm the effectiveness of each module in TOPPO and provide insights into their interactions. Our results demonstrate that, with proper optimization, on-policy methods can rival or exceed off-policy approaches in MTRL, challenging the prevailing reliance on SAC and highlighting critic-side gradient conditioning as the central bottleneck.

preprint2022arXiv

Semi-standard partial covariance variable selection when irrepresentable conditions fail

Traditional variable selection methods could fail to be sign consistent when irrepresentable conditions are violated. This is especially critical in high-dimensional settings when the number of predictors exceeds the sample size. In this paper, we propose a new semi-standard partial covariance (SPAC) approach which is capable of reducing correlation effects from other covariates while fully capturing the magnitude of coefficients. The proposed SPAC is effective in choosing covariates which have direct effects on the response variable, while eliminating the predictors which are not directly associated with the response but are highly correlated with the relevant predictors. We show that the proposed SPAC method with the Lasso penalty or the smoothly clipped absolute deviation (SCAD) penalty possesses strong sign consistency in high-dimensional settings. Numerical studies and a post-traumatic stress disorder data application also confirm that the proposed method outperforms the existing Lasso, adaptive Lasso, SCAD, Peter-Clark-simple algorithm, and factor-adjusted regularized model selection methods when the irrepresentable conditions fail.

preprint2020arXiv

Dynamic Tensor Recommender Systems

Recommender systems have been extensively used by the entertainment industry, business marketing and the biomedical industry. In addition to its capacity of providing preference-based recommendations as an unsupervised learning methodology, it has been also proven useful in sales forecasting, product introduction and other production related businesses. Since some consumers and companies need a recommendation or prediction for future budget, labor and supply chain coordination, dynamic recommender systems for precise forecasting have become extremely necessary. In this article, we propose a new recommendation method, namely the dynamic tensor recommender system (DTRS), which aims particularly at forecasting future recommendation. The proposed method utilizes a tensor-valued function of time to integrate time and contextual information, and creates a time-varying coefficient model for temporal tensor factorization through a polynomial spline approximation. Major advantages of the proposed method include competitive future recommendation predictions and effective prediction interval estimations. In theory, we establish the convergence rate of the proposed tensor factorization and asymptotic normality of the spline coefficient estimator. The proposed method is applied to simulations and IRI marketing data. Numerical studies demonstrate that the proposed method outperforms existing methods in terms of future time forecasting.

preprint2020arXiv

Integrating multi-source block-wise missing data in model selection

For multi-source data, blocks of variable information from certain sources are likely missing. Existing methods for handling missing data do not take structures of block-wise missing data into consideration. In this paper, we propose a Multiple Block-wise Imputation (MBI) approach, which incorporates imputations based on both complete and incomplete observations. Specifically, for a given missing pattern group, the imputations in MBI incorporate more samples from groups with fewer observed variables in addition to the group with complete observations. We propose to construct estimating equations based on all available information, and optimally integrate informative estimating functions to achieve efficient estimators. We show that the proposed method has estimation and model selection consistency under both fixed-dimensional and high-dimensional settings. Moreover, the proposed estimator is asymptotically more efficient than the estimator based on a single imputation from complete observations only. In addition, the proposed method is not restricted to missing completely at random. Numerical studies and ADNI data application confirm that the proposed method outperforms existing variable selection methods under various missing mechanisms.

preprint2020arXiv

Multicategory Angle-based Learning for Estimating Optimal Dynamic Treatment Regimes with Censored Data

An optimal dynamic treatment regime (DTR) consists of a sequence of decision rules in maximizing long-term benefits, which is applicable for chronic diseases such as HIV infection or cancer. In this paper, we develop a novel angle-based approach to search the optimal DTR under a multicategory treatment framework for survival data. The proposed method targets maximization the conditional survival function of patients following a DTR. In contrast to most existing approaches which are designed to maximize the expected survival time under a binary treatment framework, the proposed method solves the multicategory treatment problem given multiple stages for censored data. Specifically, the proposed method obtains the optimal DTR via integrating estimations of decision rules at multiple stages into a single multicategory classification algorithm without imposing additional constraints, which is also more computationally efficient and robust. In theory, we establish Fisher consistency of the proposed method under regularity conditions. Our numerical studies show that the proposed method outperforms competing methods in terms of maximizing the conditional survival function. We apply the proposed method to two real datasets: Framingham heart study data and acquired immunodeficiency syndrome (AIDS) clinical data.

preprint2016arXiv

Weak Signal Identification and Inference in Penalized Model Selection

Weak signal identification and inference are very important in the area of penalized model selection, yet they are under-developed and not well-studied. Existing inference procedures for penalized estimators are mainly focused on strong signals. In this paper, we propose an identification procedure for weak signals in finite samples, and pro- vide a transition phase in-between noise and strong signal strengths. We also introduce a new two-step inferential method to construct better confidence intervals for the identified weak signals. Our theory development assumes that variables are orthogonally designed. Both theory and numerical studies indicate that the proposed method leads to better confidence coverage for weak signals, compared with those using asymptotic inference. In addition, the proposed method out- performs the perturbation and bootstrap resampling approaches. We illustrate our method for HIV antiretroviral drug susceptibility data to identify genetic mutations associated with HIV drug resistance.

preprint2014arXiv

Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates

We propose generalized additive partial linear models for complex data which allow one to capture nonlinear patterns of some covariates, in the presence of linear components. The proposed method improves estimation efficiency and increases statistical power for correlated data through incorporating the correlation information. A unique feature of the proposed method is its capability of handling model selection in cases where it is difficult to specify the likelihood function. We derive the quadratic inference function-based estimators for the linear coefficients and the nonparametric functions when the dimension of covariates diverges, and establish asymptotic normality for the linear coefficient estimators and the rates of convergence for the nonparametric functions estimators for both finite and high-dimensional cases. The proposed method and theoretical development are quite challenging since the numbers of linear covariates and nonlinear components both increase as the sample size increases. We also propose a doubly penalized procedure for variable selection which can simultaneously identify nonzero linear and nonparametric components, and which has an asymptotic oracle property. Extensive Monte Carlo studies have been conducted and show that the proposed procedure works effectively even with moderate sample sizes. A pharmacokinetics study on renal cancer data is illustrated using the proposed method.

Annie Qu

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing

Semi-standard partial covariance variable selection when irrepresentable conditions fail

Dynamic Tensor Recommender Systems

Integrating multi-source block-wise missing data in model selection

Multicategory Angle-based Learning for Estimating Optimal Dynamic Treatment Regimes with Censored Data

Weak Signal Identification and Inference in Penalized Model Selection

Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates