Researcher profile

Lin Xiao

Lin Xiao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2022arXiv

A Mid-infrared Study of Superluminous Supernovae

We present the mid-infrared (MIR) light curves (LC) of 10 superluminous supernovae (SLSNe) at $z<0.12$ based on WISE data at 3.4 and 4.6 $μ$m. Three of them, including PS15br, SN 2017ens, and SN 2017err show rebrightening which started at 200--400 days and ended at 600--1000 days, indicating the presence of dust. In four of the left seven SLSNe, dust emission was detected with monochromatic luminosities of $10^7\sim10^8\ L_\odot$ at epochs of 100--500 days based on MIR colors $W1-W2\sim1$. Among the three SLSNe which show rebrightening, we further analysed PS15br and SN 2017ens. We modeled the SEDs at 500--700 days, which gives dust temperatures of 600--1100 K, dust masses of $\gtrsim 10^{-2}\ M_\odot$, and luminosities of $10^8\sim10^9$ $L_\odot$. Considering the time delay and the huge amount of energy released, the emitting dust can hardly be pre-existing dust heated whether collisionally by shocks or radiatively by peak SLSN luminosity or shock emission. Instead, it can be newly formed dust additionally heated by the interaction of circum-stellar medium, indicated by features in their spectra and slowly declining bolometric LCs. The dust masses appear to be ten times greater than those formed in normal core-collapse supernovae at similar epochs. Combining with the analysis of SN 2018bsz by Chen et al. (2022), we suggest that SLSNe have higher dust formation efficiency, although future observations are required to reach a final conclusion.

preprint2022arXiv

Chemical variations across the TMC-1 boundary: molecular tracers from translucent phase to dense phase

We investigated the chemical evolutions of gas phase and grain surface species across the Taurus molecular cloud-1 (TMC-1) filament from translucent phase to dense phase. By comparing observations with modeling results from an up-to-date chemical network, we examined the conversion processes for the carbon-, oxygen-, nitrogen- and sulfur-bearing species, i.e.from their initial atomic form to their main molecular reservoir form both in the gas phase and on the grain surface. The conversion processes were found to depend on the species and A$_V$. The effect of initial carbon to oxygen elemental abundances ratio (C/O) by varying O on the chemistry was explored, and an initial carbon elemental abundance of 2.5 $\times$ 10$^{-4}$ and a C/O ratio of 0.5 could best reproduce the abundances of most observed molecules at TMC-1 CP, where more than 90 molecules have been identified. Based on the TMC-1 condition, we predicted a varied grain ice composition during the evolutions of molecular clouds, with H$_2$O ice as the dominant ice composition at A$_V$ $>$ 4 mag, CO$_2$ ice as the dominant ice composition at A$_V$ $<$ 4 mag, while CO ice severely decreased at A$_V$ around 4--5 mag.

preprint2022arXiv

Federated Learning with Partial Model Personalization

We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices. Both algorithms have been proposed in the literature, but their convergence properties are not fully understood, especially for the alternating variant. We provide convergence analyses of both algorithms in the general nonconvex setting with partial participation and delineate the regime where one dominates the other. Our experiments on real-world image, text, and speech datasets demonstrate that (a) partial personalization can obtain most of the benefits of full model personalization with a small fraction of personal parameters, and, (b) the alternating update algorithm often outperforms the simultaneous update algorithm by a small but consistent margin.

preprint2022arXiv

Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method

The classical AdaGrad method adapts the learning rate by dividing by the square root of a sum of squared gradients. Because this sum on the denominator is increasing, the method can only decrease step sizes over time, and requires a learning rate scaling hyper-parameter to be carefully tuned. To overcome this restriction, we introduce GradaGrad, a method in the same family that naturally grows or shrinks the learning rate based on a different accumulation in the denominator, one that can both increase and decrease. We show that it obeys a similar convergence rate as AdaGrad and demonstrate its non-monotone adaptation capability with experiments.

preprint2022arXiv

On Continual Model Refinement in Out-of-Distribution Data Streams

Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams while overcoming catastrophic forgetting. However, existing continual learning (CL) problem setups cannot cover such a realistic and complex scenario. In response to this, we propose a new CL problem formulation dubbed continual model refinement (CMR). Compared to prior CL settings, CMR is more practical and introduces unique challenges (boundary-agnostic and non-stationary distribution shift, diverse mixtures of multiple OOD data clusters, error-centric streams, etc.). We extend several existing CL approaches to the CMR setting and evaluate them extensively. For benchmarking and analysis, we propose a general sampling algorithm to obtain dynamic OOD data streams with controllable non-stationarity, as well as a suite of metrics measuring various aspects of online performance. Our experiments and detailed analysis reveal the promise and challenges of the CMR problem, supporting that studying CMR in dynamic OOD streams can benefit the longevity of deployed NLP models in production.

preprint2022arXiv

On the Convergence Rates of Policy Gradient Methods

We consider infinite-horizon discounted Markov decision problems with finite state and action spaces and study the convergence rates of the projected policy gradient method and a general class of policy mirror descent methods, all with direct parametrization in the policy space. First, we develop a theory of weak gradient-mapping dominance and use it to prove sharper sublinear convergence rate of the projected policy gradient method. Then we show that with geometrically increasing step sizes, a general class of policy mirror descent methods, including the natural policy gradient method and a projected Q-descent method, all enjoy a linear rate of convergence without relying on entropy or other strongly convex regularization. Finally, we also analyze the convergence rate of an inexact policy mirror descent method and estimate its sample complexity under a simple generative model.

preprint2021arXiv

Does Head Label Help for Long-Tailed Multi-Label Text Classification

Multi-label text classification (MLTC) aims to annotate documents with the most relevant labels from a number of candidate labels. In real applications, the distribution of label frequency often exhibits a long tail, i.e., a few labels are associated with a large number of documents (a.k.a. head labels), while a large fraction of labels are associated with a small number of documents (a.k.a. tail labels). To address the challenge of insufficient training data on tail label classification, we propose a Head-to-Tail Network (HTTN) to transfer the meta-knowledge from the data-rich head labels to data-poor tail labels. The meta-knowledge is the mapping from few-shot network parameters to many-shot network parameters, which aims to promote the generalizability of tail classifiers. Extensive experimental results on three benchmark datasets demonstrate that HTTN consistently outperforms the state-of-the-art methods. The code and hyper-parameter settings are released for reproducibility

preprint2020arXiv

Jointly Modeling Intra- and Inter-transaction Dependencies with Hierarchical Attentive Transaction Embeddings for Next-item Recommendation

A transaction-based recommender system (TBRS) aims to predict the next item by modeling dependencies in transactional data. Generally, two kinds of dependencies considered are intra-transaction dependency and inter-transaction dependency. Most existing TBRSs recommend next item by only modeling the intra-transaction dependency within the current transaction while ignoring inter-transaction dependency with recent transactions that may also affect the next item. However, as not all recent transactions are relevant to the current and next items, the relevant ones should be identified and prioritized. In this paper, we propose a novel hierarchical attentive transaction embedding (HATE) model to tackle these issues. Specifically, a two-level attention mechanism integrates both item embedding and transaction embedding to build an attentive context representation that incorporates both intraand inter-transaction dependencies. With the learned context representation, HATE then recommends the next item. Experimental evaluations on two real-world transaction datasets show that HATE significantly outperforms the state-ofthe-art methods in terms of recommendation accuracy.

preprint2020arXiv

Statistical Adaptive Stochastic Gradient Methods

We propose a statistical adaptive procedure called SALSA for automatically scheduling the learning rate (step size) in stochastic gradient methods. SALSA first uses a smoothed stochastic line-search procedure to gradually increase the learning rate, then automatically switches to a statistical method to decrease the learning rate. The line search procedure ``warms up&#39;&#39; the optimization process, reducing the need for expensive trial and error in setting an initial learning rate. The method for decreasing the learning rate is based on a new statistical test for detecting stationarity when using a constant step size. Unlike in prior work, our test applies to a broad class of stochastic gradient algorithms without modification. The combined method is highly robust and autonomous, and it matches the performance of the best hand-tuned learning rate schedules in our experiments on several deep learning tasks.

preprint2020arXiv

Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization

We consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters. In order to reduce the number of communications required to reach a given accuracy, we propose a \emph{preconditioned} accelerated gradient method where the preconditioning is done by solving a local optimization problem over a subsampled dataset at the server. The convergence rate of the method depends on the square root of the relative condition number between the global and local loss functions. We estimate the relative condition number for linear prediction models by studying \emph{uniform} concentration of the Hessians over a bounded domain, which allows us to derive improved convergence rates for existing preconditioned gradient methods and our accelerated method. Experiments on real-world datasets illustrate the benefits of acceleration in the ill-conditioned regime.

preprint2019arXiv

Joint Computation and Communication Design for UAV-Assisted Mobile Edge Computing in IoT

Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) system is a prominent concept, where a UAV equipped with a MEC server is deployed to serve a number of terminal devices (TDs) of Internet of Things (IoT) in a finite period. In this paper, each TD has a certain latency-critical computation task in each time slot to complete. Three computation strategies can be available to each TD. First, each TD can operate local computing by itself. Second, each TD can partially offload task bits to the UAV for computing. Third, each TD can choose to offload task bits to access point (AP) via UAV relaying. We propose a new optimization problem formulation that aims to minimize the total energy consumption including communication-related energy, computation-related energy and UAV&#39;s flight energy by optimizing the bits allocation, time slot scheduling and power allocation as well as UAV trajectory design. As the formulated problem is non-convex and difficult to find the optimal solution, we solve the problem by two parts, and obtain the near optimal solution with within a dozen of iterations. Finally, numerical results are given to validate the proposed algorithm, which is verified to be efficient and superior to the other benchmark cases.