Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
24works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

24 published item(s)

preprint2026arXiv

Accumulation of Sub-Sampling Matrices with Applications to Statistical Computation

With appropriately chosen sampling probabilities, sampling-based random projection can be used to implement large-scale statistical methods, substantially reducing computational cost while maintaining low statistical error. However, computing optimal sampling probabilities is often itself expensive, and in practice one typically resorts to suboptimal schemes. This generally leads to increased time and space costs, as more subsamples are required and the resulting projection matrices become larger, thereby making the inference procedure more computationally demanding. In this paper, we extend the framework of sampling-based random projection and propose a new projection method, \emph{accumulative sub-sampling}. By carefully accumulating multiple such projections, accumulative sub-sampling improves statistical efficiency while controlling the effective matrix size throughout the statistical computation. On the theoretical side, we quantify how the quality of the subsampling scheme affects the error in approximating matrix products and positive semidefinite matrices, and show how the proposed accumulation strategy mitigates this effect. Moreover, we apply our method to statistical models involving intensive matrix operations, such as eigendecomposition in spectral clustering and matrix inversion in kernel ridge regression, and demonstrate that reducing the effective matrix size leads to substantial computational savings. Numerical experiments across a range of problems further show that our approach consistently improves computational efficiency compared to existing random projection baselines under suboptimal sampling schemes.

preprint2026arXiv

Toward Scalable Terminal Task Synthesis via Skill Graphs

Terminal agents have demonstrated strong potential for autonomous command-line execution, yet their training remains constrained by the scarcity of high-quality and diverse execution trajectories. Existing approaches mitigate this bottleneck by synthesizing large-scale terminal task instances for trajectory sampling. However, they primarily focus on scaling the number of tasks while providing limited control over the diversity of execution trajectories that agents actually experience during training. In this paper, we present SkillSynth, an automated framework for terminal task synthesis built on a scenario-mediated skill graph. SkillSynth first constructs a large-scale skill graph, where scenarios serve as intermediate transition nodes that connect diverse command-line skills. It then samples paths from this graph as abstractions of real-world workflows, and uses a multi-agent harness to instantiate them into executable task instances. By grounding task synthesis in graph-sampled workflow paths, SkillSynth explicitly controls the diversity of minimal execution trajectories required to solve the synthesized tasks. Experiments on Terminal-Bench demonstrate the effectiveness of SkillSynth. Moreover, task instances synthesized by SkillSynth have been adopted to train Hy3 Preview, contributing to its enhanced agentic capabilities in terminal-based settings.

preprint2024arXiv

A Practical Beamforming Design for Active RIS-assisted MU-MISO Systems

Reconfigurable Intelligent Surfaces (RIS) have been proposed as a revolutionary technology with the potential to address several critical requirements of 6G communication systems. Despite its powerful ability for radio environment reconfiguration, the ``double fading'' effect constricts the practical system performance enhancements due to the significant path loss. A new active RIS architecture has been recently proposed to overcome this challenge. However, existing active RIS studies rely on an ideal amplification model without considering the practical hardware limitation of amplifiers, which may cause performance degradation using such inaccurate active RIS modeling. Motivated by this fact, in this paper we first investigate the amplification principle of typical active RIS and propose a more accurate amplification model based on amplifier hardware characteristics. Then, based on the new amplification model, we propose a novel joint transmit beamforming and RIS reflection beamforming design considering the incident signal power on practical active RIS for multiuser multi-input single-output (MU-MISO) communication system. Fractional programming (FP), majorization minimization (MM) and block coordinate descent (BCD) methods are used to solve for the complex problem. Simulation results indicate the importance of the consideration of practical amplifier hardware characteristics in the joint beamforming designs and demonstrate the effectiveness of the proposed algorithm compared to other benchmarks.

preprint2023arXiv

Estimating Distributions with Low-dimensional Structures Using Mixtures of Generative Models

There has been a growing interest in statistical inference from data satisfying the so-called manifold hypothesis, assuming data points in the high-dimensional ambient space to lie in close vicinity of a submanifold of much lower dimension. In machine learning, encoder-decoder pair based generative modelling approaches have been successful in learning complicated high-dimensional distributions such as those over images and texts by explicitly imposing the low-dimensional manifold structure. In this work, we introduce a new approach for estimating distributions on unknown submanifolds via mixtures of generative models. We show that conventional generative modeling approaches using a single encoder-decoder pair are generally unable to capture data distributions under the manifold hypothesis, unless the underlying manifold admits a global parametrization; however, this issue can be solved by using a collection of encoder-decoder pairs for learning different local patches of the data supporting manifold. A rigorous theoretical analysis is developed to demonstrate that the proposed estimator attains the minimax-optimal rate of convergence for the implicit estimation of data distributions with manifold structures. Our experiments show that, by utilizing parameter sharing, the proposed method can significantly improve the performance of conventional auto-encoder based generative modelling approaches with minimal additional computational efforts.

preprint2022arXiv

Cost-effective Land Cover Classification for Remote Sensing Images

Land cover maps are of vital importance to various fields such as land use policy development, ecosystem services, urban planning and agriculture monitoring, which are mainly generated from remote sensing image classification techniques. Traditional land cover classification usually needs tremendous computational resources, which often becomes a huge burden to the remote sensing community. Undoubtedly cloud computing is one of the best choices for land cover classification, however, if not managed properly, the computation cost on the cloud could be surprisingly high. Recently, cutting the unnecessary computation long tail has become a promising solution for saving the cost in the cloud. For land cover classification, it is generally not necessary to achieve the best accuracy and 85% can be regarded as a reliable land cover classification. Therefore, in this paper, we propose a framework for cost-effective remote sensing classification. Given the desired accuracy, the clustering algorithm can stop early for cost-saving whilst achieving sufficient accuracy for land cover image classification. Experimental results show that achieving 85%-99.9% accuracy needs only 27.34%-60.83% of the total cloud computation cost for achieving a 100% accuracy. To put it into perspective, for the US land cover classification example, the proposed approach can save over $1,593,490.18 for the government in each single-use when the desired accuracy is 90%.

preprint2022arXiv

High-Dimensional Linear Regression via Implicit Regularization

Many statistical estimators for high-dimensional linear regression are M-estimators, formed through minimizing a data-dependent square loss function plus a regularizer. This work considers a new class of estimators implicitly defined through a discretized gradient dynamic system under overparameterization. We show that under suitable restricted isometry conditions, overparameterization leads to implicit regularization: if we directly apply gradient descent to the residual sum of squares with sufficiently small initial values, then under some proper early stopping rule, the iterates converge to a nearly sparse rate-optimal solution that improves over explicitly regularized approaches. In particular, the resulting estimator does not suffer from extra bias due to explicit penalties, and can achieve the parametric root-n rate when the signal-to-noise ratio is sufficiently high. We also perform simulations to compare our methods with high dimensional linear regression with explicit regularization. Our results illustrate the advantages of using implicit regularization via gradient descent after overparameterization in sparse vector estimation.

preprint2022arXiv

Hypernetwork Dismantling via Deep Reinforcement Learning

Network dismantling aims to degrade the connectivity of a network by removing an optimal set of nodes. It has been widely adopted in many real-world applications such as epidemic control and rumor containment. However, conventional methods usually focus on simple network modeling with only pairwise interactions, while group-wise interactions modeled by hypernetwork are ubiquitous and critical. In this work, we formulate the hypernetwork dismantling problem as a node sequence decision problem and propose a deep reinforcement learning (DRL)-based hypernetwork dismantling framework. Besides, we design a novel inductive hypernetwork embedding method to ensure the transferability to various real-world hypernetworks. Our framework first generates small-scale synthetic hypernetworks and embeds the nodes and hypernetworks into a low dimensional vector space to represent the action and state space in DRL, respectively. Then trial-and-error dismantling tasks are conducted by an agent on these synthetic hypernetworks, and the dismantling strategy is continuously optimized. Finally, the well-optimized strategy is applied to real-world hypernetwork dismantling tasks. Experimental results on five real-world hypernetworks demonstrate the effectiveness of our proposed framework.

preprint2022arXiv

Learning Topic Models: Identifiability and Finite-Sample Analysis

Topic models provide a useful text-mining tool for learning, extracting, and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, lacking in the literature is a formal theoretical investigation of the statistical identifiability and accuracy of latent topic estimation. In this paper, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood that is naturally connected to the concept, in computational geometry, of volume minimization. Our theory introduces a new set of geometric conditions for topic model identifiability, conditions that are weaker than conventional separability conditions, which typically rely on the existence of pure topic documents or of anchor words. Weaker conditions allow a wider and thus potentially more fruitful investigation. We conduct finite-sample error analysis for the proposed estimator and discuss connections between our results and those of previous investigations. We conclude with empirical studies employing both simulated and real datasets.

preprint2022arXiv

Mean-Field Nonparametric Estimation of Interacting Particle Systems

This paper concerns the nonparametric estimation problem of the distribution-state dependent drift vector field in an interacting $N$-particle system. Observing single-trajectory data for each particle, we derive the mean-field rate of convergence for the maximum likelihood estimator (MLE), which depends on both Gaussian complexity and Rademacher complexity of the function class. In particular, when the function class contains $α$-smooth H{ö}lder functions, our rate of convergence is minimax optimal on the order of $N^{-\fracα{d+2α}}$. Combining with a Fourier analytical deconvolution argument, we derive the consistency of MLE for the external force and interaction kernel in the McKean-Vlasov equation.

preprint2022arXiv

Minimax Rate of Distribution Estimation on Unknown Submanifold under Adversarial Losses

Statistical inference from high-dimensional data with low-dimensional structures has recently attracted lots of attention. In machine learning, deep generative modeling approaches implicitly estimate distributions of complex objects by creating new samples from the underlying distribution, and have achieved great success in generating synthetic realistic-looking images and texts. A key step in these approaches is the extraction of latent features or representations (encoding) that can be used for accurately reconstructing the original data (decoding). In other words, low-dimensional manifold structure is implicitly assumed and utilized in the distribution modeling and estimation. To understand the benefit of low-dimensional manifold structure in generative modeling, we build a general minimax framework for distribution estimation on unknown submanifold under adversarial losses, with suitable smoothness assumptions on the target distribution and the manifold. The established minimax rate elucidates how various problem characteristics, including intrinsic dimensionality of the data and smoothness levels of the target distribution and the manifold, affect the fundamental limit of high-dimensional distribution estimation. To prove the minimax upper bound, we construct an estimator based on a mixture of locally fitted generative models, which is motivated by the partition of unity technique from differential geometry and is necessary to cover cases where the underlying data manifold does not admit a global parametrization. We also propose a data-driven adaptive estimator that is shown to simultaneously attain within a logarithmic factor of the optimal rate over a large collection of distribution classes.

preprint2022arXiv

N-Cloth: Predicting 3D Cloth Deformation with Mesh-Based Networks

We present a novel mesh-based learning approach (N-Cloth) for plausible 3D cloth deformation prediction. Our approach is general and can handle cloth or obstacles represented by triangle meshes with arbitrary topologies. We use graph convolution to transform the cloth and object meshes into a latent space to reduce the non-linearity in the mesh space. Our network can predict the target 3D cloth mesh deformation based on the initial state of the cloth mesh template and the target obstacle mesh. Our approach can handle complex cloth meshes with up to 100K triangles and scenes with various objects corresponding to SMPL humans, non-SMPL humans or rigid bodies. In practice, our approach can be used to generate plausible cloth simulation at 30-45 fps on an NVIDIA GeForce RTX 3090 GPU. We highlight its benefits over prior learning-based methods and physically-based cloth simulators.

preprint2022arXiv

Sketch-and-Lift: Scalable Subsampled Semidefinite Program for $K$-means Clustering

Semidefinite programming (SDP) is a powerful tool for tackling a wide range of computationally hard problems such as clustering. Despite the high accuracy, semidefinite programs are often too slow in practice with poor scalability on large (or even moderate) datasets. In this paper, we introduce a linear time complexity algorithm for approximating an SDP relaxed $K$-means clustering. The proposed sketch-and-lift (SL) approach solves an SDP on a subsampled dataset and then propagates the solution to all data points by a nearest-centroid rounding procedure. It is shown that the SL approach enjoys a similar exact recovery threshold as the $K$-means SDP on the full dataset, which is known to be information-theoretically tight under the Gaussian mixture model. The SL method can be made adaptive with enhanced theoretic properties when the cluster sizes are unbalanced. Our simulation experiments demonstrate that the statistical accuracy of the proposed method outperforms state-of-the-art fast clustering algorithms without sacrificing too much computational efficiency, and is comparable to the original $K$-means SDP with substantially reduced runtime.

preprint2021arXiv

Class Knowledge Overlay to Visual Feature Learning for Zero-Shot Image Classification

New categories can be discovered by transforming semantic features into synthesized visual features without corresponding training samples in zero-shot image classification. Although significant progress has been made in generating high-quality synthesized visual features using generative adversarial networks, guaranteeing semantic consistency between the semantic features and visual features remains very challenging. In this paper, we propose a novel zero-shot learning approach, GAN-CST, based on class knowledge to visual feature learning to tackle the problem. The approach consists of three parts, class knowledge overlay, semi-supervised learning and triplet loss. It applies class knowledge overlay (CKO) to obtain knowledge not only from the corresponding class but also from other classes that have the knowledge overlay. It ensures that the knowledge-to-visual learning process has adequate information to generate synthesized visual features. The approach also applies a semi-supervised learning process to re-train knowledge-to-visual model. It contributes to reinforcing synthesized visual features generation as well as new category prediction. We tabulate results on a number of benchmark datasets demonstrating that the proposed model delivers superior performance over state-of-the-art approaches.

preprint2021arXiv

Cross Knowledge-based Generative Zero-Shot Learning Approach with Taxonomy Regularization

Although zero-shot learning (ZSL) has an inferential capability of recognizing new classes that have never been seen before, it always faces two fundamental challenges of the cross modality and crossdomain challenges. In order to alleviate these problems, we develop a generative network-based ZSL approach equipped with the proposed Cross Knowledge Learning (CKL) scheme and Taxonomy Regularization (TR). In our approach, the semantic features are taken as inputs, and the output is the synthesized visual features generated from the corresponding semantic features. CKL enables more relevant semantic features to be trained for semantic-to-visual feature embedding in ZSL, while Taxonomy Regularization (TR) significantly improves the intersections with unseen images with more generalized visual features generated from generative network. Extensive experiments on several benchmark datasets (i.e., AwA1, AwA2, CUB, NAB and aPY) show that our approach is superior to these state-of-the-art methods in terms of ZSL image classification and retrieval.

preprint2021arXiv

Distributed Estimation for Principal Component Analysis: an Enlarged Eigenspace Analysis

The growing size of modern data sets brings many challenges to the existing statistical estimation approaches, which calls for new distributed methodologies. This paper studies distributed estimation for a fundamental statistical machine learning problem, principal component analysis (PCA). Despite the massive literature on top eigenvector estimation, much less is presented for the top-$L$-dim ($L>1$) eigenspace estimation, especially in a distributed manner. We propose a novel multi-round algorithm for constructing top-$L$-dim eigenspace for distributed data. Our algorithm takes advantage of shift-and-invert preconditioning and convex optimization. Our estimator is communication-efficient and achieves a fast convergence rate. In contrast to the existing divide-and-conquer algorithm, our approach has no restriction on the number of machines. Theoretically, the traditional Davis-Kahan theorem requires the explicit eigengap assumption to estimate the top-$L$-dim eigenspace. To abandon this eigengap assumption, we consider a new route in our analysis: instead of exactly identifying the top-$L$-dim eigenspace, we show that our estimator is able to cover the targeted top-$L$-dim population eigenspace. Our distributed algorithm can be applied to a wide range of statistical problems based on PCA, such as principal component regression and single index model. Finally, We provide simulation studies to demonstrate the performance of the proposed distributed estimator.

preprint2021arXiv

EdgeWorkflowReal: An Edge Computing based Workflow Execution Engine for Smart Systems

Current cloud-based smart systems suffer from weaknesses such as high response latency, limited network bandwidth and the restricted computing power of smart end devices which seriously affect the system's QoS (Quality of Service). Recently, given its advantages of low latency, high bandwidth and location awareness, edge computing has become a promising solution for smart systems. However, the development of edge computing based smart systems is a very challenging job for software developers who do not have the skills for the creation of edge computing environments. The management of edge computing resources and computing tasks is also very challenging. Workflow technology has been widely used in smart systems to automate task and resource management, but there does not yet exist a real-world deployable edge computing based workflow execution engine. To fill this gap, we present EdgeWorkflowReal, an edge computing based workflow execution engine for smart systems. EdgeWorkflowReal supports: 1) automatic creation of a real edge computing environment according to user settings; 2) visualized modelling of edge workflow applications; and 3) automatic deployment, monitoring and performance evaluation of edge workflow applications in a smart system.

preprint2021arXiv

Fast Statistical Leverage Score Approximation in Kernel Ridge Regression

Nyström approximation is a fast randomized method that rapidly solves kernel ridge regression (KRR) problems through sub-sampling the n-by-n empirical kernel matrix appearing in the objective function. However, the performance of such a sub-sampling method heavily relies on correctly estimating the statistical leverage scores for forming the sampling distribution, which can be as costly as solving the original KRR. In this work, we propose a linear time (modulo poly-log terms) algorithm to accurately approximate the statistical leverage scores in the stationary-kernel-based KRR with theoretical guarantees. Particularly, by analyzing the first-order condition of the KRR objective, we derive an analytic formula, which depends on both the input distribution and the spectral density of stationary kernels, for capturing the non-uniformity of the statistical leverage scores. Numerical experiments demonstrate that with the same prediction accuracy our method is orders of magnitude more efficient than existing methods in selecting the representative sub-samples in the Nyström approximation.

preprint2021arXiv

Multi-Knowledge Fusion for New Feature Generation in Generalized Zero-Shot Learning

Suffering from the semantic insufficiency and domain-shift problems, most of existing state-of-the-art methods fail to achieve satisfactory results for Zero-Shot Learning (ZSL). In order to alleviate these problems, we propose a novel generative ZSL method to learn more generalized features from multi-knowledge with continuously generated new semantics in semantic-to-visual embedding. In our approach, the proposed Multi-Knowledge Fusion Network (MKFNet) takes different semantic features from multi-knowledge as input, which enables more relevant semantic features to be trained for semantic-to-visual embedding, and finally generates more generalized visual features by adaptively fusing visual features from different knowledge domain. The proposed New Feature Generator (NFG) with adaptive genetic strategy is used to enrich semantic information on the one hand, and on the other hand it greatly improves the intersection of visual feature generated by MKFNet and unseen visual faetures. Empirically, we show that our approach can achieve significantly better performance compared to existing state-of-the-art methods on a large number of benchmarks for several ZSL tasks, including traditional ZSL, generalized ZSL and zero-shot retrieval.

preprint2020arXiv

Adversarial Camouflage: Hiding Physical-World Attacks with Natural Styles

Deep neural networks (DNNs) are known to be vulnerable to adversarial examples. Existing works have mostly focused on either digital adversarial examples created via small and imperceptible perturbations, or physical-world adversarial examples created with large and less realistic distortions that are easily identified by human observers. In this paper, we propose a novel approach, called Adversarial Camouflage (\emph{AdvCam}), to craft and camouflage physical-world adversarial examples into natural styles that appear legitimate to human observers. Specifically, \emph{AdvCam} transfers large adversarial perturbations into customized styles, which are then "hidden" on-target object or off-target background. Experimental evaluation shows that, in both digital and physical-world scenarios, adversarial examples crafted by \emph{AdvCam} are well camouflaged and highly stealthy, while remaining effective in fooling state-of-the-art DNN image classifiers. Hence, \emph{AdvCam} is a flexible approach that can help craft stealthy attacks to evaluate the robustness of DNNs. \emph{AdvCam} can also be used to protect private information from being detected by deep learning systems.

preprint2020arXiv

Diffusion $K$-means clustering on manifolds: provable exact recovery via semidefinite relaxations

We introduce the {\it diffusion $K$-means} clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion $K$-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. The diffusion $K$-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given the number of clusters, we propose a polynomial-time convex relaxation algorithm via the semidefinite programming (SDP) to solve the diffusion $K$-means. In addition, we also propose a nuclear norm regularized SDP that is adaptive to the number of clusters. In both cases, we show that exact recovery of the SDPs for diffusion $K$-means can be achieved under suitable between-cluster separability and within-cluster connectedness of the submanifolds, which together quantify the hardness of the manifold clustering problem. We further propose the {\it localized diffusion $K$-means} by using the local adaptive bandwidth estimated from the nearest neighbors. We show that exact recovery of the localized diffusion $K$-means is fully adaptive to the local probability density and geometric structures of the underlying submanifolds.

preprint2020arXiv

Entropy rigidity for 3D conservative Anosov flows and dispersing billiards

Given an integer $k \geq 5$, and a $C^k$ Anosov flow $Φ$ on some compact connected $3$-manifold preserving a smooth volume, we show that the measure of maximal entropy (MME) is the volume measure if and only if $Φ$ is $C^{k-\varepsilon}$-conjugate to an algebraic flow, for $\varepsilon>0$ arbitrarily small. Besides the rigidity, we also study the entropy flexibility, and show that the metric entropy with respect to the volume measure and the topological entropy of suspension flows over Anosov diffeomorphisms on the $2$-torus achieve all possible values subject to natural normalizations. Moreover, in the case of dispersing billiards, we show that if the measure of maximal entropy is the volume measure, then the Birkhoff Normal Form of regular periodic orbits with a homoclinic intersection is linear.

preprint2020arXiv

Hanson-Wright inequality in Hilbert spaces with application to $K$-means clustering for non-Euclidean data

We derive a dimension-free Hanson-Wright inequality for quadratic forms of independent sub-gaussian random variables in a separable Hilbert space. Our inequality is an infinite-dimensional generalization of the classical Hanson-Wright inequality for finite-dimensional Euclidean random vectors. We illustrate an application to the generalized $K$-means clustering problem for non-Euclidean data. Specifically, we establish the exponential rate of convergence for a semidefinite relaxation of the generalized $K$-means, which together with a simple rounding algorithm imply the exact recovery of the true clustering structure.

preprint2020arXiv

Hyperspectral Images Classification Based on Multi-scale Residual Network

Because hyperspectral remote sensing images contain a lot of redundant information and the data structure is highly non-linear, leading to low classification accuracy of traditional machine learning methods. The latest research shows that hyperspectral image classification based on deep convolutional neural network has high accuracy. However, when a small amount of data is used for training, the classification accuracy of deep learning methods is greatly reduced. In order to solve the problem of low classification accuracy of existing algorithms on small samples of hyperspectral images, a multi-scale residual network is proposed. The multi-scale extraction and fusion of spatial and spectral features is realized by adding a branch structure into the residual block and using convolution kernels of different sizes in the branch. The spatial and spectral information contained in hyperspectral images are fully utilized to improve the classification accuracy. In addition, in order to improve the speed and prevent overfitting, the model uses dynamic learning rate, BN and Dropout strategies. The experimental results show that the overall classification accuracy of this method is 99.07% and 99.96% respectively in the data set of Indian Pines and Pavia University, which is better than other algorithms.

preprint2020arXiv

MFL_COVID19: Quantifying Country-based Factors affecting Case Fatality Rate in Early Phase of COVID-19 Epidemic via Regularised Multi-task Feature Learning

Recent outbreak of COVID-19 has led a rapid global spread around the world. Many countries have implemented timely intensive suppression to minimize the infections, but resulted in high case fatality rate (CFR) due to critical demand of health resources. Other country-based factors such as sociocultural issues, ageing population etc., has also influenced practical effectiveness of taking interventions to improve morality in early phase. To better understand the relationship of these factors across different countries with COVID-19 CFR is of primary importance to prepare for potentially second wave of COVID-19 infections. In the paper, we propose a novel regularized multi-task learning based factor analysis approach for quantifying country-based factors affecting CFR in early phase of COVID-19 epidemic. We formulate the prediction of CFR progression as a ML regression problem with observed CFR and other countries-based factors. In this formulation, all CFR related factors were categorized into 6 sectors with 27 indicators. We proposed a hybrid feature selection method combining filter, wrapper and tree-based models to calibrate initial factors for a preliminary feature interaction. Then we adopted two typical single task model (Ridge and Lasso regression) and one state-of-the-art MTFL method (fused sparse group lasso) in our formulation. The fused sparse group Lasso (FSGL) method allows the simultaneous selection of a common set of country-based factors for multiple time points of COVID-19 epidemic and also enables incorporating temporal smoothness of each factor over the whole early phase period. Finally, we proposed one novel temporal voting feature selection scheme to balance the weight instability of multiple factors in our MTFL model.