Source author record

Yuhong Yang

Yuhong Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Methodology q-fin.EC Applications Computer Vision eess.AS eess.SP math.CO Robotics Sound

Catalog footprint

What is connected

15works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Selector-Enhancer: Learning Dynamic Selection of Local and Non-local Attention Operation for Speech Enhancement

Attention mechanisms, such as local and non-local attention, play a fundamental role in recent deep learning based speech enhancement (SE) systems. However, natural speech contains many fast-changing and relatively brief acoustic events, therefore, capturing the most informative speech features by indiscriminately using local and non-local attention is challenged. We observe that the noise type and speech feature vary within a sequence of speech and the local and non-local operations can respectively extract different features from corrupted speech. To leverage this, we propose Selector-Enhancer, a dual-attention based convolution neural network (CNN) with a feature-filter that can dynamically select regions from low-resolution speech features and feed them to local or non-local attention operations. In particular, the proposed feature-filter is trained by using reinforcement learning (RL) with a developed difficulty-regulated reward that is related to network performance, model complexity, and "the difficulty of the SE task". The results show that our method achieves comparable or superior performance to existing approaches. In particular, Selector-Enhancer is potentially effective for real-world denoising, where the number and types of noise are varies on a single noisy mixture.

preprint2022arXiv

Combining Predictions of Auto Insurance Claims

This paper aims to better predict highly skewed auto insurance claims by combining candidate predictions. We analyze a version of the Kangaroo Auto Insurance company data and study the effects of combining different methods using five measures of prediction accuracy. The results show the following. First, when there is an outstanding (in terms of Gini Index) prediction among the candidates, the "forecast combination puzzle" phenomenon disappears. The simple average method performs much worse than the more sophisticated model combination methods, indicating that combining different methods could help us avoid performance degradation. Second, the choice of the prediction accuracy measure is crucial in defining the best candidate prediction for "low frequency and high severity" (LFHS) data. For example, mean square error (MSE) does not distinguish well between model combination methods, as the values are close. Third, the performances of different model combination methods can differ drastically. We propose using a new model combination method, named ARM-Tweedie, for such LFHS data; it benefits from an optimal rate of convergence and exhibits a desirable performance in several measures for the Kangaroo data. Fourth, overall, model combination methods improve the prediction accuracy for auto insurance claim costs. In particular, Adaptive Regression by Mixing (ARM), ARM-Tweedie, and constrained Linear Regression can improve forecast performance when there are only weak learners or when no dominant learner exists.

preprint2022arXiv

Is a Classification Procedure Good Enough? A Goodness-of-Fit Assessment Tool for Classification Learning

In recent years, many non-traditional classification methods, such as Random Forest, Boosting, and neural network, have been widely used in applications. Their performance is typically measured in terms of classification accuracy. While the classification error rate and the like are important, they do not address a fundamental question: Is the classification method underfitted? To our best knowledge, there is no existing method that can assess the goodness-of-fit of a general classification procedure. Indeed, the lack of a parametric assumption makes it challenging to construct proper tests. To overcome this difficulty, we propose a methodology called BAGofT that splits the data into a training set and a validation set. First, the classification procedure to assess is applied to the training set, which is also used to adaptively find a data grouping that reveals the most severe regions of underfitting. Then, based on this grouping, we calculate a test statistic by comparing the estimated success probabilities and the actual observed responses from the validation set. The data splitting guarantees that the size of the test is controlled under the null hypothesis, and the power of the test goes to one as the sample size increases under the alternative hypothesis. For testing parametric classification models, the BAGofT has a broader scope than the existing methods since it is not restricted to specific parametric models (e.g., logistic regression). Extensive simulation studies show the utility of the BAGofT when assessing general classification procedures and its strengths over some existing methods when testing parametric classification models.

preprint2022arXiv

On joins of a clique and a co-clique as star complements in regular graphs

In this paper we consider $r$-regular graphs $G$ that admit the vertex set partition such that one of the induced subgraphs is the join of an $s$-vertex clique and a $t$-vertex co-clique and represents a star complement for an eigenvalue $μ$ of $G$. The cases in which one of the parameters $s, t$ is less than 2 or $μ=r$ are already resolved. It is conjectured in [J. Wang, X. Yuan, L. Liu, Regular graphs with a prescribed complete multipartite graph as a star complement, Linear Algebra Appl.~579 (2019) 302--319] that if $s, t\geq 2$ and $μ\neq r$, then $μ=-2, t=2$ and $G=\overline{(s+1)K_2}$. For $μ=-t$ we verify this conjecture to be true. We further study the case in which $μ\neq-t$ and confirm the conjecture provided $t^2-4μ^2t-4μ^3=0$. For the remaining possibility we determine the structure of a putative counterexample and relate its existence to the existence of a particular 2-class block design. It occurs that the smallest counterexample would have 1265 vertices.

preprint2022arXiv

Targeted Cross-Validation

In many applications, we have access to the complete dataset but are only interested in the prediction of a particular region of predictor variables. A standard approach is to find the globally best modeling method from a set of candidate methods. However, it is perhaps rare in reality that one candidate method is uniformly better than the others. A natural approach for this scenario is to apply a weighted $L_2$ loss in performance assessment to reflect the region-specific interest. We propose a targeted cross-validation (TCV) to select models or procedures based on a general weighted $L_2$ loss. We show that the TCV is consistent in selecting the best performing candidate under the weighted $L_2$ loss. Experimental studies are used to demonstrate the use of TCV and its potential advantage over the global CV or the approach of using only local data for modeling a local region. Previous investigations on CV have relied on the condition that when the sample size is large enough, the ranking of two candidates stays the same. However, in many applications with the setup of changing data-generating processes or highly adaptive modeling methods, the relative performance of the methods is not static as the sample size varies. Even with a fixed data-generating process, it is possible that the ranking of two methods switches infinitely many times. In this work, we broaden the concept of the selection consistency by allowing the best candidate to switch as the sample size varies, and then establish the consistency of the TCV. This flexible framework can be applied to high-dimensional and complex machine learning scenarios where the relative performances of modeling procedures are dynamic.

preprint2021arXiv

When Face Recognition Meets Occlusion: A New Benchmark

The existing face recognition datasets usually lack occlusion samples, which hinders the development of face recognition. Especially during the COVID-19 coronavirus epidemic, wearing a mask has become an effective means of preventing the virus spread. Traditional CNN-based face recognition models trained on existing datasets are almost ineffective for heavy occlusion. To this end, we pioneer a simulated occlusion face recognition dataset. In particular, we first collect a variety of glasses and masks as occlusion, and randomly combine the occlusion attributes (occlusion objects, textures,and colors) to achieve a large number of more realistic occlusion types. We then cover them in the proper position of the face image with the normal occlusion habit. Furthermore, we reasonably combine original normal face images and occluded face images to form our final dataset, termed as Webface-OCC. It covers 804,704 face images of 10,575 subjects, with diverse occlusion types to ensure its diversity and stability. Extensive experiments on public datasets show that the ArcFace retrained by our dataset significantly outperforms the state-of-the-arts. Webface-OCC is available at https://github.com/Baojin-Huang/Webface-OCC.

preprint2020arXiv

Error Model of Radio Fingerprint and PDR Fusion Indoor Localization

Multi-source fusion positioning is one of the technical frameworks for obtaining sufficient indoor positioning accuracy. In order to evaluate the effect of multi-source fusion positioning, it is necessary to establish a fusion error model. In this paper, we first use the least squares method to fuse the radio fingerprint and the PDR positioning, and then apply the variance propagation laws to calculate the error distribution of indoor multi-source localization methods. Based on the fusion error model, we developed an indoor positioning simulation system. The system can give a better positioning source layout scheme under a given condition, and can evaluate the signal strength distribution and the error distribution.

preprint2020arXiv

To update or not to update? Delayed Nonparametric Bandits with Randomized Allocation

Delayed rewards problem in contextual bandits has been of interest in various practical settings. We study randomized allocation strategies and provide an understanding on how the exploration-exploitation tradeoff is affected by delays in observing the rewards. In randomized strategies, the extent of exploration-exploitation is controlled by a user-determined exploration probability sequence. In the presence of delayed rewards, one may choose between using the original exploration sequence that updates at every time point or update the sequence only when a new reward is observed, leading to two competing strategies. In this work, we show that while both strategies may lead to strong consistency in allocation, the property holds for a wider scope of situations for the latter. However, for finite sample performance, we illustrate that both strategies have their own advantages and disadvantages, depending on the severity of the delay and underlying reward generating mechanisms.

preprint2016arXiv

Bridging AIC and BIC: a new criterion for autoregression

We introduce a new criterion to determine the order of an autoregressive model fitted to time series data. It has the benefits of the two well-known model selection techniques, the Akaike information criterion and the Bayesian information criterion. When the data is generated from a finite order autoregression, the Bayesian information criterion is known to be consistent, and so is the new criterion. When the true order is infinity or suitably high with respect to the sample size, the Akaike information criterion is known to be efficient in the sense that its prediction performance is asymptotically equivalent to the best offered by the candidate models; in this case, the new criterion behaves in a similar manner. Different from the two classical criteria, the proposed criterion adaptively achieves either consistency or efficiency depending on the underlying true model. In practice where the observed time series is given without any prior information about the model specification, the proposed order selection criterion is more flexible and robust compared with classical approaches. Numerical results are presented demonstrating the adaptivity of the proposed technique when applied to various datasets.

preprint2016arXiv

Sparsity Oriented Importance Learning for High-dimensional Linear Regression

With now well-recognized non-negligible model selection uncertainty, data analysts should no longer be satisfied with the output of a single final model from a model selection process, regardless of its sophistication. To improve reliability and reproducibility in model choice, one constructive approach is to make good use of a sound variable importance measure. Although interesting importance measures are available and increasingly used in data analysis, little theoretical justification has been done. In this paper, we propose a new variable importance measure, sparsity oriented importance learning (SOIL), for high-dimensional regression from a sparse linear modeling perspective by taking into account the variable selection uncertainty via the use of a sensible model weighting. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. In particular, even if the signal is weak, SOIL rarely gives variables not in the true model significantly higher important values than those in the true model. Extensive simulations in several illustrative settings and real data examples with guided simulations show desirable properties of the SOIL importance in contrast to other importance measures.

preprint2015arXiv

Forecast Combination Under Heavy-Tailed Errors

Forecast combination has been proven to be a very important technique to obtain accurate predictions. In many applications, forecast errors exhibit heavy tail behaviors for various reasons. Unfortunately, to our knowledge, little has been done to deal with forecast combination for such situations. The familiar forecast combination methods such as simple average, least squares regression, or those based on variance-covariance of the forecasts, may perform very poorly. In this paper, we propose two nonparametric forecast combination methods to address the problem. One is specially proposed for the situations that the forecast errors are strongly believed to have heavy tails that can be modeled by a scaled Student's t-distribution; the other is designed for relatively more general situations when there is a lack of strong or consistent evidence on the tail behaviors of the forecast errors due to shortage of data and/or evolving data generating process. Adaptive risk bounds of both methods are developed. Simulations and a real example show superior performance of the new methods.

preprint2015arXiv

On the Forecast Combination Puzzle

It is often reported in forecast combination literature that a simple average of candidate forecasts is more robust than sophisticated combining methods. This phenomenon is usually referred to as the "forecast combination puzzle". Motivated by this puzzle, we explore its possible explanations including estimation error, invalid weighting formulas and model screening. We show that existing understanding of the puzzle should be complemented by the distinction of different forecast combination scenarios known as combining for adaptation and combining for improvement. Applying combining methods without consideration of the underlying scenario can itself cause the puzzle. Based on our new understandings, both simulations and real data evaluations are conducted to illustrate the causes of the puzzle. We further propose a multi-level AFTER strategy that can integrate the strengths of different combining methods and adapt intelligently to the underlying scenario. In particular, by treating the simple average as a candidate forecast, the proposed strategy is shown to avoid the heavy cost of estimation error and, to a large extent, solve the forecast combination puzzle.

preprint2012arXiv

Adaptive Minimax Estimation over Sparse $\ell_q$-Hulls

Given a dictionary of $M_n$ initial estimates of the unknown true regression function, we aim to construct linearly aggregated estimators that target the best performance among all the linear combinations under a sparse $q$-norm ($0 \leq q \leq 1$) constraint on the linear coefficients. Besides identifying the optimal rates of aggregation for these $\ell_q$-aggregation problems, our multi-directional (or universal) aggregation strategies by model mixing or model selection achieve the optimal rates simultaneously over the full range of $0\leq q \leq 1$ for general $M_n$ and upper bound $t_n$ of the $q$-norm. Both random and fixed designs, with known or unknown error variance, are handled, and the $\ell_q$-aggregations examined in this work cover major types of aggregation problems previously studied in the literature. Consequences on minimax-rate adaptive regression under $\ell_q$-constrained true coefficients ($0 \leq q \leq 1$) are also provided. Our results show that the minimax rate of $\ell_q$-aggregation ($0 \leq q \leq 1$) is basically determined by an effective model size, which is a sparsity index that depends on $q$, $t_n$, $M_n$, and the sample size $n$ in an easily interpretable way based on a classical model selection theory that deals with a large number of models. In addition, in the fixed design case, the model selection approach is seen to yield optimal rates of convergence not only in expectation but also with exponential decay of deviation probability. In contrast, the model mixing approach can have leading constant one in front of the target risk in the oracle inequality while not offering optimality in deviation probability.

preprint2012arXiv

Parametric or nonparametric? A parametricness index for model selection

In model selection literature, two classes of criteria perform well asymptotically in different situations: Bayesian information criterion (BIC) (as a representative) is consistent in selection when the true model is finite dimensional (parametric scenario); Akaike's information criterion (AIC) performs well in an asymptotic efficiency when the true model is infinite dimensional (nonparametric scenario). But there is little work that addresses if it is possible and how to detect the situation that a specific model selection problem is in. In this work, we differentiate the two scenarios theoretically under some conditions. We develop a measure, parametricness index (PI), to assess whether a model selected by a potentially consistent procedure can be practically treated as the true model, which also hints on AIC or BIC is better suited for the data for the goal of estimating the regression function. A consequence is that by switching between AIC and BIC based on the PI, the resulting regression estimator is simultaneously asymptotically efficient for both parametric and nonparametric scenarios. In addition, we systematically investigate the behaviors of PI in simulation and real data and show its usefulness.

preprint2010arXiv

Maximum L$q$-likelihood estimation

In this paper, the maximum L$q$-likelihood estimator (ML$q$E), a new parameter estimator based on nonextensive entropy [Kibernetika 3 (1967) 30--35] is introduced. The properties of the ML$q$E are studied via asymptotic analysis and computer simulations. The behavior of the ML$q$E is characterized by the degree of distortion $q$ applied to the assumed model. When $q$ is properly chosen for small and moderate sample sizes, the ML$q$E can successfully trade bias for precision, resulting in a substantial reduction of the mean squared error. When the sample size is large and $q$ tends to 1, a necessary and sufficient condition to ensure a proper asymptotic normality and efficiency of ML$q$E is established.

Yuhong Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Selector-Enhancer: Learning Dynamic Selection of Local and Non-local Attention Operation for Speech Enhancement

Combining Predictions of Auto Insurance Claims

Is a Classification Procedure Good Enough? A Goodness-of-Fit Assessment Tool for Classification Learning

On joins of a clique and a co-clique as star complements in regular graphs

Targeted Cross-Validation

When Face Recognition Meets Occlusion: A New Benchmark

Error Model of Radio Fingerprint and PDR Fusion Indoor Localization

To update or not to update? Delayed Nonparametric Bandits with Randomized Allocation

Bridging AIC and BIC: a new criterion for autoregression

Sparsity Oriented Importance Learning for High-dimensional Linear Regression

Forecast Combination Under Heavy-Tailed Errors

On the Forecast Combination Puzzle

Adaptive Minimax Estimation over Sparse $\ell_q$-Hulls

Parametric or nonparametric? A parametricness index for model selection

Maximum L$q$-likelihood estimation