Researcher profile

Yuya Sasaki

Yuya Sasaki contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
22works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

22 published item(s)

preprint2026arXiv

Extremal Quantiles under Two-Way Clustering

This paper studies extremal quantiles under two-way clustered dependence. We show that the limiting distribution of unconditional intermediate-order tail quantiles is Gaussian. This result is notable because two-way clustering typically leads to non-Gaussian limiting behavior. Remarkably, extremal quantiles remain asymptotically Gaussian even in degenerate cases. Building on this insight, we extend our analysis to extremal quantile regression at intermediate orders. Simulation results corroborate our theoretical findings. Finally, we provide an empirical application to growth-at-risk, showing that earlier empirical conclusions remain robust even after accounting for two-way clustered dependence in panel data and the focus on extreme quantiles.

preprint2026arXiv

High-Dimensional Tail Index Regression

Motivated by the empirical observation of power-law distributions in the credits (e.g., ``likes'') of viral posts in social media, we introduce a high-dimensional tail index regression model and propose methods for estimation and inference of its parameters. First, we propose a regularized estimator, establish its consistency, and derive its convergence rate. Second, we debias the regularized estimator to facilitate inference and prove its asymptotic normality. Simulation studies corroborate our theoretical findings. We apply these methods to the text analysis of viral posts on X (formerly Twitter).

preprint2022arXiv

An Empirical Study of Personalized Federated Learning

Federated learning is a distributed machine learning approach in which a single server and multiple clients collaboratively build machine learning models without sharing datasets on clients. A challenging issue of federated learning is data heterogeneity (i.e., data distributions may differ across clients). To cope with this issue, numerous federated learning methods aim at personalized federated learning and build optimized models for clients. Whereas existing studies empirically evaluated their own methods, the experimental settings (e.g., comparison methods, datasets, and client setting) in these studies differ from each other, and it is unclear which personalized federate learning method achieves the best performance and how much progress can be made by using these methods instead of standard (i.e., non-personalized) federated learning. In this paper, we benchmark the performance of existing personalized federated learning through comprehensive experiments to evaluate the characteristics of each method. Our experimental study shows that (1) there are no champion methods, (2) large data heterogeneity often leads to high accurate predictions, and (3) standard federated learning methods (e.g. FedAvg) with fine-tuning often outperform personalized federated learning methods. We open our benchmark tool FedBench for researchers to conduct experimental studies with various experimental settings.

preprint2022arXiv

Capital and Labor Income Pareto Exponents in the United States, 1916-2019

Accurately estimating income Pareto exponents is challenging due to limitations in data availability and the applicability of statistical methods. Using tabulated summaries of incomes from tax authorities and a recent estimation method, we estimate income Pareto exponents in U.S. for 1916-2019. We find that during the past three decades, the capital and labor income Pareto exponents have been stable at around 1.2 and 2. Our findings suggest that the top tail income and wealth inequality is higher and wealthy agents have twice as large an impact on the aggregate economy than previously thought but there is no clear trend post-1985.

preprint2022arXiv

Estimation of Average Derivatives of Latent Regressors: With an Application to Inference on Buffer-Stock Saving

This paper proposes a density-weighted average derivative estimator based on two noisy measures of a latent regressor. Both measures have classical errors with possibly asymmetric distributions. We show that the proposed estimator achieves the root-n rate of convergence, and derive its asymptotic normal distribution for statistical inference. Simulation studies demonstrate excellent small-sample performance supporting the root-n asymptotic normality. Based on the proposed estimator, we construct a formal test on the sub-unity of the marginal propensity to consume out of permanent income (MPCP) under a nonparametric consumption model and a permanent-transitory model of income dynamics with nonparametric distribution. Applying the test to four recent waves of U.S. Panel Study of Income Dynamics (PSID), we reject the null hypothesis of the unit MPCP in favor of a sub-unit MPCP, supporting the buffer-stock model of saving.

preprint2022arXiv

Fixed-k Tail Regression: New Evidence on Tax and Wealth Inequality from Forbes 400

We develop a novel fixed-k tail regression method that accommodates the unique feature in the Forbes 400 data that observations are truncated from below at the 400th largest order statistic. Applying this method, we find that higher maximum marginal income tax rates induce higher wealth Pareto exponents. Setting the maximum tax rate to 30-40% (as in U.S. currently) leads to a Pareto exponent of 1.5-1.8, while counterfactually setting it to 80% (as suggested by Piketty, 2014) would lead to a Pareto exponent of 2.6. We present a simple economic model that explains these findings and discuss the welfare implications of taxation.

preprint2022arXiv

GNN Transformation Framework for Improving Efficiency and Scalability

We propose a framework that automatically transforms non-scalable GNNs into precomputation-based GNNs which are efficient and scalable for large-scale graphs. The advantages of our framework are two-fold; 1) it transforms various non-scalable GNNs to scale well to large-scale graphs by separating local feature aggregation from weight learning in their graph convolution, 2) it efficiently executes precomputation on GPU for large-scale graphs by decomposing their edges into small disjoint and balanced sets. Through extensive experiments with large-scale graphs, we demonstrate that the transformed GNNs run faster in training time than existing GNNs while achieving competitive accuracy to the state-of-the-art GNNs. Consequently, our transformation framework provides simple and efficient baselines for future research on scalable GNNs.

preprint2022arXiv

Inference in high-dimensional regression models without the exact or $L^p$ sparsity

This paper proposes a new method of inference in high-dimensional regression models and high-dimensional IV regression models. Estimation is based on a combined use of the orthogonal greedy algorithm, high-dimensional Akaike information criterion, and double/debiased machine learning. The method of inference for any low-dimensional subvector of high-dimensional parameters is based on a root-$N$ asymptotic normality, which is shown to hold without requiring the exact sparsity condition or the $L^p$ sparsity condition. Simulation studies demonstrate superior finite-sample performance of this proposed method over those based on the LASSO or the random forest, especially under less sparse models. We illustrate an application to production analysis with a panel of Chilean firms.

preprint2022arXiv

Predicting Parking Lot Availability by Graph-to-Sequence Model: A Case Study with SmartSantander

Nowadays, so as to improve services and urban areas livability, multiple smart city initiatives are being carried out throughout the world. SmartSantander is a smart city project in Santander, Spain, which has relied on wireless sensor network technologies to deploy heterogeneous sensors within the city to measure multiple parameters, including outdoor parking information. In this paper, we study the prediction of parking lot availability using historical data from more than 300 outdoor parking sensors with SmartSantander. We design a graph-to-sequence model to capture the periodical fluctuation and geographical proximity of parking lots. For developing and evaluating our model, we use a 3-year dataset of parking lot availability in the city of Santander. Our model achieves a high accuracy compared with existing sequence-to-sequence models, which is accurate enough to provide a parking information service in the city. We apply our model to a smartphone application to be widely used by citizens and tourists.

preprint2022arXiv

Scaling Private Deep Learning with Low-Rank and Sparse Gradients

Applying Differentially Private Stochastic Gradient Descent (DPSGD) to training modern, large-scale neural networks such as transformer-based models is a challenging task, as the magnitude of noise added to the gradients at each iteration scales with model dimension, hindering the learning capability significantly. We propose a unified framework, $\textsf{LSG}$, that fully exploits the low-rank and sparse structure of neural networks to reduce the dimension of gradient updates, and hence alleviate the negative impacts of DPSGD. The gradient updates are first approximated with a pair of low-rank matrices. Then, a novel strategy is utilized to sparsify the gradients, resulting in low-dimensional, less noisy updates that are yet capable of retaining the performance of neural networks. Empirical evaluation on natural language processing and computer vision tasks shows that our method outperforms other state-of-the-art baselines.

preprint2022arXiv

Similarity Search on Computational Notebooks

Computational notebook software such as Jupyter Notebook is popular for data science tasks. Numerous computational notebooks are available on the Web and reusable; however, searching for computational notebooks manually is a tedious task, and so far, there are no tools to search for computational notebooks effectively and efficiently. In this paper, we propose a similarity search on computational notebooks and develop a new framework for the similarity search. Given contents (i.e., source codes, tabular data, libraries, and outputs formats) in computational notebooks as a query, the similarity search problem aims to find top-k computational notebooks with the most similar contents. We define two similarity measures; set-based and graph-based similarities. Set-based similarity handles each content independently, while graph-based similarity captures the relationships between contents. Our framework can effectively prune the candidates of computational notebooks that should not be in the top-k results. Furthermore, we develop optimization techniques such as caching and indexing to accelerate the search. Experiments using Kaggle notebooks show that our method, in particular graph-based similarity, can achieve high accuracy and high efficiency.

preprint2022arXiv

Unconditional Quantile Regression with High Dimensional Data

This paper considers estimation and inference for heterogeneous counterfactual effects with high-dimensional data. We propose a novel robust score for debiased estimation of the unconditional quantile regression (Firpo, Fortin, and Lemieux, 2009) as a measure of heterogeneous counterfactual marginal effects. We propose a multiplier bootstrap inference and develop asymptotic theories to guarantee the size control in large sample. Simulation studies support our theories. Applying the proposed method to Job Corps survey data, we find that a policy which counterfactually extends the duration of exposures to the Job Corps training program will be effective especially for the targeted subpopulations of lower potential wage earners.

preprint2021arXiv

Estimation and Inference for Moments of Ratios with Robustness against Large Trimming Bias

Empirical researchers often trim observations with small denominator A when they estimate moments of the form E[B/A]. Large trimming is a common practice to mitigate variance, but it incurs large trimming bias. This paper provides a novel method of correcting large trimming bias. If a researcher is willing to assume that the joint distribution between A and B is smooth, then a large trimming bias may be estimated well. With the bias correction, we also develop a valid and robust inference result for E[B/A].

preprint2021arXiv

Linear programming approach to nonparametric inference under shape restrictions: with an application to regression kink designs

We develop a novel method of constructing confidence bands for nonparametric regression functions under shape constraints. This method can be implemented via a linear programming, and it is thus computationally appealing. We illustrate a usage of our proposed method with an application to the regression kink design (RKD). Econometric analyses based on the RKD often suffer from wide confidence intervals due to slow convergence rates of nonparametric derivative estimators. We demonstrate that economic models and structures motivate shape restrictions, which in turn contribute to shrinking the confidence interval for an analysis of the causal effects of unemployment insurance benefits on unemployment durations.

preprint2021arXiv

Slow Movers in Panel Data

Panel data often contain stayers (units with no within-variations) and slow movers (units with little within-variations). In the presence of many slow movers, conventional econometric methods can fail to work. We propose a novel method of robust inference for the average partial effects in correlated random coefficient models robustly across various distributions of within-variations, including the cases with many stayers and/or many slow movers in a unified manner. In addition to this robustness property, our proposed method entails smaller biases and hence improves accuracy in inference compared to existing alternatives. Simulation studies demonstrate our theoretical claims about these properties: the conventional 95% confidence interval covers the true parameter value with 37-93% frequencies, whereas our proposed one achieves 93-96% coverage frequencies.

preprint2020arXiv

Efficient Network Reliability Computation in Uncertain Graphs

Network reliability is an important metric to evaluate the connectivity among given vertices in uncertain graphs. Since the network reliability problem is known as #P-complete, existing studies have used approximation techniques. In this paper, we propose a new sampling-based approach that efficiently and accurately approximates network reliability. Our approach improves efficiency by reducing the number of samples based on stratified sampling. We theoretically guarantee that our approach improves the accuracy of approximation by using lower and upper bounds of network reliability, even though it reduces the number of samples. To efficiently compute the bounds, we develop an extended BDD, called S2BDD. During constructing the S2BDD, our approach employs dynamic programming for efficiently sampling possible graphs. Our experiment with real datasets demonstrates that our approach is up to 51.2 times faster than the existing sampling-based approach with higher accuracy.

preprint2020arXiv

Estimation and Inference for Policy Relevant Treatment Effects

The policy relevant treatment effect (PRTE) measures the average effect of switching from a status-quo policy to a counterfactual policy. Estimation of the PRTE involves estimation of multiple preliminary parameters, including propensity scores, conditional expectation functions of the outcome and covariates given the propensity score, and marginal treatment effects. These preliminary estimators can affect the asymptotic distribution of the PRTE estimator in complicated and intractable manners. In this light, we propose an orthogonal score for double debiased estimation of the PRTE, whereby the asymptotic distribution of the PRTE estimator is obtained without any influence of preliminary parameter estimators as far as they satisfy mild requirements of convergence rates. To our knowledge, this paper is the first to develop limit distribution theories for inference about the PRTE.

preprint2020arXiv

Fixed-k Inference for Conditional Extremal Quantiles

We develop a new extreme value theory for repeated cross-sectional and panel data to construct asymptotically valid confidence intervals (CIs) for conditional extremal quantiles from a fixed number $k$ of nearest-neighbor tail observations. As a by-product, we also construct CIs for extremal quantiles of coefficients in linear random coefficient models. For any fixed $k$, the CIs are uniformly valid without parametric assumptions over a set of nonparametric data generating processes associated with various tail indices. Simulation studies show that our CIs exhibit superior small-sample coverage and length properties than alternative nonparametric methods based on asymptotic normality. Applying the proposed method to Natality Vital Statistics, we study factors of extremely low birth weights. We find that signs of major effects are the same as those found in preceding studies based on parametric models, but with different magnitudes.

preprint2020arXiv

Multiway Cluster Robust Double/Debiased Machine Learning

This paper investigates double/debiased machine learning (DML) under multiway clustered sampling environments. We propose a novel multiway cross fitting algorithm and a multiway DML estimator based on this algorithm. We also develop a multiway cluster robust standard error formula. Simulations indicate that the proposed procedure has favorable finite sample performance. Applying the proposed method to market share data for demand analysis, we obtain larger two-way cluster robust standard errors than non-robust ones.

preprint2020arXiv

Quantile Regression with Interval Data

This paper investigates the identification of quantiles and quantile regression parameters when observations are set valued. We define the identification set of quantiles of random sets in a way that extends the definition of quantiles for regular random variables. We then give sharp characterization of this set by extending concepts from random set theory. For quantile regression parameters, we show that the identification set is characterized by a system of conditional moment inequalities. This characterization extends that of parametric quantile regression for regular random variables. Estimation and inference theories are developed for continuous cases, discrete cases, nonparametric conditional quantiles, and parametric quantile regressions. A fast computational algorithm of set linear programming is proposed. Monte Carlo experiments support our theoretical properties.

preprint2020arXiv

Sequenced Route Query with Semantic Hierarchy

The trip planning query searches for preferred routes starting from a given point through multiple Point-of-Interests (PoI) that match user requirements. Although previous studies have investigated trip planning queries, they lack flexibility for finding routes because all of them output routes that strictly match user requirements. We study trip planning queries that output multiple routes in a flexible manner. We propose a new type of query called skyline sequenced route (SkySR) query, which searches for all preferred sequenced routes to users by extending the shortest route search with the semantic similarity of PoIs in the route. Flexibility is achieved by the {\it semantic hierarchy} of the PoI category. We propose an efficient algorithm for the SkySR query, bulk SkySR algorithm that simultaneously searches for sequenced routes and prunes unnecessary routes effectively. Experimental evaluations show that the proposed approach significantly outperforms the existing approaches in terms of response time (up to four orders of magnitude). Moreover, we develop a prototype service that uses the SkySR query, and conduct a user test to evaluate its usefulness.

preprint2020arXiv

Testing Finite Moment Conditions for the Consistency and the Root-N Asymptotic Normality of the GMM and M Estimators

Common approaches to inference for structural and reduced-form parameters in empirical economic analysis are based on the consistency and the root-n asymptotic normality of the GMM and M estimators. The canonical consistency (respectively, root-n asymptotic normality) for these classes of estimators requires at least the first (respectively, second) moment of the score to be finite. In this article, we present a method of testing these conditions for the consistency and the root-n asymptotic normality of the GMM and M estimators. The proposed test controls size nearly uniformly over the set of data generating processes that are compatible with the null hypothesis. Simulation studies support this theoretical result. Applying the proposed test to the market share data from the Dominick's Finer Foods retail chain, we find that a common \textit{ad hoc} procedure to deal with zero market shares in analysis of differentiated products markets results in a failure to satisfy the conditions for both the consistency and the root-n asymptotic normality.