Researcher profile

Kuan Wang

Kuan Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2020arXiv

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To deal with the larger design space it brings, a promising approach is to train a quantization-aware accuracy predictor to quickly get the accuracy of the quantized model and feed it to the search engine to select the best fit. However, training this quantization-aware accuracy predictor requires collecting a large number of quantized <model, accuracy> pairs, which involves quantization-aware finetuning and thus is highly time-consuming. To tackle this challenge, we propose to transfer the knowledge from a full-precision (i.e., fp32) accuracy predictor to the quantization-aware (i.e., int8) accuracy predictor, which greatly improves the sample efficiency. Besides, collecting the dataset for the fp32 accuracy predictor only requires to evaluate neural networks without any training cost by sampling from a pretrained once-for-all network, which is highly efficient. Extensive experiments on ImageNet demonstrate the benefits of our joint optimization approach. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ. Compared to the separate optimization approach (ProxylessNAS+AMC+HAQ), APQ achieves 2.3% higher ImageNet accuracy while reducing orders of magnitude GPU hours and CO2 emission, pushing the frontier for green AI that is environmental-friendly. The code and video are publicly available.

preprint2020arXiv

Hardware-Centric AutoML for Mixed-Precision Quantization

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator&#39;s feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy, and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design.

preprint2019arXiv

How to Optimally Constrain Galaxy Assembly Bias: Supplement Projected Correlation Functions with Count-in-cells Statistics

Most models for the connection between galaxies and their haloes ignore the possibility that galaxy properties may be correlated with halo properties other than mass, a phenomenon known as galaxy assembly bias. Yet, it is known that such correlations can lead to systematic errors in the interpretation of survey data. At present, the degree to which galaxy assembly bias may be present in the real Universe, and the best strategies for constraining it remain uncertain. We study the ability of several observables to constrain galaxy assembly bias from redshift survey data using the decorated halo occupation distribution (dHOD), an empirical model of the galaxy--halo connection that incorporates assembly bias. We cover an expansive set of observables, including the projected two-point correlation function $w_{\mathrm{p}}(r_{\mathrm{p}})$, the galaxy--galaxy lensing signal $ΔΣ(r_{\mathrm{p}})$, the void probability function $\mathrm{VPF}(r)$, the distributions of counts-in-cylinders $P(N_{\mathrm{CIC}})$, and counts-in-annuli $P(N_{\mathrm{CIA}})$, and the distribution of the ratio of counts in cylinders of different sizes $P(N_2/N_5)$. We find that despite the frequent use of the combination $w_{\mathrm{p}}(r_{\mathrm{p}})+ΔΣ(r_{\mathrm{p}})$ in interpreting galaxy data, the count statistics, $P(N_{\mathrm{CIC}})$ and $P(N_{\mathrm{CIA}})$, are generally more efficient in constraining galaxy assembly bias when combined with $w_{\mathrm{p}}(r_{\mathrm{p}})$. Constraints based upon $w_{\mathrm{p}}(r_{\mathrm{p}})$ and $ΔΣ(r_{\mathrm{p}})$ share common degeneracy directions in the parameter space, while combinations of $w_{\mathrm{p}}(r_{\mathrm{p}})$ with the count statistics are more complementary. Therefore, we strongly suggest that count statistics should be used to complement the canonical observables in future studies of the galaxy--halo connection.

preprint2018arXiv

Maturing Satellite Kinematics into a Competitive Probe of the Galaxy-Halo Connection

The kinematics of satellite galaxies moving in a dark matter halo are a direct probe of the underlying gravitational potential. Thus, the phase-space distributions of satellites represent a powerful tool to determine the galaxy-halo connection from observations. By stacking the signal of a large number of satellite galaxies this potential can be unlocked even for haloes hosting a few satellites on average. In this work, we test the impact of various modelling assumptions on constraints derived from analysing satellite phase-space distributions in the non-linear, 1-halo regime. We discuss their potential to explain the discrepancy between average halo masses derived from satellite kinematics and gravitational lensing previously reported. Furthermore, we develop an updated, more robust analysis to extract constraints on the galaxy-halo relation from satellite properties in spectroscopic galaxy surveys such as the SDSS. We test the accuracy of this approach using a large number of realistic mock catalogues. Furthermore, we find that constraints derived from such an analysis are complementary and competitive with respect to the commonly used galaxy clustering and galaxy-galaxy lensing observables.

preprint2018arXiv

Updated Results on the Galaxy-Halo Connection from Satellite Kinematics in SDSS

We present new results on the relationship between central galaxies and dark matter haloes inferred from observations of satellite kinematics in the Sloan Digital Sky Survey (SDSS) DR7. We employ an updated analysis framework that includes detailed mock catalogues to model observational effects in SDSS. Our results constrain the colour-dependent conditional luminosity function (CLF) of dark matter haloes, as well as the radial profile of satellite galaxies. Confirming previous results, we find that red central galaxies live in more massive haloes than blue galaxies at fixed luminosity. Additionally, our results suggest that satellite galaxies have a radial profile less centrally concentrated than dark matter but not as cored as resolved subhaloes in dark matter-only simulations. Compared to previous works using satellite kinematics by More et al., we find much more competitive constraints on the galaxy-halo connection, on par with those derived from a combination of galaxy clustering and galaxy-galaxy lensing. We compare our results on the galaxy-halo connection to other studies using galaxy clustering and group catalogues, showing very good agreement between these different techniques. We discuss future applications of satellite kinematics in the context of constraining cosmology and the relationship between galaxies and dark matter haloes.

preprint2017arXiv

The Immitigable Nature of Assembly Bias: The Impact of Halo Definition on Assembly Bias

Dark matter halo clustering depends not only on halo mass, but also on other properties such as concentration and shape. This phenomenon is known broadly as assembly bias. We explore the dependence of assembly bias on halo definition, parametrized by spherical overdensity parameter, $Δ$. We summarize the strength of concentration-, shape-, and spin-dependent halo clustering as a function of halo mass and halo definition. Concentration-dependent clustering depends strongly on mass at all $Δ$. For conventional halo definitions ($Δ\sim 200\mathrm{m}-600\mathrm{m}$), concentration-dependent clustering at low mass is driven by a population of haloes that is altered through interactions with neighbouring haloes. Concentration-dependent clustering can be greatly reduced through a mass-dependent halo definition with $Δ\sim 20\mathrm{m}-40\mathrm{m}$ for haloes with $M_{200\mathrm{m}} \lesssim 10^{12}\, h^{-1}\mathrm{M}_{\odot}$. Smaller $Δ$ implies larger radii and mitigates assembly bias at low mass by subsuming altered, so-called backsplash haloes into now larger host haloes. At higher masses ($M_{200\mathrm{m}} \gtrsim 10^{13}\, h^{-1}\mathrm{M}_{\odot}$) larger overdensities, $Δ\gtrsim 600\mathrm{m}$, are necessary. Shape- and spin-dependent clustering are significant for all halo definitions that we explore and exhibit a relatively weaker mass dependence. Generally, both the strength and the sense of assembly bias depend on halo definition, varying significantly even among common definitions. We identify no halo definition that mitigates all manifestations of assembly bias. A halo definition that mitigates assembly bias based on one halo property (e.g., concentration) must be mass dependent. The halo definitions that best mitigate concentration-dependent halo clustering do not coincide with the expected average splashback radii at fixed halo mass.