Researcher profile

Chun Wang

Chun Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2024arXiv

Coordinating Guidance, Matching, and Charging Station Selection for Electric Vehicle Ride-Hailing Services through Data-Driven Stochastic Optimization

Electric vehicles (EVs) play a pivotal role in sustainable ride-hailing services primarily due to their potential in reducing carbon emissions and enhancing environmental protection. Despite their significance, current research in the realm of EV batched matching frequently overlooks critical aspects such as rider demand uncertainty and charging station (CS) selection, leading to inefficiencies like decreased matching rates and prolonged waiting times for both riders and EV drivers. To fill the research gap, we propose a data-driven optimization framework that incorporates two inter-connected stochastic optimization models to address the challenges. The first model aims to relocate the idle EVs under satisfied conditions to the designated regions based on the probabilistic rider demand forecasting result before the real rider demand is revealed. Taking the solutions of the first model as the input, the second model optimizes the batched matching results by minimizing the rider's average waiting time and EV charging waiting time at CS. This integrated framework not only elevates the matching rate through the incorporation of rider demand uncertainties in the guidance module but also substantially curtails both rider and EV charging waiting times by synergizing guidance with CS selection choices. Empirical validation of our framework was conducted through an extensive case study in New York City, utilizing real-world data sets. The validation results demonstrate that the proposed data-driven optimization framework outperforms the benchmark models in terms of the proposed evaluation metrics. Most importantly, when deploying our framework, the charging waiting time of the EVs with low SOC can be reduced up to 73.6% compared to the benchmark model without CS selection.

preprint2022arXiv

A spatially dependent correction of Gaia EDR3 parallax zero-point offset based on 0.3 million LAMOST DR8 giant stars

We have studied the zero-point offset of Gaia early Data Release 3 (EDR3) parallaxes based on a sample of 0.3 million giant stars built from the LAMOST data with distance accuracy better than 8.5\%. The official parallax zero-point corrections largely reduce the global offset in the Gaia EDR3 parallaxes: the global parallax offsets are $-$27.9 $μ$as and $-$26.5 $μ$as (before correction) and $+$2.6 $μ$as and $+$2.9 $μ$as (after correction) for the five- and six-parameter solutions, respectively. The bias of the raw parallax measurements is significantly dependent on the $G$ magnitudes, spectral colors, and positions of stars. The official parallax zero-point corrections could reduce parallax bias patterns with $G$ magnitudes, while could not fully account the patterns in the spaces of the spectral colors and positions. In the current paper, a spatially dependent parallax zero-point correction model for Gaia EDR3 five-parameter solution in the LAMOST footprint is firstly provided with the advantage of huge number of stars in our sample.

preprint2022arXiv

Double-Barreled Question Detection at Momentive

Momentive offers solutions in market research, customer experience, and enterprise feedback. The technology is gleaned from the billions of real responses to questions asked on the platform. However, people may create biased questions. A double-barreled question (DBQ) is a common type of biased question that asks two aspects in one question. For example, "Do you agree with the statement: The food is yummy, and the service is great.". This DBQ confuses survey respondents because there are two parts in a question. DBQs impact both the survey respondents and the survey owners. Momentive aims to detect DBQs and recommend survey creators to make a change towards gathering high quality unbiased survey data. Previous research work has suggested detecting DBQs by checking the existence of grammatical conjunction. While this is a simple rule-based approach, this method is error-prone because conjunctions can also exist in properly constructed questions. We present an end-to-end machine learning approach for DBQ classification in this work. We handled this imbalanced data using active learning, and compared state-of-the-art embedding algorithms to transform text data into vectors. Furthermore, we proposed a model interpretation technique propagating the vector-level SHAP values to a SHAP value for each word in the questions. We concluded that the word2vec subword embedding with maximum pooling is the optimal word embedding representation in terms of precision and running time in the offline experiments using the survey data at Momentive. The A/B test and production metrics indicate that this model brings a positive change to the business. To the best of our knowledge, this is the first machine learning framework for DBQ detection, and it successfully differentiates Momentive from the competitors. We hope our work sheds light on machine learning approaches for bias question detection.

preprint2022arXiv

Li-rich Giants in LAMOST Survey. III. The statistical analysis of Li-rich giants

The puzzle of Li-rich giant is still unsolved, contradicting the prediction of the standard stellar models. Although the exact evolutionary stages play a key role in the knowledge of Li-rich giants, a limited number of Li-rich giants have been taken with high-quality asteroseismic parameters to clearly distinguish the stellar evolutionary stages. Based on the LAMOST Data Release 7 (DR7), we applied a data-driven neural network method to derive the parameters for giant stars, which contain the largest number of Li-rich giants. The red giant stars are classified into three stages of Red Giant Branch (RGB), Primary Red Clump (PRC), and Secondary Red Clump (SRC) relying on the estimated asteroseismic parameters. In the statistical analysis of the properties (i.e. stellar mass, carbon, nitrogen, Li-rich distribution, and frequency) of Li-rich giants, we found that: (1) Most of the Li-rich RGB stars are suggested to be the descendants of Li-rich pre-RGB stars and/or the result of engulfment of planet or substellar companions; (2) The massive Li-rich SRC stars could be the natural consequence of Li depletion from the high-mass Li-rich RGB stars. (3) Internal mixing processes near the helium flash can account for the phenomenon of Li-rich on PRC that dominated the Li-rich giants. Based on the comparison of [C/N] distributions between Li-rich and normal PRC stars, the Li-enriched processes probably depend on the stellar mass.

preprint2022arXiv

Play Like the Pros? Solving the Game of Darts as a Dynamic Zero-Sum Game

The game of darts has enjoyed great growth over the past decade with the perception of darts moving from that of a pub game to a game that is regularly scheduled on prime-time television in many countries including the U.K., Germany, the Netherlands, and Australia among others. The game of darts involves strategic interactions between two players but to date the literature has ignored these interactions. In this paper, we formulate and solve the game of darts as a dynamic zero-sum-game (ZSG), and to the best of our knowledge we are the first to do so. We also estimate individual skill models using a novel data-set based on darts matches that were played by the top 16 professional players in the world during the 2019 season. Using the fitted skill models and our ZSG problem formulation, we quantify the importance of playing strategically, i.e. taking into account the score and strategy of one's opponent, when computing an optimal strategy. For top professionals we find that playing strategically results in an increase in win-probability of just 0.2% - 0.6% over a single leg but as much as 2.3% over a best-of-35 legs match.

preprint2022arXiv

The value-added catalogue for LAMOST DR8 low-resolution spectra

We present a value-added catalog containing stellar parameters estimated from 7.10 million low-resolution spectra for 5.16 million unique stars with spectral signal-to-noise ratios (SNRs) higher than 10 obtained by the Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST) Galactic spectroscopic surveys. The catalog presents values of stellar atmospheric parameters (effective temperature $T_{\mathrm{eff}}$, surface gravity $\log g$, metallicity [Fe/H]/[M/H]), $α$-element to metal abundance ratio [$α$/M], carbon and nitrogen to iron abundance ratios [C/Fe] and [N/Fe] and 14 bands' absolute magnitudes deduced from LAMOST spectra using the method of neural network. The spectro-photometric distances of those stars are also provided based on the distance modulus. For stars with spectral SNRs larger than 50, precisions of $T_{\mathrm{eff}}$, $\log g$, [Fe/H], [M/H], [C/Fe], [N/Fe] and [$α$/M] are 85\,K, 0.098\,dex, 0.05\,dex, 0.05\,dex, 0.052\,dex, 0.082\,dex and 0.027\,dex, respectively. The errors of 14 band's absolute magnitudes are 0.16--0.22\,mag for stars with spectral SNRs larger than 50. The spectro-photometric distance is accurate to 8.5\% for stars with spectral SNRs larger than 50, and is more accurate than geometrical distance for stars with distance larger than 2.0\,kpc. Our estimates of [Fe/H] are reliable down to [Fe/H] $\sim -3.5$\,dex, significantly better than previous results. The catalog provide 26,868 unique very metal poor star candidates ([Fe/H] $\leq -2.0$). The catalog would be a valuable data set to study the structure and evolution of the Galaxy, especially the solar-neighbourhood and the outer disc.

preprint2021arXiv

Beyond spectroscopy. I. Metallicities, distances, and age estimates for over twenty million stars from SMSS DR2 and Gaia EDR3

Accurate determinations of stellar parameters and distances for large complete samples of stars are keys for conducting detailed studies of the formation and evolution of our Galaxy. Here we present stellar atmospheric parameters ($T_{\rm eff}$, luminosity classifications, and [Fe/H]) estimates for some 24 million stars determined from the stellar colors of SMSS DR2 and Gaia EDR3, based on training datasets with available spectroscopic measurements from previous high/medium/low-resolution spectroscopic surveys. The number of stars with photometric-metallicity estimates is 4-5 times larger than that collected by the current largest spectroscopic survey to date - LAMOST - over the course of the past decade. External checks indicate that the precision of the photometric-metallicity estimates are quite high, comparable to or slightly better than that derived from spectroscopy, with typical values around 0.05-0.15dex for both dwarf and giant stars with [Fe/H]>$-$1.0, 0.10-0.20dex for giant stars with $-$2.0<[Fe/H]<$-$1.0. and 0.20-0.25dex for giant stars with [Fe/H]<$-$2.0, and include estimates for stars as metal-poor as [Fe/H]~$-$3.5, substantially lower than previous photometric techniques. Photometric-metallicity estimates are obtained for an unprecedented number of metal-poor stars, including a total of over three million metal-poor (MP; [Fe/H] <$-$1.0) stars, over half a million very metal-poor (VMP; [Fe/H]<$-$2.0) stars, and over 25,000 extremely metal-poor (EMP; [Fe/H]<$-$3.0) stars. Moreover, distances are determined for over 20 million stars in our sample. For the over 18 million sample stars with accurate Gaia parallaxes, stellar ages are estimated by comparing with theoretical isochrones. Astrometric information is provided for the stars in our catalog, along with radial velocities for ~10% of our sample stars, taken from completed/ongoing large-scale spectroscopic surveys.

preprint2020arXiv

A Catalog of RV Variable Star Candidates from LAMOST

RV variable stars are important in astrophysics. The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) spectroscopic survey has provided ~ 6.5 million stellar spectra in its Data Release 4 (DR4). During the survey, ~ 4.7 million unique sources were targeted and ~ 1 million stars observed repeatedly. The probabilities of stars being RV variables are estimated by comparing the observed radial velocity variations with the simulated ones. We build a catalog of 80,702 RV variable candidates with probability greater than 0.60 by analyzing the duplicate-observed multi-epoch sources covered by the LAMOST DR4. Simulations and cross-identifications show that the purity of the catalog is higher than 80%. The catalog consists of 77% binary systems and 7% pulsating stars as well as 16% pollution by single stars. 3,138 RV variables are classified through cross-identifications with published results in literatures. By using the 3,138 sources common to both LAMOST and a collection of published RV variable catalogs we are able to analyze LAMOST&#39;s RV variable detection rate. The efficiency of the method adopted in this work relies not only on the sampling frequency of observations but also periods and amplitudes of RV variables. With the progress of LAMOST, Gaia and other surveys, more and more RV variables would will be confirmed and classified. This catalog is valuable for other large-scale surveys, especially for RV variable searches. The catalog will be released according to the LAMOST Data Policy via http://dr4.lamost.org.

preprint2020arXiv

A GRU-based Mixture Density Network for Data-Driven Dynamic Stochastic Programming

The conventional deep learning approaches for solving time-series problem such as long-short term memory (LSTM) and gated recurrent unit (GRU) both consider the time-series data sequence as the input with one single unit as the output (predicted time-series result). Those deep learning approaches have made tremendous success in many time-series related problems, however, this cannot be applied in data-driven stochastic programming problems since the output of either LSTM or GRU is a scalar rather than probability distribution which is required by stochastic programming model. To fill the gap, in this work, we propose an innovative data-driven dynamic stochastic programming (DD-DSP) framework for time-series decision-making problem, which involves three components: GRU, Gaussian Mixture Model (GMM) and SP. Specifically, we devise the deep neural network that integrates GRU and GMM which is called GRU-based Mixture Density Network (MDN), where GRU is used to predict the time-series outcomes based on the recent historical data, and GMM is used to extract the corresponding probability distribution of predicted outcomes, then the results will be input as the parameters for SP. To validate our approach, we apply the framework on the car-sharing relocation problem. The experiment validations show that our framework is superior to data-driven optimization based on LSTM with the vehicle average moving lower than LSTM.

preprint2020arXiv

DDKSP: A Data-Driven Stochastic Programming Framework for Car-Sharing Relocation Problem

Car-sharing issue is a popular research field in sharing economy. In this paper, we investigate the car-sharing relocation problem (CSRP) under uncertain demands. Normally, the real customer demands follow complicating probability distribution which cannot be described by parametric approaches. In order to overcome the problem, an innovative framework called Data-Driven Kernel Stochastic Programming (DDKSP) that integrates a non-parametric approach - kernel density estimation (KDE) and a two-stage stochastic programming (SP) model is proposed. Specifically, the probability distributions are derived from historical data by KDE, which are used as the input uncertain parameters for SP. Additionally, the CSRP is formulated as a two-stage SP model. Meanwhile, a Monte Carlo method called sample average approximation (SAA) and Benders decomposition algorithm are introduced to solve the large-scale optimization model. Finally, the numerical experimental validations which are based on New York taxi trip data sets show that the proposed framework outperforms the pure parametric approaches including Gaussian, Laplace and Poisson distributions with 3.72% , 4.58% and 11% respectively in terms of overall profits.

preprint2020arXiv

Generation and manipulation of chiral terahertz waves emitted from the three-dimensional topological insulator Bi2Te3

Arbitrary manipulation of broadband terahertz waves with flexible polarization shaping at the source has great potential in expanding real applications such as imaging, information encryption, and all-optically coherent control of terahertz nonlinear phenomena. Topological insulators featuring unique spin-momentum locked surface state have already exhibited very promising prospects in terahertz emission, detection and modulation, which may lay a foundation for future on-chip topological insulator-based terahertz systems. However, polarization shaped terahertz emission with prescribed manners of arbitrarily manipulated temporal evolution of the amplitude and electric-field vector direction based on topological insulators have not yet been explored. Here we systematically investigated the terahertz radiation from topological insulator Bi2Te3 nanofilms driven by femtosecond laser pulses, and successfully realized the generation of efficient chiral terahertz waves with controllable chirality, ellipticity, and principle axis. The convenient engineering of the chiral terahertz waves was interpreted by photogalvanic effect induced photocurrent, while the linearly polarized terahertz waves originated from linear photogalvanic effect induced shift currents. We believe our works not only help further understanding femtosecond coherent control of ultrafast spin currents in light-matter interaction but also provide an effective way to generate spin-polarized terahertz waves and accelerate the proliferation of twisting the terahertz waves at the source.

preprint2020arXiv

Mapping the Galactic disk with the LAMOST and Gaia Red clump sample: I: precise distances, masses, ages and 3D velocities of $\sim$ 140000 red clump stars

We present a sample of $\sim$ 140,000 primary red clump (RC) stars of spectral signal-to-noise ratios higher than 20 from the LAMOST Galactic spectroscopic surveys, selected based on their positions in the metallicity-dependent effective temperature--surface gravity and color--metallicity diagrams, supervised by high-quality $Kepler$ asteroseismology data. The stellar masses and ages of those stars are further determined from the LAMOST spectra, using the Kernel Principal Component Analysis method, trained with thousands of RCs in the LAMOST-$Kepler$ fields with accurate asteroseismic mass measurements. The purity and completeness of our primary RC sample are generally higher than 80 per cent. For the mass and age, a variety of tests show typical uncertainties of 15 and 30 per cent, respectively. Using over ten thousand primary RCs with accurate distance measurements from the parallaxes of Gaia DR2, we re-calibrate the $K_{\rm s}$ absolute magnitudes of primary RCs by, for the first time, considering both the metallicity and age dependencies. With the the new calibration, distances are derived for all the primary RCs, with a typical uncertainty of 5--10 per cent, even better than the values yielded by the Gaia parallax measurements for stars beyond 3--4 kpc. The sample covers a significant volume of the Galactic disk of $4 \leq R \leq 16$ kpc, $|Z| \leq 5$ kpc, and $-20 \leq ϕ\leq 50^{\circ}$. Stellar atmospheric parameters, line-of-sight velocities and elemental abundances derived from the LAMOST spectra and proper motions of Gaia DR2 are also provided for the sample stars. Finally, the selection function of the sample is carefully evaluated in the color-magnitude plane for different sky areas. The sample is publicly available.