Source author record

Jinhua Zhao

Jinhua Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Machine Learning Artificial Intelligence Social and Information Networks cs.CY Computer Vision Cryptography and Security Data Structures and Algorithms econ.GN eess.SY math.PR physics.soc-ph q-fin.EC Robotics Systems and Control

Catalog footprint

What is connected

17works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SENSE: Satellite-based ENergy Synthesis for Sustainable Environment

Urban Building Energy Modeling plays a critical role in achieving the United Nations' Sustainable Development Goals 7 and 11. Although existing studies based on satellite imagery and deep learning have achieved remarkable progress, many challenges exist: most existing studies are inherently predictive, failing to reflect the generative nature of urban planning; although generative AI and diffusion models have seen explosive growth in satellite imagery, they lack the urban functional generation (e.g., energy layer); third, aligned high-quality high-resolution building energy data with satellite imagery is limited and scarce. Here we propose SENSE (Satellite-based ENergy Synthesis for Sustainable Environment), a unified generative UBEM framework that jointly synthesizes realistic urban satellite imagery and aligned high-quality building energy consumption and height maps. By conditioning on road networks and urban density metrics, SENSE, based on a controllable diffusion model, leverages the knowledge learned by large vision models to generate urban building energy consumption and height information (annotations) in the latent space. Experiments across four cities (New York City, Boston, Lyon, Busan) demonstrate that SENSE achieves high visual fidelity and strong physical consistency, satisfying the ASHRAE standard metric. Experiments demonstrate that SENSE can generate enough annotated synthetic data using less than 20% labeled energy data, boosting downstream prediction performance by 10% IoU. Compared to SOTA urban energy prediction methods, SENSE significantly reduced prediction error (reduced 3%-11% NMBE and 1%-9% CVRMSE). This study offers an energy-efficiency urban planning and physical generation solution for urban science, energy science and building science. The dataset and code: https://huggingface.co/datasets/skl24/MUSE and https://github.com/kailaisun/GenAI4Urban-Energy/.

preprint2026arXiv

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

We present TailedTS, a large-scale benchmark dataset derived from Wikipedia hourly page view observations throughout 2024, specifically designed to test time series forecasting models under heavy-tailed, zero-inflated, and non-Gaussian conditions. The dataset comprises approximately 24.69 billion data points spanning roughly 3 million unique Wikipedia pages per month, stored in high-efficiency Apache Parquet format. Wikipedia traffic follows a pronounced power-law distribution where roughly 5% of pages account for over 70% of total page views, creating a natural and rigorous testbed for model robustness against extreme volatility that are absent from or underrepresented in existing benchmarks such as M4, M5, and UCI electricity datasets. TailedTS enables several research tasks. First, we introduce a periodicity quantification framework based on sparse autoregression with sparsity and non-negativity constraints, revealing that frequently-viewed pages exhibit significantly weaker periodic structure than their less-viewed counterparts, showing direct implications for server allocation and traffic forecasting on large digital platforms. Second, we provide standardized prediction benchmarks evaluated under a suite of non-Gaussian loss functions, including $\ell_1$-norm, Huber, quantile, and $\ell_p$-norm losses, demonstrating that standard Gaussian-based estimators degrade substantially on high-volume page categories, while robust alternatives provide consistent gains across all traffic scales. TailedTS is publicly available at https://doi.org/10.5281/zenodo.17070469.

preprint2025arXiv

Reproducibility in the Control of Autonomous Mobility-on-Demand Systems

Autonomous Mobility-on-Demand (AMoD) systems, powered by advances in robotics, control, and Machine Learning (ML), offer a promising paradigm for future urban transportation. AMoD offers fast and personalized travel services by leveraging centralized control of autonomous vehicle fleets to optimize operations and enhance service performance. However, the rapid growth of this field has outpaced the development of standardized practices for evaluating and reporting results, leading to significant challenges in reproducibility. As AMoD control algorithms become increasingly complex and data-driven, a lack of transparency in modeling assumptions, experimental setups, and algorithmic implementation hinders scientific progress and undermines confidence in the results. This paper presents a systematic study of reproducibility in AMoD research. We identify key components across the research pipeline, spanning system modeling, control problems, simulation design, algorithm specification, and evaluation, and analyze common sources of irreproducibility. We survey prevalent practices in the literature, highlight gaps, and propose a structured framework to assess and improve reproducibility. Specifically, concrete guidelines are offered, along with a "reproducibility checklist", to support future work in achieving replicable, comparable, and extensible results. While focused on AMoD, the principles and practices we advocate generalize to a broader class of cyber-physical systems that rely on networked autonomy and data-driven control. This work aims to lay the foundation for a more transparent and reproducible research culture in the design and deployment of intelligent mobility systems.

preprint2023arXiv

Evaluation of Public Transit Systems under Short Random Service Suspensions: A Bulk-Service Queuing Approach

This paper proposes a stochastic framework to evaluate the performance of public transit systems under short random service suspensions. We aim to derive closed-form formulations of the mean and variance of the queue length and waiting time. A bulk-service queue model is adopted to formulate the queuing behavior in the system. The random service suspension is modeled as a two-state (disruption and normal) Markov process. We prove that headway is distributed as the difference between two compound Poisson exponential random variables. The distribution is used to specify the mean and variance of queue length and waiting time at each station with analytical formulations. The closed-form stability condition of the system is also derived, implying that the system is more likely to be unstable with high incident rates and long incident duration. The proposed model is implemented on a bus network. Results show that higher incident rates and higher average incident duration will increase both the mean and variance of queue length and waiting time, which are consistent with the theoretical analysis. Crowding stations are more vulnerable to random service suspensions. The theoretical results are validated with a simulation model, showing consistency between the two outcomes.

preprint2023arXiv

Fairness-Enhancing Vehicle Rebalancing in the Ride-hailing System

The rapid growth of the ride-hailing industry has revolutionized urban transportation worldwide. Despite its benefits, equity concerns arise as underserved communities face limited accessibility to affordable ride-hailing services. A key issue in this context is the vehicle rebalancing problem, where idle vehicles are moved to areas with anticipated demand. Without equitable approaches in demand forecasting and rebalancing strategies, these practices can further deepen existing inequities. In the realm of ride-hailing, three main facets of fairness are recognized: algorithmic fairness, fairness to drivers, and fairness to riders. This paper focuses on enhancing both algorithmic and rider fairness through a novel vehicle rebalancing method. We introduce an approach that combines a Socio-Aware Spatial-Temporal Graph Convolutional Network (SA-STGCN) for refined demand prediction and a fairness-integrated Matching-Integrated Vehicle Rebalancing (MIVR) model for subsequent vehicle rebalancing. Our methodology is designed to reduce prediction discrepancies and ensure equitable service provision across diverse regions. The effectiveness of our system is evaluated using simulations based on real-world ride-hailing data. The results suggest that our proposed method enhances both accuracy and fairness in forecasting ride-hailing demand, ultimately resulting in more equitable vehicle rebalancing in subsequent operations. Specifically, the algorithm developed in this study effectively reduces the standard deviation and average customer wait times by 6.48% and 0.49%, respectively. This achievement signifies a beneficial outcome for ride-hailing platforms, striking a balance between operational efficiency and fairness.

preprint2023arXiv

Modeling Virus Transmission Risks in Commuting with Emerging Mobility Services: A Case Study of COVID-19

Commuting is an important part of daily life. With the gradual recovery from COVID-19 and more people returning to work from the office, the transmission of COVID-19 during commuting becomes a concern. Recent emerging mobility services (such as ride-hailing and bike-sharing) further deteriorate the infection risks due to shared vehicles or spaces during travel. Hence, it is important to quantify the infection risks in commuting. This paper proposes a probabilistic framework to estimate the risk of infection during an individual's commute considering different travel modes, including public transit, ride-share, bike, and walking. The objective is to evaluate the probability of infection as well as the estimation errors (i.e., uncertainty quantification) given the origin-destination (OD), departure time, and travel mode. We first define a general trip planning function to generate trip trajectories and probabilities of choosing different paths according to the OD, departure time, and travel mode. Then, we consider two channels of infections: 1) infection by close contact and 2) infection by touching surfaces. The infection risks are calculated on a trip segment basis. Different sources of data (such as smart card data, travel surveys, and population data) are used to estimate the potential interactions between the individual and the infectious environment. The model is implemented in the MIT community as a case study. We evaluate the commute infection risks for employees and students. Results show that most of the individuals have an infection probability close to zero. The maximum infection probability is around 0.8%, implying that the probability of getting infected during the commuting process is low. Individuals with larger travel distances, traveling in transit, and traveling during peak hours are more likely to get infected.

preprint2023arXiv

Passenger Path Choice Estimation Using Smart Card Data: A Latent Class Approach with Panel Effects Across Days

Understanding passengers' path choice behavior in urban rail systems is a prerequisite for effective operations and planning. This paper attempts bridging the gap by proposing a probabilistic approach to infer passengers' path choice behavior in urban rail systems using a large-scale smart card data. The model uses latent classes and panel effects to capture passengers' implicit behavior heterogeneity and longitudinal correlations, key research gaps in big data driven behavior studies. We formulate the probability of each individual's arrival time at a destination based on their path choice behavior, and estimate corresponding path choice model parameters as a maximum likelihood estimation problem. The original likelihood function is intractable due to the exponential computation complexity. We derive a tractable likelihood function and propose a numerical integral approach to efficiently estimate the model. Also, we propose a method to calculate the t-statistic of the estimated choice parameters based on the numerically estimated Hessian matrix and Cramer-Rao bound (the lower bound on the coefficient variance). Case studies using synthetic data validate the model performance and its robustness against parameter initialization and input errors, and highlight the importance of incorporating crowding impact in path choice estimation. Applications using actual data from the Mass Transit Railway, Hong Kong reveal two latent groups of passengers: time-sensitive (TS) and comfort-aware (CA). TS passengers are those who are more likely to choose paths with short travel times. Most of them are regular commuters with high travel frequency and less schedule flexibility. CA passengers care more about the travel comfort experience and choose paths with less walking and waiting times. The proposed approach is data-driven and general to accommodate other discrete choice structures.

preprint2023arXiv

Proof of Travel for Trust-Based Data Validation in V2I Communication

Previous work on misbehavior detection and trust management for Vehicle-to-Everything (V2X) communication security is effective in identifying falsified and malicious V2X data. Each vehicle in a given region can be a witness to report on the misbehavior of other nearby vehicles, which will then be added to a "blacklist." However, there may not exist enough witness vehicles that are willing to opt-in in the early stage of connected-vehicle deployment. In this paper, we propose a "whitelisting" approach to V2X security, titled Proof-of-Travel (POT), which leverages the support of roadside infrastructure. Our goal is to transform the power of cryptography techniques embedded within Vehicle-to-Infrastructure (V2I) protocols into game-theoretic mechanisms to incentivize connected-vehicle data sharing and validate data trustworthiness simultaneously. The key idea is to determine the reputation of and the contribution made by a vehicle based on its distance traveled and the information it shared through V2I channels. In particular, the total vehicle miles traveled for a vehicle must be testified by digital signatures signed by each infrastructure component along the path of its movement. While building a chain of proofs of spatial movement creates burdens for malicious vehicles, acquiring proofs does not result in extra costs for normal vehicles, which naturally want to move from the origin to the destination. The POT protocol is used to enhance the security of previous voting-based data validation algorithms for V2I crowdsensing applications. For the POT-enhanced voting, we prove that all vehicles choosing to cheat are not a pure Nash equilibrium using game-theoretic analysis. Simulation results suggest that the POT-enhanced voting is more robust to malicious data.

preprint2022arXiv

Economies and Diseconomies of Scale in Segmented Mobility Sharing Markets

On-demand mobility sharing, provided by one or several transportation network companies (TNCs), is realized by real-time optimization algorithms to connect trips among tens of thousands of drivers and fellow passengers. In a market of mobility sharing comprised of TNCs, there are two competing principles, the economies of network scale and the healthy competition between TNCs, which can lead to "segmentation" of market. To understand the substantiality and relationship of the two competing principles, we need to answer how much efficiency loss is generated due to the segmentation of market, and which factors are related to it. Here we show how four critical factors of market structure and characteristics of mobility sharing services -- density of trips (thickness), maximum detour allowed for sharing (tightness), market shares (unevenness), and spatial segregation of the TNCs (dissolvedness) -- are associated with the efficiency loss, represented as the difference in vehicle miles traveled (VMT) under different market structures. We found that 1) while VMT shows a simple power function with thickness, the corresponding exponent term can be expressed as a non-monotonic function with tightness -- essentially showing how economies and diseconomies of scale in this market arise, and appearing a very similar form to the Lennard--Jones model in inter-molecular potentials; and 2) the efficiency loss is higher when unevenness is closer to 0.5 (50-50 market share) and dissolvedness is larger. Our results give a comprehensive analysis of how the inefficiency of market segmentation is generated, and how potentially it may be avoided through market mechanism design.

preprint2022arXiv

Impact of unplanned service disruptions on urban public transit systems

This paper proposes a general unplanned incident analysis framework for public transit systems from the supply and demand sides using automated fare collection (AFC) and automated vehicle location (AVL) data. Specifically, on the supply side, we propose an incident-based network redundancy index to analyze the network's ability to provide alternative services under a specific rail disruption. The impacts on operations are analyzed through the headway changes. On the demand side, the analysis takes place at two levels: aggregate flows and individual response. We calculate the demand changes of different rail lines, rail stations, bus routes, and bus stops to better understand the passenger flow redistribution under incidents. Individual behavior is analyzed using a binary logit model based on inferred passengers' mode choices and socio-demographics using AFC data. The public transit system of the Chicago Transit Authority is used as a case study. Two rail disruption cases are analyzed, one with high network redundancy around the impacted stations and the other with low. Results show that the service frequency of the incident line was largely reduced (by around 30% ~ 70%) during the incident time. Nearby rail lines with substitutional functions were also slightly affected. Passengers showed different behavioral responses in the two incident scenarios. In the low redundancy case, most of the passengers chose to use nearby buses to move, either to their destinations or to the nearby rail lines. In the high redundancy case, most of the passengers transferred directly to nearby lines. Corresponding policy implications and operating suggestions are discussed.

preprint2022arXiv

Preparing urban mobility for the future of work

A gradual growth in flexible work over many decades has been suddenly and dramatically accelerated by the COVID-19 pandemic. The share of flexible work days in the United States is forecasted to grow from 4\% in 2018 to over 26\% by 2022. This rapid and unexpected shift in the nature of work will have a profound effect on the demand for, and supply of, urban transportation. Understanding how people make decisions around where and with whom to work will be critical for predicting future travel patterns and designing mobility systems to serve flexible commuters. To that end, this paper establishes a formal taxonomy for describing possible flexible work arrangements, the stakeholders involved and the relationships between them. An analytical framework is then developed for adapting existing transportation models to incorporate the unique dynamics of flexible work location choice. Several examples are provided to demonstrate how the new taxonomy and analytical framework can be applied across a broad set of scenarios. Finally, a critical research agenda is proposed to create both the empirical knowledge and methodological tools to prepare urban mobility for the future of work.

preprint2022arXiv

Uncertainty Quantification of Sparse Travel Demand Prediction with Spatial-Temporal Graph Neural Networks

Origin-Destination (O-D) travel demand prediction is a fundamental challenge in transportation. Recently, spatial-temporal deep learning models demonstrate the tremendous potential to enhance prediction accuracy. However, few studies tackled the uncertainty and sparsity issues in fine-grained O-D matrices. This presents a serious problem, because a vast number of zeros deviate from the Gaussian assumption underlying the deterministic deep learning models. To address this issue, we design a Spatial-Temporal Zero-Inflated Negative Binomial Graph Neural Network (STZINB-GNN) to quantify the uncertainty of the sparse travel demand. It analyzes spatial and temporal correlations using diffusion and temporal convolution networks, which are then fused to parameterize the probabilistic distributions of travel demand. The STZINB-GNN is examined using two real-world datasets with various spatial and temporal resolutions. The results demonstrate the superiority of STZINB-GNN over benchmark models, especially under high spatial-temporal resolutions, because of its high accuracy, tight confidence intervals, and interpretable parameters. The sparsity parameter of the STZINB-GNN has physical interpretation for various transportation applications.

preprint2021arXiv

Estimating the potential for shared autonomous scooters

Recent technological developments have shown significant potential for transforming urban mobility. Considering first- and last-mile travel and short trips, the rapid adoption of dockless bike-share systems showed the possibility of disruptive change, while simultaneously presenting new challenges, such as fleet management or the use of public spaces. In this paper, we evaluate the operational characteristics of a new class of shared vehicles that are being actively developed in the industry: scooters with self-repositioning capabilities. We do this by adapting the methodology of shareability networks to a large-scale dataset of dockless bike-share usage, giving us estimates of ideal fleet size under varying assumptions of fleet operations. We show that the availability of self-repositioning capabilities can help achieve up to 10 times higher utilization of vehicles than possible in current bike-share systems. We show that actual benefits will highly depend on the availability of dedicated infrastructure, a key issue for scooter and bicycle use. Based on our results, we envision that technological advances can present an opportunity to rethink urban infrastructures and how transportation can be effectively organized in cities.

preprint2021arXiv

Individual Mobility Prediction: An Interpretable Activity-based Hidden Markov Approach

Individual mobility is driven by demand for activities with diverse spatiotemporal patterns, but existing methods for mobility prediction often overlook the underlying activity patterns. To address this issue, this study develops an activity-based modeling framework for individual mobility prediction. Specifically, an input-output hidden Markov model (IOHMM) framework is proposed to simultaneously predict the (continuous) time and (discrete) location of an individual's next trip using transit smart card data. The prediction task can be transformed into predicting the hidden activity duration and end location. Based on a case study of Hong Kong's metro system, we show that the proposed model can achieve similar prediction performance as the state-of-the-art long short-term memory (LSTM) model. Unlike LSTM, the proposed IOHMM model can also be used to analyze hidden activity patterns, which provides meaningful behavioral interpretation for why an individual makes a certain trip. Therefore, the activity-based prediction framework offers a way to preserve the predictive power of advanced machine learning methods while enhancing our ability to generate insightful behavioral explanations, which is useful for enhancing situational awareness in user-centric transportation applications such as personalized traveler information.

preprint2020arXiv

Assignment-based Path Choice Estimation for Metro Systems Using Smart Card Data

Urban rail services are the principal means of public transportation in many cities. To understand the crowding patterns and develop efficient operation strategies in the system, obtaining path choices is important. This paper proposed an assignment-based path choice estimation framework using automated fare collection (AFC) data. The framework captures the inherent correlation of crowding among stations, as well as the interaction between path choice and left behind. The path choice estimation is formulated as an optimization problem. The original problem is intractable because of a non-analytical constraint and a non-linear equation constraint. A solution procedure is proposed to decompose the original problem into three tractable sub-problems, which can be solved efficiently. The model is validated using both synthetic data and real-world AFC data in Hong Kong Mass Transit Railway (MTR) system. The synthetic data test validates the model's effectiveness in estimating path choice parameters, which can outperform the purely simulation-based optimization methods in both accuracy and efficiency. The test results using actual data show that the estimated path shares are more reasonable than survey-derived path shares and uniform path shares. Model robustness in terms of different initial values and different case study dates are also verified.

preprint2020arXiv

Profiling presence patterns and segmenting user locations from cell phone data

The dynamic monitoring of commuting flows is crucial for improving transit systems in fast-developing cities around the world. However, existing methodology to infer commuting originations and destinations have to either rely on large-scale survey data, which is inherently expensive to implement, or on Call Detail Records but based on ad-hoc heuristic assignment rules based on the frequency of appearance at given locations. In this paper, we proposed a novel method to accurately infer the point of origin and destinations of commuting flows based on individual's spatial-temporal patterns inferred from Call Detail Records. Our project significantly improves the accuracy upon the heuristic assignment rules popularly adopted in the literature. Starting with the historical data of geo-temporal travel patterns for a panel of individuals, we create, for each person-location, a vector of probability distribution capturing the likelihood that the person will appear in that location for a given the time of day. Stacked in this way, the matrix of historical geo-temporal data enables us to apply Eigen-decomposition and use unsupervised machine learning techniques to extract commonalities across locations for the different groups of travelers, which ultimately allows us to make inferences and create labels, such as home and work, on specific locations. Testing the methodology on real-world data with known location labels shows that our method identifies home and workplaces with significant accuracy, improving upon the most commonly used methods in the literature by 79% and 34%, respectively. Most importantly, our methodology does not bear any significant computation burden and is easily scalable and easily expanded to other real-world data with historical tracking.

preprint2016arXiv

Managing travel demand: Location recommendation for system efficiency based on mobile phone data

Growth in leisure travel has become increasingly significant economically, socially, and environmentally. However, flexible but uncoordinated travel behaviors exacerbate traffic congestion. Mobile phone records not only reveal human mobility patterns, but also enable us to manage travel demand for system efficiency. In this paper, we propose a location recommendation system that infers personal preferences while accounting for constraints imposed by road capacity in order to manage travel demand. We first infer unobserved preferences using a machine learning technique from phone records. We then formulate an optimization method to improve system efficiency. Coupling mobile phone data with traffic counts and road network infrastructures collected in Andorra, this study shows that uncoordinated travel behaviors lead to longer average travel delay, implying the opportunities in managing travel demand by collective decisions. The interplay between congestion relief and overall satisfied location preferences observed in extensive simulations indicate that moderate sacrifices of individual utility lead to significant travel time savings. Specifically, the results show that under full compliance rate, travel delay fell by 52% at a cost of 31% less satisfaction. Under 60% compliance rate, 41% travel delay is saved with a 17% reduction in satisfaction. This paper highlights the effectiveness of the synergy among collective behaviors in increasing system efficiency.

Jinhua Zhao

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

SENSE: Satellite-based ENergy Synthesis for Sustainable Environment

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

Reproducibility in the Control of Autonomous Mobility-on-Demand Systems

Evaluation of Public Transit Systems under Short Random Service Suspensions: A Bulk-Service Queuing Approach

Fairness-Enhancing Vehicle Rebalancing in the Ride-hailing System

Modeling Virus Transmission Risks in Commuting with Emerging Mobility Services: A Case Study of COVID-19

Passenger Path Choice Estimation Using Smart Card Data: A Latent Class Approach with Panel Effects Across Days

Proof of Travel for Trust-Based Data Validation in V2I Communication

Economies and Diseconomies of Scale in Segmented Mobility Sharing Markets

Impact of unplanned service disruptions on urban public transit systems

Preparing urban mobility for the future of work

Uncertainty Quantification of Sparse Travel Demand Prediction with Spatial-Temporal Graph Neural Networks

Estimating the potential for shared autonomous scooters

Individual Mobility Prediction: An Interpretable Activity-based Hidden Markov Approach

Assignment-based Path Choice Estimation for Metro Systems Using Smart Card Data

Profiling presence patterns and segmenting user locations from cell phone data

Managing travel demand: Location recommendation for system efficiency based on mobile phone data