Source author record

Zhe Jiang

Zhe Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision physics.ao-ph Artificial Intelligence Computation and Language Databases Distributed, Parallel, and Cluster Computing eess.SP

Catalog footprint

What is connected

12works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Table as a Modality for Large Language Models

To migrate the remarkable successes of Large Language Models (LLMs), the community has made numerous efforts to generalize them to the table reasoning tasks for the widely deployed tabular data. Despite that, in this work, by showing a probing experiment on our proposed StructQA benchmark, we postulate that even the most advanced LLMs (such as GPTs) may still fall short of coping with tabular data. More specifically, the current scheme often simply relies on serializing the tabular data, together with the meta information, then inputting them through the LLMs. We argue that the loss of structural information is the root of this shortcoming. In this work, we further propose TAMO, which bears an ideology to treat the tables as an independent modality integrated with the text tokens. The resulting model in TAMO is a multimodal framework consisting of a hypergraph neural network as the global table encoder seamlessly integrated with the mainstream LLM. Empirical results on various benchmarking datasets, including HiTab, WikiTQ, WikiSQL, FeTaQA, and StructQA, have demonstrated significant improvements on generalization with an average relative gain of 42.65%.

preprint2026arXiv

VNU-Bench: A Benchmarking Dataset for Multi-Source Multimodal News Video Understanding

News videos are carefully edited multimodal narratives that combine narration, visuals, and external quotations into coherent storylines. In recent years, there have been significant advances in evaluating multimodal large language models (MLLMs) for news video understanding. However, existing benchmarks largely focus on single-source, intra-video reasoning, where each report is processed in isolation. In contrast, real-world news consumption is inherently multi-sourced: the same event is reported by different outlets with complementary details, distinct narrative choices, and sometimes conflicting claims that unfold over time. Robust news understanding, therefore, requires models to compare perspectives from different sources, align multimodal evidence across sources, and synthesize multi-source information. To fill this gap, we introduce VNU-Bench, the first benchmark for multi-source, cross-video understanding in the news domain. We design a set of new question types that are unique in testing models' ability of understanding multi-source multimodal news from a variety of different angles. We design a novel hybrid human-model QA generation process that addresses the issues of scalability and quality control in building a large dataset for cross-source news understanding. The dataset comprises 429 news groups, 1,405 videos, and 2,501 high-quality questions. Comprehensive evaluation of both closed- and open-source multimodal models shows that VNU-Bench poses substantial challenges for current MLLMs.

preprint2022arXiv

An Explainer for Temporal Graph Neural Networks

Temporal graph neural networks (TGNNs) have been widely used for modeling time-evolving graph-related tasks due to their ability to capture both graph topology dependency and non-linear temporal dynamic. The explanation of TGNNs is of vital importance for a transparent and trustworthy model. However, the complex topology structure and temporal dependency make explaining TGNN models very challenging. In this paper, we propose a novel explainer framework for TGNN models. Given a time series on a graph to be explained, the framework can identify dominant explanations in the form of a probabilistic graphical model in a time period. Case studies on the transportation domain demonstrate that the proposed approach can discover dynamic dependency structures in a road network for a time period.

preprint2022arXiv

Large discrepancy between observations and simulations: Implications for urban air quality in China

Chemical transport models (CTMs) have been widely used to provide instructions for the control of ozone (O3) pollution. However, we find large discrepancies between observation- and model-based urban O3 chemical regimes: volatile organic compound (VOC)-limited regimes over N. China and weak nitrogen oxides (NOx)-limited regimes over S. China in observations, in contrast to simulations with widespread distributions of strong NOx-limited regimes. The conflicting O3 evolutions are caused by underestimated urban NOx concentrations and the possible overestimation of biogenic VOC emissions. Reductions in NOx emissions, in response to regulations, have thus led to an unintended deterioration of O3 pollution over N. China provinces, for example, an increase in surface O3 by approximately 7 ppb over the Sichuan Basin (SCB) in 2014-2020. The NOx-induced urban O3 changes resulted in an increase in premature mortality by approximately 3000 cases in 2015-2020.

preprint2022arXiv

Modeling Reservoir Release Using Pseudo-Prospective Learning and Physical Simulations to Predict Water Temperature

This paper proposes a new data-driven method for predicting water temperature in stream networks with reservoirs. The water flows released from reservoirs greatly affect the water temperature of downstream river segments. However, the information of released water flow is often not available for many reservoirs, which makes it difficult for data-driven models to capture the impact to downstream river segments. In this paper, we first build a state-aware graph model to represent the interactions amongst streams and reservoirs, and then propose a parallel learning structure to extract the reservoir release information and use it to improve the prediction. In particular, for reservoirs with no available release information, we mimic the water managers' release decision process through a pseudo-prospective learning method, which infers the release information from anticipated water temperature dynamics. For reservoirs with the release information, we leverage a physics-based model to simulate the water release temperature and transfer such information to guide the learning process for other reservoirs. The evaluation for the Delaware River Basin shows that the proposed method brings over 10\% accuracy improvement over existing data-driven models for stream temperature prediction when the release data is not available for any reservoirs. The performance is further improved after we incorporate the release data and physical simulations for a subset of reservoirs.

preprint2022arXiv

Spatiotemporal Data Mining: A Survey

Spatiotemporal data mining aims to discover interesting, useful but non-trivial patterns in big spatial and spatiotemporal data. They are used in various application domains such as public safety, ecology, epidemiology, earth science, etc. This problem is challenging because of the high societal cost of spurious patterns and exorbitant computational cost. Recent surveys of spatiotemporal data mining need update due to rapid growth. In addition, they did not adequately survey parallel techniques for spatiotemporal data mining. This paper provides a more up-to-date survey of spatiotemporal data mining methods. Furthermore, it has a detailed survey of parallel formulations of spatiotemporal data mining.

preprint2021arXiv

Flood Extent Mapping based on High Resolution Aerial Imagery and DEM: A Hidden Markov Tree Approach

Flood extent mapping plays a crucial role in disaster management and national water forecasting. In recent years, high-resolution optical imagery becomes increasingly available with the deployment of numerous small satellites and drones. However, analyzing such imagery data to extract flood extent poses unique challenges due to the rich noise and shadows, obstacles (e.g., tree canopies, clouds), and spectral confusion between pixel classes (flood, dry) due to spatial heterogeneity. Existing machine learning techniques often focus on spectral and spatial features from raster images without fully incorporating the geographic terrain within classification models. In contrast, we recently proposed a novel machine learning model called geographical hidden Markov tree that integrates spectral features of pixels and topographic constraints from Digital Elevation Model (DEM) data (i.e., water flow directions) in a holistic manner. This paper evaluates the model through case studies on high-resolution aerial imagery from the National Oceanic and Atmospheric Administration (NOAA) National Geodetic Survey together with DEM. Three scenes are selected in heavily vegetated floodplains near the cities of Grimesland and Kinston in North Carolina during Hurricane Matthew floods in 2016. Results show that the proposed hidden Markov tree model outperforms several state of the art machine learning algorithms (e.g., random forests, gradient boosted model) by an improvement of F-score (the harmonic mean of the user's accuracy and producer's accuracy) from around 70% to 80% to over 95% on our datasets.

preprint2020arXiv

Deep Neural Network for 3D Surface Segmentation based on Contour Tree Hierarchy

Given a 3D surface defined by an elevation function on a 2D grid as well as non-spatial features observed at each pixel, the problem of surface segmentation aims to classify pixels into contiguous classes based on both non-spatial features and surface topology. The problem has important applications in hydrology, planetary science, and biochemistry but is uniquely challenging for several reasons. First, the spatial extent of class segments follows surface contours in the topological space, regardless of their spatial shapes and directions. Second, the topological structure exists in multiple spatial scales based on different surface resolutions. Existing widely successful deep learning models for image segmentation are often not applicable due to their reliance on convolution and pooling operations to learn regular structural patterns on a grid. In contrast, we propose to represent surface topological structure by a contour tree skeleton, which is a polytree capturing the evolution of surface contours at different elevation levels. We further design a graph neural network based on the contour tree hierarchy to model surface topological structure at different spatial scales. Experimental evaluations based on real-world hydrological datasets show that our model outperforms several baseline methods in classification accuracy.

preprint2020arXiv

Impacts of COVID-19 control measures on tropospheric NO$_2$ over China, South Korea and Italy

Tropospheric nitrogen dioxide (NO$_2$) concentrations are strongly affected by anthropogenic activities. Using space-based measurements of tropospheric NO$_2$, here we investigate the responses of tropospheric NO$_2$ to the 2019 novel coronavirus (COVID-19) over China, South Korea, and Italy. We find noticeable reductions of tropospheric NO$_2$ columns due to the COVID-19 controls by more than 40% over E. China, South Korea, and N. Italy. The 40% reductions of tropospheric NO$_2$ are coincident with intensive lockdown events as well as up to 20% reductions in anthropogenic nitrogen oxides (NO$_x$) emissions. The perturbations in tropospheric NO$_2$ diminished accompanied with the mitigation of COVID-19 pandemic, and finally disappeared within around 50-70 days after the starts of control measures over all three nations, providing indications for the start, maximum, and mitigation of intensive controls. This work exhibits significant influences of lockdown measures on atmospheric environment, highlighting the importance of satellite observations to monitor anthropogenic activity changes.

preprint2020arXiv

Learning Event-Based Motion Deblurring

Recovering sharp video sequence from a motion-blurred image is highly ill-posed due to the significant loss of motion information in the blurring process. For event-based cameras, however, fast motion can be captured as events at high time rate, raising new opportunities to exploring effective solutions. In this paper, we start from a sequential formulation of event-based motion deblurring, then show how its optimization can be unfolded with a novel end-to-end deep architecture. The proposed architecture is a convolutional recurrent neural network that integrates visual and temporal knowledge of both global and local scales in principled manner. To further improve the reconstruction, we propose a differentiable directional event filtering module to effectively extract rich boundary prior from the stream of events. We conduct extensive experiments on the synthetic GoPro dataset and a large newly introduced dataset captured by a DAVIS240C camera. The proposed approach achieves state-of-the-art reconstruction quality, and generalizes better to handling real-world motion blur.

preprint2020arXiv

Semi-supervised Learning with the EM Algorithm: A Comparative Study between Unstructured and Structured Prediction

Semi-supervised learning aims to learn prediction models from both labeled and unlabeled samples. There has been extensive research in this area. Among existing work, generative mixture models with Expectation-Maximization (EM) is a popular method due to clear statistical properties. However, existing literature on EM-based semi-supervised learning largely focuses on unstructured prediction, assuming that samples are independent and identically distributed. Studies on EM-based semi-supervised approach in structured prediction is limited. This paper aims to fill the gap through a comparative study between unstructured and structured methods in EM-based semi-supervised learning. Specifically, we compare their theoretical properties and find that both methods can be considered as a generalization of self-training with soft class assignment of unlabeled samples, but the structured method additionally considers structural constraint in soft class assignment. We conducted a case study on real-world flood mapping datasets to compare the two methods. Results show that structured EM is more robust to class confusion caused by noise and obstacles in features in the context of the flood mapping application.

preprint2020arXiv

Spatial Classification With Limited Observations Based On Physics-Aware Structural Constraint

Spatial classification with limited feature observations has been a challenging problem in machine learning. The problem exists in applications where only a subset of sensors are deployed at certain spots or partial responses are collected in field surveys. Existing research mostly focuses on addressing incomplete or missing data, e.g., data cleaning and imputation, classification models that allow for missing feature values or model missing features as hidden variables in the EM algorithm. These methods, however, assume that incomplete feature observations only happen on a small subset of samples, and thus cannot solve problems where the vast majority of samples have missing feature observations. To address this issue, we recently proposed a new approach that incorporates physics-aware structural constraint into the model representation. Our approach assumes that a spatial contextual feature is observed for all sample locations and establishes spatial structural constraint from the underlying spatial contextual feature map. We design efficient algorithms for model parameter learning and class inference. This paper extends our recent approach by allowing feature values of samples in each class to follow a multi-modal distribution. We propose learning algorithms for the extended model with multi-modal distribution. Evaluations on real-world hydrological applications show that our approach significantly outperforms baseline methods in classification accuracy, and the multi-modal extension is more robust than our early single-modal version especially when feature distribution in training samples is multi-modal. Computational experiments show that the proposed solution is computationally efficient on large datasets.

Zhe Jiang

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Table as a Modality for Large Language Models

VNU-Bench: A Benchmarking Dataset for Multi-Source Multimodal News Video Understanding

An Explainer for Temporal Graph Neural Networks

Large discrepancy between observations and simulations: Implications for urban air quality in China

Modeling Reservoir Release Using Pseudo-Prospective Learning and Physical Simulations to Predict Water Temperature

Spatiotemporal Data Mining: A Survey

Flood Extent Mapping based on High Resolution Aerial Imagery and DEM: A Hidden Markov Tree Approach

Deep Neural Network for 3D Surface Segmentation based on Contour Tree Hierarchy

Impacts of COVID-19 control measures on tropospheric NO$_2$ over China, South Korea and Italy

Learning Event-Based Motion Deblurring

Semi-supervised Learning with the EM Algorithm: A Comparative Study between Unstructured and Structured Prediction

Spatial Classification With Limited Observations Based On Physics-Aware Structural Constraint