Source author record

Siqi Wang

Siqi Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci Machine Learning Distributed, Parallel, and Cluster Computing eess.IV eess.SP eess.SY math.OC Systems and Control astro-ph.IM cond-mat.mes-hall Multimedia Performance

Catalog footprint

What is connected

14works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ASCNet: Research on all-sky camera images classification at the Muztagh-ata site

Cloud coverage is one of the crucial elements of site testing in astronomy. All-sky camera (ASC) images are beneficial for our research on cloud coverage. In this paper, we propose ASCNet, an innovative model specifically designed for classifying nighttime ASC images collected at the Muztagh-ata site from 2022 March to 2024 June. ASCNet integrates ResNet34 with an ASCModule, which employs Depthwise Dilated Convolution and embeds lightweight Squeeze-and-Excitation attention within its branches to extract fine-grained texture information from the luminance channel. The data set is partitioned by category, with 70% of images assigned to the training set and 30% to the test set. The model's performance is assessed by comparing its predictions on the test set with manually annotated labels, yielding a consistency rate of 92.7%. All evaluation metrics of ASCNet are as follows: Accuracy 92.66%, Precision 83.26%, Recall 84.25%, and F1-Score 83.67%, and both ablation and comparative experiments demonstrate significant superiority over other models. A confusion matrix is utilized to analyze the differences between manual classification and model classification. The statistical results demonstrate the model's excellent classification performance and its robust generalization ability, illustrating that ASCNet has potential for application in future astronomical image classifications.

preprint2026arXiv

Beyond Feature Mapping GAP: Integrating Real HDRTV Priors for Superior SDRTV-to-HDRTV Conversion

The rise of HDR-WCG display devices has highlighted the need to convert SDRTV to HDRTV, as most video sources are still in SDR. Existing methods primarily focus on designing neural networks to learn a single-style mapping from SDRTV to HDRTV. However, the limited information in SDRTV and the diversity of styles in real-world conversions render this process an ill-posed problem, thereby constraining the performance and generalization of these methods. Inspired by generative approaches, we propose a novel method for SDRTV to HDRTV conversion guided by real HDRTV priors. Despite the limited information in SDRTV, introducing real HDRTV as reference priors significantly constrains the solution space of the originally high-dimensional ill-posed problem. This shift transforms the task from solving an unreferenced prediction problem to making a referenced selection, thereby markedly enhancing the accuracy and reliability of the conversion process. Specifically, our approach comprises two stages: the first stage employs a Vector Quantized Generative Adversarial Network to capture HDRTV priors, while the second stage matches these priors to the input SDRTV content to recover realistic HDRTV outputs. We evaluate our method on public datasets, demonstrating its effectiveness with significant improvements in both objective and subjective metrics across real and synthetic datasets.

preprint2024arXiv

Real-Time Asphalt Pavement Layer Thickness Prediction Using Ground-Penetrating Radar Based on a Modified Extended Common Mid-Point (XCMP) Approach

The conventional surface reflection method has been widely used to measure the asphalt pavement layer dielectric constant using ground-penetrating radar (GPR). This method may be inaccurate for in-service pavement thickness estimation with dielectric constant variation through the depth, which could be addressed using the extended common mid-point method (XCMP) with air-coupled GPR antennas. However, the factors affecting the XCMP method on thickness prediction accuracy haven't been studied. Manual acquisition of key factors is required, which hinders its real-time applications. This study investigates the affecting factors and develops a modified XCMP method to allow automatic thickness prediction of in-service asphalt pavement with non-uniform dielectric properties through depth. A sensitivity analysis was performed, necessitating the accurate estimation of time of flights (TOFs) from antenna pairs. A modified XCMP method based on edge detection was proposed to allow real-time TOFs estimation, then dielectric constant and thickness predictions. Field tests using a multi-channel GPR system were performed for validation. Both the surface reflection and XCMP setups were conducted. Results show that the modified XCMP method is recommended with a mean prediction error of 1.86%, which is more accurate than the surface reflection method (5.73%).

preprint2022arXiv

An Effective Transformer-based Solution for RSNA Intracranial Hemorrhage Detection Competition

We present an effective method for Intracranial Hemorrhage Detection (IHD) which exceeds the performance of the winner solution in RSNA-IHD competition (2019). Meanwhile, our model only takes quarter parameters and ten percent FLOPs compared to the winner's solution. The IHD task needs to predict the hemorrhage category of each slice for the input brain CT. We review the top-5 solutions for the IHD competition held by the Radiological Society of North America(RSNA) in 2019. Nearly all the top solutions rely on 2D convolutional networks and sequential models (Bidirectional GRU or LSTM) to extract intra-slice and inter-slice features, respectively. All the top solutions enhance the performance by leveraging the model ensemble, and the model number varies from 7 to 31. In the past years, since much progress has been made in the computer vision regime especially Transformer-based models, we introduce the Transformer-based techniques to extract the features in both intra-slice and inter-slice views for IHD tasks. Additionally, a semi-supervised method is embedded into our workflow to further improve the performance. The code is available in the manuscript.

preprint2022arXiv

Deep Anomaly Discovery From Unlabeled Videos via Normality Advantage and Self-Paced Refinement

While classic video anomaly detection (VAD) requires labeled normal videos for training, emerging unsupervised VAD (UVAD) aims to discover anomalies directly from fully unlabeled videos. However, existing UVAD methods still rely on shallow models to perform detection or initialization, and they are evidently inferior to classic VAD methods. This paper proposes a full deep neural network (DNN) based solution that can realize highly effective UVAD. First, we, for the first time, point out that deep reconstruction can be surprisingly effective for UVAD, which inspires us to unveil a property named "normality advantage", i.e., normal events will enjoy lower reconstruction loss when DNN learns to reconstruct unlabeled videos. With this property, we propose Localization based Reconstruction (LBR) as a strong UVAD baseline and a solid foundation of our solution. Second, we propose a novel self-paced refinement (SPR) scheme, which is synthesized into LBR to conduct UVAD. Unlike ordinary self-paced learning that injects more samples in an easy-to-hard manner, the proposed SPR scheme gradually drops samples so that suspicious anomalies can be removed from the learning process. In this way, SPR consolidates normality advantage and enables better UVAD in a more proactive way. Finally, we further design a variant solution that explicitly takes the motion cues into account. The solution evidently enhances the UVAD performance, and it sometimes even surpasses the best classic VAD methods. Experiments show that our solution not only significantly outperforms existing UVAD methods by a wide margin (5% to 9% AUROC), but also enables UVAD to catch up with the mainstream performance of classic VAD.

preprint2022arXiv

Pervasive beyond room-temperature ferromagnetism in a doped van der Waals magnet: Ni doped Fe$_5$GeTe$_2$ with $T_{\text{C}}$ up to 478 K

The existence of long range magnetic order in low dimensional magnetic systems, such as the quasi-two-dimensional (2D) van der Waals (vdW) magnets, has attracted intensive studies of new physical phenomena. The vdW Fe$_N$GeTe$_2$ ($N$ = 3, 4, 5; FGT) family is exceptional owing to its vast tunability of magnetic properties. Particularly, a ferromagnetic ordering temperature ($T_{\text{C}}$) above room temperature at $N$ = 5 (F5GT) is observed. Here, our study shows that, by nickel (Ni) substitution of iron (Fe) in F5GT, a record high $T_{\text{C}}$ = 478(6) K is achieved. Importantly, pervasive, beyond-room-temperature ferromagnetism exists in almost the entire doping range of the phase diagram of Ni-F5GT. We argue that this striking observation in Ni-F5GT can be possibly due to several contributing factors, in which the structural alteration enhanced 3D magnetic couplings might be critical for enhancing the ferromagnetic order.

preprint2020arXiv

Berry curvature memory through electrically driven stacking transitions

In two-dimensional layered quantum materials, the stacking order of the layers determines both the crystalline symmetry and electronic properties such as the Berry curvature, topology and electron correlation. Electrical stimuli can influence quasiparticle interactions and the free-energy landscape, making it possible to dynamically modify the stacking order and reveal hidden structures that host different quantum properties. Here we demonstrate electrically driven stacking transitions that can be applied to design nonvolatile memory based on Berry curvature in few-layer WTe$_2$. The interplay of out-of-plane electric fields and electrostatic doping controls in-plane interlayer sliding and creates multiple polar and centrosymmetric stacking orders. In situ nonlinear Hall transport reveals such stacking rearrangements result in a layer-parity-selective Berry curvature memory in momentum space, where the sign reversal of the Berry curvature and its dipole only occurs in odd-layer crystals. Our findings open an avenue towards exploring coupling between topology, electron correlations, and ferroelectricity in hidden stacking orders and demonstrate a new low-energy-cost, electrically controlled topological memory in the atomically thin limit.

preprint2020arXiv

Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events

As a vital topic in media content interpretation, video anomaly detection (VAD) has made fruitful progress via deep neural network (DNN). However, existing methods usually follow a reconstruction or frame prediction routine. They suffer from two gaps: (1) They cannot localize video activities in a both precise and comprehensive manner. (2) They lack sufficient abilities to utilize high-level semantics and temporal context information. Inspired by frequently-used cloze test in language study, we propose a brand-new VAD solution named Video Event Completion (VEC) to bridge gaps above: First, we propose a novel pipeline to achieve both precise and comprehensive enclosure of video activities. Appearance and motion are exploited as mutually complimentary cues to localize regions of interest (RoIs). A normalized spatio-temporal cube (STC) is built from each RoI as a video event, which lays the foundation of VEC and serves as a basic processing unit. Second, we encourage DNN to capture high-level semantics by solving a visual cloze test. To build such a visual cloze test, a certain patch of STC is erased to yield an incomplete event (IE). The DNN learns to restore the original video event from the IE by inferring the missing patch. Third, to incorporate richer motion dynamics, another DNN is trained to infer erased patches' optical flow. Finally, two ensemble strategies using different types of IE and modalities are proposed to boost VAD performance, so as to fully exploit the temporal context and modality information for VAD. VEC can consistently outperform state-of-the-art methods by a notable margin (typically 1.5%-5% AUROC) on commonly-used VAD benchmarks. Our codes and results can be verified at github.com/yuguangnudt/VEC_VAD.

preprint2020arXiv

Evaluating Load Models and Their Impacts on Power Transfer Limits

Power transfer limits or transfer capability (TC) directly relate to the system operation and control as well as electricity markets. As a consequence, their assessment has to comply with static constraints, such as line thermal limits, and dynamic constraints, such as transient stability limits, voltage stability limits and small-signal stability limits. Since the load dynamics have substantial impacts on power system transient stability, load models are one critical factor that affects the power transfer limits. Currently, multiple load models have been proposed and adopted in the industry and academia, including the ZIP model, ZIP plus induction motor composite model (ZIP + IM) and WECC composite load model (WECC CLM). Each of them has its unique advantages, but their impacts on the power transfer limits are not yet adequately addressed. One existing challenge is fitting the high-order nonlinear models such as WECC CLM. In this study, we innovatively adopt double deep Q-learning Network (DDQN) agent as a general load modeling tool in the dynamic assessment procedure and fit the same transient field measurements into different load models. A comprehensive evaluation is then conducted to quantify the load models' impacts on the power transfer limits. The simulation environment is the IEEE-39 bus system constructed in Transient Security Assessment Tool (TSAT).

preprint2020arXiv

Global Sensitivity Analysis in Load Modeling via Low-rank Tensor

Growing model complexities in load modeling have created high dimensionality in parameter estimations, and thereby substantially increasing associated computational costs. In this paper, a tensor-based method is proposed for identifying composite load modeling (CLM) parameters and for conducting a global sensitivity analysis. Tensor format and Fokker-Planck equations are used to estimate the power output response of CLM in the context of simultaneously varying parameters under their full parameter distribution ranges. The proposed tensor structured is shown as effective for tackling high-dimensional parameter estimation and for improving computational performances in load modeling through global sensitivity analysis.

preprint2020arXiv

High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core Processors

IoT Edge intelligence requires Convolutional Neural Network (CNN) inference to take place in the edge devices itself. ARM big.LITTLE architecture is at the heart of prevalent commercial edge devices. It comprises of single-ISA heterogeneous cores grouped into multiple homogeneous clusters that enable power and performance trade-offs. All cores are expected to be simultaneously employed in inference to attain maximal throughput. However, high communication overhead involved in parallelization of computations from convolution kernels across clusters is detrimental to throughput. We present an alternative framework called Pipe-it that employs pipelined design to split convolutional layers across clusters while limiting parallelization of their respective kernels to the assigned cluster. We develop a performance-prediction model that utilizes only the convolutional layer descriptors to predict the execution time of each layer individually on all permitted core configurations (type and count). Pipe-it then exploits the predictions to create a balanced pipeline using an efficient design space exploration algorithm. Pipe-it on average results in a 39% higher throughput than the highest antecedent throughput.

preprint2020arXiv

Neural Network Inference on Mobile SoCs

The ever-increasing demand from mobile Machine Learning (ML) applications calls for evermore powerful on-chip computing resources. Mobile devices are empowered with heterogeneous multi-processor Systems-on-Chips (SoCs) to process ML workloads such as Convolutional Neural Network (CNN) inference. Mobile SoCs house several different types of ML capable components on-die, such as CPU, GPU, and accelerators. These different components are capable of independently performing inference but with very different power-performance characteristics. In this article, we provide a quantitative evaluation of the inference capabilities of the different components on mobile SoCs. We also present insights behind their respective power-performance behavior. Finally, we explore the performance limit of the mobile SoCs by synergistically engaging all the components concurrently. We observe that a mobile SoC provides up to 2x improvement with parallel inference when all its components are engaged, as opposed to engaging only one component.

preprint2020arXiv

Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification

Gait-based person re-identification (Re-ID) is valuable for safety-critical applications, and using only 3D skeleton data to extract discriminative gait features for person Re-ID is an emerging open topic. Existing methods either adopt hand-crafted features or learn gait features by traditional supervised learning paradigms. Unlike previous methods, we for the first time propose a generic gait encoding approach that can utilize unlabeled skeleton data to learn gait representations in a self-supervised manner. Specifically, we first propose to introduce self-supervision by learning to reconstruct input skeleton sequences in reverse order, which facilitates learning richer high-level semantics and better gait representations. Second, inspired by the fact that motion's continuity endows temporally adjacent skeletons with higher correlations ("locality"), we propose a locality-aware attention mechanism that encourages learning larger attention weights for temporally adjacent skeletons when reconstructing current skeleton, so as to learn locality when encoding gait. Finally, we propose Attention-based Gait Encodings (AGEs), which are built using context vectors learned by locality-aware attention, as final gait representations. AGEs are directly utilized to realize effective person Re-ID. Our approach typically improves existing skeleton-based methods by 10-20% Rank-1 accuracy, and it achieves comparable or even superior performance to multi-modal methods with extra RGB or depth information. Our codes are available at https://github.com/Kali-Hac/SGE-LA.

preprint2019arXiv

Observation of Rydberg exciton polaritons and their condensate in a perovskite cavity

The condensation of half-light half-matter exciton polaritons in semiconductor optical cavities is a striking example of macroscopic quantum coherence in a solid state platform. Quantum coherence is possible only when there are strong interactions between the exciton polaritons provided by their excitonic constituents. Rydberg excitons with high principle value exhibit strong dipole-dipole interactions in cold atoms. However, polaritons with the excitonic constituent that is an excited state, namely Rydberg exciton polaritons (REPs), have not yet been experimentally observed. Here, for the first time, we observe the formation of REPs in a single crystal CsPbBr3 perovskite cavity without any external fields. These polaritons exhibit strong nonlinear behavior that leads to a coherent polariton condensate with a prominent blue shift. Furthermore, the REPs in CsPbBr3 are highly anisotropic and have a large extinction ratio, arising from the perovskite's orthorhombic crystal structure. Our observation not only sheds light on the importance of many-body physics in coherent polariton systems involving higher-order excited states, but also paves the way for exploring these coherent interactions for solid state quantum optical information processing.

Siqi Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

ASCNet: Research on all-sky camera images classification at the Muztagh-ata site

Beyond Feature Mapping GAP: Integrating Real HDRTV Priors for Superior SDRTV-to-HDRTV Conversion

Real-Time Asphalt Pavement Layer Thickness Prediction Using Ground-Penetrating Radar Based on a Modified Extended Common Mid-Point (XCMP) Approach

An Effective Transformer-based Solution for RSNA Intracranial Hemorrhage Detection Competition

Deep Anomaly Discovery From Unlabeled Videos via Normality Advantage and Self-Paced Refinement

Pervasive beyond room-temperature ferromagnetism in a doped van der Waals magnet: Ni doped Fe$_5$GeTe$_2$ with $T_{\text{C}}$ up to 478 K

Berry curvature memory through electrically driven stacking transitions

Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events

Evaluating Load Models and Their Impacts on Power Transfer Limits

Global Sensitivity Analysis in Load Modeling via Low-rank Tensor

High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core Processors

Neural Network Inference on Mobile SoCs

Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification

Observation of Rydberg exciton polaritons and their condensate in a perovskite cavity