Source author record

Bowen Zhang

Bowen Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Systems and Control Artificial Intelligence Computation and Language eess.IV cond-mat.str-el Cryptography and Security eess.AS eess.SP Software Engineering Sound

Catalog footprint

What is connected

21works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Systematic Comparison of Prompting and Multi-Agent Methods for LLM-based Stance Detection

Stance detection identifies the attitude of a text author toward a given target. Recent studies have explored various LLM-based strategies for this task, from zero-shot prompting to multi-agent debate. However, existing works differ in data splits, base models, and evaluation protocols, making fair comparison difficult. We conduct a systematic comparison that evaluates five methods across two categories -- prompt-based inference (Direct Prompting, Auto-CoT, StSQA) and agent-based debate (COLA, MPRF) -- on four datasets with 14 subtasks, using 15 LLMs from six model families with parameter sizes from 7B to 72B+. Our experiments yield several findings. First, on all models with complete results, the best prompt-based method outperforms the best agent-based method, while agent methods require 7 to 12 times more API calls per sample. Second, model scale has a larger impact on performance than method choice, with gains plateauing around 32B. Third, reasoning-enhanced models (DeepSeek-R1) do not consistently outperform general models of the same size on this task.

preprint2026arXiv

Improving Day-Ahead Grid Carbon Intensity Forecasting by Joint Modeling of Local-Temporal and Cross-Variable Dependencies Across Different Frequencies

Accurate forecasting of the grid carbon intensity factor (CIF) is critical for enabling demand-side management and reducing emissions in modern electricity systems. Leveraging multiple interrelated time series, CIF prediction is typically formulated as a multivariate time series forecasting problem. Despite advances in deep learning-based methods, it remains challenging to capture the fine-grained local-temporal dependencies, dynamic higher-order cross-variable dependencies, and complex multi-frequency patterns for CIF forecasting. To address these issues, we propose a novel model that integrates two parallel modules: 1) one enhances the extraction of local-temporal dependencies under multi-frequency by applying multiple wavelet-based convolutional kernels to overlapping patches of varying lengths; 2) the other captures dynamic cross-variable dependencies under multi-frequency to model how inter-variable relationships evolve across the time-frequency domain. Evaluations on four representative electricity markets from Australia, featuring varying levels of renewable penetration, demonstrate that the proposed method outperforms the state-of-the-art models. An ablation study further validates the complementary benefits of the two proposed modules. Designed with built-in interpretability, the proposed model also enables better understanding of its predictive behavior, as shown in a case study where it adaptively shifts attention to relevant variables and time intervals during a disruptive event.

preprint2024arXiv

Cross-target Stance Detection by Exploiting Target Analytical Perspectives

Cross-target stance detection (CTSD) is an important task, which infers the attitude of the destination target by utilizing annotated data derived from the source target. One important approach in CTSD is to extract domain-invariant features to bridge the knowledge gap between multiple targets. However, the analysis of informal and short text structure, and implicit expressions, complicate the extraction of domain-invariant knowledge. In this paper, we propose a Multi-Perspective Prompt-Tuning (MPPT) model for CTSD that uses the analysis perspective as a bridge to transfer knowledge. First, we develop a two-stage instruct-based chain-of-thought method (TsCoT) to elicit target analysis perspectives and provide natural language explanations (NLEs) from multiple viewpoints by formulating instructions based on large language model (LLM). Second, we propose a multi-perspective prompt-tuning framework (MultiPLN) to fuse the NLEs into the stance predictor. Extensive experiments results demonstrate the superiority of MPPT against the state-of-the-art baseline methods.

preprint2024arXiv

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

Equipping a deep model the abaility of few-shot learning, i.e., learning quickly from only few examples, is a core challenge for artificial intelligence. Gradient-based meta-learning approaches effectively address the challenge by learning how to learn novel tasks. Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (i.e., its hyperparameters), while the inner-loop process leverage it to optimize a task-specific model by using only few labeled data. Although these existing methods have shown superior performance, the outer-loop process requires calculating second-order derivatives along the inner optimization path, which imposes considerable memory burdens and the risk of vanishing gradients. Drawing inspiration from recent progress of diffusion models, we find that the inner-loop gradient descent process can be actually viewed as a reverse process (i.e., denoising) of diffusion where the target of denoising is model weights but the origin data. Based on this fact, in this paper, we propose to model the gradient descent optimizer as a diffusion model and then present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights from Gaussion noises to target weights in a denoising manner. Thanks to the training efficiency of diffusion models, our MetaDiff do not need to differentiate through the inner-loop path such that the memory burdens and the risk of vanishing gradients can be effectvely alleviated. Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks.

preprint2022arXiv

Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

Recent studies have shown that the benefits provided by self-supervised pre-training and self-training (pseudo-labeling) are complementary. Semi-supervised fine-tuning strategies under the pre-training framework, however, remain insufficiently studied. Besides, modern semi-supervised speech recognition algorithms either treat unlabeled data indiscriminately or filter out noisy samples with a confidence threshold. The dissimilarities among different unlabeled data are often ignored. In this paper, we propose Censer, a semi-supervised speech recognition algorithm based on self-supervised pre-training to maximize the utilization of unlabeled data. The pre-training stage of Censer adopts wav2vec2.0 and the fine-tuning stage employs an improved semi-supervised learning algorithm from slimIPL, which leverages unlabeled data progressively according to their pseudo labels' qualities. We also incorporate a temporal pseudo label pool and an exponential moving average to control the pseudo labels' update frequency and to avoid model divergence. Experimental results on Libri-Light and LibriSpeech datasets manifest our proposed method achieves better performance compared to existing approaches while being more unified.

preprint2022arXiv

Deep Learning-Based Perceptual Stimulus Encoder for Bionic Vision

Retinal implants have the potential to treat incurable blindness, yet the quality of the artificial vision they produce is still rudimentary. An outstanding challenge is identifying electrode activation patterns that lead to intelligible visual percepts (phosphenes). Here we propose a PSE based on CNN that is trained in an end-to-end fashion to predict the electrode activation patterns required to produce a desired visual percept. We demonstrate the effectiveness of the encoder on MNIST using a psychophysically validated phosphene model tailored to individual retinal implant users. The present work constitutes an essential first step towards improving the quality of the artificial vision provided by retinal implants.

preprint2022arXiv

FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling

The recently proposed FixMatch achieved state-of-the-art results on most semi-supervised learning (SSL) benchmarks. However, like other modern SSL algorithms, FixMatch uses a pre-defined constant threshold for all classes to select unlabeled data that contribute to the training, thus failing to consider different learning status and learning difficulties of different classes. To address this issue, we propose Curriculum Pseudo Labeling (CPL), a curriculum learning approach to leverage unlabeled data according to the model's learning status. The core of CPL is to flexibly adjust thresholds for different classes at each time step to let pass informative unlabeled data and their pseudo labels. CPL does not introduce additional parameters or computations (forward or backward propagation). We apply CPL to FixMatch and call our improved algorithm FlexMatch. FlexMatch achieves state-of-the-art performance on a variety of SSL benchmarks, with especially strong performances when the labeled data are extremely limited or when the task is challenging. For example, FlexMatch achieves 13.96% and 18.96% error rate reduction over FixMatch on CIFAR-100 and STL-10 datasets respectively, when there are only 4 labels per class. CPL also significantly boosts the convergence speed, e.g., FlexMatch can use only 1/5 training time of FixMatch to achieve even better performance. Furthermore, we show that CPL can be easily adapted to other SSL algorithms and remarkably improve their performances. We open-source our code at https://github.com/TorchSSL/TorchSSL.

preprint2022arXiv

Instance and Panoptic Segmentation Using Conditional Convolutions

We propose a simple yet effective framework for instance and panoptic segmentation, termed CondInst (conditional convolutions for instance and panoptic segmentation). In the literature, top-performing instance segmentation methods typically follow the paradigm of Mask R-CNN and rely on ROI operations (typically ROIAlign) to attend to each instance. In contrast, we propose to attend to the instances with dynamic conditional convolutions. Instead of using instance-wise ROIs as inputs to the instance mask head of fixed weights, we design dynamic instance-aware mask heads, conditioned on the instances to be predicted. CondInst enjoys three advantages: 1.) Instance and panoptic segmentation are unified into a fully convolutional network, eliminating the need for ROI cropping and feature alignment. 2.) The elimination of the ROI cropping also significantly improves the output instance mask resolution. 3.) Due to the much improved capacity of dynamically-generated conditional convolutions, the mask head can be very compact (e.g., 3 conv. layers, each having only 8 channels), leading to significantly faster inference time per instance and making the overall inference time almost constant, irrelevant to the number of instances. We demonstrate a simpler method that can achieve improved accuracy and inference speed on both instance and panoptic segmentation tasks. On the COCO dataset, we outperform a few state-of-the-art methods. We hope that CondInst can be a strong baseline for instance and panoptic segmentation. Code is available at: https://git.io/AdelaiDet

preprint2022arXiv

StyleSwin: Transformer-based GAN for High-resolution Image Generation

Despite the tantalizing success in a broad of vision tasks, transformers have not yet demonstrated on-par ability as ConvNets in high-resolution image generative modeling. In this paper, we seek to explore using pure transformers to build a generative adversarial network for high-resolution image synthesis. To this end, we believe that local attention is crucial to strike the balance between computational efficiency and modeling capacity. Hence, the proposed generator adopts Swin transformer in a style-based architecture. To achieve a larger receptive field, we propose double attention which simultaneously leverages the context of the local and the shifted windows, leading to improved generation quality. Moreover, we show that offering the knowledge of the absolute position that has been lost in window-based transformers greatly benefits the generation quality. The proposed StyleSwin is scalable to high resolutions, with both the coarse geometry and fine structures benefit from the strong expressivity of transformers. However, blocking artifacts occur during high-resolution synthesis because performing the local attention in a block-wise manner may break the spatial coherency. To solve this, we empirically investigate various solutions, among which we find that employing a wavelet discriminator to examine the spectral discrepancy effectively suppresses the artifacts. Extensive experiments show the superiority over prior transformer-based GANs, especially on high resolutions, e.g., 1024x1024. The StyleSwin, without complex training strategies, excels over StyleGAN on CelebA-HQ 1024, and achieves on-par performance on FFHQ-1024, proving the promise of using transformers for high-resolution image generation. The code and models will be available at https://github.com/microsoft/StyleSwin.

preprint2021arXiv

CREATe: Clinical Report Extraction and Annotation Technology

Clinical case reports are written descriptions of the unique aspects of a particular clinical case, playing an essential role in sharing clinical experiences about atypical disease phenotypes and new therapies. However, to our knowledge, there has been no attempt to develop an end-to-end system to annotate, index, or otherwise curate these reports. In this paper, we propose a novel computational resource platform, CREATe, for extracting, indexing, and querying the contents of clinical case reports. CREATe fosters an environment of sustainable resource support and discovery, enabling researchers to overcome the challenges of information science. An online video of the demonstration can be viewed at https://youtu.be/Q8owBQYTjDc.

preprint2021arXiv

Visually Grounded Concept Composition

We investigate ways to compose complex concepts in texts from primitive ones while grounding them in images. We propose Concept and Relation Graph (CRG), which builds on top of constituency analysis and consists of recursively combined concepts with predicate functions. Meanwhile, we propose a concept composition neural network called Composer to leverage the CRG for visually grounded concept learning. Specifically, we learn the grounding of both primitive and all composed concepts by aligning them to images and show that learning to compose leads to more robust grounding results, measured in text-to-image matching accuracy. Notably, our model can model grounded concepts forming at both the finer-grained sentence level and the coarser-grained intermediate level (or word-level). Composer leads to pronounced improvement in matching accuracy when the evaluation data has significant compound divergence from the training data.

preprint2020arXiv

Characterizing Cryptocurrency Exchange Scams

As the indispensable trading platforms of the ecosystem, hundreds of cryptocurrency exchanges are emerging to facilitate the trading of digital assets. While, it also attracts the attentions of attackers. A number of scam attacks were reported targeting cryptocurrency exchanges, leading to a huge mount of financial loss. However, no previous work in our research community has systematically studied this problem. In this paper, we make the first effort to identify and characterize the cryptocurrency exchange scams. We first identify over 1,500 scam domains and over 300 fake apps, by collecting existing reports and using typosquatting generation techniques. Then we investigate the relationship between them, and identify 94 scam domain families and 30 fake app families. We further characterize the impacts of such scams, and reveal that these scams have incurred financial loss of 520k US dollars at least. We further observe that the fake apps have been sneaked to major app markets (including Google Play) to infect unsuspicious users. Our findings demonstrate the urgency to identify and prevent cryptocurrency exchange scams. To facilitate future research, we have publicly released all the identified scam domains and fake apps to the community.

preprint2020arXiv

Solving Sparse Linear Inverse Problems in Communication Systems: A Deep Learning Approach With Adaptive Depth

Sparse signal recovery problems from noisy linear measurements appear in many areas of wireless communications. In recent years, deep learning (DL) based approaches have attracted interests of researchers to solve the sparse linear inverse problem by unfolding iterative algorithms as neural networks. Typically, research concerning DL assume a fixed number of network layers. However, it ignores a key character in traditional iterative algorithms, where the number of iterations required for convergence changes with varying sparsity levels. By investigating on the projected gradient descent, we unveil the drawbacks of the existing DL methods with fixed depth. Then we propose an end-to-end trainable DL architecture, which involves an extra halting score at each layer. Therefore, the proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase. We conduct experiments using both synthetic data and applications including random access in massive MTC and massive MIMO channel estimation, and the results demonstrate the improved efficiency for the proposed approach.

preprint2020arXiv

Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

We propose a learning model for the task of visual storytelling. The main idea is to predict anchor word embeddings from the images and use the embeddings and the image features jointly to generate narrative sentences. We use the embeddings of randomly sampled nouns from the groundtruth stories as the target anchor word embeddings to learn the predictor. To narrate a sequence of images, we use the predicted anchor word embeddings and the image features as the joint input to a seq2seq model. As opposed to state-of-the-art methods, the proposed model is simple in design, easy to optimize, and attains the best results in most automatic evaluation metrics. In human evaluation, the method also outperforms competing methods.

preprint2019arXiv

Persistent Insulator: Avoidance of Metallization at Megabar Pressures in Strongly Spin-Orbit-Coupled Sr2IrO4

It is commonly anticipated that an insulating state collapses in favor of an emergent metallic state at high pressures as the unit cell shrinks and the electronic bandwidth broadens to fill the insulating energy band gap. Here we report a rare insulating state that persists up to at least 185 GPa in the antiferromagnetic iridate Sr2IrO4, which is the archetypical spin-orbit-driven Jeff = 1/2 insulator. This study shows the electrical resistance of single-crystal Sr2IrO4 initially decreases with applied pressure, reaches a minimum in the range, 32 - 38 GPa, then abruptly rises to fully recover the insulating state with further pressure increases up to 185 GPa. Our synchrotron x-ray diffraction and Raman scattering data show the onset of the rapid increase in resistance is accompanied by a structural phase transition from the native tetragonal I41/acd phase to an orthorhombic Pbca phase (with much reduced symmetry) at 40.6 GPa. The clear-cut correspondence of these two anomalies is key to understanding the stability of the insulating state at megabar pressures: Pressure-induced, severe structural distortions prevent the expected metallization, despite the 26% volume compression attained at the highest pressure accessed in this study. Moreover, the resistance of Sr2IrO4 remains stable while the applied pressure is tripled from 61 GPa to 185 GPa. These results suggest that a novel type of electronic Coulomb correlation compensates the anticipated band broadening in strongly spin-orbit-coupled materials at megabar pressures.

preprint2016arXiv

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016. We follow the basic pipeline of temporal segment networks and further raise the performance via a number of other techniques. Specifically, we use the latest deep model architecture, e.g., ResNet and Inception V3, and introduce new aggregation schemes (top-k and attention-weighted pooling). Additionally, we incorporate the audio as a complementary channel, extracting relevant information via a CNN applied to the spectrograms. With these techniques, we derive an ensemble of deep models, which, together, attains a high classification accuracy (mAP $93.23\%$) on the testing set and secured the first place in the challenge.

preprint2016arXiv

Real-time Action Recognition with Enhanced Motion Vector CNNs

The deep two-stream architecture exhibited excellent performance on video based action recognition. The most computationally expensive step in this approach comes from the calculation of optical flow which prevents it to be real-time. This paper accelerates this architecture by replacing optical flow with motion vector which can be obtained directly from compressed videos without extra calculation. However, motion vector lacks fine structures, and contains noisy and inaccurate motion patterns, leading to the evident degradation of recognition performance. Our key insight for relieving this problem is that optical flow and motion vector are inherent correlated. Transferring the knowledge learned with optical flow CNN to motion vector CNN can significantly boost the performance of the latter. Specifically, we introduce three strategies for this, initialization transfer, supervision transfer and their combination. Experimental results show that our method achieves comparable recognition performance to the state-of-the-art, while our method can process 390.7 frames per second, which is 27 times faster than the original two-stream method.

preprint2015arXiv

Control and Communication Protocols that Enable Smart Building Microgrids

Recent communication, computation, and technology advances coupled with climate change concerns have transformed the near future prospects of electricity transmission, and, more notably, distribution systems and microgrids. Distributed resources (wind and solar generation, combined heat and power) and flexible loads (storage, computing, EV, HVAC) make it imperative to increase investment and improve operational efficiency. Commercial and residential buildings, being the largest energy consumption group among flexible loads in microgrids, have the largest potential and flexibility to provide demand side management. Recent advances in networked systems and the anticipated breakthroughs of the Internet of Things will enable significant advances in demand response capabilities of intelligent load network of power-consuming devices such as HVAC components, water heaters, and buildings. In this paper, a new operating framework, called packetized direct load control (PDLC), is proposed based on the notion of quantization of energy demand. This control protocol is built on top of two communication protocols that carry either complete or binary information regarding the operation status of the appliances. We discuss the optimal demand side operation for both protocols and analytically derive the performance differences between the protocols. We propose an optimal reservation strategy for traditional and renewable energy for the PDLC in both day-ahead and real time markets. In the end we discuss the fundamental trade-off between achieving controllability and endowing flexibility.

preprint2015arXiv

The Kirchhoff-Braess Paradox and Its Implications for Smart Microgrids

Well known in the theory of network flows, Braess paradox states that in a congested network, it may happen that adding a new path between destinations can increase the level of congestion. In transportation networks the phenomenon results from the decisions of network participants who selfishly seek to optimize their own performance metrics. In an electric power distribution network, an analogous increase in congestion can arise as a consequence Kirchhoff's laws. Even for the simplest linear network of resistors and voltage sources, the sudden appearance of congestion due to an additional conductive line is a nonlinear phenomenon that results in a discontinuous change in the network state. It is argued that the phenomenon can occur in almost any grid in which they are loops, and with the increasing penetration of small-scale distributed generation it suggests challenges ahead in the operation of microgrids.

preprint2013arXiv

A Packetized Direct Load Control Mechanism for Demand Side Management

Electricity peaks can be harmful to grid stability and result in additional generation costs to balance supply with demand. By developing a network of smart appliances together with a quasi-decentralized control protocol, direct load control (DLC) provides an opportunity to reduce peak consumption by directly controlling the on/off switch of the networked appliances. This paper proposes a packetized DLC (PDLC) solution that is illustrated by an application to air conditioning temperature control. Here the term packetized refers to a fixed time energy usage authorization. The consumers in each room choose their preferred set point, and then an operator of the local appliance pool will determine the comfort band around the set point. We use a thermal dynamic model to investigate the duty cycle of thermostatic appliances. Three theorems are proposed in this paper. The first two theorems evaluate the performance of the PDLC in both transient and steady state operation. The first theorem proves that the average room temperature would converge to the average room set point with fixed number of packets applied in each discrete interval. The second theorem proves that the PDLC solution guarantees to control the temperature of all the rooms within their individual comfort bands. The third theorem proposes an allocation method to link the results in theorem 1 and assumptions in theorem 2 such that the overall PDLC solution works. The direct result of the theorems is that we can reduce the consumption oscillation that occurs when no control is applied. Simulation is provided to verify theoretical results.

preprint2013arXiv

A Two Level Feedback System Design to Regulation Service Provision

Demand side management has gained increasing importance as the penetration of renewable energy grows. Based on a Markov jump process modelling of a group of thermostatic loads, this paper proposes a two level feedback system design between the independent system operator (ISO) and the regulation service provider such that two objectives are achieved: (1) the ISO can optimally dispatch regulation signals to multiple providers in real time in order to reduce the requirement for expensive spinning reserves, and (2) each regulation provider can control its thermostatic loads to respond the ISO signal. It is also shown that the amount of regulation service that can be provided is implicitly restricted by a few fundamental parameters of the provider itself, such as the allowable set point choice and its thermal constant. An interesting finding is that the regulation provider's ability to provide a large amount of long term accumulated regulation and short term signal tracking restrict each other. Simulation results are presented to verify and illustrate the performance of the proposed framework.

Bowen Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

A Systematic Comparison of Prompting and Multi-Agent Methods for LLM-based Stance Detection

Improving Day-Ahead Grid Carbon Intensity Forecasting by Joint Modeling of Local-Temporal and Cross-Variable Dependencies Across Different Frequencies

Cross-target Stance Detection by Exploiting Target Analytical Perspectives

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

Deep Learning-Based Perceptual Stimulus Encoder for Bionic Vision

FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling

Instance and Panoptic Segmentation Using Conditional Convolutions

StyleSwin: Transformer-based GAN for High-resolution Image Generation

CREATe: Clinical Report Extraction and Annotation Technology

Visually Grounded Concept Composition

Characterizing Cryptocurrency Exchange Scams

Solving Sparse Linear Inverse Problems in Communication Systems: A Deep Learning Approach With Adaptive Depth

Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

Persistent Insulator: Avoidance of Metallization at Megabar Pressures in Strongly Spin-Orbit-Coupled Sr2IrO4

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

Real-time Action Recognition with Enhanced Motion Vector CNNs

Control and Communication Protocols that Enable Smart Building Microgrids

The Kirchhoff-Braess Paradox and Its Implications for Smart Microgrids

A Packetized Direct Load Control Mechanism for Demand Side Management

A Two Level Feedback System Design to Regulation Service Provision