Source author record

Hu Wang

Hu Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Computation and Language Computational Engineering, Finance, and Science astro-ph.GA astro-ph.HE cond-mat.mtrl-sci cond-mat.str-el q-fin.ST

Catalog footprint

What is connected

13works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FACTOR: Counterfactual Training-Free Test-Time Adaptation for Open-Vocabulary Object Detection

Open-vocabulary object detection often fails under distribution shifts, as it can be misled by spurious correlations between non-causal visual attributes (e.g., brightness, texture) and object categories. Existing test-time adaptation (TTA) methods either depend on costly online optimization or perform global calibration, overlooking the attribute-specific nature of these failures. To address this, we propose FACTOR (counterFACtual training-free Test-time adaptation for Open-vocabulaRy object detection), a lightweight framework grounded in counterfactual reasoning. By perturbing test images along non-causal attributes and comparing region-level predictions between original and counterfactual views, FACTOR quantifies attribute sensitivity, semantic relevance, and prediction variation to selectively suppress attribute-dependent predictions-without parameter updates. Experiments on PASCAL-C, COCO-C, and FoggyCityscapes show that FACTOR consistently outperforms prior TTA methods, demonstrating that explicit counterfactual reasoning effectively improves robustness under distribution shifts.

preprint2023arXiv

An efficient topology optimization method based on adaptive reanalysis with projection reduction

Efficient topology optimization based on the adaptive auxiliary reduced model reanalysis (AARMR) is proposed to improve computational efficiency and scale. In this method, a projection auxiliary reduced model (PARM) is integrated into the combined approximation reduced model (CARM) to reduce the dimension of the model in different aspects. First, the CARM restricts the solution space to avoid large matrix factorization. Second, the PARM is proposed to construct the CARM dynamically to save computational cost. Furthermore, the multi-grid conjugate gradient method is suggested to update PARM adaptively. Finally, several classic numerical examples are tested to show that the proposed method not only significantly improves computational efficiency, but also can solve large-scale problems that are difficult to solve by direct solvers due to the memory limitations.

preprint2022arXiv

Dependency Structure for News Document Summarization

In this work, we develop a neural network based model which leverages dependency parsing to capture cross-positional dependencies and grammatical structures. With the help of linguistic signals, sentence-level relations can be correctly captured, thus improving news documents summarization performance. Empirical studies demonstrate that this simple but effective method outperforms existing works on the benchmark dataset. Extensive analyses examine different settings and configurations of the proposed model which provide a good reference to the community.

preprint2022arXiv

Document-aware Positional Encoding and Linguistic-guided Encoding for Abstractive Multi-document Summarization

One key challenge in multi-document summarization is to capture the relations among input documents that distinguish between single document summarization (SDS) and multi-document summarization (MDS). Few existing MDS works address this issue. One effective way is to encode document positional information to assist models in capturing cross-document relations. However, existing MDS models, such as Transformer-based models, only consider token-level positional information. Moreover, these models fail to capture sentences' linguistic structure, which inevitably causes confusions in the generated summaries. Therefore, in this paper, we propose document-aware positional encoding and linguistic-guided encoding that can be fused with Transformer architecture for MDS. For document-aware positional encoding, we introduce a general protocol to guide the selection of document encoding functions. For linguistic-guided encoding, we propose to embed syntactic dependency relations into the dependency relation mask with a simple but effective non-linear encoding learner for feature learning. Extensive experiments show the proposed model can generate summaries with high quality.

preprint2022arXiv

Uncertainty-aware Multi-modal Learning via Cross-modal Random Network Prediction

Multi-modal learning focuses on training models by equally combining multiple input data modalities during the prediction process. However, this equal combination can be detrimental to the prediction accuracy because different modalities are usually accompanied by varying levels of uncertainty. Using such uncertainty to combine modalities has been studied by a couple of approaches, but with limited success because these approaches are either designed to deal with specific classification or segmentation problems and cannot be easily translated into other tasks, or suffer from numerical instabilities. In this paper, we propose a new Uncertainty-aware Multi-modal Learner that estimates uncertainty by measuring feature density via Cross-modal Random Network Prediction (CRNP). CRNP is designed to require little adaptation to translate between different prediction tasks, while having a stable training process. From a technical point of view, CRNP is the first approach to explore random network prediction to estimate uncertainty and to combine multi-modal data. Experiments on two 3D multi-modal medical image segmentation tasks and three 2D multi-modal computer vision classification tasks show the effectiveness, adaptability and robustness of CRNP. Also, we provide an extensive discussion on different fusion functions and visualization to validate the proposed model.

preprint2020arXiv

Memory-Gated Recurrent Networks

The essence of multivariate sequential learning is all about how to extract dependencies in data. These data sets, such as hourly medical records in intensive care units and multi-frequency phonetic time series, often time exhibit not only strong serial dependencies in the individual components (the "marginal" memory) but also non-negligible memories in the cross-sectional dependencies (the "joint" memory). Because of the multivariate complexity in the evolution of the joint distribution that underlies the data generating process, we take a data-driven approach and construct a novel recurrent network architecture, termed Memory-Gated Recurrent Networks (mGRN), with gates explicitly regulating two distinct types of memories: the marginal memory and the joint memory. Through a combination of comprehensive simulation studies and empirical experiments on a range of public datasets, we show that our proposed mGRN architecture consistently outperforms state-of-the-art architectures targeting multivariate time series.

preprint2020arXiv

Molecular dynamics simulation of crack growth in mono-crystal nickel with voids and inclusions

In this study, the crack propagation of the pre-cracked mono-crystal nickel with the voids and inclusions has been investigated by molecular dynamics simulations. Different sizes of voids, inclusions and materials of inclusions are used to fully study the effect of the voids and inclusions during the crack propagation process. The dislocations evolution, stress distribution and crack length are analyzed as the associated mechanical properties. The results indicate that the voids and inclusions can change the path of crack propagation of the pre-cracked mono-crystal nickel. Moreover, the results show that the voids and inclusions can lead a better resistance to plastic deformation of the mono-crystal and the inclusions can make the system more difficult to fracture.

preprint2020arXiv

Multi-label Thoracic Disease Image Classification with Cross-Attention Networks

Automated disease classification of radiology images has been emerging as a promising technique to support clinical diagnosis and treatment planning. Unlike generic image classification tasks, a real-world radiology image classification task is significantly more challenging as it is far more expensive to collect the training data where the labeled data is in nature multi-label; and more seriously samples from easy classes often dominate; training data is highly class-imbalanced problem exists in practice as well. To overcome these challenges, in this paper, we propose a novel scheme of Cross-Attention Networks (CAN) for automated thoracic disease classification from chest x-ray images, which can effectively excavate more meaningful representation from data to boost the performance through cross-attention by only image-level annotations. We also design a new loss function that beyond cross-entropy loss to help cross-attention process and is able to overcome the imbalance between classes and easy-dominated samples within each class. The proposed method achieves state-of-the-art results.

preprint2020arXiv

Randomized Online CP Decomposition

CANDECOMP/PARAFAC (CP) decomposition has been widely used to deal with multi-way data. For real-time or large-scale tensors, based on the ideas of randomized-sampling CP decomposition algorithm and online CP decomposition algorithm, a novel CP decomposition algorithm called randomized online CP decomposition (ROCP) is proposed in this paper. The proposed algorithm can avoid forming full Khatri-Rao product, which leads to boost the speed largely and reduce memory usage. The experimental results on synthetic data and real-world data show the ROCP algorithm is able to cope with CP decomposition for large-scale tensors with arbitrary number of dimensions. In addition, ROCP can reduce the computing time and memory usage dramatically, especially for large-scale tensors.

preprint2020arXiv

Soft Expert Reward Learning for Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions. Dominant methods based on supervised learning clone expert's behaviours and thus perform better on seen environments, while showing restricted performance on unseen ones. Reinforcement Learning (RL) based models show better generalisation ability but have issues as well, requiring large amount of manual reward engineering is one of which. In this paper, we introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task. Our proposed method consists of two complementary components: Soft Expert Distillation (SED) module encourages agents to behave like an expert as much as possible, but in a soft fashion; Self Perceiving (SP) module targets at pushing the agent towards the final destination as fast as possible. Empirically, we evaluate our model on the VLN seen, unseen and test splits and the model outperforms the state-of-the-art methods on most of the evaluation metrics.

preprint2020arXiv

Unsupervised Representation Learning by Predicting Random Distances

Deep neural networks have gained tremendous success in a broad range of machine learning tasks due to its remarkable capability to learn semantic-rich features from high-dimensional data. However, they often require large-scale labelled data to successfully learn such features, which significantly hinders their adaption into unsupervised learning tasks, such as anomaly detection and clustering, and limits their applications into critical domains where obtaining massive labelled data is prohibitively expensive. To enable unsupervised learning on those domains, in this work we propose to learn features without using any labelled data by training neural networks to predict data distances in a randomly projected space. Random mapping is a theoretically proven approach to obtain approximately preserved distances. To well predict these random distances, the representation learner is optimised to learn genuine class structures that are implicitly embedded in the randomly projected space. Empirical results on 19 real-world datasets show that our learned representations substantially outperform a few state-of-the-art competing methods in both anomaly detection and clustering tasks. Code is available at https://git.io/RDP

preprint2020arXiv

Years Delayed X-ray Afterglows of TDEs Originated from Wind-Torus Interactions

Tidal disruption events (TDEs) occurred in active galactic nuclei (AGNs) are a special class of sources with outstanding scientific significance. TDEs can generate ultrafast winds, which should almost inevitably collide with the preexisting AGN dusty tori. We perform analytical calculations and simulations on the wind-torus interactions and find such a process can generate considerable X-ray afterglow radiation several years or decades later after the TDE outburst. This provides a new origin for the years delayed X-rays in TDEs. The X-ray luminosity can reach 10^{41-42} erg/s, and the light curve characteristics depend on the parameters of winds and tori. We apply the model to two TDE candidates, and provide lower limits on the masses of the disrupted stars, as well as rigorous constraints on the gas densities of tori. Our results suggest that the observations of the time delay, spectral shape, luminosity and the light curve of the X-ray afterglow can be used to constrain the physical parameters of both TDE winds and tori, including the wind velocity, wind density and torus density.

preprint2015arXiv

Fermi arcs, pseudogap and collective excitations in doped Sr2IrO4: A generalized fluctuation exchange study

Motivated by recent experimental measurements, we study the quasiparticle spectra and the collective excitations in doped Sr$_2$IrO$_4$, in which the interesting interplay between the electronic correlations and strong spin-orbital coupling (SOC) exists. To include the SOC, we use the Hugenholtz diagrams to extend the fluctuation exchange (FLEX) approach to the case where the SU(2) symmetry can be broken. By using this generalized FLEX method, we find a weak pseudogap behavior near $(π,0)$ in the slightly electron-doped system, with the corresponding Fermi arc formed by the partial destruction of Fermi surface. Similar features also appear in the hole-doped system, however, the position of the Fermi arc is rotated $45^\circ$ with respect to the former. These results are consistent with the recent angle-resolved photoemission spectra in Sr$_2$IrO$_4$. We demonstrate that these anomalous features are mainly caused by the isospin fluctuations derived from the effective $J_{\text{eff}}=1/2$ doublet.

Hu Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

FACTOR: Counterfactual Training-Free Test-Time Adaptation for Open-Vocabulary Object Detection

An efficient topology optimization method based on adaptive reanalysis with projection reduction

Dependency Structure for News Document Summarization

Document-aware Positional Encoding and Linguistic-guided Encoding for Abstractive Multi-document Summarization

Uncertainty-aware Multi-modal Learning via Cross-modal Random Network Prediction

Memory-Gated Recurrent Networks

Molecular dynamics simulation of crack growth in mono-crystal nickel with voids and inclusions

Multi-label Thoracic Disease Image Classification with Cross-Attention Networks

Randomized Online CP Decomposition

Soft Expert Reward Learning for Vision-and-Language Navigation

Unsupervised Representation Learning by Predicting Random Distances

Years Delayed X-ray Afterglows of TDEs Originated from Wind-Torus Interactions

Fermi arcs, pseudogap and collective excitations in doped Sr2IrO4: A generalized fluctuation exchange study