Source author record

Xiaohui Wang

Xiaohui Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks Artificial Intelligence Computation and Language Computer Vision cond-mat cond-mat.mes-hall cond-mat.other Human-Computer Interaction Machine Learning math.AP math.OC Mathematical Software Multimedia physics.class-ph physics.ins-det quant-ph Robotics

Catalog footprint

What is connected

10works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

LightSeq2: Accelerated Training for Transformer-based Models on GPUs

Transformer-based neural models are used in many AI applications. Training these models is expensive, as it takes huge GPU resources and long duration. It is challenging because typical data like sentences have variable lengths, and Transformer's computation patterns are more complex than convolutional neural networks. Existing systems either only focus on model inference or optimization for only BERT-like encoder models. In this paper, we present LightSeq2, a system to accelerate training for a general family of Transformer models on GPUs. We propose a series of GPU optimization techniques tailored to the specific computation flow and memory access patterns of Transformer models. LightSeq2 supports many model architectures, including BERT (encoder-only), GPT (decoder-only), Transformer (encoder-decoder), and vision Transformer. Our experiments for a variety of models and benchmarks show that LightSeq2 is consistently faster (1.4-3.5x) than previous systems on different GPUs. In particular, it gains 308% training speedup compared with existing systems on a large public machine translation benchmark (WMT14 English-German).

preprint2021arXiv

How Information Diffuse in a Nomination Network

During the special period of the COVID-19 outbreak, this project investigated the driving factors in different information diffusion modes (i.e. broadcasting mode, contagion mode) based on the nomination relations in a social welfare campaign on Weibo. Specifically, we mapped a nomination social network and tracked the core communicators in both modes. Besides, we also observed the network from perspectives such as relationships between core communicators and modularity of the whole network. We extracted 6 homophily factors and tested them on 2 representative communities within the largest component of the network. We found that some core communicators distributed in a co-dependent way. At last, we supposed several explanations to the phenomenon which can be explored in further research.

preprint2020arXiv

An Efficient Agreement Mechanism in CapsNets By Pairwise Product

Capsule networks (CapsNets) are capable of modeling visual hierarchical relationships, which is achieved by the "routing-by-agreement" mechanism. This paper proposes a pairwise agreement mechanism to build capsules, inspired by the feature interactions of factorization machines (FMs). The proposed method has a much lower computation complexity. We further proposed a new CapsNet architecture that combines the strengths of residual networks in representing low-level visual features and CapsNets in modeling the relationships of parts to wholes. We conduct comprehensive experiments to compare the routing algorithms, including dynamic routing, EM routing, and our proposed FM agreement, based on both architectures of original CapsNet and our proposed one, and the results show that our method achieves both excellent performance and efficiency under a variety of situations.

preprint2020arXiv

Multi-Task Reinforcement Learning based Mobile Manipulation Control for Dynamic Object Tracking and Grasping

Agile control of mobile manipulator is challenging because of the high complexity coupled by the robotic system and the unstructured working environment. Tracking and grasping a dynamic object with a random trajectory is even harder. In this paper, a multi-task reinforcement learning-based mobile manipulation control framework is proposed to achieve general dynamic object tracking and grasping. Several basic types of dynamic trajectories are chosen as the task training set. To improve the policy generalization in practice, random noise and dynamics randomization are introduced during the training process. Extensive experiments show that our policy trained can adapt to unseen random dynamic trajectories with about 0.1m tracking error and 75\% grasping success rate of dynamic objects. The trained policy can also be successfully deployed on a real mobile manipulator.

preprint2016arXiv

Comment on "Anomalous Edge State in a Non-Hermitian Lattice"

In this comment, we criticize three main conclusions of the letter\cite{Lee2016}. We show that the concept of fractional winding number(FWN) is factitious, Lee's conclusions on Fig. 3 are finite-size effect and the breakdown of bulk-boundary correspondence (BBBC) cannot be explained by "defective".

preprint2015arXiv

Implementation and verification of different ECC mitigation designs for BRAMs in flash-based FPGAs

Embedded RAM blocks (BRAMs) in field programmable gate arrays (FPGAs) are susceptible to single event effects (SEEs) induced by environmental factors such as cosmic rays, heavy ions, alpha particles and so on. As technology scales, the issue will be more serious. In order to tackle this issue, two different error correcting codes (ECCs), the shortened Hamming codes and shortened BCH codes, are investigated in this paper. The concrete design methods of the codes are presented. Also, the codes are both implemented in flash-based FPGAs. Finally, the synthesis report and simulation results are presented in the paper. Moreover, the heavy-ion experiments are performed, the experimental results indicate that the error cross-section using the shortened Hamming codes can be reduced by two orders of magnitude compared with the device without mitigation, and no errors are discovered in the experiments for the device using the shortened BCH codes.

preprint2014arXiv

Modeling Emotion Influence from Images in Social Networks

Images become an important and prevalent way to express users' activities, opinions and emotions. In a social network, individual emotions may be influenced by others, in particular by close friends. We focus on understanding how users embed emotions into the images they uploaded to the social websites and how social influence plays a role in changing users' emotions. We first verify the existence of emotion influence in the image networks, and then propose a probabilistic factor graph based emotion influence model to answer the questions of "who influences whom". Employing a real network from Flickr as experimental data, we study the effectiveness of factors in the proposed model with in-depth data analysis. Our experiments also show that our model, by incorporating the emotion influence, can significantly improve the accuracy (+5%) for predicting emotions from images. Finally, a case study is used as the anecdotal evidence to further demonstrate the effectiveness of the proposed model.

preprint2013arXiv

Multiscale Decompositions and Optimization

In this paper, the following type Tikhonov regularization problem will be systematically studied: [(u_t,v_t):=\argmin_{u+v=f} {|v|_X+t|u|_Y},] where $Y$ is a smooth space such as a $\BV$ space or a Sobolev space and $X$ is the pace in which we measure distortion. Examples of the above problem occur in denoising in image processing, in numerically treating inverse problems, and in the sparse recovery problem of compressed sensing. It is also at the heart of interpolation of linear operators by the real method of interpolation. We shall characterize of the minimizing pair $(u_t,v_t)$ for $(X,Y)=(L_2(Ω),\BV(Ω))$ as a primary example and generalize Yves Meyer's result in [11] and Antonin Chambolle's result in [6]. After that, the following multiscale decomposition scheme will be studied: [u_{k+1}:=\argmin_{u\in \BV(Ω)\cap L_2(Ω)} {1/2|f-u|^2_{L_2}+t_{k}|u-u_k|_{\BV}},] where $u_0=0$ and $Ω$ is a bounded Lipschitz domain in $\R^d$. This method was introduced by Eitan Tadmor et al. and we will improve the $L_2$ convergence result in \cite{Tadmor}. Other pairs such as $(X,Y)=(L_p,W^{1}(L_τ))$ and $(X,Y)=(\ell_2,\ell_p)$ will also be mentioned. In the end, the numerical implementation for $(X,Y)=(L_2(Ω),\BV(Ω))$ and the corresponding convergence results will be given.

preprint2011arXiv

Controlling the band gap of ZnO by programmable annealing

Annealing has been extensively used to control crystal growth and physical properties of materials with unfortunately unclear mechanism and quantitative correlations. Here we present the "annealing temperature - grain size - band gap" correlation for ZnO nanocrystals with experimental evidence. Findings revealed that the annealing condition determines the critical size by equating the thermal and the cohesive energy of the undercoordinated atoms in the surface skin, which in turn induce local strain and quantum entrapment, perturbing the Hamiltonian and hence the band gap. The formulation provides a general guideline for controlling crystal growth and performance of materials, and makes predictive design and fabrication of functional nanomaterials into reality.

preprint1995arXiv

Coulomb Charging at Large Conduction

We discuss the suppression of Coulomb charging effects on a small metallic island coupled to an electrode by a tunnel junction. At high temperatures the quantum corrections to the classical charging energy $E_c=e^2/2C$, where $C$ is the island capacitance, are evaluated. At low temperatures the large quantum fluctuations of the island charge cause a strong reduction of the effective $E_c$ which is determined explicitly in the limit of a large tunneling conductance.