Source author record

Hui Jiang

Hui Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

43works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Stochastic volatility modeling of high-frequency CSI 300 index and dynamic jump prediction driven by machine learning

This paper models stochastic process of price time series of CSI 300 index in Chinese financial market, analyzes volatility characteristics of intraday high-frequency price data. In the new generalized Barndorff-Nielsen and Shephard model, the lag caused by asynchrony of market information is considered, and the problem of lack of long-term dependence is solved. To speed up the valuation process, several machine learning and deep learning algorithms are used to estimate parameter and evaluate forecast results. Tracking historical jumps of different magnitudes offers promising avenues for simulating dynamic price processes and predicting future jumps. Numerical results show that the deterministic component of stochastic volatility processes would always be captured over short and longer-term windows. Research finding could be suitable for influence investors and regulators interested in predicting market dynamics based on realized volatility.

preprint2022arXiv

Analysis of stock index with a generalized BN-S model: an approach based on machine learning and fuzzy parameters

In this paper we implement a combination of data-science and fuzzy theory to improve the classical Barndorff-Nielsen and Shephard model, and implement this to analyze the S&P 500 index. We pre-process the index data based on fuzzy theory. After that, S&P 500 stock index data for the past ten years are analyzed, and a deterministic parameter is extracted using various machine and deep learning methods. The results show that the new model, where fuzzy parameters are incorporated, can incorporate the long-term dependence in the classical Barndorff-Nielsen and Shephard model. The modification is based on only a few changes compared to the classical model. At the same time, the resulting analysis effectively captures the stochastic dynamics of the stock index time series.

preprint2022arXiv

DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding

This paper presents DavarOCR, an open-source toolbox for OCR and document understanding tasks. DavarOCR currently implements 19 advanced algorithms, covering 9 different task forms. DavarOCR provides detailed usage instructions and the trained models for each algorithm. Compared with the previous opensource OCR toolbox, DavarOCR has relatively more complete support for the sub-tasks of the cutting-edge technology of document understanding. In order to promote the development and application of OCR technology in academia and industry, we pay more attention to the use of modules that different sub-domains of technology can share. DavarOCR is publicly released at https://github.com/hikopensource/Davar-Lab-OCR.

preprint2022arXiv

Functional large deviations for Stroock's approximation to a class of Gaussian processes with application to small noise diffusions

Letting~$N=\left\{N(t), t\geq0\right\}$ be a standard Poisson process, Stroock~ \cite{Stroock-1981} constructed a family of continuous processes by $$Θ_ε(t)=\int_0^tθ_ε(r)dr, \ \ \ \ \ 0 \le t \le 1,$$ where $θ_ε(r)=\frac{1}ε(-1)^{N(ε^{-2}r)}$, and proved that it weakly converges to a standard Brownian motion under the continuous function topology. We establish the functional large deviations principle (LDP) for the approximations of a class of Gaussian processes constructed by integrals over $Θ_ε(t)$, and find the explicit form for rate function. As an application, we consider the following (non-Markovian) stochastic differential equation \begin{equation*} \begin{aligned} X^ε(t) &=x_{0}+\int^{t}_{0}b(X^ε(s))ds+λ(ε)\int^{t}_{0}σ(X^ε(s))dΘ_ε(s), \end{aligned} \end{equation*} where $b$ and $σ$ are both Lipschitz functions, and establish its Freidlin-Wentzell type LDP as $ε\rightarrow 0$. The rate function indicates a phase transition phenomenon as $λ(ε)$ moves from one region to the other.

preprint2022arXiv

Kullback-Leibler-Based Discrete Failure Time Models for Integration of Published Prediction Models with New Time-To-Event Dataset

Prediction of time-to-event data often suffers from rare event rates, small sample sizes, high dimensionality and low signal-to-noise ratios. Incorporating published prediction models from large-scale studies is expected to improve the performance of prognosis prediction on internal individual-level time-to-event data. However, existing integration approaches typically assume that underlying distributions from the external and internal data sources are similar, which is often invalid. To account for challenges including heterogeneity, data sharing, and privacy constraints, we propose a discrete failure time modeling procedure, which utilizes a discrete hazard-based Kullback-Leibler discriminatory information measuring the discrepancy between the published models and the internal dataset. Simulations show the advantage of the proposed method compared with those solely based on the internal data or published models. We apply the proposed method to improve prediction performance on a kidney transplant dataset from a local hospital by integrating this small-scale dataset with published survival models obtained from the national transplant registry.

preprint2022arXiv

One-dimensional quasi bound states in the continuum in the ω~k space for nonlinear optical applications

The phenomenon of bound state in the continuum (BIC) with infinite quality factor and lifetime has emerged in recent years in photonics as a new tool of manipulating light-matter interactions. However, most of the investigated structures only support BIC resonances at very few discrete points in the w~k space. Even when the BIC is switched to a quasi-BIC(QBIC) resonance through perturbation, its frequency will still be located within a narrow spectral band close to that of the original BIC, restricting their applications in many fields where random or multiple input frequencies beyond the narrow band are required. In this work, we demonstrate that a new set of QBIC resonances can be supported by making use of a special binary grating consisting of two alternatingly aligned ridge arrays with the same period and zero-approaching ridge width difference on a slab waveguide. These QBIC resonances are distributed continuously over a broad band along a line in the w~k space and can thus be considered as one-dimensional QBICs. With the Q factors generally affected by the ridge difference, it is now possible to choose arbitrarily any frequencies on the dispersion line to achieve significantly enhanced light-matter interactions, facilitating many applications where multiple input wavelengths are required, e.g. sum or difference frequency generations in nonlinear optics.

preprint2021arXiv

Enhanced Aspect-Based Sentiment Analysis Models with Progressive Self-supervised Attention Learning

In aspect-based sentiment analysis (ABSA), many neural models are equipped with an attention mechanism to quantify the contribution of each context word to sentiment prediction. However, such a mechanism suffers from one drawback: only a few frequent words with sentiment polarities are tended to be taken into consideration for final sentiment decision while abundant infrequent sentiment words are ignored by models. To deal with this issue, we propose a progressive self-supervised attention learning approach for attentional ABSA models. In this approach, we iteratively perform sentiment prediction on all training instances, and continually learn useful attention supervision information in the meantime. During training, at each iteration, context words with the highest impact on sentiment prediction, identified based on their attention weights or gradients, are extracted as words with active/misleading influence on the correct/incorrect prediction for each instance. Words extracted in this way are masked for subsequent iterations. To exploit these extracted words for refining ABSA models, we augment the conventional training objective with a regularization term that encourages ABSA models to not only take full advantage of the extracted active context words but also decrease the weights of those misleading words. We integrate the proposed approach into three state-of-the-art neural ABSA models. Experiment results and in-depth analyses show that our approach yields better attention results and significantly enhances the performance of all three models. We release the source code and trained models at https://github.com/DeepLearnXMU/PSSAttention.

preprint2021arXiv

Filling up complex spectral regions through non-Hermitian disordered chains

Eigenspectra that fill regions in the complex plane have been intriguing to many, inspiring research from random matrix theory to esoteric semi-infinite bounded non-Hermitian lattices. In this work, we propose a simple and robust ansatz for constructing models whose eigenspectra fill up generic prescribed regions. Our approach utilizes specially designed non-Hermitian random couplings that allow the co-existence of eigenstates with a continuum of localization lengths, mathematically emulating the effects of semi-infinite boundaries. While some of these couplings are necessarily long-ranged, they are still far more local than what is possible with known random matrix ensembles. Our ansatz can be feasibly implemented in physical platforms such as classical and quantum circuits, and harbors very high tolerance to imperfections due to its stochastic nature.

preprint2020arXiv

Match$^2$: A Matching over Matching Model for Similar Question Identification

Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers. To enhance the efficiency of the service, similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked. However, it has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions. To alleviate this problem, it is natural to involve the existing answers for the enrichment of the archived questions. Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the corresponding question. Unfortunately, this may introduce unexpected noises into the similarity computation since answers are often long and diverse, leading to inferior performance. In this work, we propose a two-side usage, which leverages the answer as a bridge of the two questions. The key idea is based on our observation that similar questions could be addressed by similar parts of the answer while different questions may not. In other words, we can compare the matching patterns of the two questions over the same answer to measure their similarity. In this way, we propose a novel matching over matching model, namely Match$^2$, which compares the matching patterns between two question-answer pairs for similar question identification. Empirical experiments on two benchmark datasets demonstrate that our model can significantly outperform previous state-of-the-art methods on the similar question identification task.

preprint2020arXiv

On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks

In this paper, we have extended the well-established universal approximator theory to neural networks that use the unbounded ReLU activation function and a nonlinear softmax output layer. We have proved that a sufficiently large neural network using the ReLU activation function can approximate any function in $L^1$ up to any arbitrary precision. Moreover, our theoretical results have shown that a large enough neural network using a nonlinear softmax output layer can also approximate any indicator function in $L^1$, which is equivalent to mutually-exclusive class labels in any realistic multiple-class pattern classification problems. To the best of our knowledge, this work is the first theoretical justification for using the softmax output layers in neural networks for pattern classification.

preprint2020arXiv

Topological invariants, zero mode edge states and finite size effect for a generalized non-reciprocal Su-Schrieffer-Heeger model

Intriguing issues in one-dimensional non-reciprocal topological systems include the breakdown of usual bulk-edge correspondence and the occurrence of half-integer topological invariants. In order to understand these unusual topological properties, we investigate the topological phase diagrams and the zero-mode edge states of a generalized non-reciprocal Su-Schrieffer-Heeger model, based on some analytical results. Meanwhile, we provide a concise geometrical interpretation of the bulk topological invariants in terms of two independent winding numbers and also give an alternative interpretation related to the linking properties of curves in three-dimensional space. For the system under the open boundary condition, we construct analytically the wavefunctions of zero-mode edge states by properly considering a hidden symmetry of the system and the normalization condition with the use of biorthogonal eigenvectors. Our analytical results directly give the phase boundary for the existence of zero-mode edge states and unveil clearly the evolution behavior of edge states. In comparison with results via exact diagonalization of finite-size systems, we find our analytical results agree with the numerical results very well.

preprint2017arXiv

Enhanced LSTM for Natural Language Inference

Reasoning and inference are central to human and artificial intelligence. Modeling inference in human language is very challenging. With the availability of large annotated data (Bowman et al., 2015), it has recently become feasible to train neural network based inference models, which have shown to be very effective. In this paper, we present a new state-of-the-art result, achieving the accuracy of 88.6% on the Stanford Natural Language Inference Dataset. Unlike the previous top models that use very complicated network architectures, we first demonstrate that carefully designing sequential inference models based on chain LSTMs can outperform all previous models. Based on this, we further show that by explicitly considering recursive architectures in both local inference modeling and inference composition, we achieve additional improvement. Particularly, incorporating syntactic parsing information contributes to our best result---it further improves the performance even when added to the already very strong model.

preprint2016arXiv

A Deep Learning Based Fast Image Saliency Detection Algorithm

In this paper, we propose a fast deep learning method for object saliency detection using convolutional neural networks. In our approach, we use a gradient descent method to iteratively modify the input images based on the pixel-wise gradients to reduce a pre-defined cost function, which is defined to measure the class-specific objectness and clamp the class-irrelevant outputs to maintain image background. The pixel-wise gradients can be efficiently computed using the back-propagation algorithm. We further apply SLIC superpixels and LAB color based low level saliency features to smooth and refine the gradients. Our methods are quite computationally efficient, much faster than other deep learning based saliency methods. Experimental results on two benchmark tasks, namely Pascal VOC 2012 and MSRA10k, have shown that our proposed methods can generate high-quality salience maps, at least comparable with many slow and complicated deep learning methods. Comparing with the pure low-level methods, our approach excels in handling many difficult images, which contain complex background, highly-variable salient objects, multiple objects, and/or very small salient objects.

preprint2016arXiv

A FOFE-based Local Detection Approach for Named Entity Recognition and Mention Detection

In this paper, we study a novel approach for named entity recognition (NER) and mention detection in natural language processing. Instead of treating NER as a sequence labelling problem, we propose a new local detection approach, which rely on the recent fixed-size ordinally forgetting encoding (FOFE) method to fully encode each sentence fragment and its left/right contexts into a fixed-size representation. Afterwards, a simple feedforward neural network is used to reject or predict entity label for each individual fragment. The proposed method has been evaluated in several popular NER and mention detection tasks, including the CoNLL 2003 NER task and TAC-KBP2015 and TAC-KBP2016 Tri-lingual Entity Discovery and Linking (EDL) tasks. Our methods have yielded pretty strong performance in all of these examined tasks. This local detection approach has shown many advantages over the traditional sequence labelling methods.

preprint2016arXiv

A Unified Model for Differential Expression Analysis of RNA-seq Data via L1-Penalized Linear Regression

The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. The normalization of existing DE detection algorithms is ad hoc and performed once for all prior to DE detection, which may be suboptimal since ideally normalization should be based on non-DE genes only and thus coupled with DE detection. We propose a unified statistical model for joint normalization and DE detection of log-transformed RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models and jointly estimated with the regression coefficients. By imposing sparsity-inducing L1 penalty (or mixed L1/L2-norm for multiple treatment conditions) on the regression coefficients, we formulate the problem as a penalized least-squares regression problem and apply the augmented lagrangian method to solve it. Simulation studies show that the proposed model and algorithms outperform existing methods in terms of detection power and false-positive rate when more than half of the genes are differentially expressed and/or when the up- and down-regulated genes among DE genes are unbalanced in amount.

preprint2016arXiv

Commonsense Knowledge Enhanced Embeddings for Solving Pronoun Disambiguation Problems in Winograd Schema Challenge

In this paper, we propose commonsense knowledge enhanced embeddings (KEE) for solving the Pronoun Disambiguation Problems (PDP). The PDP task we investigate in this paper is a complex coreference resolution task which requires the utilization of commonsense knowledge. This task is a standard first round test set in the 2016 Winograd Schema Challenge. In this task, traditional linguistic features that are useful for coreference resolution, e.g. context and gender information, are no longer effective anymore. Therefore, the KEE models are proposed to provide a general framework to make use of commonsense knowledge for solving the PDP problems. Since the PDP task doesn't have training data, the KEE models would be used during the unsupervised feature extraction process. To evaluate the effectiveness of the KEE models, we propose to incorporate various commonsense knowledge bases, including ConceptNet, WordNet, and CauseCom, into the KEE training process. We achieved the best performance by applying the proposed methods to the 2016 Winograd Schema Challenge. In addition, experiments conducted on the standard PDP task indicate that, the proposed KEE models could solve the PDP problems by achieving 66.7% accuracy, which is a new state-of-the-art performance.

preprint2016arXiv

Distraction-Based Neural Networks for Document Summarization

Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences. Whether and how such an approach can be extended to help model larger spans of text, e.g., documents, is intriguing, and further investigation would still be desirable. This paper aims to enhance neural network models for such a purpose. A typical problem of document-level modeling is automatic summarization, which aims to model documents in order to generate summaries. In this paper, we propose neural models to train computers not just to pay attention to specific regions and content of input documents with attention models, but also distract them to traverse between different content of a document so as to better grasp the overall meaning for summarization. Without engineering any features, we train the models on two large datasets. The models achieve the state-of-the-art performance, and they significantly benefit from the distraction modeling, particularly when input documents are long.

preprint2016arXiv

Erdős-Gallai-type results for total monochromatic connection of graphs

preprint2016arXiv

Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency

In this paper, we propose a novel neural network structure, namely \emph{feedforward sequential memory networks (FSMN)}, to model long-term dependency in time series without using recurrent feedback. The proposed FSMN is a standard fully-connected feedforward neural network equipped with some learnable memory blocks in its hidden layers. The memory blocks use a tapped-delay line structure to encode the long context information into a fixed-size representation as short-term memory mechanism. We have evaluated the proposed FSMNs in several standard benchmark tasks, including speech recognition and language modelling. Experimental results have shown FSMNs significantly outperform the conventional recurrent neural networks (RNN), including LSTMs, in modeling sequential signals like speech or language. Moreover, FSMNs can be learned much more reliably and faster than RNNs or LSTMs due to the inherent non-recurrent model structure.

preprint2016arXiv

Generating images with recurrent adversarial networks

Gatys et al. (2015) showed that optimizing pixels to match features in a convolutional network with respect reference image features is a way to render images of high visual quality. We show that unrolling this gradient-based optimization yields a recurrent computation that creates images by incrementally adding onto a visual "canvas". We propose a recurrent generative model inspired by this view, and show that it can be trained using adversarial training to generate very good image samples. We also propose a way to quantitatively compare adversarial networks by having the generators and discriminators of these networks compete against each other.

preprint2016arXiv

Higher Order Recurrent Neural Networks

In this paper, we study novel neural network structures to better model long term dependency in sequential data. We propose to use more memory units to keep track of more preceding states in recurrent neural networks (RNNs), which are all recurrently fed to the hidden layers as feedback through different weighted paths. By extending the popular recurrent structure in RNNs, we provide the models with better short-term memory mechanism to learn long term dependency in sequences. Analogous to digital filters in signal processing, we call these structures as higher order RNNs (HORNNs). Similar to RNNs, HORNNs can also be learned using the back-propagation through time method. HORNNs are generally applicable to a variety of sequence modelling tasks. In this work, we have examined HORNNs for the language modeling task using two popular data sets, namely the Penn Treebank (PTB) and English text8 data sets. Experimental results have shown that the proposed HORNNs yield the state-of-the-art performance on both data sets, significantly outperforming the regular RNNs as well as the popular LSTMs.

preprint2016arXiv

Learning Convolutional Neural Networks using Hybrid Orthogonal Projection and Estimation

Convolutional neural networks (CNNs) have yielded the excellent performance in a variety of computer vision tasks, where CNNs typically adopt a similar structure consisting of convolution layers, pooling layers and fully connected layers. In this paper, we propose to apply a novel method, namely Hybrid Orthogonal Projection and Estimation (HOPE), to CNNs in order to introduce orthogonality into the CNN structure. The HOPE model can be viewed as a hybrid model to combine feature extraction using orthogonal linear projection with mixture models. It is an effective model to extract useful information from the original high-dimension feature vectors and meanwhile filter out irrelevant noises. In this work, we present three different ways to apply the HOPE models to CNNs, i.e., {\em HOPE-Input}, {\em single-HOPE-Block} and {\em multi-HOPE-Blocks}. For {\em HOPE-Input} CNNs, a HOPE layer is directly used right after the input to de-correlate high-dimension input feature vectors. Alternatively, in {\em single-HOPE-Block} and {\em multi-HOPE-Blocks} CNNs, we consider to use HOPE layers to replace one or more blocks in the CNNs, where one block may include several convolutional layers and one pooling layer. The experimental results on both Cifar-10 and Cifar-100 data sets have shown that the orthogonal constraints imposed by the HOPE layers can significantly improve the performance of CNNs in these image classification tasks (we have achieved one of the best performance when image augmentation has not been applied, and top 5 performance with image augmentation).

preprint2016arXiv

More on total monochromatic connection of graphs

A graph is said to be {\it total-colored} if all the edges and the vertices of the graph are colored. A total-coloring of a graph is a {\it total monochromatically-connecting coloring} ({\it TMC-coloring}, for short) if any two vertices of the graph are connected by a path whose edges and internal vertices on the path have the same color. For a connected graph $G$, the {\it total monochromatic connection number}, denoted by $tmc(G)$, is defined as the maximum number of colors used in a TMC-coloring of $G$. Note that a TMC-coloring does not exist if $G$ is not connected, in which case we simply let $tmc(G)=0$. In this paper, we first characterize all graphs of order $n$ and size $m$ with $tmc(G)=3,4,5,6,m+n-2,m+n-3$ and $m+n-4$, respectively. Then we determine the threshold function for a random graph to have $tmc(G)\geq f(n)$, where $f(n)$ is a function satisfying $1\leq f(n)<\frac{1}{2}n(n-1)+n$. Finally, we show that for a given connected graph $G$, and a positive integer $L$ with $L\leq m+n$, it is NP-complete to decide whether $tmc(G)\geq L$.

preprint2016arXiv

Neural Networks Models for Entity Discovery and Linking

This paper describes the USTC_NELSLIP systems submitted to the Trilingual Entity Detection and Linking (EDL) track in 2016 TAC Knowledge Base Population (KBP) contests. We have built two systems for entity discovery and mention detection (MD): one uses the conditional RNNLM and the other one uses the attention-based encoder-decoder framework. The entity linking (EL) system consists of two modules: a rule based candidate generation and a neural networks probability ranking model. Moreover, some simple string matching rules are used for NIL clustering. At the end, our best system has achieved an F1 score of 0.624 in the end-to-end typed mention ceaf plus metric.

preprint2016arXiv

Part-of-Speech Relevance Weights for Learning Word Embeddings

This paper proposes a model to learn word embeddings with weighted contexts based on part-of-speech (POS) relevance weights. POS is a fundamental element in natural language. However, state-of-the-art word embedding models fail to consider it. This paper proposes to use position-dependent POS relevance weighting matrices to model the inherent syntactic relationship among words within a context window. We utilize the POS relevance weights to model each word-context pairs during the word embedding training process. The model proposed in this paper paper jointly optimizes word vectors and the POS relevance matrices. Experiments conducted on popular word analogy and word similarity tasks all demonstrated the effectiveness of the proposed method.

preprint2016arXiv

Probabilistic Reasoning via Deep Learning: Neural Association Models

In this paper, we propose a new deep learning approach, called neural association model (NAM), for probabilistic reasoning in artificial intelligence. We propose to use neural networks to model association between any two events in a domain. Neural networks take one event as input and compute a conditional probability of the other event to model how likely these two events are to be associated. The actual meaning of the conditional probabilities varies between applications and depends on how the models are trained. In this work, as two case studies, we have investigated two NAM structures, namely deep neural networks (DNN) and relation-modulated neural nets (RMNN), on several probabilistic reasoning tasks in AI, including recognizing textual entailment, triple classification in multi-relational knowledge bases and commonsense reasoning. Experimental results on several popular datasets derived from WordNet, FreeBase and ConceptNet have all demonstrated that both DNNs and RMNNs perform equally well and they can significantly outperform the conventional methods available for these reasoning tasks. Moreover, compared with DNNs, RMNNs are superior in knowledge transfer, where a pre-trained model can be quickly extended to an unseen relation after observing only a few training samples. To further prove the effectiveness of the proposed models, in this work, we have applied NAMs to solving challenging Winograd Schema (WS) problems. Experiments conducted on a set of WS problems prove that the proposed models have the potential for commonsense reasoning.

preprint2016arXiv

Total monochromatic connection of graphs

A graph is said to be {\it total-colored} if all the edges and the vertices of the graph are colored. A path in a total-colored graph is a {\it total monochromatic path} if all the edges and internal vertices on the path have the same color. A total-coloring of a graph is a {\it total monochromatically-connecting coloring} ({\it TMC-coloring}, for short) if any two vertices of the graph are connected by a total monochromatic path of the graph. For a connected graph $G$, the {\it total monochromatic connection number}, denoted by $tmc(G)$, is defined as the maximum number of colors used in a TMC-coloring of $G$. These concepts are inspired by the concepts of monochromatic connection number $mc(G)$, monochromatic vertex connection number $mvc(G)$ and total rainbow connection number $trc(G)$ of a connected graph $G$. Let $l(T)$ denote the number of leaves of a tree $T$, and let $l(G)=\max\{ l(T) | $ $T$ is a spanning tree of $G$ $\}$ for a connected graph $G$. In this paper, we show that there are many graphs $G$ such that $tmc(G)=m-n+2+l(G)$, and moreover, we prove that for almost all graphs $G$, $tmc(G)=m-n+2+l(G)$ holds. Furthermore, we compare $tmc(G)$ with $mvc(G)$ and $mc(G)$, respectively, and obtain that there exist graphs $G$ such that $tmc(G)$ is not less than $mvc(G)$ and vice versa, and that $tmc(G)=mc(G)+l(G)$ holds for almost all graphs. Finally, we prove that $tmc(G)\leq mc(G)+mvc(G)$, and the equality holds if and only if $G$ is a complete graph.

preprint2016arXiv

Unit-free and robust detection of differential expression from RNA-Seq data

Ultra high-throughput sequencing of transcriptomes (RNA-Seq) is a widely used method for quantifying gene expression levels due to its low cost, high accuracy and wide dynamic range for detection. However, the nature of RNA-Seq makes it nearly impossible to provide absolute measurements of transcript abundances. Several units or data summarization methods for transcript quantification have been proposed in the past to account for differences in transcript lengths and sequencing depths across different genes and different samples. Nevertheless, further between-sample normalization is still needed for reliable detection of differentially expressed genes. In this paper we propose a unified statistical model for joint detection of differential gene expression and between-sample normalization. Our method is independent of the unit in which gene expression levels are summarized. We also introduce an efficient algorithm for model fitting. Due to the L0-penalized likelihood used in our model, it is able to reliably normalize the data and detect differential gene expression in some cases when more than $50\%$ of the genes are differentially expressed in an asymmetric manner. We compare our method with existing methods using simulated and real data sets.

preprint2015arXiv

A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models

In this paper, we propose the new fixed-size ordinally-forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence of words into a fixed-size representation. FOFE can model the word order in a sequence using a simple ordinally-forgetting mechanism according to the positions of words. In this work, we have applied FOFE to feedforward neural network language models (FNN-LMs). Experimental results have shown that without using any recurrent feedbacks, FOFE based FNN-LMs can significantly outperform not only the standard fixed-input FNN-LMs but also the popular RNN-LMs.

preprint2015arXiv

Deep Learning for Object Saliency Detection and Image Segmentation

In this paper, we propose several novel deep learning methods for object saliency detection based on the powerful convolutional neural networks. In our approach, we use a gradient descent method to iteratively modify an input image based on the pixel-wise gradients to reduce a cost function measuring the class-specific objectness of the image. The pixel-wise gradients can be efficiently computed using the back-propagation algorithm. The discrepancy between the modified image and the original one may be used as a saliency map for the image. Moreover, we have further proposed several new training methods to learn saliency-specific convolutional nets for object saliency detection, in order to leverage the available pixel-wise segmentation information. Our methods are extremely computationally efficient (processing 20-40 images per second in one GPU). In this work, we use the computed saliency maps for image segmentation. Experimental results on two benchmark tasks, namely Microsoft COCO and Pascal VOC 2012, have shown that our proposed methods can generate high-quality salience maps, clearly outperforming many existing methods. In particular, our approaches excel in handling many difficult images, which contain complex background, highly-variable salient objects, multiple objects, and/or very small salient objects.

preprint2015arXiv

Feedforward Sequential Memory Neural Networks without Recurrent Feedback

We introduce a new structure for memory neural networks, called feedforward sequential memory networks (FSMN), which can learn long-term dependency without using recurrent feedback. The proposed FSMN is a standard feedforward neural networks equipped with learnable sequential memory blocks in the hidden layers. In this work, we have applied FSMN to several language modeling (LM) tasks. Experimental results have shown that the memory blocks in FSMN can learn effective representations of long history. Experiments have shown that FSMN based language models can significantly outperform not only feedforward neural network (FNN) based LMs but also the popular recurrent neural network (RNN) LMs.

preprint2015arXiv

Good upper bounds for the total rainbow connection of graphs

A total-colored graph is a graph $G$ such that both all edges and all vertices of $G$ are colored. A path in a total-colored graph $G$ is a total rainbow path if its edges and internal vertices have distinct colors. A total-colored graph $G$ is total-rainbow connected if any two vertices of $G$ are connected by a total rainbow path of $G$. The total rainbow connection number of $G$, denoted by $trc(G)$, is defined as the smallest number of colors that are needed to make $G$ total-rainbow connected. These concepts were introduced by Liu et al. Notice that for a connected graph $G$, $2diam(G)-1\leq trc(G)\leq 2n-3$, where $diam(G)$ denotes the diameter of $G$ and $n$ is the order of $G$. In this paper we show, for a connected graph $G$ of order $n$ with minimum degree $δ$, that $trc(G)\leq6n/{(δ+1)}+28$ for $δ\geq\sqrt{n-2}-1$ and $n\geq 291$, while $trc(G)\leq7n/{(δ+1)}+32$ for $16\leqδ\leq\sqrt{n-2}-2$ and $trc(G)\leq7n/{(δ+1)}+4C(δ)+12$ for $6\leqδ\leq15$, where $C(δ)=e^{\frac{3\log(δ^3+2δ^2+3)-3(\log3-1)}{δ-3}}-2$. This implies that when $δ$ is in linear with $n$, then the total rainbow number $trc(G)$ is a constant. We also show that $trc(G)\leq 7n/4-3$ for $δ=3$, $trc(G)\leq8n/5-13/5$ for $δ=4$ and $trc(G)\leq3n/2-3$ for $δ=5$. Furthermore, an example shows that our bound can be seen tight up to additive factors when $δ\geq\sqrt{n-2}-1$.

preprint2015arXiv

Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks

In this paper, we propose a novel model for high-dimensional data, called the Hybrid Orthogonal Projection and Estimation (HOPE) model, which combines a linear orthogonal projection and a finite mixture model under a unified generative modeling framework. The HOPE model itself can be learned unsupervised from unlabelled data based on the maximum likelihood estimation as well as discriminatively from labelled data. More interestingly, we have shown the proposed HOPE models are closely related to neural networks (NNs) in a sense that each hidden layer can be reformulated as a HOPE model. As a result, the HOPE framework can be used as a novel tool to probe why and how NNs work, more importantly, to learn NNs in either supervised or unsupervised ways. In this work, we have investigated the HOPE framework to learn NNs for several standard tasks, including image recognition on MNIST and speech recognition on TIMIT. Experimental results have shown that the HOPE framework yields significant performance gains over the current state-of-the-art methods in various types of NN learning problems, including unsupervised feature learning, supervised or semi-supervised learning.

preprint2015arXiv

Large deviations of the Threshold estimator of integrated (co-)volatility vector in the presence of jumps

Recently a considerable interest has been paid on the estimation problem of the realized volatility and covolatility by using high-frequency data of financial price processes in financial econometrics. Threshold estimation is one of the useful techniques in the inference for jump-type stochastic processes from discrete observations. In this paper, we adopt the threshold estimator introduced by Mancini where only the variations under a given threshold function are taken into account. The purpose of this work is to investigate large and moderate deviations for the threshold estimator of the integrated variance-covariance vector. This paper is an extension of the previous work in Djellout et al. where the problem has been studied in absence of the jump component. We will use the approximation lemma to prove the LDP. As the reader can expect we obtain the same results as in the case without jump.

preprint2015arXiv

Leverage Financial News to Predict Stock Price Movements Using Word Embeddings and Deep Neural Networks

Financial news contains useful information on public companies and the market. In this paper we apply the popular word embedding methods and deep neural networks to leverage financial news to predict stock price movements in the market. Experimental results have shown that our proposed methods are simple but very effective, which can significantly improve the stock prediction accuracy on a standard financial database over the baseline system using only the historical price information.

preprint2015arXiv

On (strong) proper vertex-connection of graphs

A path in a vertex-colored graph is a {\it vertex-proper path} if any two internal adjacent vertices differ in color. A vertex-colored graph is {\it proper vertex $k$-connected} if any two vertices of the graph are connected by $k$ disjoint vertex-proper paths of the graph. For a $k$-connected graph $G$, the {\it proper vertex $k$-connection number} of $G$, denoted by $pvc_{k}(G)$, is defined as the smallest number of colors required to make $G$ proper vertex $k$-connected. A vertex-colored graph is {\it strong proper vertex-connected}, if for any two vertices $u,v$ of the graph, there exists a vertex-proper $u$-$v$ geodesic. For a connected graph $G$, the {\it strong proper vertex-connection number} of $G$, denoted by $spvc(G)$, is the smallest number of colors required to make $G$ strong proper vertex-connected. These concepts are inspired by the concepts of rainbow vertex $k$-connection number $rvc_k(G)$, strong rainbow vertex-connection number $srvc(G)$, and proper $k$-connection number $pc_k(G)$ of a $k$-connected graph $G$. Firstly, we determine the value of $pvc(G)$ for general graphs and $pvc_k(G)$ for some specific graphs. We also compare the values of $pvc_k(G)$ and $pc_k(G)$. Then, sharp bounds of $spvc(G)$ are given for a connected graph $G$ of order $n$, that is, $0\leq spvc(G)\leq n-2$. Moreover, we characterize the graphs of order $n$ such that $spvc(G)=n-2,n-3$, respectively. Finally, we study the relationship among the three vertex-coloring parameters, namely, $spvc(G), \ srvc(G)$ and the chromatic number $χ(G)$ of a connected graph $G$.

preprint2015arXiv

Total proper connection of graphs

A graph is said to be {\it total-colored} if all the edges and the vertices of the graph is colored. A path in a total-colored graph is a {\it total proper path} if $(i)$ any two adjacent edges on the path differ in color, $(ii)$ any two internal adjacent vertices on the path differ in color, and $(iii)$ any internal vertex of the path differs in color from its incident edges on the path. A total-colored graph is called {\it total-proper connected} if any two vertices of the graph are connected by a total proper path of the graph. For a connected graph $G$, the {\it total proper connection number} of $G$, denoted by $tpc(G)$, is defined as the smallest number of colors required to make $G$ total-proper connected. These concepts are inspired by the concepts of proper connection number $pc(G)$, proper vertex connection number $pvc(G)$ and total rainbow connection number $trc(G)$ of a connected graph $G$. In this paper, we first determine the value of the total proper connection number $tpc(G)$ for some special graphs $G$. Secondly, we obtain that $tpc(G)\leq 4$ for any $2$-connected graph $G$ and give examples to show that the upper bound $4$ is sharp. For general graphs, we also obtain an upper bound for $tpc(G)$. Furthermore, we prove that $tpc(G)\leq \frac{3n}{δ+1}+1$ for a connected graph $G$ with order $n$ and minimum degree $δ$. Finally, we compare $tpc(G)$ with $pvc(G)$ and $pc(G)$, respectively, and obtain that $tpc(G)>pvc(G)$ for any nontrivial connected graph $G$, and that $tpc(G)$ and $pc(G)$ can differ by $t$ for $0\leq t\leq 2$.

preprint2014arXiv

Asymptotic distributions related to mildly-explosive second order autoregressive models

In this paper, we consider the normalized least squares estimator of the parameter in a mildly-explosive first-order autoregressive model with dependent errors which are modeled as a mildly-explosive AR(1) process. We prove that the estimator has a Cauchy limit law which provides a bridge between moderate deviation asymptotics and the earlier results on the local to unity and explosive autoregressive models. In particular, the results can be applied to understand the near-integrated second order autoregressive processes. Simulation studies are also carried out to assess the performance of least squares estimation in finite samples.

preprint2014arXiv

Robust estimation of isoform expression with RNA-Seq data

Qualifying gene and isoform expression is one of the primary tasks for RNA-Seq experiments. Given a sequence of counts representing numbers of reads mapped to different positions (exons and junctions) of isoforms, methods based on Poisson generalized linear models (GLM) with the identity link function have been proposed to estimate isoform expression levels from these counts. These Poisson based models have very limited ability in handling the overdispersion in the counts brought by various sources, and some of them are not robust to outliers. We propose a negative binomial based GLM with identity link, and use a set of robustified quasi-likelihood equations to make it resistant to outliers. An efficient and reliable numeric algorithm has been identified to solve these equations. In simulations, we find that our approach seems to outperform existing approaches. We also find evidence supporting this conclusion in real RNA-Seq data.

preprint2013arXiv

A penalized likelihood approach for robust estimation of isoform expression

Ultra high-throughput sequencing of transcriptomes (RNA-Seq) has enabled the accurate estimation of gene expression at individual isoform level. However, systematic biases introduced during the sequencing and mapping processes as well as incompleteness of the transcript annotation databases may cause the estimates of isoform abundances to be unreliable, and in some cases, highly inaccurate. This paper introduces a penalized likelihood approach to detect and correct for such biases in a robust manner. Our model extends those previously proposed by introducing bias parameters for reads. An L1 penalty is used for the selection of non-zero bias parameters. We introduce an efficient algorithm for model fitting and analyze the statistical properties of the proposed model. Our experimental studies on both simulated and real datasets suggest that the model has the potential to improve isoform-specific gene expression estimates and identify incompletely annotated gene models.

preprint2013arXiv

Computational Aspects of Optional Pólya Tree

Optional Pólya Tree (OPT) is a flexible non-parametric Bayesian model for density estimation. Despite its merits, the computation for OPT inference is challenging. In this paper we present time complexity analysis for OPT inference and propose two algorithmic improvements. The first improvement, named Limited-Lookahead Optional Pólya Tree (LL-OPT), aims at greatly accelerate the computation for OPT inference. The second improvement modifies the output of OPT or LL-OPT and produces a continuous piecewise linear density estimate. We demonstrate the performance of these two improvements using simulations.

preprint2011arXiv

Statistical Modeling of RNA-Seq Data

Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth. Simulation studies are also given.

preprint2011arXiv

Thermodynamic properties and phase diagrams of spin-1 quantum Ising systems with three-spin interactions

The spin-1 quantum Ising systems with three-spin interactions on two-dimensional triangular lattices are studied by mean-field method. The thermal variations of order parameters and phase diagrams are investigated in detail. The stable, metastable and unstable branches of the order parameters are obtained. According to the stable conditions at critical point, we find that the systems exhibit tricritical points. With crystal field and biquadratic interactions, the system has rich phase diagrams with single reentrant or double reentrant phase transitions for appropriate ranges of the both parameters.

Hui Jiang

What is connected

Connect this record

See the researcher in context

Building this map preview

43 published item(s)

Stochastic volatility modeling of high-frequency CSI 300 index and dynamic jump prediction driven by machine learning

Analysis of stock index with a generalized BN-S model: an approach based on machine learning and fuzzy parameters

DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding

Functional large deviations for Stroock's approximation to a class of Gaussian processes with application to small noise diffusions

Kullback-Leibler-Based Discrete Failure Time Models for Integration of Published Prediction Models with New Time-To-Event Dataset

One-dimensional quasi bound states in the continuum in the ω~k space for nonlinear optical applications

Enhanced Aspect-Based Sentiment Analysis Models with Progressive Self-supervised Attention Learning

Filling up complex spectral regions through non-Hermitian disordered chains

Match$^2$: A Matching over Matching Model for Similar Question Identification

On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks

Topological invariants, zero mode edge states and finite size effect for a generalized non-reciprocal Su-Schrieffer-Heeger model

Enhanced LSTM for Natural Language Inference

A Deep Learning Based Fast Image Saliency Detection Algorithm

A FOFE-based Local Detection Approach for Named Entity Recognition and Mention Detection

A Unified Model for Differential Expression Analysis of RNA-seq Data via L1-Penalized Linear Regression

Commonsense Knowledge Enhanced Embeddings for Solving Pronoun Disambiguation Problems in Winograd Schema Challenge

Distraction-Based Neural Networks for Document Summarization

Erdős-Gallai-type results for total monochromatic connection of graphs

Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency

Generating images with recurrent adversarial networks

Higher Order Recurrent Neural Networks

Learning Convolutional Neural Networks using Hybrid Orthogonal Projection and Estimation

More on total monochromatic connection of graphs

Neural Networks Models for Entity Discovery and Linking

Part-of-Speech Relevance Weights for Learning Word Embeddings

Probabilistic Reasoning via Deep Learning: Neural Association Models

Total monochromatic connection of graphs

Unit-free and robust detection of differential expression from RNA-Seq data

A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models

Deep Learning for Object Saliency Detection and Image Segmentation

Feedforward Sequential Memory Neural Networks without Recurrent Feedback

Good upper bounds for the total rainbow connection of graphs

Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks

Large deviations of the Threshold estimator of integrated (co-)volatility vector in the presence of jumps

Leverage Financial News to Predict Stock Price Movements Using Word Embeddings and Deep Neural Networks

On (strong) proper vertex-connection of graphs

Total proper connection of graphs

Asymptotic distributions related to mildly-explosive second order autoregressive models

Robust estimation of isoform expression with RNA-Seq data

A penalized likelihood approach for robust estimation of isoform expression

Computational Aspects of Optional Pólya Tree

Statistical Modeling of RNA-Seq Data

Thermodynamic properties and phase diagrams of spin-1 quantum Ising systems with three-spin interactions