Source author record

Nan Ding

Nan Ding appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning astro-ph.HE Computation and Language quant-ph astro-ph.GA Computer Vision Distributed, Parallel, and Cluster Computing eess.AS Genomics math.ST Molecular Networks Neural and Evolutionary Computing Sound Statistics Theory Tissues and Organs

Catalog footprint

What is connected

23works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

All You May Need for VQA are Image Captions

Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but has not enjoyed the same level of engagement in terms of data creation. In this paper, we propose a method that automatically derives VQA examples at volume, by leveraging the abundance of existing image-caption annotations combined with neural models for textual question generation. We show that the resulting data is of high-quality. VQA models trained on our data improve state-of-the-art zero-shot accuracy by double digits and achieve a level of robustness that lacks in the same model trained on human-annotated VQA data.

preprint2022arXiv

PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks

With the increasing abundance of pretrained models in recent years, the problem of selecting the best pretrained checkpoint for a particular downstream classification task has been gaining increased attention. Although several methods have recently been proposed to tackle the selection problem (e.g. LEEP, H-score), these methods resort to applying heuristics that are not well motivated by learning theory. In this paper we present PACTran, a theoretically grounded family of metrics for pretrained model selection and transferability measurement. We first show how to derive PACTran metrics from the optimal PAC-Bayesian bound under the transfer learning setting. We then empirically evaluate three metric instantiations of PACTran on a number of vision tasks (VTAB) as well as a language-and-vision (OKVQA) task. An analysis of the results shows PACTran is a more consistent and effective transferability measure compared to existing selection methods.

preprint2021arXiv

Detection of a possible high-confidence radio quasi-periodic oscillation in the BL Lac PKS J2134-0153

We have searched quasi-periodic oscillations (QPOs) for BL Lac PKS J2134-0153 in the 15 GHz radio light curve announced by the Owens Valley Radio Observatory 40-m telescope during the period from 2008-01-05 to 2019-05-18, utilizing the Lomb-Scargle periodogram (LSP) and the weighted wavelet Z-transform (WWZ) techniques. This is the first time that to search for periodic radio signal in BL Lac PKS J2134-0153 by these two methods. These two methods consistently reveal a QPO of 4.69 $\pm$ 0.14 years (>5 $σ$ confidence level). We discuss possible causes for this QPO, and we expected that the binary black holes scenario, where the QPO is caused by the precession of the binary black holes, is the most likely explanation. BL Lac PKS J2134-0153 thus could be a good binary black hole candidate. In the binary black holes scenario, the distance between the primary black hole and the secondary black hole is 1.83$\times$10$^{16}$ cm.

preprint2020arXiv

A two-zone blazar radiation model for "orphan" neutrino flares

In this work, we investigate the 2014-2015 neutrino flare associated with the blazar TXS 0506+056 and a recently discovered muon neutrino event IceCube-200107A in spatial coincidence with the blazar 4FGL J0955.1+3551, under the framework of a two-zone radiation model of blazars where an inner/outer blob close to/far from the supermassive black hole are invoked. An interesting feature that the two sources share in common is that no evidence of GeV gamma-ray activity is found during the neutrino detection period, probably implying a large opacity for GeV gamma rays in the neutrino production region. In our model, continuous particle acceleration/injection takes place in the inner blob at the jet base, where the hot X-ray corona of the supermassive black hole provides target photon fields for efficient neutrino production and strong GeV gamma-ray absorption. We show that this model can self-consistently interpret the neutrino emission from both two blazars in a large parameter space. In the meantime, the dissipation processes in outer blob are responsible for the simultaneous multi-wavelength emission of both sources. In agreement with previous studies of TXS 0506+056 and, an intense MeV emission from the induced electromagnetic cascade in the inner blob is robustly expected to accompany the neutrino flare in our model could be used to test the model with the next-generation MeV gamma-ray detector in the future.

preprint2020arXiv

From the Fermi blazar sequence to the relation between Fermi blazars and gamma-ray Narrow-line Seyfert 1 Galaxies

We use the third catalog of blazars detected by Fermi/LAT (3LAC) and gamma-ray Narrow-line Seyfert 1 Galaxies (gamma-NLSy1s) to study the blazar sequence and relationship between them. Our results are as follows: (i) There is a weak anti-correlation between synchrotron peak frequency and peak luminosity for both Fermi blazars and gamma-NLSy1s, which supports the blazar sequence. However, after Doppler correction, the inverse correlation disappeared, which suggests that anti-correlation between synchrotron peak frequency and peak luminosity is affected by the beaming effect. (ii) There is a significant anti-correlation between jet kinetic power and synchrotron peak frequency for both Fermi blazars and gamma-NLSy1s, which suggests that the gamma-NLSy1s could fit well into the original blazar sequence. (iii) According to previous work, the relationship between synchrotron peak frequency and synchrotron curvature can be explained by statistical or stochastic acceleration mechanisms. There are significant correlations between synchrotron peak frequency and synchrotron curvature for whole sample, Fermi blazars and BL Lacs, respectively. The slopes of the correlation are consistent with statistical acceleration. For FSRQs, LBLs, IBLs, HBLs, and gamma-NLS1s, we also find a significant correlation, but in these cases the slopes can not be explained by previous theoretical models. (iv) The slope of relation between synchrotron peak frequency and synchrotron curvature in gamma-NLS1s is large than that of FSRQs and BL Lacs. This result may imply that the cooling dominates over the acceleration process for FSRQs and BL Lacs, while gamma-NLS1s is the opposite.

preprint2020arXiv

iqiyi Submission to ActivityNet Challenge 2019 Kinetics-700 challenge: Hierarchical Group-wise Attention

In this report, the method for the iqiyi submission to the task of ActivityNet 2019 Kinetics-700 challenge is described. Three models are involved in the model ensemble stage: TSN, HG-NL and StNet. We propose the hierarchical group-wise non-local (HG-NL) module for frame-level features aggregation for video classification. The standard non-local (NL) module is effective in aggregating frame-level features on the task of video classification but presents low parameters efficiency and high computational cost. The HG-NL method involves a hierarchical group-wise structure and generates multiple attention maps to enhance performance. Basing on this hierarchical group-wise structure, the proposed method has competitive accuracy, fewer parameters and smaller computational cost than the standard NL. For the task of ActivityNet 2019 Kinetics-700 challenge, after model ensemble, we finally obtain an averaged top-1 and top-5 error percentage 28.444% on the test set.

preprint2020arXiv

LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment

Pairwise sequence alignment is one of the most computationally intensive kernels in genomic data analysis, accounting for more than 90% of the runtime for key bioinformatics applications. This method is particularly expensive for third-generation sequences due to the high computational cost of analyzing sequences of length between 1Kb and 1Mb. Given the quadratic overhead of exact pairwise algorithms for long alignments, the community primarily relies on approximate algorithms that search only for high-quality alignments and stop early when one is not found. In this work, we present the first GPU optimization of the popular X-drop alignment algorithm, that we named LOGAN. Results show that our high-performance multi-GPU implementation achieves up to 181.6 GCUPS and speed-ups up to 6.6x and 30.7x using 1 and 6 NVIDIA Tesla V100, respectively, over the state-of-the-art software running on two IBM Power9 processors using 168 CPU threads, with equivalent accuracy. We also demonstrate a 2.3x LOGAN speed-up versus ksw2, a state-of-art vectorized algorithm for sequence alignment implemented in minimap2, a long-read mapping software. To highlight the impact of our work on a real-world application, we couple LOGAN with a many-to-many long-read alignment software called BELLA, and demonstrate that our implementation improves the overall BELLA runtime by up to 10.6x. Finally, we adapt the Roofline model for LOGAN and demonstrate that our implementation is near-optimal on the NVIDIA Tesla V100s.

preprint2020arXiv

Multi-wavelength Selected Compton-thick AGNs in Chandra Deep Field-South Survey

Even in deep X-ray surveys, Compton-thick active galactic nuclei (CT AGNs, ${\rm N_H} \geqslant 1.5~\times~10^{24}~{\rm cm}^{-2}$) are difficult to be identified due to X-ray flux suppression and their complex spectral shape. However, the study of CT AGNs is vital for understanding the rapid growth of black holes and the origin of cosmic X-ray background. In the local universe, the fraction of CT AGNs accounts for 30% of the whole AGN population. We may expect a higher fraction of CT AGNs in deep X-ray surveys, however, only 10% of AGNs have been identified as CT AGNs in the 7 Ms \textit{Chandra} Deep Field-South (CDFS) survey. In this work, we select 51 AGNs with abundant multi-wavelength data. Using the method of the mid-infrared (mid-IR) excess, we select hitherto unknown 8 CT AGN candidates in our sample. Seven of these candidates can confirm as CT AGN based on the multi-wavelength identification approach, and a new CT AGN (XID 133) is identified through the mid-IR diagnostics. We also discuss the X-ray origin of these eight CT AGNs and the reason why their column densities were underestimated in previous studies. We find that the multi-wavelength approaches of selecting CT AGNs are highly efficient, provided the high quality of observational data. We also find that CT AGNs have a higher Eddington ratio than non-CT AGNs, and that both CT AGNs and non-CT AGNs show similar properties of host galaxies.

preprint2020arXiv

Multicolor Optical Monitoring of the Blazar S5 0716+714 from 2017 to 2019

We continuously monitored the blazar S5 0716+714 in the optical $g$, $r$ and $i$ bands from Nov. 10, 2017 to Jun. 06, 2019. The total number of observations is 201 nights including 26973 data points. This is a very large quasi-simultaneous multicolor sample for the blazar. The average time spans and time resolutions are 3.4 hours and 2.9 minutes per night, respectively. During the period of observations, the target source in the $r$ band brightens from $14^{\rm m}.16$ to $12^{\rm m}.29$ together with five prominent sub-flares, and then first becomes fainter to $14^{\rm m}.76$ and again brightens to $12^{\rm m}.94$ with seven prominent sub-flares. For the long-term variations, we find a strong flatter when brighter (FWB) trend at a low flux state and then a weak FWB trend at a higher flux state. A weak FWB trend at a low flux state and then a strong FWB trend at a higher flux state are also reported. Most of sub-flares show the strong FWB trends, except for two flares with a weak FWB trend. The particle acceleration and cooling mechanisms together with the superposition of different FWB-slopes from sub-flares are likely to explain the optical color behaviours. A scenario of bent jet is discussed.

preprint2020arXiv

Talking-Heads Attention

We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation.While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.

preprint2016arXiv

Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors

We present a dual contribution to the task of machine reading-comprehension: a technique for creating large-sized machine-comprehension (MC) datasets using paragraph-vector models; and a novel, hybrid neural-network architecture that combines the representation power of recurrent neural networks with the discriminative power of fully-connected multi-layered networks. We use the MC-dataset generation technique to build a dataset of around 2 million examples, for which we empirically determine the high-ceiling of human performance (around 91% accuracy), as well as the performance of a variety of computer models. Among all the models we have experimented with, our hybrid neural-network architecture achieves the highest performance (83.2% accuracy). The remaining gap to the human-performance ceiling provides enough room for future model improvements.

preprint2016arXiv

Multilingual Word Embeddings using Multigraphs

We present a family of neural-network--inspired models for computing continuous word representations, specifically designed to exploit both monolingual and multilingual text. This framework allows us to perform unsupervised training of embeddings that exhibit higher accuracy on syntactic and semantic compositionality, as well as multilingual semantic similarity, compared to previous models trained in an unsupervised fashion. We also show that such multilingual embeddings, optimized for semantic similarity, can improve the performance of statistical machine translation with respect to how it handles words not present in the parallel data.

preprint2016arXiv

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

Recent advances in Bayesian learning with large-scale data have witnessed emergence of stochastic gradient MCMC algorithms (SG-MCMC), such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian MCMC (SGHMC), and the stochastic gradient thermostat. While finite-time convergence properties of the SGLD with a 1st-order Euler integrator have recently been studied, corresponding theory for general SG-MCMCs has not been explored. In this paper we consider general SG-MCMCs with high-order integrators, and develop theory to analyze finite-time convergence properties and their asymptotic invariant measures. Our theoretical results show faster convergence rates and more accurate invariant measures for SG-MCMCs with higher-order integrators. For example, with the proposed efficient 2nd-order symmetric splitting integrator, the {\em mean square error} (MSE) of the posterior average for the SGHMC achieves an optimal convergence rate of $L^{-4/5}$ at $L$ iterations, compared to $L^{-2/3}$ for the SGHMC and SGLD with 1st-order Euler integrators. Furthermore, convergence results of decreasing-step-size SG-MCMCs are also developed, with the same convergence rates as their fixed-step-size counterparts for a specific decreasing sequence. Experiments on both synthetic and real datasets verify our theory, and show advantages of the proposed method in two large-scale real applications.

preprint2016arXiv

Stochastic Gradient MCMC with Stale Gradients

Stochastic gradient MCMC (SG-MCMC) has played an important role in large-scale Bayesian learning, with well-developed theoretical convergence properties. In such applications of SG-MCMC, it is becoming increasingly popular to employ distributed systems, where stochastic gradients are computed based on some outdated parameters, yielding what are termed stale gradients. While stale gradients could be directly used in SG-MCMC, their impact on convergence properties has not been well studied. In this paper we develop theory to show that while the bias and MSE of an SG-MCMC algorithm depend on the staleness of stochastic gradients, its estimation variance (relative to the expected estimate, based on a prescribed number of samples) is independent of it. In a simple Bayesian distributed system with SG-MCMC, where stale gradients are computed asynchronously by a set of workers, our theory indicates a linear speedup on the decrease of estimation variance w.r.t. the number of workers. Experiments on synthetic data and deep neural networks validate our theory, demonstrating the effectiveness and scalability of SG-MCMC with stale gradients.

preprint2016arXiv

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options. Accomplishing the task entails demonstrating comprehension beyond just recognizing "keywords" (or key-phrases) and their corresponding visual concepts. Instead, it requires an alignment between the representations of the two modalities that achieves a visually-grounded "understanding" of various linguistic elements and their dependencies. This new task also admits an easy-to-compute and well-studied metric: the accuracy in detecting the true target among the decoys. The paper makes several contributions: an effective and extensible mechanism for generating decoys from (human-created) image captions; an instance of applying this mechanism, yielding a large-scale machine comprehension dataset (based on the COCO images and captions) that we make publicly available; human evaluation results on this dataset, informing a performance upper-bound; and several baseline and competitive learning approaches that illustrate the utility of the proposed task and dataset in advancing both image and language comprehension. We also show that, in a multi-task learning setting, the performance on the proposed task is positively correlated with the end-to-end task of image captioning.

preprint2016arXiv

What is the Computational Value of Finite Range Tunneling?

Quantum annealing (QA) has been proposed as a quantum enhanced optimization heuristic exploiting tunneling. Here, we demonstrate how finite range tunneling can provide considerable computational advantage. For a crafted problem designed to have tall and narrow energy barriers separating local minima, the D-Wave 2X quantum annealer achieves significant runtime advantages relative to Simulated Annealing (SA). For instances with 945 variables, this results in a time-to-99%-success-probability that is $\sim 10^8$ times faster than SA running on a single processor core. We also compared physical QA with Quantum Monte Carlo (QMC), an algorithm that emulates quantum tunneling on classical processors. We observe a substantial constant overhead against physical QA: D-Wave 2X again runs up to $\sim 10^8$ times faster than an optimized implementation of QMC on a single core. We note that there exist heuristic classical algorithms that can solve most instances of Chimera structured problems in a timescale comparable to the D-Wave 2X. However, we believe that such solvers will become ineffective for the next generation of annealers currently being designed. To investigate whether finite range tunneling will also confer an advantage for problems of practical interest, we conduct numerical studies on binary optimization problems that cannot yet be represented on quantum hardware. For random instances of the number partitioning problem, we find numerically that QMC, as well as other algorithms designed to simulate QA, scale better than SA. We discuss the implications of these findings for the design of next generation quantum annealers.

preprint2015arXiv

Probabilistic Label Relation Graphs with Ising Models

We consider classification problems in which the label space has structure. A common example is hierarchical label spaces, corresponding to the case where one label subsumes another (e.g., animal subsumes dog). But labels can also be mutually exclusive (e.g., dog vs cat) or unrelated (e.g., furry, carnivore). To jointly model hierarchy and exclusion relations, the notion of a HEX (hierarchy and exclusion) graph was introduced in [7]. This combined a conditional random field (CRF) with a deep neural network (DNN), resulting in state of the art results when applied to visual object classification problems where the training labels were drawn from different levels of the ImageNet hierarchy (e.g., an image might be labeled with the basic level category "dog", rather than the more specific label "husky"). In this paper, we extend the HEX model to allow for soft or probabilistic relations between labels, which is useful when there is uncertainty about the relationship between two labels (e.g., an antelope is "sort of" furry, but not to the same degree as a grizzly bear). We call our new model pHEX, for probabilistic HEX. We show that the pHEX graph can be converted to an Ising model, which allows us to use existing off-the-shelf inference methods (in contrast to the HEX method, which needed specialized inference algorithms). Experimental results show significant improvements in a number of large-scale visual object classification tasks, outperforming the previous HEX model.

preprint2015arXiv

Totally Corrective Boosting with Cardinality Penalization

We propose a totally corrective boosting algorithm with explicit cardinality regularization. The resulting combinatorial optimization problems are not known to be efficiently solvable with existing classical methods, but emerging quantum optimization technology gives hope for achieving sparser models in practice. In order to demonstrate the utility of our algorithm, we use a distributed classical heuristic optimizer as a stand-in for quantum hardware. Even though this evaluation methodology incurs large time and resource costs on classical computing machinery, it allows us to gauge the potential gains in generalization performance and sparsity of the resulting boosted ensembles. Our experimental results on public data sets commonly used for benchmarking of boosting algorithms decidedly demonstrate the existence of such advantages. If actual quantum optimization were to be used with this algorithm in the future, we would expect equivalent or superior results at much smaller time and energy costs during training. Moreover, studying cardinality-penalized boosting also sheds light on why unregularized boosting algorithms with early stopping often yield better results than their counterparts with explicit convex regularization: Early stopping performs suboptimal cardinality regularization. The results that we present here indicate it is beneficial to explicitly solve the combinatorial problem still left open at early termination.

preprint2014arXiv

Construction of non-convex polynomial loss functions for training a binary classifier with quantum annealing

Quantum annealing is a heuristic quantum algorithm which exploits quantum resources to minimize an objective function embedded as the energy levels of a programmable physical system. To take advantage of a potential quantum advantage, one needs to be able to map the problem of interest to the native hardware with reasonably low overhead. Because experimental considerations constrain our objective function to take the form of a low degree PUBO (polynomial unconstrained binary optimization), we employ non-convex loss functions which are polynomial functions of the margin. We show that these loss functions are robust to label noise and provide a clear advantage over convex methods. These loss functions may also be useful for classical approaches as they compile to regularized risk expressions which can be evaluated in constant time with respect to the number of training examples.

preprint2013arXiv

Warburg Effect due to Exposure to Different Types of Radiation

Cancer cells maintain a high level of aerobic glycolysis (the Warburg effect), which is associated with their rapid proliferation. Many studies have reported that the suppression of glycolysis and activation of oxidative phosphorylation can repress the growth of cancer cells through regulation of key regulators. Whether Warburg effect of cancer cells could be switched by some other environmental stimulus? Herein, we report an interesting phenomenon in which cells alternated between glycolysis and mitochondrial respiration depending on the type of radiation they were exposed to. We observed enhanced glycolysis and mitochondrial respiration in HeLa cells exposed to 2-Gy X-ray and 2-Gy carbon ion radiation, respectively. This discovery may provide novel insights for tumor therapy.

preprint2012arXiv

Dependent Hierarchical Normalized Random Measures for Dynamic Topic Modeling

We develop dependent hierarchical normalized random measures and apply them to dynamic topic modeling. The dependency arises via superposition, subsampling and point transition on the underlying Poisson processes of these measures. The measures used include normalised generalised Gamma processes that demonstrate power law properties, unlike Dirichlet processes used previously in dynamic topic modeling. Inference for the model includes adapting a recently developed slice sampler to directly manipulate the underlying Poisson process. Experiments performed on news, blogs, academic and Twitter collections demonstrate the technique gives superior perplexity over a number of previous models.

preprint2012arXiv

Robust Classification with Adiabatic Quantum Optimization

We propose a non-convex training objective for robust binary classification of data sets in which label noise is present. The design is guided by the intention of solving the resulting problem by adiabatic quantum optimization. Two requirements are imposed by the engineering constraints of existing quantum hardware: training problems are formulated as quadratic unconstrained binary optimization; and model parameters are represented as binary expansions of low bit-depth. In the present work we validate this approach by using a heuristic classical solver as a stand-in for quantum hardware. Testing on several popular data sets and comparing with a number of existing losses we find substantial advantages in robustness as measured by test error under increasing label noise. Robustness is enabled by the non-convexity of our hardware-compatible loss function, which we name q-loss.

preprint2012arXiv

Theory of Dependent Hierarchical Normalized Random Measures

This paper presents theory for Normalized Random Measures (NRMs), Normalized Generalized Gammas (NGGs), a particular kind of NRM, and Dependent Hierarchical NRMs which allow networks of dependent NRMs to be analysed. These have been used, for instance, for time-dependent topic modelling. In this paper, we first introduce some mathematical background of completely random measures (CRMs) and their construction from Poisson processes, and then introduce NRMs and NGGs. Slice sampling is also introduced for posterior inference. The dependency operators in Poisson processes and for the corresponding CRMs and NRMs is then introduced and Posterior inference for the NGG presented. Finally, we give dependency and composition results when applying these operators to NRMs so they can be used in a network with hierarchical and dependent relations.

Nan Ding

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

All You May Need for VQA are Image Captions

PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks

Detection of a possible high-confidence radio quasi-periodic oscillation in the BL Lac PKS J2134-0153

A two-zone blazar radiation model for "orphan" neutrino flares

From the Fermi blazar sequence to the relation between Fermi blazars and gamma-ray Narrow-line Seyfert 1 Galaxies

iqiyi Submission to ActivityNet Challenge 2019 Kinetics-700 challenge: Hierarchical Group-wise Attention

LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment

Multi-wavelength Selected Compton-thick AGNs in Chandra Deep Field-South Survey

Multicolor Optical Monitoring of the Blazar S5 0716+714 from 2017 to 2019

Talking-Heads Attention

Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors

Multilingual Word Embeddings using Multigraphs

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

Stochastic Gradient MCMC with Stale Gradients

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

What is the Computational Value of Finite Range Tunneling?

Probabilistic Label Relation Graphs with Ising Models

Totally Corrective Boosting with Cardinality Penalization

Construction of non-convex polynomial loss functions for training a binary classifier with quantum annealing

Warburg Effect due to Exposure to Different Types of Radiation

Dependent Hierarchical Normalized Random Measures for Dynamic Topic Modeling

Robust Classification with Adiabatic Quantum Optimization

Theory of Dependent Hierarchical Normalized Random Measures