Source author record

Kenji Kawaguchi

Kenji Kawaguchi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence astro-ph.HE math.OC Computer Vision Neural and Evolutionary Computing astro-ph.GA astro-ph.SR astro-ph.EP Computer Science and Game Theory math.PR

Catalog footprint

What is connected

29works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Simple Hierarchical Planning with Diffusion

Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets. However, they often face computational challenges and can falter in generalization, especially in capturing temporal abstractions for long-horizon tasks. To overcome this, we introduce the Hierarchical Diffuser, a simple, fast, yet surprisingly effective planning method combining the advantages of hierarchical and diffusion-based planning. Our model adopts a "jumpy" planning strategy at the higher level, which allows it to have a larger receptive field but at a lower computational cost -- a crucial factor for diffusion-based planning methods, as we have empirically verified. Additionally, the jumpy sub-goals guide our low-level planner, facilitating a fine-tuning stage and further improving our approach's effectiveness. We conducted empirical evaluations on standard offline reinforcement learning benchmarks, demonstrating our method's superior performance and efficiency in terms of training and planning speed compared to the non-hierarchical Diffuser as well as other hierarchical planning methods. Moreover, we explore our model's generalization capability, particularly on how our method improves generalization capabilities on compositional out-of-distribution tasks.

preprint2023arXiv

Clustering Aware Classification for Risk Prediction and Subtyping in Clinical Data

In data containing heterogeneous subpopulations, classification performance benefits from incorporating the knowledge of cluster structure in the classifier. Previous methods for such combined clustering and classification either 1) are classifier-specific and not generic, or 2) independently perform clustering and classifier training, which may not form clusters that can potentially benefit classifier performance. The question of how to perform clustering to improve the performance of classifiers trained on the clusters has received scant attention in previous literature, despite its importance in several real-world applications. In this paper, first, we theoretically analyze the generalization performance of classifiers trained on clustered data and find conditions under which clustering can potentially aid classification. This motivates the design of a simple k-means-based classification algorithm called Clustering Aware Classification (CAC) and its neural variant {DeepCAC}. DeepCAC effectively leverages deep representation learning to learn latent embeddings and finds clusters in a manner that make the clustered data suitable for training classifiers for each underlying subpopulation. Our experiments on synthetic and real benchmark datasets demonstrate the efficacy of DeepCAC over previous methods for combined clustering and classification.

preprint2022arXiv

Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization

Vector Quantization (VQ) is a method for discretizing latent representations and has become a major part of the deep learning toolkit. It has been theoretically and empirically shown that discretization of representations leads to improved generalization, including in reinforcement learning where discretization can be used to bottleneck multi-agent communication to promote agent specialization and robustness. The discretization tightness of most VQ-based methods is defined by the number of discrete codes in the representation vector and the codebook size, which are fixed as hyperparameters. In this work, we propose learning to dynamically select discretization tightness conditioned on inputs, based on the hypothesis that data naturally contains variations in complexity that call for different levels of representational coarseness. We show that dynamically varying tightness in communication bottlenecks can improve model performance on visual reasoning and reinforcement learning tasks.

preprint2022arXiv

ExpertNet: A Symbiosis of Classification and Clustering

A widely used paradigm to improve the generalization performance of high-capacity neural models is through the addition of auxiliary unsupervised tasks during supervised training. Tasks such as similarity matching and input reconstruction have been shown to provide a beneficial regularizing effect by guiding representation learning. Real data often has complex underlying structures and may be composed of heterogeneous subpopulations that are not learned well with current approaches. In this work, we design ExpertNet, which uses novel training strategies to learn clustered latent representations and leverage them by effectively combining cluster-specific classifiers. We theoretically analyze the effect of clustering on its generalization gap, and empirically show that clustered latent representations from ExpertNet lead to disentangling the intrinsic structure and improvement in classification performance. ExpertNet also meets an important real-world need where classifiers need to be tailored for distinct subpopulations, such as in clinical risk models. We demonstrate the superiority of ExpertNet over state-of-the-art methods on 6 large clinical datasets, where our approach leads to valuable insights on group-specific risks.

preprint2022arXiv

MemStream: Memory-Based Streaming Anomaly Detection

Given a stream of entries over time in a multi-dimensional data setting where concept drift is present, how can we detect anomalous activities? Most of the existing unsupervised anomaly detection approaches seek to detect anomalous events in an offline fashion and require a large amount of data for training. This is not practical in real-life scenarios where we receive the data in a streaming manner and do not know the size of the stream beforehand. Thus, we need a data-efficient method that can detect and adapt to changing data trends, or concept drift, in an online manner. In this work, we propose MemStream, a streaming anomaly detection framework, allowing us to detect unusual events as they occur while being resilient to concept drift. We leverage the power of a denoising autoencoder to learn representations and a memory module to learn the dynamically changing trend in data without the need for labels. We prove the optimum memory size required for effective drift handling. Furthermore, MemStream makes use of two architecture design choices to be robust to memory poisoning. Experimental results show the effectiveness of our approach compared to state-of-the-art streaming baselines using $2$ synthetic datasets and $11$ real-world datasets.

preprint2022arXiv

Multi-Task Learning as a Bargaining Game

In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks. Joint training reduces computation costs and improves data efficiency; however, since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts. A common method for alleviating this issue is to combine per-task gradients into a joint update direction using a particular heuristic. In this paper, we propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update. Under certain assumptions, the bargaining problem has a unique solution, known as the Nash Bargaining Solution, which we propose to use as a principled approach to multi-task learning. We describe a new MTL optimization procedure, Nash-MTL, and derive theoretical guarantees for its convergence. Empirically, we show that Nash-MTL achieves state-of-the-art results on multiple MTL benchmarks in various domains.

preprint2022arXiv

Robustness Implies Generalization via Data-Dependent Generalization Bounds

This paper proves that robustness implies generalization via data-dependent generalization bounds. As a result, robustness and generalization are shown to be connected closely in a data-dependent manner. Our bounds improve previous bounds in two directions, to solve an open problem that has seen little development since 2010. The first is to reduce the dependence on the covering number. The second is to remove the dependence on the hypothesis space. We present several examples, including ones for lasso and deep learning, in which our bounds are provably preferable. The experiments on real-world data and theoretical models demonstrate near-exponential improvements in various situations. To achieve these improvements, we do not require additional assumptions on the unknown distribution; instead, we only incorporate an observable and computable property of the training samples. A key technical innovation is an improved concentration bound for multinomial random variables that is of independent interest beyond robustness and generalization.

preprint2022arXiv

Training Free Graph Neural Networks for Graph Matching

We present a framework of Training Free Graph Matching (TFGM) to boost the performance of Graph Neural Networks (GNNs) based graph matching, providing a fast promising solution without training (training-free). TFGM provides four widely applicable principles for designing training-free GNNs and is generalizable to supervised, semi-supervised, and unsupervised graph matching. The keys are to handcraft the matching priors, which used to be learned by training, into GNN's architecture and discard the components inessential under the training-free setting. Further analysis shows that TFGM is a linear relaxation to the quadratic assignment formulation of graph matching and generalizes TFGM to a broad set of GNNs. Extensive experiments show that GNNs with TFGM achieve comparable (if not better) performances to their fully trained counterparts, and demonstrate TFGM's superiority in the unsupervised setting. Our code is available at https://github.com/acharkq/Training-Free-Graph-Matching.

preprint2022arXiv

Understanding Dynamics of Nonlinear Representation Learning and Its Application

Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network learns nonlinear representations at its hidden layers, which are subsequently used for classification (or regression) at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss in common practical regimes of deep learning, unlike the neural tangent kernel (NTK) regime. In this paper, we study the dynamics of such implicit nonlinear representation learning, which is beyond the NTK regime. We identify a pair of a new assumption and a novel condition, called the common model structure assumption and the data-architecture alignment condition. Under the common model structure assumption, the data-architecture alignment condition is shown to be sufficient for the global convergence and necessary for the global optimality. Moreover, our theory explains how and when increasing the network size does and does not improve the training behaviors in the practical regime. Our results provide practical guidance for designing a model structure: e.g., the common model structure assumption can be used as a justification for using a particular model structure instead of others. We also derive a new training framework based on the theory. The proposed framework is empirically shown to maintain competitive (practical) test performances while providing global convergence guarantees for deep residual neural networks with convolutions, skip connections, and batch normalization with standard benchmark datasets, including CIFAR-10, CIFAR-100, and SVHN.

preprint2022arXiv

When and How Mixup Improves Calibration

In many machine learning applications, it is important for the model to provide confidence scores that accurately capture its prediction uncertainty. Although modern learning methods have achieved great success in predictive accuracy, generating calibrated confidence scores remains a major challenge. Mixup, a popular yet simple data augmentation technique based on taking convex combinations of pairs of training examples, has been empirically found to significantly improve confidence calibration across diverse applications. However, when and how Mixup helps calibration is still a mystery. In this paper, we theoretically prove that Mixup improves calibration in \textit{high-dimensional} settings by investigating natural statistical models. Interestingly, the calibration benefit of Mixup increases as the model capacity increases. We support our theories with experiments on common architectures and datasets. In addition, we study how Mixup improves calibration in semi-supervised learning. While incorporating unlabeled data can sometimes make the model less calibrated, adding Mixup training mitigates this issue and provably improves calibration. Our analysis provides new insights and a framework to understand Mixup and calibration.

preprint2021arXiv

Meta-learning PINN loss functions

We propose a meta-learning technique for offline discovery of physics-informed neural network (PINN) loss functions. We extend earlier works on meta-learning, and develop a gradient-based meta-learning algorithm for addressing diverse task distributions based on parametrized partial differential equations (PDEs) that are solved with PINNs. Furthermore, based on new theory we identify two desirable properties of meta-learned losses in PINN problems, which we enforce by proposing a new regularization method or using a specific parametrization of the loss function. In the computational examples, the meta-learned losses are employed at test time for addressing regression and PDE task distributions. Our results indicate that significant performance improvement can be achieved by using a shared-among-tasks offline-learned loss function even for out-of-distribution meta-testing. In this case, we solve for test tasks that do not belong to the task distribution used in meta-training, and we also employ PINN architectures that are different from the PINN architecture used in meta-training. To better understand the capabilities and limitations of the proposed method, we consider various parametrizations of the loss function and describe different algorithm design options and how they may affect meta-learning performance.

preprint2021arXiv

On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers

A deep equilibrium model uses implicit layers, which are implicitly defined through an equilibrium point of an infinite sequence of computation. It avoids any explicit computation of the infinite sequence by finding an equilibrium point directly via root-finding and by computing gradients via implicit differentiation. In this paper, we analyze the gradient dynamics of deep equilibrium models with nonlinearity only on weight matrices and non-convex objective functions of weights for regression and classification. Despite non-convexity, convergence to global optimum at a linear rate is guaranteed without any assumption on the width of the models, allowing the width to be smaller than the output dimension and the number of data points. Moreover, we prove a relation between the gradient dynamics of the deep implicit layer and the dynamics of trust region Newton method of a shallow explicit layer. This mathematically proven relation along with our numerical observation suggests the importance of understanding implicit bias of implicit layers and an open problem on the topic. Our proofs deal with implicit layers, weight tying and nonlinearity on weights, and differ from those in the related literature.

preprint2020arXiv

Elimination of All Bad Local Minima in Deep Learning

In this paper, we theoretically prove that adding one special neuron per output unit eliminates all suboptimal local minima of any deep neural network, for multi-class classification, binary classification, and regression with an arbitrary loss function, under practical assumptions. At every local minimum of any deep neural network with these added neurons, the set of parameters of the original neural network (without added neurons) is guaranteed to be a global minimum of the original neural network. The effects of the added neurons are proven to automatically vanish at every local minimum. Moreover, we provide a novel theoretical characterization of a failure mode of eliminating suboptimal local minima via an additional theorem and several examples. This paper also introduces a novel proof technique based on the perturbable gradient basis (PGB) necessary condition of local minima, which provides new insight into the elimination of local minima and is applicable to analyze various models and transformations of objective functions beyond the elimination of local minima.

preprint2020arXiv

Evidence for planetary hypothesis for PTFO 8-8695b with five-year optical/infrared monitoring observations

PTFO 8-8695b (CVSO 30b) is a young planet candidate whose host star is a $\sim$ 2.6 Myr-old T-Tauri star, and there have been continuous discussions about the nature of this system. To unveil the mystery of this system, we observed PTFO8-8695 for around five years at optical and infrared bands simultaneously using Kanata telescope at the Higashi-Hiroshima Observatory. Through our observations, we found that the reported fading event split into two: deeper but phase-shifted "dip-A" and shallower but equiphase "dip-B". These dips disappeared at different epochs, and then, dip-B reappeared. Based on the observed wavelength dependence of dip depths, a dust clump and a precessing planet are likely origins of dip-A and B, respectively. Here we propose "a precessing planet associated with a dust cloud" scenario for this system. This scenario is consistent with the reported change in the depth of fading events, and even with the reported results, which were thought to be negative evidence to the planetary hypothesis, such as the past non-detection of the Rossiter-McLaughlin effect. If this scenario is correct, this is the third case of a young (<3 Myr) planet around a pre-main sequence star. This finding implies that a planet can be formed within a few Myr.

preprint2020arXiv

Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes

In this paper, we theoretically prove that gradient descent can find a global minimum of non-convex optimization of all layers for nonlinear deep neural networks of sizes commonly encountered in practice. The theory developed in this paper only requires the practical degrees of over-parameterization unlike previous theories. Our theory only requires the number of trainable parameters to increase linearly as the number of training samples increases. This allows the size of the deep neural networks to be consistent with practice and to be several orders of magnitude smaller than that required by the previous theories. Moreover, we prove that the linear increase of the size of the network is the optimal rate and that it cannot be improved, except by a logarithmic factor. Furthermore, deep neural networks with the trainability guarantee are shown to generalize well to unseen test samples with a natural dataset but not a random dataset.

preprint2020arXiv

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

We propose a new stochastic optimization framework for empirical risk minimization problems such as those that arise in machine learning. The traditional approaches, such as (mini-batch) stochastic gradient descent (SGD), utilize an unbiased gradient estimator of the empirical average loss. In contrast, we develop a computationally efficient method to construct a gradient estimator that is purposely biased toward those observations with higher current losses. On the theory side, we show that the proposed method minimizes a new ordered modification of the empirical average loss, and is guaranteed to converge at a sublinear rate to a global optimum for convex loss and to a critical point for weakly convex (non-convex) loss. Furthermore, we prove a new generalization bound for the proposed algorithm. On the empirical side, the numerical experiments show that our proposed method consistently improves the test errors compared with the standard mini-batch SGD in various models including SVM, logistic regression, and deep learning problems.

preprint2016arXiv

Bayesian Optimization with Exponential Convergence

This paper presents a Bayesian optimization method with exponential convergence without the need of auxiliary optimization and without the delta-cover sampling. Most Bayesian optimization methods require auxiliary optimization: an additional non-convex global optimization problem, which can be time-consuming and hard to implement in practice. Also, the existing Bayesian optimization method with exponential convergence requires access to the delta-cover sampling, which was considered to be impractical. Our approach eliminates both requirements and achieves an exponential convergence rate.

preprint2016arXiv

Bounded Optimal Exploration in MDP

Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration. However, practical concerns require the attainment of satisfactory behavior within a short period of time. In this paper, we relax the PAC-MDP conditions to reconcile theoretically driven exploration methods and practical needs. We propose simple algorithms for discrete and continuous state spaces, and illustrate the benefits of our proposed relaxation via theoretical analyses and numerical examples. Our algorithms also maintain anytime error bounds and average loss bounds. Our approach accommodates both Bayesian and non-Bayesian methods.

preprint2016arXiv

Deep Learning without Poor Local Minima

In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015. With no unrealistic assumption, we first prove the following statements for the squared loss function of deep linear neural networks with any depth and any widths: 1) the function is non-convex and non-concave, 2) every local minimum is a global minimum, 3) every critical point that is not a global minimum is a saddle point, and 4) there exist "bad" saddle points (where the Hessian has no negative eigenvalue) for the deeper networks (with more than three layers), whereas there is no bad saddle point for the shallow networks (with three layers). Moreover, for deep nonlinear neural networks, we prove the same four statements via a reduction to a deep linear model under the independence assumption adopted from recent work. As a result, we present an instance, for which we can answer the following question: how difficult is it to directly train a deep model in theory? It is more difficult than the classical machine learning models (because of the non-convexity), but not too difficult (because of the nonexistence of poor local minima). Furthermore, the mathematically proven existence of bad saddle points for deeper models would suggest a possible open problem. We note that even though we have advanced the theoretical foundations of deep learning and non-convex optimization, there is still a gap between theory and practice.

preprint2016arXiv

Global Continuous Optimization with Error Bound and Fast Convergence

This paper considers global optimization with a black-box unknown objective function that can be non-convex and non-differentiable. Such a difficult optimization problem arises in many real-world applications, such as parameter tuning in machine learning, engineering design problem, and planning with a complex physics simulator. This paper proposes a new global optimization algorithm, called Locally Oriented Global Optimization (LOGO), to aim for both fast convergence in practice and finite-time error bound in theory. The advantage and usage of the new algorithm are illustrated via theoretical analysis and an experiment conducted with 11 benchmark test functions. Further, we modify the LOGO algorithm to specifically solve a planning problem via policy search with continuous state/action space and long time horizon while maintaining its finite-time error bound. We apply the proposed planning method to accident management of a nuclear power plant. The result of the application study demonstrates the practical utility of our method.

preprint2016arXiv

Streaming Normalization: Towards Simpler and More Biologically-plausible Normalizations for Online and Recurrent Learning

We systematically explored a spectrum of normalization algorithms related to Batch Normalization (BN) and propose a generalized formulation that simultaneously solves two major limitations of BN: (1) online learning and (2) recurrent learning. Our proposal is simpler and more biologically-plausible. Unlike previous approaches, our technique can be applied out of the box to all learning scenarios (e.g., online learning, batch learning, fully-connected, convolutional, feedforward, recurrent and mixed --- recurrent and convolutional) and compare favorably with existing approaches. We also propose Lp Normalization for normalizing by different orders of statistical moments. In particular, L1 normalization is well-performing, simple to implement, fast to compute, more biologically-plausible and thus ideal for GPU or hardware implementations.

preprint2016arXiv

X-ray and Optical Correlation of Type I Seyfert NGC 3516 Studied with Suzaku and Japanese Ground-Based Telescopes

From 2013 April to 2014 April, we performed an X-ray and optical simultaneous monitoring of the type 1.5 Seyfert galaxy NGC 3516. It employed Suzaku, and 5 Japanese ground-based telescopes, the Pirka, Kiso Schmidt, Nayuta, MITSuME, and the Kanata telescopes. The Suzaku observations were conducted seven times with various intervals ranging from days, weeks, to months, with an exposure of $\sim50$ ksec each. The optical $B$-band observations not only covered those of Suzaku almost simultaneously, but also followed the source as frequently as possible. As a result, NGC 3516 was found in its faint phase with the 2-10 keV flux of $0.21-2.70 \times 10^{-11}$ erg s$^{-1}$ cm$^{-2}$. The 2-45 keV X-ray spectra were composed of a dominant variable hard power-law continuum with a photon index of $\sim1.7$, and a non-relativistic reflection component with a prominent Fe-K$α$ emission line. Producing the $B$-band light curve by differential image photometry, we found that the $B$-band flux changed by $\sim2.7 \times 10^{-11}$ erg s$^{-1}$ cm$^{-2}$, which is comparable to the X-ray variation, and detected a significant flux correlation between the hard power-law component in X-rays and the $B$-band radiation, for the first time in NGC 3516. By examining their correlation, we found that the X-ray flux preceded that of $B$ band by $2.0^{+0.7}_{-0.6}$ days ($1σ$ error). Although this result supports the X-ray reprocessing model, the derived lag is too large to be explained by the standard view which assumes a "lamppost"-type X-ray illuminator located near a standard accretion disk. Our results are better explained by assuming a hot accretion flow and a truncated disk.

preprint2015arXiv

Kanata optical and X-ray monitoring of Gamma-ray emitting Narrow-Line Seyfert 1 and Radio galaxies

Broadband spectrum of AGN consists of multiple components such as jet emission and accretion disk emission. Temporal correlation study is useful to understand emission components and their physical origins. We have performed optical monitoring using Kanata telescope for 4 radio galaxies and 6 radio-loud Narrow-Line Seyfert 1 (RL-NLSy1): 2 gamma-ray-loud RL-NLSy1s, 1H 0323+342 and PMN J0948+0022, and 4 gamma-ray-quiet RL-NLSy1s. From these results, it is suggested that RL-NLSy1s show a disk-dominant phase and a jet-dominant phase in the optical band, but it is not well correlated with brightness.

preprint2015arXiv

Probing the nature of the TeV gamma-ray binary HESS J0632+057 by monitoring Be disk variability

We report on monitoring observations of the TeV gamma-ray binary HESS J0632+057, which were carried out to constrain the interaction between the Be circumstellar disk and the compact object of unknown nature, and provide for the first time high-dispersion (R > 50000) optical spectra in the second half of the orbital cycle, from apastron through periastron. The Halpha, Hbeta, and Hgamma line profiles are found to exhibit remarkable short-term variability for ~1 month after the apastron (phase 0.6--0.7), whereas they show little variation near the periastron. These emission lines show "S-shaped" variations with timescale of ~150 days, which is about twice that reported previously. In contrast to the Balmer lines, no profile variability is seen in any FeII emission line. We estimate the radii of emitting regions of the Halpha, Hbeta, Hgamma, and FeII emission lines to be ~30, 11, 7, and 2 stellar radii (R_*), respectively. The amplitudes of the line profile variations in different lines indicate that the interaction with the compact object affects the Be disk down to, at least, the radius of 7 R_* after the apastron. This fact, together with little profile variability near the periastron, rules out the tidal force as the major cause of disk variability. Although this leaves the pulsar wind as the most likely candidate mechanism for disk variations, understanding the details of the interaction, particularly the mechanism for causing a large-scale disk disturbance after the apastron, remains an open question.

preprint2015arXiv

Study for relation between direction of relativistic jet and optical polarization angle with multi-wavelength observation

Blazars are thought to possess a relativistic jet that is pointing toward the direction of the Earth and the elect of relativistic beaming enhances its apparent brightness. They radiate in all wavebands from the radio to the gamma-ray bands via the synchrotron and the inverse Compton scattering process. Numerous observations are performed but the mechanism of variability, creation and composition of jets are still controversial. We performed multi-wavelength monitoring with optical polarization for 3C 66A, Mrk 421, CTA 102 and PMN J0948+0022 to investigate the mechanisms of variability and research the emission region in the relativistic jets. Consequently, an emergence of new emission component in flaring state is suggested in each object. The most significant aspect of these results is its wide range of sizes of emission regions from $10^{14}-10^{16}$ cm, which implies the model with a number of independent emission regions with variety sizes and randomly orientation. The "shock-in-jet" scenario can explain high PD and direction of PA in each objects. It might reflect the common mechanism of flares in the relativistic jets.

preprint2015arXiv

Suzaku X-Ray Monitoring of Gamma-Ray-Emitting Radio Galaxy, NGC 1275

NGC 1275 is a gamma-ray-emitting radio galaxy at the center of the Perseus cluster. Its multi-wavelength spectrum is similar to that of blazers, and thus a jet-origin of gamma-ray emissions is believed. In the optical and X-ray region, NGC 1275 also shows a bright core, but their origin has not been understood, since a disk emission is not ruled out. In fact, NGC 1275 exhibits optical broad emission lines and a X-ray Fe-K line, which are typical for Seyfert galaxies. In our precious studies of NGC 1275 with Suzaku/XIS, no X-ray time variability was found from 2006 to 2011, regardless of moderate gamma-ray variability observed by {it Fermi}-LAT~\cite{Yamazaki}. We have continued monitoring observations of NGC 1275 with Suzaku/XIS. In 2013-2014, MeV/GeV gams-ray flux of NGC 1275 gradually increased and reached the maximum at the beginning of 2014. Correlated with this recent gamma-ray activity, we found that X-ray flux also increased, and this is the first evidence of X-ray variability of NGC 1275. Following these results, we discuss the emission component during the time variability, but we cannot decide the origin of X-ray variability correlating with gamma-ray. Therefore, for future observation, it is important to observe NGC 1275 by using Fermi gamma-ray, XMM-Newton, NuStar, ASTRO-H X-ray, CTA TeV gamma-ray and Kanata optical telescope.

preprint2014arXiv

Variable optical polarization during high state in gamma-ray loud narrow line Seyfert 1 galaxy 1H 0323+342

We present results of optical polarimetric and multi-band photometric observations for gamma-ray loud narrow-line Seyfert 1 galaxy 1H 0323+342. This object has been monitored by 1.5 m Kanata telescope since 2012 September but following a gamma-ray flux enhancement detected by Fermi-LAT on MJD 56483 (2013 July 10) dense follow-up was performed by ten 0.5-2.0 m telescopes in Japan over one week. The 2-year R_C-band light curve showed clear brightening corresponding to the gamma-ray flux increase and then decayed gradually. The high state as a whole lasted for ~20 days, during which we clearly detected optical polarization from this object. The polarization degree (PD) of the source increased from 0-1% in quiescence to ~3% at maximum and then declined to the quiescent level, with the duration of the enhancement of less than 10 days. The moderate PD around the peak allowed us to precisely measure the daily polarization angle (PA). As a result, we found that the daily PAs were almost constant and aligned to the jet axis, suggesting that the magnetic field direction at the emission region is transverse to the jet. This implies either a presence of helical/toroidal magnetic field or transverse magnetic field compressed by shock(s). We also found small-amplitude intra-night variability during the 2-hour continuous exposure on a single night. We discuss these findings based on the turbulent multi-zone model recently advocated by Marscher (2014). Optical to ultraviolet (UV) spectrum showed a rising shape in the higher frequency and the UV magnitude measured by Swift/UVOT was steady even during the flaring state, suggesting that thermal emission from accretion disk is dominant in that band.

preprint2013arXiv

A Greedy Approximation of Bayesian Reinforcement Learning with Probably Optimistic Transition Model

Bayesian Reinforcement Learning (RL) is capable of not only incorporating domain knowledge, but also solving the exploration-exploitation dilemma in a natural way. As Bayesian RL is intractable except for special cases, previous work has proposed several approximation methods. However, these methods are usually too sensitive to parameter values, and finding an acceptable parameter setting is practically impossible in many applications. In this paper, we propose a new algorithm that greedily approximates Bayesian RL to achieve robustness in parameter space. We show that for a desired learning behavior, our proposed algorithm has a polynomial sample complexity that is lower than those of existing algorithms. We also demonstrate that the proposed algorithm naturally outperforms other existing algorithms when the prior distributions are not significantly misleading. On the other hand, the proposed algorithm cannot handle greatly misspecified priors as well as the other algorithms can. This is a natural consequence of the fact that the proposed algorithm is greedier than the other algorithms. Accordingly, we discuss a way to select an appropriate algorithm for different tasks based on the algorithms' greediness. We also introduce a new way of simplifying Bayesian planning, based on which future work would be able to derive new algorithms.

preprint2013arXiv

Minute-Scale Rapid Variability of Optical Polarization in Narrow-Line Seyfert 1 Galaxy: PMN J0948+0022

We report on optical photopolarimetric results of the radio-loud narrow line Seyfert 1 (RL-NLSy1) galaxy PMN J0948+0022 on 2012 December to 2013 February triggered by flux enhancements in near infrared and gamma-ray bands. Thanks to one-shot polarimetry of the HOWPol installed to the Kanata telescope, we have detected very rapid variability in the polarized-flux light curve on MJD 56281 (2012 December 20). The rise and decay times were about 140 sec and 180 sec, respectively. The polarization degree (PD) reached 36 +/- 3% at the peak of the short-duration pulse, while polarization angle (PA) remained almost constant. In addition, temporal profiles of the total flux and PD showed highly variable but well correlated behavior and discrete correlation function analysis revealed that no significant time lag of more than 10 min was present. The high PD and minute-scale variability in polarized flux provides a clear evidence of synchrotron radiation from a very compact emission region of 10^14 cm size with highly ordered magnetic field. Such micro variability of polarization are also observed in several blazar jets, but its complex relation between total flux and PD are explained by multi-zone model in several blazars. The implied single emission region in PMN J0948+0022 might be reflecting a difference of jets between RL-NLSy1s and blazars.

Kenji Kawaguchi

What is connected

Connect this record

See the researcher in context

Building this map preview

29 published item(s)

Simple Hierarchical Planning with Diffusion

Clustering Aware Classification for Risk Prediction and Subtyping in Clinical Data

Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization

ExpertNet: A Symbiosis of Classification and Clustering

MemStream: Memory-Based Streaming Anomaly Detection

Multi-Task Learning as a Bargaining Game

Robustness Implies Generalization via Data-Dependent Generalization Bounds

Training Free Graph Neural Networks for Graph Matching

Understanding Dynamics of Nonlinear Representation Learning and Its Application

When and How Mixup Improves Calibration

Meta-learning PINN loss functions

On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers

Elimination of All Bad Local Minima in Deep Learning

Evidence for planetary hypothesis for PTFO 8-8695b with five-year optical/infrared monitoring observations

Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Bayesian Optimization with Exponential Convergence

Bounded Optimal Exploration in MDP

Deep Learning without Poor Local Minima

Global Continuous Optimization with Error Bound and Fast Convergence

Streaming Normalization: Towards Simpler and More Biologically-plausible Normalizations for Online and Recurrent Learning

X-ray and Optical Correlation of Type I Seyfert NGC 3516 Studied with Suzaku and Japanese Ground-Based Telescopes

Kanata optical and X-ray monitoring of Gamma-ray emitting Narrow-Line Seyfert 1 and Radio galaxies

Probing the nature of the TeV gamma-ray binary HESS J0632+057 by monitoring Be disk variability

Study for relation between direction of relativistic jet and optical polarization angle with multi-wavelength observation

Suzaku X-Ray Monitoring of Gamma-Ray-Emitting Radio Galaxy, NGC 1275

Variable optical polarization during high state in gamma-ray loud narrow line Seyfert 1 galaxy 1H 0323+342

A Greedy Approximation of Bayesian Reinforcement Learning with Probably Optimistic Transition Model

Minute-Scale Rapid Variability of Optical Polarization in Narrow-Line Seyfert 1 Galaxy: PMN J0948+0022