Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
24works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

24 published item(s)

preprint2026arXiv

Disciplined Diffusion: Text-to-Image Diffusion Model against NSFW Generation

Text-to-image (T2I) diffusion models have the ability to build high-quality pictures from text prompts, but they pose safety concerns because they can generate offensive or disturbing imagery when provided with harmful inputs. Existing safety filters typically rely on text-based classifiers or image-based checkers that completely block the output upon detecting a threat, issuing an explicit allow/block feedback signal to the user. This binary strategy leaves models vulnerable to adversarial attacks that alter keywords to bypass detection, and it causes high false-alarm rates that degrade the experience for benign users. To address such vulnerabilities, we propose Disciplined Diffusion (DDiffusion), a novel robust text-to-image diffusion that counters Not Safe For Work (NSFW) generation by uncovering implicit malicious semantics in prompt embeddings. DDiffusion leverages a semantic retrieval mechanism to evaluate prompts against concept distributions rather than relying on brittle pairwise similarity. Furthermore, it employs a localization method during the diffusion process to selectively edit only the harmful regions of the generated image. By returning locally sanitized images instead of applying uniform blocking, DDiffusion suppresses malicious content while preserving generation fidelity for benign prompts and avoiding the binary allow-deny signal on which existing probing attacks rely.

preprint2022arXiv

Evolution of Galaxy Types and HI Gas in Hickson Compact Groups

Compact groups have high galaxy densities and low velocity dispersions, and their group members have experienced numerous and frequent interactions during their lifetimes. They provide a unique environment to study the evolution of galaxies. We examined the galaxies types and HI contents in groups to make a study on the galaxy evolution in compact groups. We used the group crossing time as an age indicator for galaxy groups. Our sample is derived from the Hickson Compact Group catalog. We obtained group morphology data from the Hyper-Leda database and the IR classification based on Wide-Field Infrared Survey Explorer (WISE) fluxes from Zucker et al. (2016). By cross-matching the latest released ALFALFA 100% HI source catalog and supplemented by data found in literature, we obtained 40 galaxy groups with HI data available. We confirmed that the weak correlation between HI mass fraction and group crossing time found by Ai & Zhu (2018) in SDSS groups also exists in compact groups. We also found that the group spiral galaxy fraction is correlated with the group crossing time, but the actively star-forming galaxy fraction is not correlated with the group crossing time. These results seem to fit with the hypothesis that the sequential acquisition of neighbors from surrounding larger-scale structures has affected the morphology transition and star formation efficiency in compact groups.

preprint2022arXiv

Gas Column Density Distribution of Molecular Clouds in the Third Quadrant of the Milky Way

We have obtained column density maps for an unbiased sample of 120 molecular clouds in the third quadrant of the Milky Way mid-plane (b$\le |5|^{\circ}$) within the galactic longitude range from 195$^{\circ}$ to 225$^{\circ}$, using the high sensitivity $^{12}$CO and $^{13}$CO ($J=1-0$) data from the Milky Way Imaging Scroll Painting (MWISP) project. The probability density functions of the molecular hydrogen column density of the clouds, N-PDFs, are fitted with both log-normal (LN) function and log-normal plus power-law (LN+PL) function. The molecular clouds are classified into three categories according to their shapes of N-PDFs, i.e., LN, LN+PL, and UN (unclear), respectively. About 72\% of the molecular clouds fall into the LN category, while 18\% and 10\% into the LN+PL and UN categories, respectively. A power-law scaling relation, $σ_s\propto N_{H_2}^{0.44}$, exists between the width of the N-PDF, $σ_s$, and the average column density, $N_{H_2}$, of the molecular clouds. However, $σ_s$ shows no correlation with the mass of the clouds. A correlation is found between the dispersion of normalized column density, $σ_{N/\rm <N>}$, and the sonic Mach number, $\mathcal{M}$, of molecular clouds. Overall, as predicted by numerical simulations, the N-PDFs of the molecular clouds with active star formation activity tend to have N-PDFs with power-law high-density tails.

preprint2022arXiv

Generalized Federated Learning via Sharpness Aware Minimization

Federated Learning (FL) is a promising framework for performing privacy-preserving, distributed learning with a set of clients. However, the data distribution among clients often exhibits non-IID, i.e., distribution shift, which makes efficient optimization difficult. To tackle this problem, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by increasing the performance of the global model. However, almost all algorithms leverage Empirical Risk Minimization (ERM) to be the local optimizer, which is easy to make the global model fall into a sharp valley and increase a large deviation of parts of local clients. Therefore, in this paper, we revisit the solutions to the distribution shift problem in FL with a focus on local learning generality. To this end, we propose a general, effective algorithm, \texttt{FedSAM}, based on Sharpness Aware Minimization (SAM) local optimizer, and develop a momentum FL algorithm to bridge local and global models, \texttt{MoFedSAM}. Theoretically, we show the convergence analysis of these two algorithms and demonstrate the generalization bound of \texttt{FedSAM}. Empirically, our proposed algorithms substantially outperform existing FL studies and significantly decrease the learning deviation.

preprint2022arXiv

Information-Theoretic Limits of Integrated Sensing and Communication with Correlated Sensing and Channel States for Vehicular Networks

In connected vehicular networks, it is vital to have vehicular nodes that are capable of sensing about surrounding environments and exchanging messages with each other for automating and coordinating purpose. Towards this end, integrated sensing and communication (ISAC), combining both sensing and communication systems to jointly utilize their resources and to pursue mutual benefits, emerges as a new cost-effective solution. In ISAC, the hardware and spectrum co-sharing leads to a fundamental tradeoff between sensing and communication performance, which is not well understood except for very simple cases with the same sensing and channel states, and perfect channel state information at the receiver (CSIR). In this paper, a general point-to-point ISAC model is proposed to account for the scenarios that the sensing state is different from but correlated with the channel state, and the CSIR is not necessarily perfect. For the model considered, the optimal tradeoff is characterized by a capacity-distortion function that quantifies the best communication rate for a given sensing distortion constraint requirement. An iterative algorithm is proposed to compute such tradeoff, and a few non-trivial examples are constructed to demonstrate the benefits of ISAC as compared to the separation-based approach.

preprint2022arXiv

Learning to Guide Human Attention on Mobile Telepresence Robots with 360 Vision

Mobile telepresence robots (MTRs) allow people to navigate and interact with a remote environment that is in a place other than the person&#39;s true location. Thanks to the recent advances in 360 degree vision, many MTRs are now equipped with an all-degree visual perception capability. However, people&#39;s visual field horizontally spans only about 120 degree of the visual field captured by the robot. To bridge this observability gap toward human-MTR shared autonomy, we have developed a framework, called GHAL360, to enable the MTR to learn a goal-oriented policy from reinforcements for guiding human attention using visual indicators. Three telepresence environments were constructed using datasets that are extracted from Matterport3D and collected from a real robot respectively. Experimental results show that GHAL360 outperformed the baselines from the literature in the efficiency of a human-MTR team completing target search tasks.

preprint2022arXiv

LoMar: A Local Defense Against Poisoning Attack on Federated Learning

Federated learning (FL) provides a high efficient decentralized machine learning framework, where the training data remains distributed at remote clients in a network. Though FL enables a privacy-preserving mobile edge computing framework using IoT devices, recent studies have shown that this approach is susceptible to poisoning attacks from the side of remote clients. To address the poisoning attacks on FL, we provide a \textit{two-phase} defense algorithm called {Lo}cal {Ma}licious Facto{r} (LoMar). In phase I, LoMar scores model updates from each remote client by measuring the relative distribution over their neighbors using a kernel density estimation method. In phase II, an optimal threshold is approximated to distinguish malicious and clean updates from a statistical perspective. Comprehensive experiments on four real-world datasets have been conducted, and the experimental results show that our defense strategy can effectively protect the FL system. {Specifically, the defense performance on Amazon dataset under a label-flipping attack indicates that, compared with FG+Krum, LoMar increases the target label testing accuracy from $96.0\%$ to $98.8\%$, and the overall averaged testing accuracy from $90.1\%$ to $97.0\%$.

preprint2022arXiv

Molecules with ALMA at Planet-forming Scales (MAPS) III: Characteristics of Radial Chemical Substructures

The Molecules with ALMA at Planet-forming Scales (MAPS) Large Program provides a detailed, high resolution (${\sim}$10-20 au) view of molecular line emission in five protoplanetary disks at spatial scales relevant for planet formation. Here, we present a systematic analysis of chemical substructures in 18 molecular lines toward the MAPS sources: IM Lup, GM Aur, AS 209, HD 163296, and MWC 480. We identify more than 200 chemical substructures, which are found at nearly all radii where line emission is detected. A wide diversity of radial morphologies - including rings, gaps, and plateaus - is observed both within each disk and across the MAPS sample. This diversity in line emission profiles is also present in the innermost 50 au. Overall, this suggests that planets form in varied chemical environments both across disks and at different radii within the same disk. Interior to 150 au, the majority of chemical substructures across the MAPS disks are spatially coincident with substructures in the millimeter continuum, indicative of physical and chemical links between the disk midplane and warm, elevated molecular emission layers. Some chemical substructures in the inner disk and most chemical substructures exterior to 150 au cannot be directly linked to dust substructure, however, which indicates that there are also other causes of chemical substructures, such as snowlines, gradients in UV photon fluxes, ionization, and radially-varying elemental ratios. This implies that chemical substructures could be developed into powerful probes of different disk characteristics, in addition to influencing the environments within which planets assemble. This paper is part of the MAPS special issue of the Astrophysical Journal Supplement.

preprint2022arXiv

Molecules with ALMA at Planet-forming Scales (MAPS). A Circumplanetary Disk Candidate in Molecular Line Emission in the AS 209 Disk

We report the discovery of a circumplanetary disk (CPD) candidate embedded in the circumstellar disk of the T Tauri star AS 209 at a radial distance of about 200 au (on-sky separation of 1.&#34;4 from the star at a position angle of $161^\circ$), isolated via $^{13}$CO $J=2-1$ emission. This is the first instance of CPD detection via gaseous emission capable of tracing the overall CPD mass. The CPD is spatially unresolved with a $117\times82$ mas beam and manifests as a point source in $^{13}$CO, indicating that its diameter is $\lesssim14$ au. The CPD is embedded within an annular gap in the circumstellar disk previously identified using $^{12}$CO and near-infrared scattered light observations, and is associated with localized velocity perturbations in $^{12}$CO. The coincidence of these features suggests that they have a common origin: an embedded giant planet. We use the $^{13}$CO intensity to constrain the CPD gas temperature and mass. We find that the CPD temperature is $\gtrsim35$ K, higher than the circumstellar disk temperature at the radial location of the CPD, 22 K, suggesting that heating sources localized to the CPD must be present. The CPD gas mass is $\gtrsim 0.095 M_{\rm Jup} \simeq 30 M_{\rm Earth}$ adopting a standard $^{13}$CO abundance. From the non-detection of millimeter continuum emission at the location of the CPD ($3σ$ flux density $\lesssim26.4~μ$Jy), we infer that the CPD dust mass is $\lesssim 0.027 M_{\rm Earth} \simeq 2.2$ lunar masses, indicating a low dust-to-gas mass ratio of $\lesssim9\times10^{-4}$. We discuss the formation mechanism of the CPD-hosting giant planet on a wide orbit in the framework of gravitational instability and pebble accretion.

preprint2022arXiv

Offline Policy Optimization with Eligible Actions

Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation, and such estimators typically do not require assumptions on the properties and representational capabilities of value function or decision process model function classes. In this paper, we identify an important overfitting phenomenon in optimizing the importance weighted return, in which it may be possible for the learned policy to essentially avoid making aligned decisions for part of the initial state space. We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint, and provide a theoretical justification of the proposed algorithm. We also show the limitations of previous attempts to this approach. We test our algorithm in a healthcare-inspired simulator, a logged dataset collected from real hospitals and continuous control tasks. These experiments show the proposed method yields less overfitting and better test performance compared to state-of-the-art batch reinforcement learning algorithms.

preprint2022arXiv

On the Convergence of Multi-Server Federated Learning with Overlapping Area

Multi-server Federated learning (FL) has been considered as a promising solution to address the limited communication resource problem of single-server FL. We consider a typical multi-server FL architecture, where the coverage areas of regional servers may overlap. The key point of this architecture is that the clients located in the overlapping areas update their local models based on the average model of all accessible regional models, which enables indirect model sharing among different regional servers. Due to the complicated network topology, the convergence analysis is much more challenging than single-server FL. In this paper, we firstly propose a novel MS-FedAvg algorithm for this multi-server FL architecture and analyze its convergence on non-iid datasets for general non-convex settings. Since the number of clients located in each regional server is much less than in single-server FL, the bandwidth of each client should be large enough to successfully communicate training models with the server, which indicates that full client participation can work in multi-server FL. Also, we provide the convergence analysis of the partial client participation scheme and develop a new biased partial participation strategy to further accelerate convergence. Our results indicate that the convergence results highly depend on the ratio of the number of clients in each area type to the total number of clients in all three strategies. The extensive experiments show remarkable performance and support our theoretical results.

preprint2022arXiv

Perception-Aware Attack: Creating Adversarial Music via Reverse-Engineering Human Perception

Recently, adversarial machine learning attacks have posed serious security threats against practical audio signal classification systems, including speech recognition, speaker recognition, and music copyright detection. Previous studies have mainly focused on ensuring the effectiveness of attacking an audio signal classifier via creating a small noise-like perturbation on the original signal. It is still unclear if an attacker is able to create audio signal perturbations that can be well perceived by human beings in addition to its attack effectiveness. This is particularly important for music signals as they are carefully crafted with human-enjoyable audio characteristics. In this work, we formulate the adversarial attack against music signals as a new perception-aware attack framework, which integrates human study into adversarial attack design. Specifically, we conduct a human study to quantify the human perception with respect to a change of a music signal. We invite human participants to rate their perceived deviation based on pairs of original and perturbed music signals, and reverse-engineer the human perception process by regression analysis to predict the human-perceived deviation given a perturbed signal. The perception-aware attack is then formulated as an optimization problem that finds an optimal perturbation signal to minimize the prediction of perceived deviation from the regressed human perception model. We use the perception-aware framework to design a realistic adversarial music attack against YouTube&#39;s copyright detector. Experiments show that the perception-aware attack produces adversarial music with significantly better perceptual quality than prior work.

preprint2022arXiv

Provably Sample-Efficient RL with Side Information about Latent Dynamics

We study reinforcement learning (RL) in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space, as is the case, for example, when a robot is tasked to go to a specific room in a building using observations from its own camera, while having access to the floor plan. We formalize this setting as transfer reinforcement learning from an abstract simulator, which we assume is deterministic (such as a simple model of moving around the floor plan), but which is only required to capture the target domain&#39;s latent-state dynamics approximately up to unknown (bounded) perturbations (to account for environment stochasticity). Crucially, we assume no prior knowledge about the structure of observations in the target domain except that they can be used to identify the latent states (but the decoding map is unknown). Under these assumptions, we present an algorithm, called TASID, that learns a robust policy in the target domain, with sample complexity that is polynomial in the horizon, and independent of the number of states, which is not possible without access to some prior knowledge. In synthetic experiments, we verify various properties of our algorithm and show that it empirically outperforms transfer RL algorithms that require access to &#34;full simulators&#34; (i.e., those that also simulate observations).

preprint2022arXiv

The Ages of Optically Bright Sub-Clusters in the Serpens Star-Forming Region

The Serpens Molecular Cloud is one of the most active star-forming regions within 500 pc, with over one thousand of YSOs at different evolutionary stages. The ages of the member stars inform us about the star formation history of the cloud. In this paper, we develop a spectral energy distribution (SED) fitting method for nearby evolved (diskless) young stars from members of the Pleiades to estimate their ages, with a temperature scale adopted from APOGEE spectra. When compared with literature temperatures of selected YSOs in Orion, the SED fits to cool (<5000 K) stars have temperatures that differ by an average of <~ 50 K and have a scatter of ~ 210 K for both disk-hosting and diskless stars. We then apply this method to YSOs in the Serpens Molecular Cloud to estimate ages of optical members previously identified from Gaia DR2 astrometry data. The optical members in Serpens are concentrated in different subgroups with ages from ~4 Myr to ~22 Myr; the youngest clusters, W40 and Serpens South, are dusty regions that lack enough optical members to be included in this analysis. These ages establish that the Serpens Molecular Cloud has been forming stars for much longer than has been inferred from infrared surveys.

preprint2022arXiv

Tiny Object Tracking: A Large-scale Dataset and A Baseline

Tiny objects, frequently appearing in practical applications, have weak appearance and features, and receive increasing interests in meany vision tasks, such as object detection and segmentation. To promote the research and development of tiny object tracking, we create a large-scale video dataset, which contains 434 sequences with a total of more than 217K frames. Each frame is carefully annotated with a high-quality bounding box. In data creation, we take 12 challenge attributes into account to cover a broad range of viewpoints and scene complexities, and annotate these attributes for facilitating the attribute-based performance analysis. To provide a strong baseline in tiny object tracking, we propose a novel Multilevel Knowledge Distillation Network (MKDNet), which pursues three-level knowledge distillations in a unified framework to effectively enhance the feature representation, discrimination and localization abilities in tracking tiny objects. Extensive experiments are performed on the proposed dataset, and the results prove the superiority and effectiveness of MKDNet compared with state-of-the-art methods. The dataset, the algorithm code, and the evaluation code are available at https://github.com/mmic-lcl/Datasets-and-benchmark-code.

preprint2022arXiv

Towards Adaptive Unknown Authentication for Universal Domain Adaptation by Classifier Paradox

Universal domain adaptation (UniDA) is a general unsupervised domain adaptation setting, which addresses both domain and label shifts in adaptation. Its main challenge lies in how to identify target samples in unshared or unknown classes. Previous methods commonly strive to depict sample &#34;confidence&#34; along with a threshold for rejecting unknowns, and align feature distributions of shared classes across domains. However, it is still hard to pre-specify a &#34;confidence&#34; criterion and threshold which are adaptive to various real tasks, and a mis-prediction of unknowns further incurs misalignment of features in shared classes. In this paper, we propose a new UniDA method with adaptive Unknown Authentication by Classifier Paradox (UACP), considering that samples with paradoxical predictions are probably unknowns belonging to none of the source classes. In UACP, a composite classifier is jointly designed with two types of predictors. That is, a multi-class (MC) predictor classifies samples to one of the multiple source classes, while a binary one-vs-all (OVA) predictor further verifies the prediction by MC predictor. Samples with verification failure or paradox are identified as unknowns. Further, instead of feature alignment for shared classes, implicit domain alignment is conducted in output space such that samples across domains share the same decision boundary, though with feature discrepancy. Empirical results validate UACP under both open-set and universal UDA settings.

preprint2020arXiv

Bringing high spatial resolution to the Far-infrared -- A giant leap for astrophysics

The far-infrared (FIR) regime is one of the few wavelength ranges where no astronomical data with sub-arcsecond spatial resolution exist. Neither of the medium-term satellite projects like SPICA, Millimetron nor O.S.T. will resolve this malady. For many research areas, however, information at high spatial and spectral resolution in the FIR, taken from atomic fine-structure lines, from highly excited carbon monoxide (CO), light hydrids, and especially from water lines would open the door for transformative science. A main theme will be to trace the role of water in proto-planetary disks, to observationally advance our understanding of the planet formation process and, intimately related to that, the pathways to habitable planets and the emergence of life. Furthermore, key observations will zoom into the physics and chemistry of the star-formation process in our own Galaxy, as well as in external galaxies. The FIR provides unique tools to investigate in particular the energetics of heating, cooling and shocks. The velocity-resolved data in these tracers will reveal the detailed dynamics engrained in these processes in a spatially resolved fashion, and will deliver the perfect synergy with ground-based molecular line data for the colder dense gas.

preprint2020arXiv

Dual-Wavelength ALMA Observations of Dust Rings in Protoplanetary Disks

We present new Atacama Large Millimeter/submillimeter Array (ALMA) observations for three protoplanetary disks in Taurus at 2.9\,mm and comparisons with previous 1.3\,mm data both at an angular resolution of $\sim0.&#39;&#39;1$ (15\,au for the distance of Taurus). In the single-ring disk DS Tau, double-ring disk GO Tau, and multiple-ring disk DL Tau, the same rings are detected at both wavelengths, with radial locations spanning from 50 to 120\,au. To quantify the dust emission morphology, the observed visibilities are modeled with a parametric prescription for the radial intensity profile. The disk outer radii, taken as 95\% of the total flux encircled in the model intensity profiles, are consistent at both wavelengths for the three disks. Dust evolution models show that dust trapping in local pressure maxima in the outer disk could explain the observed patterns. Dust rings are mostly unresolved. The marginally resolved ring in DS Tau shows a tentatively narrower ring at the longer wavelength, an observational feature expected from efficient dust trapping. The spectral index ($α_{\rm mm}$) increases outward and exhibits local minima that correspond to the peaks of dust rings, indicative of the changes in grain properties across the disks. The low optical depths ($τ\sim$0.1--0.2 at 2.9\,mm and 0.2--0.4 at 1.3\,mm) in the dust rings suggest that grains in the rings may have grown to millimeter sizes. The ubiquitous dust rings in protoplanetary disks modify the overall dynamics and evolution of dust grains, likely paving the way towards the new generation of planet formation.

preprint2020arXiv

Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. In this paper we develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates. This is accomplished by highlighting observations in the data whose removal will have a large effect on the OPE estimate, and formulating a set of rules for choosing which ones to present to domain experts for validation. We develop methods to compute exactly the influence functions for fitted Q-evaluation with two different function classes: kernel-based and linear least squares, as well as importance sampling methods. Experiments on medical simulations and real-world intensive care unit data demonstrate that our method can be used to identify limitations in the evaluation process and make evaluation more robust.

preprint2020arXiv

NetReduce: RDMA-Compatible In-Network Reduction for Distributed DNN Training Acceleration

We present NetReduce, a novel RDMA-compatible in-network reduction architecture to accelerate distributed DNN training. Compared to existing designs, NetReduce maintains a reliable connection between end-hosts in the Ethernet and does not terminate the connection in the network. The advantage of doing so is that we can fully reuse the designs of congestion control and reliability in RoCE. In the meanwhile, we do not need to implement a high-cost network protocol processing stack in the switch, as IB does. The prototype implemented by using FPGA is an out-of-box solution without modifying commodity devices such as NICs or switches. For the coordination between the end-host and the switch, NetReduce customizes the transport protocol only on the first packet in a data message to comply with RoCE v2. The special status monitoring module is designed to reuse the reliability mechanism of RoCE v2 for dealing with packet loss. A message-level credit-based flow control algorithm is also proposed to fully utilize bandwidth and avoid buffer overflow. We study the effects of intra bandwidth on the training performance in multi-machines multi-GPUs scenario and give sufficient conditions for hierarchical NetReduce to outperform other algorithms. We also extend the design from rack-level aggregation to more general spine-leaf topology in the data center. NetReduce accelerates the training up to 1.7x and 1.5x for CNN-based CV and transformer-based NLP tasks, respectively. Simulations on large-scale systems indicate the superior scalability of NetReduce to the state-of-the-art ring all-reduce.

preprint2020arXiv

Pebbles in an Embedded Protostellar Disk: The Case of CB26

Planetary cores are thought to form in proto-planetary disks via the growth of dusty solid material. However, it is unclear how early this process begins. We study the physical structure and grain growth in the edge-on disk that surrounds the ~1 Myr old low-mass (~0.55 Msun) protostar embedded in the Bok Globule CB26 to examine how much grain growth has already occurred in the protostellar phase. We combine the SED between 0.9 $μ$m and 6.4 cm with high angular resolution continuum maps at 1.3, 2.9, and 8.1 mm, and use the radiative transfer code RADMC-3D to conduct a detailed modelling of the dust emission from the disk and envelope of CB 26. We infer inner and outer disk radii of around 16 au and 172$\pm$22 au, respectively. The total gas mass in the disk is ~0.076 Msun, which amounts to ~14% of the mass of the central star. The inner disk contains a compact free-free emission region, which could be related to either a jet or a photoevaporation region. The thermal dust emission from the outer disk is optically thin at mm wavelengths, while the emission from the inner disk midplane is moderately optically thick. Our best-fit radiative transfer models indicate that the dust grains in the disk have already grown to pebbles with diameters of the order of 10 cm in size. Residual 8.1 mm emission suggests the presence of even larger particles in the inner disk. For the optically thin mm dust emission from the outer disk, we derive a mean opacity slope of 0.6$\pm$0.4, which is consistent with the presence of large dust grains. The presence of cm-sized bodies in the CB 26 disk indicates that solids grow rapidly already during the first million years in a protostellar disk. It is thus possible that Class II disks are already seeded with large particles and may contain even planetesimals.

preprint2020arXiv

Provably Good Batch Reinforcement Learning Without Great Exploration

Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks. Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance. Recent algorithms have shown promise but can still be overly optimistic in their expected outcomes. Theoretical work that provides strong guarantees on the performance of the output policy relies on a strong concentrability assumption, that makes it unsuitable for cases where the ratio between state-action distributions of behavior policy and some candidate policies is large. This is because in the traditional analysis, the error bound scales up with this ratio. We show that a small modification to Bellman optimality and evaluation back-up to take a more conservative update can have much stronger guarantees. In certain settings, they can find the approximately best policy within the state-action space explored by the batch data, without requiring a priori assumptions of concentrability. We highlight the necessity of our conservative update and the limitations of previous algorithms and analyses by illustrative MDP examples, and demonstrate an empirical comparison of our algorithm and other state-of-the-art batch RL baselines in standard benchmarks.

preprint2020arXiv

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

Off-policy policy estimators that use importance sampling (IS) can suffer from high variance in long-horizon domains, and there has been particular excitement over new IS methods that leverage the structure of Markov decision processes. We analyze the variance of the most popular approaches through the viewpoint of conditional Monte Carlo. Surprisingly, we find that in finite horizon MDPs there is no strict variance reduction of per-decision importance sampling or stationary importance sampling, comparing with vanilla importance sampling. We then provide sufficient conditions under which the per-decision or stationary estimators will provably reduce the variance over importance sampling with finite horizons. For the asymptotic (in terms of horizon $T$) case, we develop upper and lower bounds on the variance of those estimators which yields sufficient conditions under which there exists an exponential v.s. polynomial gap between the variance of importance sampling and that of the per-decision or stationary estimators. These results help advance our understanding of if and when new types of IS estimators will improve the accuracy of off-policy estimation.

preprint2019arXiv

Combining Parametric and Nonparametric Models for Off-Policy Evaluation

We consider a model-based approach to perform batch off-policy evaluation in reinforcement learning. Our method takes a mixture-of-experts approach to combine parametric and non-parametric models of the environment such that the final value estimate has the least expected error. We do so by first estimating the local accuracy of each model and then using a planner to select which model to use at every time step as to minimize the return error estimate along entire trajectories. Across a variety of domains, our mixture-based approach outperforms the individual models alone as well as state-of-the-art importance sampling-based estimators.