Researcher profile

Zhiqiang Shen

Zhiqiang Shen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
34works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

34 published item(s)

preprint2026arXiv

Beyond Sgr A* and M87*: Sub-Microarcsecond Black Hole Shadow Detection via Lunar-based Extremely Long Baseline Interferometry

The 1.3 mm ground-based very long baseline interferometry (VLBI) array, the Event Horizon Telescope (EHT), is limited by the Earth's diameter and can image the supermassive black hole (SMBH) shadows of only M87* and Sgr A*. Extending the array with an assumed lunar-based telescope could achieve $\sim 0.85\ μ$as angular resolution at 230 GHz, enabling black hole shadow detection for a larger SMBH sample. The concept is motivated by space VLBI missions and lunar exploration, including the ongoing Lunar Orbit VLBI Experiment (LOVEX) aboard QueQiao-2 (Chang'E-7) and the planned International Lunar Research Station (ILRS). We assess shadow detectability for 31 SMBH with predicted large angular sizes, exploring different telescope location and antenna size. Assuming a telescope at the lunar antipode, we simulate the Moon-Earth (u,v) coverage and show that source geometry relative to the Moon's orbit determines whether the primary indicator of shadow, first visibility null, can be sampled. Using a geometric ring model, we identify six high-priority targets: M104, NGC 524, PGC 049940, NGC 5077, NGC 5252, and NGC 1052. Shadows of M104, NGC 5077, and NGC 1052 are detectable with a 5 m lunar-based telescope; PGC 049940 requires 20 m; NGC 524 and NGC 5252 require 100 m. Photon ring detection for Sgr A*, M87*, NGC 1600, and M31 is possible if space telescopes fill the baseline coverage gaps and sensitivity requirements are met. These results provide a clear scientific and technical motivation for lunar-based telescopes in future black hole shadow studies.

preprint2026arXiv

Does the radio-active phase of XTE~J1810$-$197 recur following the same evolutionary pattern?

Magnetars are the most strongly magnetized compact objects known in the Universe and are regarded as one of the primary engines powering a variety of enigmatic, high-energy transients. However, our understanding of magnetars remains highly limited, constrained by observational sample size and radiative variability. XTE~J1810$-$197, which re-entered a radio-active phase in 2018, is one of only six known radio-pulsating magnetars. Leveraging the distinctive capability for simultaneous dual-frequency observations, we utilized the Shanghai Tianma Radio Telescope (TMRT) to monitor this magnetar continuously at both 2.25 and 8.60~GHz, capturing its entire evolution from radio activation to quenching. This enabled precise characterization of the evolution in its integrated profile, spin frequency, flux density, and spectral index ($α$, defined by $S \propto f^α$). The first time derivative of its spin frequency $\dotν$ passed through four distinct phases -- rapid decrease, violent oscillation, steady decline, and stable recovery -- before returning to its pre-outburst value concomitant with the cessation of radio emission. Remarkably, both the amplitudes and the characteristic time-scales of these $\dotν$ variations match those observed during the previous outburst that began in 2003, providing the first demonstration that post-outburst rotational evolution and radiative behavior in a magnetar are repeatable. A twisted-magnetosphere model can qualitatively account for this repeatability as well as for the progressive narrowing and abrupt disappearance of the radio pulse radiation, thereby receiving strong observational support.

preprint2026arXiv

On the Cultural Anachronism and Temporal Reasoning in Vision Language Models

Vision-Language Models (VLMs) are increasingly applied to cultural heritage materials, from digital archives to educational platforms. This work identifies a fundamental issue in how these models interpret historical artifacts. We define this phenomenon as cultural anachronism, the tendency to misinterpret historical objects using temporally inappropriate concepts, materials, or cultural frameworks. To quantify this phenomenon, we introduce the Temporal Anachronism Benchmark for Vision-Language Models (TAB-VLM), a dataset of 600 questions across six categories, designed to evaluate temporal reasoning on 1,600 Indian cultural artifacts spanning prehistoric to modern periods. Systematic evaluations of ten state-of-the-art models reveal significant deficiencies on our benchmark, and even the best model (GPT-5.2) achieves only 58.7% overall accuracy. The performance gap persists across varying architectures and scales, suggesting that cultural anachronism represents a significant limitation in visual AI systems, regardless of model size. These findings highlight the disparity between current VLM capabilities and the requirements for accurately interpreting cultural heritage materials, particularly for non-Western visual cultures underrepresented in training data. Our benchmark provides a foundation for enhancing temporal cognition in multimodal AI systems that interact with historical artifacts. The dataset and code are available in our project page.

preprint2026arXiv

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

Structured pruning and knowledge distillation (KD) are typical techniques for compressing large language models, but it remains unclear how they should be applied at pretraining scale, especially to recent mixture-of-experts (MoE) models. In this work, we systematically study MoE compression in large-scale pretraining, focusing on three key questions: whether pruning provides a better initialization than training from scratch, how expert compression choices affect the final model after continued training, and which training strategy is most effective. We have the following findings: First, across depth, width, and expert compression, pruning a pretrained MoE consistently outperforms training the target architecture from scratch under the same training budget. Second, different one-shot expert compression methods converge to similar final performance after large-scale continual pretraining. Motivated by this, we introduce a simple partial-preservation expert merging strategy that improves downstream performance across most benchmarks. Third, combining KD with the language modeling loss outperforms KD alone, particularly on knowledge-intensive tasks. We further propose multi-token prediction (MTP) distillation, which yields consistent gains. Finally, given the same training tokens, progressive pruning schedules outperform one-shot compression, suggesting that gradual architecture transitions lead to better optimization trajectories. Putting it all together, we compress Qwen3-Next-80A3B to a 23A2B model that retains competitive performance. These results offer practical guidance for efficient MoE compression at scale.

preprint2025arXiv

Introduction to the Chinese Space Station Survey Telescope (CSST)

The Chinese Space Station Survey Telescope (CSST) is an upcoming Stage-IV sky survey telescope, distinguished by its large field of view (FoV), high image quality, and multi-band observation capabilities. It can simultaneously conduct precise measurements of the Universe by performing multi-color photometric imaging and slitless spectroscopic surveys. The CSST is equipped with five scientific instruments, i.e. Multi-band Imaging and Slitless Spectroscopy Survey Camera (SC), Multi-Channel Imager (MCI), Integral Field Spectrograph (IFS), Cool Planet Imaging Coronagraph (CPI-C), and THz Spectrometer (TS). Using these instruments, CSST is expected to make significant contributions and discoveries across various astronomical fields, including cosmology, galaxies and active galactic nuclei (AGN), the Milky Way and nearby galaxies, stars, exoplanets, Solar System objects, astrometry, and transients and variable sources. This review aims to provide a comprehensive overview of the CSST instruments, observational capabilities, data products, and scientific potential.

preprint2023arXiv

FerKD: Surgical Label Adaptation for Efficient Distillation

We present FerKD, a novel efficient knowledge distillation framework that incorporates partial soft-hard label adaptation coupled with a region-calibration mechanism. Our approach stems from the observation and intuition that standard data augmentations, such as RandomResizedCrop, tend to transform inputs into diverse conditions: easy positives, hard positives, or hard negatives. In traditional distillation frameworks, these transformed samples are utilized equally through their predictive probabilities derived from pretrained teacher models. However, merely relying on prediction values from a pretrained teacher, a common practice in prior studies, neglects the reliability of these soft label predictions. To address this, we propose a new scheme that calibrates the less-confident regions to be the context using softened hard groundtruth labels. Our approach involves the processes of hard regions mining + calibration. We demonstrate empirically that this method can dramatically improve the convergence speed and final accuracy. Additionally, we find that a consistent mixing strategy can stabilize the distributions of soft supervision, taking advantage of the soft labels. As a result, we introduce a stabilized SelfMix augmentation that weakens the variation of the mixed images and corresponding soft labels through mixing similar regions within the same image. FerKD is an intuitive and well-designed learning system that eliminates several heuristics and hyperparameters in former FKD solution. More importantly, it achieves remarkable improvement on ImageNet-1K and downstream tasks. For instance, FerKD achieves 81.2% on ImageNet-1K with ResNet-50, outperforming FKD and FunMatch by remarkable margins. Leveraging better pre-trained weights and larger architectures, our finetuned ViT-G14 even achieves 89.9%. Our code is available at https://github.com/szq0214/FKD/tree/main/FerKD.

preprint2022arXiv

ALMA Survey of Orion Planck Galactic Cold Clumps (ALMASOP): How do dense core properties affect the multiplicity of protostars?

During the transition phase from a prestellar to a protostellar cloud core, one or several protostars can form within a single gas core. The detailed physical processes of this transition, however, still remain unclear. We present 1.3 mm dust continuum and molecular line observations with the Atacama Large Millimeter/submillimeter Array (ALMA) toward 43 protostellar cores in the Orion Molecular Cloud Complex ($λ$ Orionis, Orion B, and Orion A) with an angular resolution of $\sim$ 0.35" ($\sim$ 140 au). In total, we detect 13 binary/multiple systems. We derive an overall multiplicity frequency (MF) of 28$\%$ $\pm$ 4$\%$ and a companion star fraction (CSF) of 51$\%$ $\pm$ 6$\%$, over a separation range of 300-8900 au. The median separation of companions is about 2100 au. The occurrence of stellar multiplicity may depend on the physical characteristics of the dense cores. Notably, those containing binary/multiple systems tend to show higher gas density and Mach number than cores forming single stars. The integral-shaped filament (ISF) of Orion A giant molecular cloud (GMC), which has the highest gas density and hosts high-mass star formation in its central region (the Orion Nebula cluster), shows the highest MF and CSF among the Orion GMCs. In contrast, the $λ$ Orionis Giant Molecular Cloud (GMC) has a lower MF and CSF than the Orion B and Orion A GMCs, indicating that feedback from HII regions may suppress the formation of multiple systems. We also find that the protostars comprising a binary/multiple system are usually at different evolutionary stages.

preprint2022arXiv

CHANG-ES. XXIV. First Detection of A Radio Nuclear Ring and Potential LLAGN in NGC 5792

We report the discoveries of a nuclear ring of diameter 10$\arcsec$ ($\sim$1.5 kpc) and a potential low luminosity active galactic nucleus (LLAGN) in the radio continuum emission map of the edge-on barred spiral galaxy NGC~5792. These discoveries are based on the Continuum Halos in Nearby Galaxies - an Expanded Very Large Array (VLA) Survey, as well as subsequent VLA observations of sub-arcsecond resolution. Using a mixture of H$α$ and 24 $μ$m calibration, we disentangle the thermal and non-thermal radio emission of the nuclear region, and derive a star formation rate (SFR) of $\sim 0.4~M_{\sun}$ yr$^{-1}$. We find that the nuclear ring is dominated by non-thermal synchrotron emission. The synchrotron-based SFR is about three times of the mixture-based SFR. This result indicates that the nuclear ring underwent more intense star-forming activity in the past, and now its star formation is in the low state. The sub-arcsecond VLA images resolve six individual knots on the nuclear ring. The equipartition magnetic field strength $B_{\rm eq}$ of the knots varies from 77 to 88 $μ$G. The radio ring surrounds a point-like faint radio core of $S_{\rm 6GHz}=(16\pm4)$ $μ$Jy with polarized lobes at the center of NGC~5792, which suggests an LLAGN with an Eddington ratio $\sim10^{-5}$. This radio nuclear ring is reminiscent of the Central Molecular Zone (CMZ) of the Galaxy. Both of them consist of a nuclear ring and LLAGN.

preprint2022arXiv

Data-Free Neural Architecture Search via Recursive Label Calibration

This paper aims to explore the feasibility of neural architecture search (NAS) given only a pre-trained model without using any original training data. This is an important circumstance for privacy protection, bias avoidance, etc., in real-world scenarios. To achieve this, we start by synthesizing usable data through recovering the knowledge from a pre-trained deep neural network. Then we use the synthesized data and their predicted soft-labels to guide neural architecture search. We identify that the NAS task requires the synthesized data (we target at image domain here) with enough semantics, diversity, and a minimal domain gap from the natural images. For semantics, we propose recursive label calibration to produce more informative outputs. For diversity, we propose a regional update strategy to generate more diverse and semantically-enriched synthetic data. For minimal domain gap, we use input and feature-level regularization to mimic the original data distribution in latent space. We instantiate our proposed framework with three popular NAS algorithms: DARTS, ProxylessNAS and SPOS. Surprisingly, our results demonstrate that the architectures discovered by searching with our synthetic data achieve accuracy that is comparable to, or even higher than, architectures discovered by searching from the original ones, for the first time, deriving the conclusion that NAS can be done effectively with no need of access to the original or called natural data if the synthesis method is well designed.

preprint2022arXiv

Exploring Simple and Transferable Recognition-Aware Image Processing

Recent progress in image recognition has stimulated the deployment of vision systems at an unprecedented scale. As a result, visual data are now often consumed not only by humans but also by machines. Existing image processing methods only optimize for better human perception, yet the resulting images may not be accurately recognized by machines. This can be undesirable, e.g., the images can be improperly handled by search engines or recommendation systems. In this work, we examine simple approaches to improve machine recognition of processed images: optimizing the recognition loss directly on the image processing network or through an intermediate input transformation model. Interestingly, the processing model's ability to enhance recognition quality can transfer when evaluated on models of different architectures, recognized categories, tasks and training datasets. This makes the methods applicable even when we do not have the knowledge of future recognition models, e.g., when uploading processed images to the Internet. We conduct experiments on multiple image processing tasks paired with ImageNet classification and PASCAL VOC detection as recognition tasks. With these simple yet effective methods, substantial accuracy gain can be achieved with strong transferability and minimal image quality loss. Through a user study we further show that the accuracy gain can transfer to a black-box cloud model. Finally, we try to explain this transferability phenomenon by demonstrating the similarities of different models' decision boundaries. Code is available at https://github.com/liuzhuang13/Transferable_RA .

preprint2022arXiv

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

The nonuniform quantization strategy for compressing neural networks usually achieves better performance than its counterpart, i.e., uniform strategy, due to its superior representational capacity. However, many nonuniform quantization methods overlook the complicated projection process in implementing the nonuniformly quantized weights/activations, which incurs non-negligible time and space overhead in hardware deployment. In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference. We achieve this through learning the flexible in-equidistant input thresholds to better fit the underlying distribution while quantizing these real-valued inputs into equidistant output levels. To train the quantized network with learnable input thresholds, we introduce a generalized straight-through estimator (G-STE) for intractable backward derivative calculation w.r.t. threshold parameters. Additionally, we consider entropy preserving regularization to further reduce information loss in weight quantization. Even under this adverse constraint of imposing uniformly quantized weights and activations, our N2UQ outperforms state-of-the-art nonuniform quantization methods by 0.5~1.7 on ImageNet, demonstrating the contribution of N2UQ design. Code and models are available at: https://github.com/liuzechun/Nonuniform-to-Uniform-Quantization.

preprint2022arXiv

Radio properties of the OH megamaser galaxy IIZw 096

Based on the two epochs EVN archive data from OH line observations of IIZw 096, we confirm that the high-resolution OH emission in this source mainly comes from two spots (OH1 and OH2) of comp D1 of this merging system. We found no significant variations in the OH line emission. The OH 1665 MHz line emission is detected at about 6 $σ$ level in the OH1 region by combining two epoch EVN observations. We found that the comp D1 shows the brightest CO, HCO+ line emission, as well as multi-band radio continuum emission. The environment around D1 shows no clear velocity structure associated with circular motions, making it different from most other OHMs in the literature, which might have been caused by an effect during the merger stage. Meanwhile, we found that the CO emission shows three velocity structures around D1, including the central broad FWHM region, the double peak region where the CO line profile shows two separated peaks, and the region of the high-velocity clouds where the CO line peaks at a high velocity ($\sim$ 11000 \kms). \HI in absorption also show high-velocity clouds around the D1 region, which might be due to inflows caused by the merging of two or more galaxy components. Based on the high-resolution K-band VLA and L-band VLBA observations of the radio continuum emission, we derived the brightness temperature in the range $10^{5}$ K to $10^{6}$ K, which is consistent with other starburst dominant OHM sources in the literature. The multi-band VLA observations show that the radio continuum emission of comp D might also have contributions from free-free emission, besides synchrotron emission. As a concenquence, these results support a starburst origin for the OHMs, without the presence of an AGN.

preprint2022arXiv

SDQ: Stochastic Differentiable Quantization with Mixed Precision

In order to deploy deep models in a computationally efficient manner, model quantization approaches have been frequently used. In addition, as new hardware that supports mixed bitwidth arithmetic operations, recent research on mixed precision quantization (MPQ) begins to fully leverage the capacity of representation by searching optimized bitwidths for different layers and modules in a network. However, previous studies mainly search the MPQ strategy in a costly scheme using reinforcement learning, neural architecture search, etc., or simply utilize partial prior knowledge for bitwidth assignment, which might be biased and sub-optimal. In this work, we present a novel Stochastic Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy in a more flexible and globally-optimized space with smoother gradient approximation. Particularly, Differentiable Bitwidth Parameters (DBPs) are employed as the probability factors in stochastic quantization between adjacent bitwidth choices. After the optimal MPQ strategy is acquired, we further train our network with entropy-aware bin regularization and knowledge distillation. We extensively evaluate our method for several networks on different hardware (GPUs and FPGA) and datasets. SDQ outperforms all state-of-the-art mixed or single precision quantization with a lower bitwidth and is even better than the full-precision counterparts across various ResNet and MobileNet families, demonstrating the effectiveness and superiority of our method.

preprint2022arXiv

Sliced Recursive Transformer

We present a neat yet effective recursive operation on vision transformers that can improve parameter utilization without involving additional parameters. This is achieved by sharing weights across the depth of transformer networks. The proposed method can obtain a substantial gain (~2%) simply using naive recursive operation, requires no special or sophisticated knowledge for designing principles of networks, and introduces minimal computational overhead to the training procedure. To reduce the additional computation caused by recursive operation while maintaining the superior accuracy, we propose an approximating method through multiple sliced group self-attentions across recursive layers which can reduce the cost consumption by 10~30% with minimal performance loss. We call our model Sliced Recursive Transformer (SReT), a novel and parameter-efficient vision transformer design that is compatible with a broad range of other designs for efficient ViT architectures. Our best model establishes significant improvement on ImageNet-1K over state-of-the-art methods while containing fewer parameters. The proposed weight sharing mechanism by sliced recursion structure allows us to build a transformer with more than 100 or even 1000 shared layers with ease while keeping a compact size (13~15M), to avoid optimization difficulties when the model is too large. The flexible scalability has shown great potential for scaling up models and constructing extremely deep vision transformers. Code is available at https://github.com/szq0214/SReT.

preprint2022arXiv

Stereo Neural Vernier Caliper

We propose a new object-centric framework for learning-based stereo 3D object detection. Previous studies build scene-centric representations that do not consider the significant variation among outdoor instances and thus lack the flexibility and functionalities that an instance-level model can offer. We build such an instance-level model by formulating and tackling a local update problem, i.e., how to predict a refined update given an initial 3D cuboid guess. We demonstrate how solving this problem can complement scene-centric approaches in (i) building a coarse-to-fine multi-resolution system, (ii) performing model-agnostic object location refinement, and (iii) conducting stereo 3D tracking-by-detection. Extensive experiments demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on the KITTI benchmark. Code and pre-trained models are available at https://github.com/Nicholasli1995/SNVC.

preprint2022arXiv

Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning

The recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations. Making the two views distinctive is a core to guarantee that unsupervised methods can learn meaningful information. However, such frameworks are sometimes fragile on overfitting if the augmentations used for generating two views are not strong enough, causing the over-confident issue on the training data. This drawback hinders the model from learning subtle variance and fine-grained information. To address this, in this work we aim to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs through mixing the input data space, to further work collaboratively for the input and loss spaces. Despite its conceptual simplicity, we show empirically that with the solution -- Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space. Extensive experiments are conducted on CIFAR-10, CIFAR-100, STL-10, Tiny ImageNet and standard ImageNet with popular unsupervised methods SimCLR, BYOL, MoCo V1&V2, SwAV, etc. Our proposed image mixture and label assignment strategy can obtain consistent improvement by 1~3% following exactly the same hyperparameters and training procedures of the base methods. Code is publicly available at https://github.com/szq0214/Un-Mix.

preprint2022arXiv

Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space

This paper explores the feasibility of finding an optimal sub-model from a vision transformer and introduces a pure vision transformer slimming (ViT-Slim) framework. It can search a sub-structure from the original model end-to-end across multiple dimensions, including the input tokens, MHSA and MLP modules with state-of-the-art performance. Our method is based on a learnable and unified $\ell_1$ sparsity constraint with pre-defined factors to reflect the global importance in the continuous searching space of different dimensions. The searching process is highly efficient through a single-shot training scheme. For instance, on DeiT-S, ViT-Slim only takes ~43 GPU hours for the searching process, and the searched structure is flexible with diverse dimensionalities in different modules. Then, a budget threshold is employed according to the requirements of accuracy-FLOPs trade-off on running devices, and a re-training process is performed to obtain the final model. The extensive experiments show that our ViT-Slim can compress up to 40% of parameters and 40% FLOPs on various vision transformers while increasing the accuracy by ~0.6% on ImageNet. We also demonstrate the advantage of our searched models on several downstream datasets. Our code is available at https://github.com/Arnav0400/ViT-Slim.

preprint2021arXiv

Interpretative Computer-aided Lung Cancer Diagnosis: from Radiology Analysis to Malignancy Evaluation

Background and Objective:Computer-aided diagnosis (CAD) systems promote diagnosis effectiveness and alleviate pressure of radiologists. A CAD system for lung cancer diagnosis includes nodule candidate detection and nodule malignancy evaluation. Recently, deep learning-based pulmonary nodule detection has reached satisfactory performance ready for clinical application. However, deep learning-based nodule malignancy evaluation depends on heuristic inference from low-dose computed tomography volume to malignant probability, which lacks clinical cognition. Methods:In this paper, we propose a joint radiology analysis and malignancy evaluation network (R2MNet) to evaluate the pulmonary nodule malignancy via radiology characteristics analysis. Radiological features are extracted as channel descriptor to highlight specific regions of the input volume that are critical for nodule malignancy evaluation. In addition, for model explanations, we propose channel-dependent activation mapping to visualize the features and shed light on the decision process of deep neural network. Results:Experimental results on the LIDC-IDRI dataset demonstrate that the proposed method achieved area under curve of 96.27% on nodule radiology analysis and AUC of 97.52% on nodule malignancy evaluation. In addition, explanations of CDAM features proved that the shape and density of nodule regions were two critical factors that influence a nodule to be inferred as malignant, which conforms with the diagnosis cognition of experienced radiologists. Conclusion:Incorporating radiology analysis with nodule malignant evaluation, the network inference process conforms to the diagnostic procedure of radiologists and increases the confidence of evaluation results. Besides, model interpretation with CDAM features shed light on the regions which DNNs focus on when they estimate nodule malignancy probabilities.

preprint2021arXiv

Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot Learning

The goal of few-shot learning is to learn a classifier that can recognize unseen classes from limited support data with labels. A common practice for this task is to train a model on the base set first and then transfer to novel classes through fine-tuning (Here fine-tuning procedure is defined as transferring knowledge from base to novel data, i.e. learning to transfer in few-shot scenario.) or meta-learning. However, as the base classes have no overlap to the novel set, simply transferring whole knowledge from base data is not an optimal solution since some knowledge in the base model may be biased or even harmful to the novel class. In this paper, we propose to transfer partial knowledge by freezing or fine-tuning particular layer(s) in the base model. Specifically, layers will be imposed different learning rates if they are chosen to be fine-tuned, to control the extent of preserved transferability. To determine which layers to be recast and what values of learning rates for them, we introduce an evolutionary search based method that is efficient to simultaneously locate the target layers and determine their individual learning rates. We conduct extensive experiments on CUB and mini-ImageNet to demonstrate the effectiveness of our proposed method. It achieves the state-of-the-art performance on both meta-learning and non-meta based frameworks. Furthermore, we extend our method to the conventional pre-training + fine-tuning paradigm and obtain consistent improvement.

preprint2021arXiv

Structural and spectral properties of Galactic plane variable radio sources

In the time domain, the radio sky in particular along the Galactic plane direction may vary significantly because of various energetic activities associated with stars, stellar and supermassive black holes. Using multi-epoch Very Large Array surveys of the Galactic plane at 5.0 GHz, Becker et al. (2010) presented a catalogue of 39 variable radio sources in the flux density range 1-70 mJy. To probe their radio structures and spectra, we observed 17 sources with the very-long-baseline interferometric (VLBI) imaging technique and collected additional multi-frequency data from the literature. We detected all of the sources at 5 GHz with the Westerbork Synthesis Radio Telescope, but only G23.6644-0.0372 with the European VLBI Network (EVN). Together with its decadal variability and multi-frequency radio spectrum, we interpret it as an extragalactic peaked-spectrum source with a size of <~10 pc. The remaining sources were resolved out by the long baselines of the EVN because of either strong scatter broadening at the Galactic latitude <1 deg or intrinsically very extended structures on centi-arcsec scales. According to their spectral and structural properties, we find that the sample has a diverse nature. We notice two young H II regions and spot a radio star and a candidate planetary nebula. The rest of the sources are very likely associated with radio active galactic nuclei (AGN). Two of them also displays arcsec-scale faint jet activity. The sample study indicates that AGN are commonplace even among variable radio sources in the Galactic plane.

preprint2021arXiv

The intrinsic structure of Sagittarius A* at 1.3 cm and 7 mm

Sagittarius A* (Sgr A*), the Galactic Center supermassive black hole (SMBH), is one of the best targets to resolve the innermost region of SMBH with very long baseline interferometry (VLBI). In this study, we have carried out observations toward Sgr A* at 1.349 cm (22.223 GHz) and 6.950 mm (43.135 GHz) with the East Asian VLBI Network, as a part of the multi-wavelength campaign of the Event Horizon Telescope (EHT) in 2017 April. To mitigate scattering effects, the physically motivated scattering kernel model from Psaltis et al. (2018) and the scattering parameters from Johnson et al. (2018) have been applied. As a result, a single, symmetric Gaussian model well describes the intrinsic structure of Sgr A* at both wavelengths. From closure amplitudes, the major-axis sizes are ~704$\pm$102 $μ$as (axial ratio $\sim$1.19$^{+0.24}_{-0.19}$) and $\sim$300$\pm$25 $μ$as (axial ratio $\sim$1.28$\pm$0.2) at 1.349 cm and 6.95 mm respectively. Together with a quasi-simultaneous observation at 3.5 mm (86 GHz) by Issaoun et al. (2019), we show that the intrinsic size scales with observing wavelength as a power-law, with an index $\sim$1.2$\pm$0.2. Our results also provide estimates of the size and compact flux density at 1.3 mm, which can be incorporated into the analysis of the EHT observations. In terms of the origin of radio emission, we have compared the intrinsic structures with the accretion flow scenario, especially the radiatively inefficient accretion flow based on the Keplerian shell model. With this, we show that a nonthermal electron population is necessary to reproduce the source sizes.

preprint2021arXiv

The TMRT K Band Observations towards 26 Infrared Dark Clouds: NH$_{3}$, CCS, and HC$_{3}$N

We present one of the first Shanghai Tian Ma Radio Telescope (TMRT) K Band observations towards a sample of 26 infrared dark clouds (IRDCs). We observed the (1,1), (2,2), (3,3), and (4,4) transitions of NH$_{3}$ together with CCS (2$_{1}$-1$_{0}$) and HC$_{3}$N $J\,$=2-1, simultaneously. The survey dramatically increases the existing CCS-detected IRDC sample from 8 to 23, enabling a better statistical study of the ratios of carbon-chain molecules (CCM) to N-bearing molecules in IRDCs. With the newly developed hyperfine group ratio (HFGR) method of fitting NH$_{3}$ inversion lines, we found the gas temperature to be between 10 and 18 K. The column density ratios of CCS to NH$_{3}$ for most of the IRDCs are less than 10$^{-2}$, distinguishing IRDCs from low-mass star-forming regions. We carried out chemical evolution simulations based on a three-phase chemical model NAUTILUS. Our measurements of the column density ratios between CCM and NH$_{3}$ are consistent with chemical evolutionary ages of $\lesssim$10$^{5}$ yr in the models. Comparisons of the data and chemical models suggest that CCS, HC$_{3}$N, and NH$_{3}$ are sensitive to the chemical evolutionary stages of the sources.

preprint2020arXiv

Attentive CutMix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification

Convolutional neural networks (CNN) are capable of learning robust representation with different regularization methods and activations as convolutional layers are spatially correlated. Based on this property, a large variety of regional dropout strategies have been proposed, such as Cutout, DropBlock, CutMix, etc. These methods aim to promote the network to generalize better by partially occluding the discriminative parts of objects. However, all of them perform this operation randomly, without capturing the most important region(s) within an object. In this paper, we propose Attentive CutMix, a naturally enhanced augmentation strategy based on CutMix. In each training iteration, we choose the most descriptive regions based on the intermediate attention maps from a feature extractor, which enables searching for the most discriminative parts in an image. Our proposed method is simple yet effective, easy to implement and can boost the baseline significantly. Extensive experiments on CIFAR-10/100, ImageNet datasets with various CNN architectures (in a unified setting) demonstrate the effectiveness of our proposed method, which consistently outperforms the baseline CutMix and other methods by a significant margin.

preprint2020arXiv

Binarizing MobileNet via Evolution-based Searching

Binary Neural Networks (BNNs), known to be one among the effectively compact network architectures, have achieved great outcomes in the visual tasks. Designing efficient binary architectures is not trivial due to the binary nature of the network. In this paper, we propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet, a compact network with separable depth-wise convolution. Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs), assuming an approximately optimal trade-off between computational cost and model accuracy. Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution while optimizing the model performance in terms of complexity and latency. The approach is threefold. First, we train strong baseline binary networks with a wide range of random group combinations at each convolutional layer. This set-up gives the binary neural networks a capability of preserving essential information through layers. Second, to find a good set of hyperparameters for group convolutions we make use of the evolutionary search which leverages the exploration of efficient 1-bit models. Lastly, these binary models are trained from scratch in a usual manner to achieve the final binary model. Various experiments on ImageNet are conducted to show that following our construction guideline, the final model achieves 60.09% Top-1 accuracy and outperforms the state-of-the-art CI-BCNN with the same computational cost.

preprint2020arXiv

Channel-wise Alignment for Adaptive Object Detection

Generic object detection has been immensely promoted by the development of deep convolutional neural networks in the past decade. However, in the domain shift circumstance, the changes in weather, illumination, etc., often cause domain gap, and thus performance drops substantially when detecting objects from one domain to another. Existing methods on this task usually draw attention on the high-level alignment based on the whole image or object of interest, which naturally, cannot fully utilize the fine-grained channel information. In this paper, we realize adaptation from a thoroughly different perspective, i.e., channel-wise alignment. Motivated by the finding that each channel focuses on a specific pattern (e.g., on special semantic regions, such as car), we aim to align the distribution of source and target domain on the channel level, which is finer for integration between discrepant domains. Our method mainly consists of self channel-wise and cross channel-wise alignment. These two parts explore the inner-relation and cross-relation of attention regions implicitly from the view of channels. Further more, we also propose a RPN domain classifier module to obtain a domain-invariant RPN network. Extensive experiments show that the proposed method performs notably better than existing methods with about 5% improvement under various domain-shift settings. Experiments on different task (e.g. instance segmentation) also demonstrate its good scalability.

preprint2020arXiv

Cross-Supervised Object Detection

After learning a new object category from image-level annotations (with no object bounding boxes), humans are remarkably good at precisely localizing those objects. However, building good object localizers (i.e., detectors) currently requires expensive instance-level annotations. While some work has been done on learning detectors from weakly labeled samples (with only class labels), these detectors do poorly at localization. In this work, we show how to build better object detectors from weakly labeled images of new categories by leveraging knowledge learned from fully labeled base categories. We call this novel learning paradigm cross-supervised object detection. We propose a unified framework that combines a detection head trained from instance-level annotations and a recognition head learned from image-level annotations, together with a spatial correlation module that bridges the gap between detection and recognition. These contributions enable us to better detect novel objects with image-level annotations in complex multi-object scenes such as the COCO dataset.

preprint2020arXiv

DR 21 South Filament: a Parsec-sized Dense Gas Accretion Flow onto the DR 21 Massive Young Cluster

DR21 south filament (DR21SF) is a unique component of the giant network of filamentary molecular clouds in the north region of Cygnus X complex. Unlike the highly fragmented and star-forming active environment it resides, DR21SF exhibits a coherent profile in the column density map with very few star formation signposts, even though the previously reported linear density of the filament is an order of magnitude higher than the thermal stable threshold. We derive the size (3.6~pc by 0.13~pc), temperature (10 to 15~K), and mass (1048~\textit{M$_\odot$}) of DR21SF from Shanghai 65 m TianMa Radio Telescope (TMRT) observations of NH$_3$ (1, 1) and (2, 2) inversion lines in conjunction with the column density map from our previous work. Star-forming sites are identified along the filament where gas temperature excesses. We find clear gradients in radial velocity and intrinsic line-width along the spine of the filament. The gradients can be well interpreted with a scenario of an accretion flow feeding DR 21 at a mass transfer rate of $1.1 \times 10^{-3}$~\textit{M$_\odot$} yr$^{-1}$. Based on the analysis of its kinematic temperature, intrinsic line-width and mass distribution, we conclude that DR21SF is in an overall trans-critical status, which indicates an early evolutionary stage.

preprint2020arXiv

ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions

In this paper, we propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost. We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts, bypassing all the intermediate convolutional layers including the downsampling layers. This baseline network strikes a good trade-off between accuracy and efficiency, achieving superior performance than most of existing binary networks at approximately half of the computational cost. Through extensive experiments and analysis, we observed that the performance of binary networks is sensitive to activation distribution variations. Based on this important observation, we propose to generalize the traditional Sign and PReLU functions, denoted as RSign and RPReLU for the respective generalized functions, to enable explicit learning of the distribution reshape and shift at near-zero extra cost. Lastly, we adopt a distributional loss to further enforce the binary network to learn similar output distributions as those of a real-valued network. We show that after incorporating all these ideas, the proposed ReActNet outperforms all the state-of-the-arts by a large margin. Specifically, it outperforms Real-to-Binary Net and MeliusNet29 by 4.0% and 3.6% respectively for the top-1 accuracy and also reduces the gap to its real-valued counterpart to within 3.0% top-1 accuracy on ImageNet dataset. Code and models are available at: https://github.com/liuzechun/ReActNet.

preprint2020arXiv

Soft Anchor-Point Object Detection

Recently, anchor-free detection methods have been through great progress. The major two families, anchor-point detection and key-point detection, are at opposite edges of the speed-accuracy trade-off, with anchor-point detectors having the speed advantage. In this work, we boost the performance of the anchor-point detector over the key-point counterparts while maintaining the speed advantage. To achieve this, we formulate the detection problem from the anchor point&#39;s perspective and identify ineffective training as the main problem. Our key insight is that anchor points should be optimized jointly as a group both within and across feature pyramid levels. We propose a simple yet effective training strategy with soft-weighted anchor points and soft-selected pyramid levels to address the false attention issue within each pyramid level and the feature selection issue across all the pyramid levels, respectively. To evaluate the effectiveness, we train a single-stage anchor-free detector called Soft Anchor-Point Detector (SAPD). Experiments show that our concise SAPD pushes the envelope of speed/accuracy trade-off to a new level, outperforming recent state-of-the-art anchor-free and anchor-based detectors. Without bells and whistles, our best model can achieve a single-model single-scale AP of 47.4% on COCO.

preprint2020arXiv

Solving Missing-Annotation Object Detection with Background Recalibration Loss

This paper focuses on a novel and challenging detection scenario: A majority of true objects/instances is unlabeled in the datasets, so these missing-labeled areas will be regarded as the background during training. Previous art on this problem has proposed to use soft sampling to re-weight the gradients of RoIs based on the overlaps with positive instances, while their method is mainly based on the two-stage detector (i.e. Faster RCNN) which is more robust and friendly for the missing label scenario. In this paper, we introduce a superior solution called Background Recalibration Loss (BRL) that can automatically re-calibrate the loss signals according to the pre-defined IoU threshold and input image. Our design is built on the one-stage detector which is faster and lighter. Inspired by the Focal Loss formulation, we make several significant modifications to fit on the missing-annotation circumstance. We conduct extensive experiments on the curated PASCAL VOC and MS COCO datasets. The results demonstrate that our proposed method outperforms the baseline and other state-of-the-arts by a large margin. Code available: https://github.com/Dwrety/mmdetection-selective-iou.

preprint2020arXiv

Space VLBI 2020: Science and Technology Futures Conference Summary

The &#34;Space VLBI 2020: Science and Technology Futures&#34; meeting was the second in The Future of High-Resolution Radio Interferometry in Space series. The first meeting (2018 September 5--6; Noordwijk, the Netherlands) focused on the full range of science applications possible for very long baseline interferometry (VLBI) with space-based antennas. Accordingly, the observing frequencies (wavelengths) considered ranged from below 1~MHz (> 300 m) to above 300~GHz (< 1 mm). For this second meeting, the focus was narrowed to mission concepts and the supporting technologies to enable the highest angular resolution observations at frequencies of 30~GHz and higher (< 1 cm). This narrowing of focus was driven by both scientific and technical considerations. First, results from the RadioAstron mission and the Event Horizon Telescope (EHT) have generated considerable excitement for studying the inner portions of black hole (BH) accretion disks and jets and testing elements of the General Theory of Relativity (GR). Second, the technologies and requirements involved in space-based VLBI differ considerably between 100~MHz and 100~GHz; a related consideration is that there are a number of existing instruments or mission concepts for frequencies of approximately 100~MHz and below, while it has been some time since attention has been devoted to space VLBI at frequencies above 10~GHz. This conference summary attempts to capture elements of presentations and discussions that occurred.

preprint2020arXiv

The radio properties of the OH megamaser galaxy IRAS 02524+2046

We present results from VLBI observations of continuum and OH line emission in IRAS 02524+2046 and also arcsecond-scale radio properties of this galaxy using VLA archive data. We found that there is no significant detection of radio continuum emission from VLBI observations. The arcsecond-scale radio images of this source show no clear extended emission, the total radio flux density at L and C band are around 2.9 mJy and 1.0 mJy respectively, which indicate a steep radio spectral index between the two band. Steep spectral index, low brightness temperature and high $q$-ratio (the FIR to the radio flux density), which are three critical indicators in classification of radio activity in the nuclei of galaxies, are all consistent with the classification of this source as a starburst galaxy from its optical spectrum. The high-resolution line profile show that both of \textbf{the 1665 and 1667 MHz OH maser} line have been detected which show three and two clear components respectively. The channel maps show that the maser emission are distributed in a region $\sim$ 210 pc $\times$ 90 pc, the detected maser components at different region show similar double spectral feature, which might be an evidence that this galaxy is at a stage of major merger as seen from the optical morphology.

preprint2019arXiv

Kinematics of the M87 jet in the collimation zone: gradual acceleration and velocity stratification

We study the kinematics of the M87 jet using the first year data of the KVN and VERA Array (KaVA) large program, which has densely monitored the jet at 22 and 43 GHz since 2016. We find that the apparent jet speeds generally increase from $\approx0.3c$ at $\approx0.5$ mas from the jet base to $\approx2.7c$ at $\approx20$ mas, indicating that the jet is accelerated from subluminal to superluminal speeds on these scales. We perform a complementary jet kinematic analysis by using archival Very Long Baseline Array monitoring data observed in $2005-2009$ at 1.7 GHz and find that the jet is moving at relativistic speeds up to $\approx5.8c$ at distances of $200-410$ mas. We combine the two kinematic results and find that the jet is gradually accelerated over a broad distance range that coincides with the jet collimation zone, implying that conversion of Poynting flux to kinetic energy flux takes place. If the jet emission consists of a single streamline, the observed trend of jet acceleration ($Γ\propto z^{0.16\pm0.01}$) is relatively slow compared to models of a highly magnetized jet. This indicates that Poynting flux conversion through the differential collimation of poloidal magnetic fields may be less efficient than expected. However, we find a non-negligible dispersion in the observed speeds for a given jet distance, making it difficult to describe the jet velocity field with a single power-law acceleration function. We discuss the possibility that the jet emission consists of multiple streamlines following different acceleration profiles, resulting in jet velocity stratification.

preprint2019arXiv

Mapping Observations of complex organic molecules around Sagittarius B2 with ARO 12m telescope

We performed high-sensitivity mapping observations of several complex organic molecules around Sagittarius B2 with ARO 12m telescope at 3-mm wavelength. Based on their spatial distribution, molecules can be classified as either &#34;extended&#34; that detected not only in Sgr B2(N) and Sgr B2(M), or &#34;compact&#34; that only detected toward or near to Sgr B2(N) and Sgr B2(M). The &#34;extended&#34; molecules including glycolaldehyde (CH2OHCHO), methyl formate (CH3OCHO), formic acid (t-HCOOH), ethanol (C2H5OH) and methyl amine (CH3NH2), while the &#34;compact&#34; molecules including dimethyl ether (CH3OCH3), ethyl cyanide (C2H5CN), and amino acetonitrile (H2NCH2CN). These &#34;compact&#34; molecules are likely produced under strong UV radiation, while &#34;extended&#34; molecules are likely formed under low-temperature, via gas-phase or grain surface reactions. The spatial distribution of &#34;warm&#34; CH2OHCHO at 89 GHz differ from the spatial distribution of &#34;cold&#34; CH2OHCHO observed at 13 GHz. We found evidence for an overabundance of CH2OHCHO compared to that expected from the gas-phase model, which indicates that grain-surface reactions are necessary to explain the origin of CH2OHCHO in Sagittarius B2. Grain-surface reactions are also needed to explain the correlation between the abundances of &#34;cold&#34; CH2OHCHO and C2H5OH. These results demonstrate the importance of grain-surface chemistry in the production of complex organic molecules.