Source author record

Yixiao Zhang

Yixiao Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Computation and Language eess.AS Machine Learning Sound Artificial Intelligence astro-ph.EP eess.IV math.CO

Catalog footprint

What is connected

9works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Light-weight Interpretable Compositional Model for Nuclei Detection and Weakly-Supervised Segmentation

The field of computational pathology has witnessed great advancements since deep neural networks have been widely applied. These networks usually require large numbers of annotated data to train vast parameters. However, it takes significant effort to annotate a large histopathology dataset. We introduce a light-weight and interpretable model for nuclei detection and weakly-supervised segmentation. It only requires annotations on isolated nucleus, rather than on all nuclei in the dataset. Besides, it is a generative compositional model that first locates parts of nucleus, then learns the spatial correlation of the parts to further locate the nucleus. This process brings interpretability in its prediction. Empirical results on an in-house dataset show that in detection, the proposed method achieved comparable or better performance than its deep network counterparts, especially when the annotated data is limited. It also outperforms popular weakly-supervised segmentation methods. The proposed method could be an alternative solution for the data-hungry problem of deep learning methods.

preprint2022arXiv

Fast AdvProp

Adversarial Propagation (AdvProp) is an effective way to improve recognition models, leveraging adversarial examples. Nonetheless, AdvProp suffers from the extremely slow training speed, mainly because: a) extra forward and backward passes are required for generating adversarial examples; b) both original samples and their adversarial counterparts are used for training (i.e., 2$\times$ data). In this paper, we introduce Fast AdvProp, which aggressively revamps AdvProp's costly training components, rendering the method nearly as cheap as the vanilla training. Specifically, our modifications in Fast AdvProp are guided by the hypothesis that disentangled learning with adversarial examples is the key for performance improvements, while other training recipes (e.g., paired clean and adversarial training samples, multi-step adversarial attackers) could be largely simplified. Our empirical results show that, compared to the vanilla training baseline, Fast AdvProp is able to further model performance on a spectrum of visual benchmarks, without incurring extra training cost. Additionally, our ablations find Fast AdvProp scales better if larger models are used, is compatible with existing data augmentation methods (i.e., Mixup and CutMix), and can be easily adapted to other recognition tasks like object detection. The code is available here: https://github.com/meijieru/fast_advprop.

preprint2022arXiv

Generating non-jumps from a known one

Let $r\ge 2$ be an integer. The real number $α\in [0,1]$ is a jump for $r$ if there exists a constant $c > 0$ such that for any $ε>0$ and any integer $m \geq r$, there exists an integer $n_0(ε, m)$ satisfying any $r$-uniform graph with $n\ge n_0(ε, m)$ vertices and density at least $α+ε$ contains a subgraph with $m$ vertices and density at least $α+c$. A result of Erdős, Stone and Simonovits implies that every $α\in [0,1)$ is a jump for $r=2$. Erdős asked whether the same is true for $r\ge 3$. Frankl and Rödl gave a negative answer by showing that $1-\frac{1}{l^{r-1}}$ is not a jump for $r$ if $r\ge 3$ and $l>2r$. After that, more non-jumps are found using a method of Frankl and Rödl. In this note, we show a method to construct maps $f \colon [0,1] \to [0,1]$ that preserve non-jumps, if $α$ is a non-jump for $r$ given by the method of Frankl and Rödl, then $f(α)$ is also a non-jump for $r$. We use these maps to study hypergraph Turán densities and answer a question posed by Grosu.

preprint2022arXiv

Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model

Lyric interpretations can help people understand songs and their lyrics quickly, and can also make it easier to manage, retrieve and discover songs efficiently from the growing mass of music archives. In this paper we propose BART-fusion, a novel model for generating lyric interpretations from lyrics and music audio that combines a large-scale pre-trained language model with an audio encoder. We employ a cross-modal attention module to incorporate the audio representation into the lyrics representation to help the pre-trained language model understand the song from an audio perspective, while preserving the language model's original generative performance. We also release the Song Interpretation Dataset, a new large-scale dataset for training and evaluating our model. Experimental results show that the additional audio information helps our model to understand words and music better, and to generate precise and fluent interpretations. An additional experiment on cross-modal music retrieval shows that interpretations generated by BART-fusion can also help people retrieve music more accurately than with the original BART.

preprint2020arXiv

C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation

3D convolution neural networks (CNN) have been proved very successful in parsing organs or tumours in 3D medical images, but it remains sophisticated and time-consuming to choose or design proper 3D networks given different task contexts. Recently, Neural Architecture Search (NAS) is proposed to solve this problem by searching for the best network architecture automatically. However, the inconsistency between search stage and deployment stage often exists in NAS algorithms due to memory constraints and large search space, which could become more serious when applying NAS to some memory and time consuming tasks, such as 3D medical image segmentation. In this paper, we propose coarse-to-fine neural architecture search (C2FNAS) to automatically search a 3D segmentation network from scratch without inconsistency on network size or input size. Specifically, we divide the search procedure into two stages: 1) the coarse stage, where we search the macro-level topology of the network, i.e. how each convolution module is connected to other modules; 2) the fine stage, where we search at micro-level for operations in each cell based on previous searched macro-level topology. The coarse-to-fine manner divides the search procedure into two consecutive stages and meanwhile resolves the inconsistency. We evaluate our method on 10 public datasets from Medical Segmentation Decalthon (MSD) challenge, and achieve state-of-the-art performance with the network searched using one dataset, which demonstrates the effectiveness and generalization of our searched models.

preprint2020arXiv

Learning Interpretable Representation for Controllable Polyphonic Music Generation

While deep generative models have become the leading methods for algorithmic composition, it remains a challenging problem to control the generation process because the latent variables of most deep-learning models lack good interpretability. Inspired by the content-style disentanglement idea, we design a novel architecture, under the VAE framework, that effectively learns two interpretable latent factors of polyphonic music: chord and texture. The current model focuses on learning 8-beat long piano composition segments. We show that such chord-texture disentanglement provides a controllable generation pathway leading to a wide spectrum of applications, including compositional style transfer, texture variation, and accompaniment arrangement. Both objective and subjective evaluations show that our method achieves a successful disentanglement and high quality controlled music generation.

preprint2020arXiv

PIANOTREE VAE: Structured Representation Learning for Polyphonic Music

The dominant approach for music representation learning involves the deep unsupervised model family variational autoencoder (VAE). However, most, if not all, viable attempts on this problem have largely been limited to monophonic music. Normally composed of richer modality and more complex musical structures, the polyphonic counterpart has yet to be addressed in the context of music representation learning. In this work, we propose the PianoTree VAE, a novel tree-structure extension upon VAE aiming to fit the polyphonic music learning. The experiments prove the validity of the PianoTree VAE via (i)-semantically meaningful latent code for polyphonic segments; (ii)-more satisfiable reconstruction aside of decent geometry learned in the latent space; (iii)-this model's benefits to the variety of the downstream music generation.

preprint2020arXiv

Small Sensitivity of the Simulated Climate of Tidally Locked Aquaplanets to Model Resolution

Tidally locked terrestrial planets around low-mass stars are the prime targets of finding potentially habitable exoplanets. Several atmospheric general circulation models have been employed to simulate their possible climates, however, model intercomparisons showed that there are large differences in the results of the models even when they are forced with the same boundary conditions. In this paper, we examine whether model resolution contributes to the differences. Using the atmospheric general circulation model ExoCAM coupled to a 50-m slab ocean, we examine three different horizontal resolutions (440 km * 550 km, 210 km * 280 km, and 50 km * 70 km in latitude and longitude) and three different vertical resolutions (26, 51, and 74 levels) under the same dynamical core and the same schemes of radiation, convection and clouds. Among the experiments, the differences are within 5 K in global-mean surface temperature and within 0.007 in planetary albedo. These differences are from cloud feedback, water vapor feedback, and the decreasing trend of relative humidity with increasing resolution. Relatively small-scale downdrafts between upwelling columns over the substellar region are better resolved and the mixing between dry and wet air parcels and between anvil clouds and their environment are enhanced as the resolution is increased. These reduce atmospheric relative humidity and high-level cloud fraction, causing a lower clear-sky greenhouse effect, a weaker cloud longwave radiation effect, and subsequently a cooler climate with increasing model resolution. Overall, the sensitivity of the simulated climate of tidally locked aquaplanets to model resolution is small.

preprint2020arXiv

When Radiology Report Generation Meets Knowledge Graph

Automatic radiology report generation has been an attracting research problem towards computer-aided diagnosis to alleviate the workload of doctors in recent years. Deep learning techniques for natural image captioning are successfully adapted to generating radiology reports. However, radiology image reporting is different from the natural image captioning task in two aspects: 1) the accuracy of positive disease keyword mentions is critical in radiology image reporting in comparison to the equivalent importance of every single word in a natural image caption; 2) the evaluation of reporting quality should focus more on matching the disease keywords and their associated attributes instead of counting the occurrence of N-gram. Based on these concerns, we propose to utilize a pre-constructed graph embedding module (modeled with a graph convolutional neural network) on multiple disease findings to assist the generation of reports in this work. The incorporation of knowledge graph allows for dedicated feature learning for each disease finding and the relationship modeling between them. In addition, we proposed a new evaluation metric for radiology image reporting with the assistance of the same composed graph. Experimental results demonstrate the superior performance of the methods integrated with the proposed graph embedding module on a publicly accessible dataset (IU-RR) of chest radiographs compared with previous approaches using both the conventional evaluation metrics commonly adopted for image captioning and our proposed ones.

Yixiao Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

A Light-weight Interpretable Compositional Model for Nuclei Detection and Weakly-Supervised Segmentation

Fast AdvProp

Generating non-jumps from a known one

Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model

C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation

Learning Interpretable Representation for Controllable Polyphonic Music Generation

PIANOTREE VAE: Structured Representation Learning for Polyphonic Music

Small Sensitivity of the Simulated Climate of Tidally Locked Aquaplanets to Model Resolution

When Radiology Report Generation Meets Knowledge Graph