Source author record

Yu Cao

Yu Cao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

34works

27topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate

On-policy distillation (OPD) trains a student on its own trajectories under token-level teacher supervision, but existing methods are capped by a single-teacher capability ceiling: when the teacher errs, the student inherits the error. OPD also remains largely unexplored in agentic tasks, where per-step errors compound across long trajectories and destabilize training. We propose MAD-OPD (Multi-Agent Debate-driven On-Policy Distillation), which breaks this ceiling by recasting the distillation teacher as a deliberative collective of teachers that debate over the student's on-policy state; the debate produces an emergent collective intelligence that supplies token-level supervision, with each teacher's contribution weighted by its post-debate confidence. To extend OPD to agentic tasks, we also introduce On-Policy Agentic Distillation (OPAD), which adds step-level sampling to stabilize training under multi-step error compounding. We additionally derive a task-adaptive divergence principle, selecting JSD (Jensen-Shannon divergence) for agentic stability and reverse KL (Kullback-Leibler) divergence for code generation, and verify it both theoretically and empirically. Across six teacher-student configurations (Qwen3 and Qwen3.5; 1.7B-14B students, 8B-32B teachers) and five agentic and code benchmarks, MAD-OPD ranks first across all six configurations; on the 14B+8B$\to$4B setting it lifts the agentic average by $+2.4\%$ and the code average by $+3.7\%$ over the stronger single-teacher OPD.

preprint2022arXiv

A Knowledge-Based Decision Support System for In Vitro Fertilization Treatment

In Vitro Fertilization (IVF) is the most widely used Assisted Reproductive Technology (ART). IVF usually involves controlled ovarian stimulation, oocyte retrieval, fertilization in the laboratory with subsequent embryo transfer. The first two steps correspond with follicular phase of females and ovulation in their menstrual cycle. Therefore, we refer to it as the treatment cycle in our paper. The treatment cycle is crucial because the stimulation medications in IVF treatment are applied directly on patients. In order to optimize the stimulation effects and lower the side effects of the stimulation medications, prompt treatment adjustments are in need. In addition, the quality and quantity of the retrieved oocytes have a significant effect on the outcome of the following procedures. To improve the IVF success rate, we propose a knowledge-based decision support system that can provide medical advice on the treatment protocol and medication adjustment for each patient visit during IVF treatment cycle. Our system is efficient in data processing and light-weighted which can be easily embedded into electronic medical record systems. Moreover, an oocyte retrieval oriented evaluation demonstrates that our system performs well in terms of accuracy of advice for the protocols and medications.

preprint2022arXiv

A Model-Agnostic Data Manipulation Method for Persona-based Dialogue Generation

Towards building intelligent dialogue agents, there has been a growing interest in introducing explicit personas in generation models. However, with limited persona-based dialogue data at hand, it may be difficult to train a dialogue generation model well. We point out that the data challenges of this generation task lie in two aspects: first, it is expensive to scale up current persona-based dialogue datasets; second, each data sample in this task is more complex to learn with than conventional dialogue data. To alleviate the above data issues, we propose a data manipulation method, which is model-agnostic to be packed with any persona-based dialogue generation model to improve its performance. The original training samples will first be distilled and thus expected to be fitted more easily. Next, we show various effective ways that can diversify such easier distilled data. A given base model will then be trained via the constructed data curricula, i.e. first on augmented distilled samples and then on original ones. Experiments illustrate the superiority of our method with two strong base dialogue models (Transformer encoder-decoder and GPT2).

preprint2022arXiv

COIN: Communication-Aware In-Memory Acceleration for Graph Convolutional Networks

Graph convolutional networks (GCNs) have shown remarkable learning capabilities when processing graph-structured data found inherently in many application areas. GCNs distribute the outputs of neural networks embedded in each vertex over multiple iterations to take advantage of the relations captured by the underlying graphs. Consequently, they incur a significant amount of computation and irregular communication overheads, which call for GCN-specific hardware accelerators. To this end, this paper presents a communication-aware in-memory computing architecture (COIN) for GCN hardware acceleration. Besides accelerating the computation using custom compute elements (CE) and in-memory computing, COIN aims at minimizing the intra- and inter-CE communication in GCN operations to optimize the performance and energy efficiency. Experimental evaluations with widely used datasets show up to 105x improvement in energy consumption compared to state-of-the-art GCN accelerator.

preprint2022arXiv

DeePKS+ABACUS as a Bridge between Expensive Quantum Mechanical Models and Machine Learning Potentials

Recently, the development of machine learning (ML) potentials has made it possible to perform large-scale and long-time molecular simulations with the accuracy of quantum mechanical (QM) models. However, for high-level QM methods, such as density functional theory (DFT) at the meta-GGA level and/or with exact exchange, quantum Monte Carlo, etc., generating a sufficient amount of data for training a ML potential has remained computationally challenging due to their high cost. In this work, we demonstrate that this issue can be largely alleviated with Deep Kohn-Sham (DeePKS), a ML-based DFT model. DeePKS employs a computationally efficient neural network-based functional model to construct a correction term added upon a cheap DFT model. Upon training, DeePKS offers closely-matched energies and forces compared with high-level QM method, but the number of training data required is orders of magnitude less than that required for training a reliable ML potential. As such, DeePKS can serve as a bridge between expensive QM models and ML potentials: one can generate a decent amount of high-accuracy QM data to train a DeePKS model, and then use the DeePKS model to label a much larger amount of configurations to train a ML potential. This scheme for periodic systems is implemented in a DFT package ABACUS, which is open-source and ready for use in various applications.

preprint2022arXiv

Dual-CLVSA: a Novel Deep Learning Approach to Predict Financial Markets with Sentiment Measurements

It is a challenging task to predict financial markets. The complexity of this task is mainly due to the interaction between financial markets and market participants, who are not able to keep rational all the time, and often affected by emotions such as fear and ecstasy. Based on the state-of-the-art approach particularly for financial market predictions, a hybrid convolutional LSTM Based variational sequence-to-sequence model with attention (CLVSA), we propose a novel deep learning approach, named dual-CLVSA, to predict financial market movement with both trading data and the corresponding social sentiment measurements, each through a separate sequence-to-sequence channel. We evaluate the performance of our approach with backtesting on historical trading data of SPDR SP 500 Trust ETF over eight years. The experiment results show that dual-CLVSA can effectively fuse the two types of data, and verify that sentiment measurements are not only informative for financial market predictions, but they also contain extra profitable features to boost the performance of our predicting system.

preprint2022arXiv

Interpretable Proof Generation via Iterative Backward Reasoning

We present IBR, an Iterative Backward Reasoning model to solve the proof generation tasks on rule-based Question Answering (QA), where models are required to reason over a series of textual rules and facts to find out the related proof path and derive the final answer. We handle the limitations of existed works in two folds: 1) enhance the interpretability of reasoning procedures with detailed tracking, by predicting nodes and edges in the proof path iteratively backward from the question; 2) promote the efficiency and accuracy via reasoning on the elaborate representations of nodes and history paths, without any intermediate texts that may introduce external noise during proof generation. There are three main modules in IBR, QA and proof strategy prediction to obtain the answer and offer guidance for the following procedure; parent node prediction to determine a node in the existing proof that a new child node will link to; child node prediction to find out which new node will be added to the proof. Experiments on both synthetic and paraphrased datasets demonstrate that IBR has better in-domain performance as well as cross-domain transferability than several strong baselines. Our code and models are available at https://github.com/find-knowledge/IBR .

preprint2022arXiv

Phrase-level Textual Adversarial Attack with Label Preservation

Generating high-quality textual adversarial examples is critical for investigating the pitfalls of natural language processing (NLP) models and further promoting their robustness. Existing attacks are usually realized through word-level or sentence-level perturbations, which either limit the perturbation space or sacrifice fluency and textual quality, both affecting the attack effectiveness. In this paper, we propose Phrase-Level Textual Adversarial aTtack (PLAT) that generates adversarial samples through phrase-level perturbations. PLAT first extracts the vulnerable phrases as attack targets by a syntactic parser, and then perturbs them by a pre-trained blank-infilling model. Such flexible perturbation design substantially expands the search space for more effective attacks without introducing too many modifications, and meanwhile maintaining the textual fluency and grammaticality via contextualized generation using surrounding texts. Moreover, we develop a label-preservation filter leveraging the likelihoods of language models fine-tuned on each class, rather than textual similarity, to rule out those perturbations that potentially alter the original class label for humans. Extensive experiments and human evaluation demonstrate that PLAT has a superior attack effectiveness as well as a better label consistency than strong baselines.

preprint2022arXiv

Swin-Pose: Swin Transformer Based Human Pose Estimation

Convolutional neural networks (CNNs) have been widely utilized in many computer vision tasks. However, CNNs have a fixed reception field and lack the ability of long-range perception, which is crucial to human pose estimation. Due to its capability to capture long-range dependencies between pixels, transformer architecture has been adopted to computer vision applications recently and is proven to be a highly effective architecture. We are interested in exploring its capability in human pose estimation, and thus propose a novel model based on transformer architecture, enhanced with a feature pyramid fusion structure. More specifically, we use pre-trained Swin Transformer as our backbone and extract features from input images, we leverage a feature pyramid structure to extract feature maps from different stages. By fusing the features together, our model predicts the keypoint heatmap. The experiment results of our study have demonstrated that the proposed transformer-based model can achieve better performance compared to the state-of-the-art CNN-based models.

preprint2021arXiv

CLVSA: A Convolutional LSTM Based Variational Sequence-to-Sequence Model with Attention for Predicting Trends of Financial Markets

Financial markets are a complex dynamical system. The complexity comes from the interaction between a market and its participants, in other words, the integrated outcome of activities of the entire participants determines the markets trend, while the markets trend affects activities of participants. These interwoven interactions make financial markets keep evolving. Inspired by stochastic recurrent models that successfully capture variability observed in natural sequential data such as speech and video, we propose CLVSA, a hybrid model that consists of stochastic recurrent networks, the sequence-to-sequence architecture, the self- and inter-attention mechanism, and convolutional LSTM units to capture variationally underlying features in raw financial trading data. Our model outperforms basic models, such as convolutional neural network, vanilla LSTM network, and sequence-to-sequence model with attention, based on backtesting results of six futures from January 2010 to December 2017. Our experimental results show that, by introducing an approximate posterior, CLVSA takes advantage of an extra regularizer based on the Kullback-Leibler divergence to prevent itself from overfitting traps.

preprint2021arXiv

Colorectal Polyp Detection in Real-world Scenario: Design and Experiment Study

Colorectal polyps are abnormal tissues growing on the intima of the colon or rectum with a high risk of developing into colorectal cancer, the third leading cause of cancer death worldwide. Early detection and removal of colon polyps via colonoscopy have proved to be an effective approach to prevent colorectal cancer. Recently, various CNN-based computer-aided systems have been developed to help physicians detect polyps. However, these systems do not perform well in real-world colonoscopy operations due to the significant difference between images in a real colonoscopy and those in the public datasets. Unlike the well-chosen clear images with obvious polyps in the public datasets, images from a colonoscopy are often blurry and contain various artifacts such as fluid, debris, bubbles, reflection, specularity, contrast, saturation, and medical instruments, with a wide variety of polyps of different sizes, shapes, and textures. All these factors pose a significant challenge to effective polyp detection in a colonoscopy. To this end, we collect a private dataset that contains 7,313 images from 224 complete colonoscopy procedures. This dataset represents realistic operation scenarios and thus can be used to better train the models and evaluate a system's performance in practice. We propose an integrated system architecture to address the unique challenges for polyp detection. Extensive experiments results show that our system can effectively detect polyps in a colonoscopy with excellent performance in real time.

preprint2021arXiv

Complexity of randomized algorithms for underdamped Langevin dynamics

We establish an information complexity lower bound of randomized algorithms for simulating underdamped Langevin dynamics. More specifically, we prove that the worst $L^2$ strong error is of order $Ω(\sqrt{d}\, N^{-3/2})$, for solving a family of $d$-dimensional underdamped Langevin dynamics, by any randomized algorithm with only $N$ queries to $\nabla U$, the driving Brownian motion and its weighted integration, respectively. The lower bound we establish matches the upper bound for the randomized midpoint method recently proposed by Shen and Lee [NIPS 2019], in terms of both parameters $N$ and $d$.

preprint2021arXiv

Financial Markets Prediction with Deep Learning

Financial markets are difficult to predict due to its complex systems dynamics. Although there have been some recent studies that use machine learning techniques for financial markets prediction, they do not offer satisfactory performance on financial returns. We propose a novel one-dimensional convolutional neural networks (CNN) model to predict financial market movement. The customized one-dimensional convolutional layers scan financial trading data through time, while different types of data, such as prices and volume, share parameters (kernels) with each other. Our model automatically extracts features instead of using traditional technical indicators and thus can avoid biases caused by selection of technical indicators and pre-defined coefficients in technical indicators. We evaluate the performance of our prediction model with strictly backtesting on historical trading data of six futures from January 2010 to October 2017. The experiment results show that our CNN model can effectively extract more generalized and informative features than traditional technical indicators, and achieves more robust and profitable financial performance than previous machine learning approaches.

preprint2021arXiv

Towards Efficiently Diversifying Dialogue Generation via Embedding Augmentation

Dialogue generation models face the challenge of producing generic and repetitive responses. Unlike previous augmentation methods that mostly focus on token manipulation and ignore the essential variety within a single sample using hard labels, we propose to promote the generation diversity of the neural dialogue models via soft embedding augmentation along with soft labels in this paper. Particularly, we select some key input tokens and fuse their embeddings together with embeddings from their semantic-neighbor tokens. The new embeddings serve as the input of the model to replace the original one. Besides, soft labels are used in loss calculation, resulting in multi-target supervision for a given input. Our experimental results on two datasets illustrate that our proposed method is capable of generating more diverse responses than raw models while remains a similar n-gram accuracy that ensures the quality of generated responses.

preprint2020arXiv

3D Aggregated Faster R-CNN for General Lesion Detection

Lesions are damages and abnormalities in tissues of the human body. Many of them can later turn into fatal diseases such as cancers. Detecting lesions are of great importance for early diagnosis and timely treatment. To this end, Computed Tomography (CT) scans often serve as the screening tool, allowing us to leverage the modern object detection techniques to detect the lesions. However, lesions in CT scans are often small and sparse. The local area of lesions can be very confusing, leading the region based classifier branch of Faster R-CNN easily fail. Therefore, most of the existing state-of-the-art solutions train two types of heterogeneous networks (multi-phase) separately for the candidate generation and the False Positive Reduction (FPR) purposes. In this paper, we enforce an end-to-end 3D Aggregated Faster R-CNN solution by stacking an "aggregated classifier branch" on the backbone of RPN. This classifier branch is equipped with Feature Aggregation and Local Magnification Layers to enhance the classifier branch. We demonstrate our model can achieve the state of the art performance on both LUNA16 and DeepLesion dataset. Especially, we achieve the best single-model FROC performance on LUNA16 with the inference time being 4.2s per processed scan.

preprint2020arXiv

A Deep Reinforcement Learning Approach to Multi-component Job Scheduling in Edge Computing

We are interested in the optimal scheduling of a collection of multi-component application jobs in an edge computing system that consists of geo-distributed edge computing nodes connected through a wide area network. The scheduling and placement of application jobs in an edge system is challenging due to the interdependence of multiple components of each job, and the communication delays between the geographically distributed data sources and edge nodes and their dynamic availability. In this paper we explore the feasibility of applying Deep Reinforcement Learning (DRL) based design to address these challenges. We introduce a DRL actor-critic algorithm that aims to find an optimal scheduling policy to minimize average job slowdown in the edge system. We have demonstrated through simulations that our design outperforms a few existing algorithms, based on both synthetic data and a Google cloud data trace.

preprint2020arXiv

A Progressive Sub-Network Searching Framework for Dynamic Inference

Many techniques have been developed, such as model compression, to make Deep Neural Networks (DNNs) inference more efficiently. Nevertheless, DNNs still lack excellent run-time dynamic inference capability to enable users trade-off accuracy and computation complexity (i.e., latency on target hardware) after model deployment, based on dynamic requirements and environments. Such research direction recently draws great attention, where one realization is to train the target DNN through a multiple-term objective function, which consists of cross-entropy terms from multiple sub-nets. Our investigation in this work show that the performance of dynamic inference highly relies on the quality of sub-net sampling. With objective to construct a dynamic DNN and search multiple high quality sub-nets with minimal searching cost, we propose a progressive sub-net searching framework, which is embedded with several effective techniques, including trainable noise ranking, channel group and fine-tuning threshold setting, sub-nets re-selection. The proposed framework empowers the target DNN with better dynamic inference capability, which outperforms prior works on both CIFAR-10 and ImageNet dataset via comprehensive experiments on different network structures. Taken ResNet18 as an example, our proposed method achieves much better dynamic inference accuracy compared with prior popular Universally-Slimmable-Network by 4.4%-maximally and 2.3%-averagely in ImageNet dataset with the same model size.

preprint2020arXiv

Algebraic Bounds on the Rayleigh-Bénard attractor

The Rayleigh-Bénard system with stress-free boundary conditions is shown to have a global attractor in each affine space where velocity has fixed spatial average. The physical problem is shown to be equivalent to one with periodic boundary conditions and certain symmetries. This enables a Gronwall estimate on enstrophy. That estimate is then used to bound the $L^2$ norm of the temperature gradient on the global attractor, which, in turn, is used to find a bounding region for the attractor in the enstrophy, palinstrophy-plane. All final bounds are algebraic in the viscosity and thermal diffusivity, a significant improvement over previously established estimates. The sharpness of the bounds are tested with numerical simulations.

preprint2020arXiv

Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images

High-fidelity clothing reconstruction is the key to achieving photorealism in a wide range of applications including human digitization, virtual try-on, etc. Recent advances in learning-based approaches have accomplished unprecedented accuracy in recovering unclothed human shape and pose from single images, thanks to the availability of powerful statistical models, e.g. SMPL, learned from a large number of body scans. In contrast, modeling and recovering clothed human and 3D garments remains notoriously difficult, mostly due to the lack of large-scale clothing models available for the research community. We propose to fill this gap by introducing Deep Fashion3D, the largest collection to date of 3D garment models, with the goal of establishing a novel benchmark and dataset for the evaluation of image-based garment reconstruction systems. Deep Fashion3D contains 2078 models reconstructed from real garments, which covers 10 different categories and 563 garment instances. It provides rich annotations including 3D feature lines, 3D body pose and the corresponded multi-view real images. In addition, each garment is randomly posed to enhance the variety of real clothing deformations. To demonstrate the advantage of Deep Fashion3D, we propose a novel baseline approach for single-view garment reconstruction, which leverages the merits of both mesh and implicit representations. A novel adaptable template is proposed to enable the learning of all types of clothing in a single network. Extensive experiments have been conducted on the proposed dataset to verify its significance and usefulness. We will make Deep Fashion3D publicly available upon publication.

preprint2020arXiv

Identification of New Assembly Mode in the Heliconical Nematic Phase via Tender Resonant X-ray Scattering

Helical structures are exciting and are utilized in numerous applications ranging from biotechnology to displays to medicine. Accurate description and understanding of resonance effects in helical structures provides crucial knowledge on molecular packing beyond positional ordering. We exam-ined the manifestation of resonance effects in a nematic phase with heliconical structure, the so called twist bend nematic (NTB) via tender resonant X-ray scattering (TReXS) at the sulfur K-edge. We demonstrate for the first time quantitatively that the energy dependence of the scattering peak in the NTB phase follows the energy dependence of the complex refractive indices measured by X-ray absorption. This allows us to identify a new self-assembly mode for specific sets of liquid crystal dimers in the NTB phase. We anticipate that new avenues in the exploration of complex orientational structures both in static as well as in dynamic modes induced by external stimuli will be pursued.

preprint2020arXiv

Pseudo-Labeling for Small Lesion Detection on Diabetic Retinopathy Images

Diabetic retinopathy (DR) is a primary cause of blindness in working-age people worldwide. About 3 to 4 million people with diabetes become blind because of DR every year. Diagnosis of DR through color fundus images is a common approach to mitigate such problem. However, DR diagnosis is a difficult and time consuming task, which requires experienced clinicians to identify the presence and significance of many small features on high resolution images. Convolutional Neural Network (CNN) has proved to be a promising approach for automatic biomedical image analysis recently. In this work, we investigate lesion detection on DR fundus images with CNN-based object detection methods. Lesion detection on fundus images faces two unique challenges. The first one is that our dataset is not fully labeled, i.e., only a subset of all lesion instances are marked. Not only will these unlabeled lesion instances not contribute to the training of the model, but also they will be mistakenly counted as false negatives, leading the model move to the opposite direction. The second challenge is that the lesion instances are usually very small, making them difficult to be found by normal object detectors. To address the first challenge, we introduce an iterative training algorithm for the semi-supervised method of pseudo-labeling, in which a considerable number of unlabeled lesion instances can be discovered to boost the performance of the lesion detector. For the small size targets problem, we extend both the input size and the depth of feature pyramid network (FPN) to produce a large CNN feature map, which can preserve the detail of small lesions and thus enhance the effectiveness of the lesion detector. The experimental results show that our proposed methods significantly outperform the baselines.

preprint2020arXiv

Retinopathy of Prematurity Stage Diagnosis Using Object Segmentation and Convolutional Neural Networks

Retinopathy of Prematurity (ROP) is an eye disorder primarily affecting premature infants with lower weights. It causes proliferation of vessels in the retina and could result in vision loss and, eventually, retinal detachment, leading to blindness. While human experts can easily identify severe stages of ROP, the diagnosis of earlier stages, which are the most relevant to determining treatment choice, are much more affected by variability in subjective interpretations of human experts. In recent years, there has been a significant effort to automate the diagnosis using deep learning. This paper builds upon the success of previous models and develops a novel architecture, which combines object segmentation and convolutional neural networks (CNN) to construct an effective classifier of ROP stages 1-3 based on neonatal retinal images. Motivated by the fact that the formation and shape of a demarcation line in the retina is the distinguishing feature between earlier ROP stages, our proposed system first trains an object segmentation model to identify the demarcation line at a pixel level and adds the resulting mask as an additional "color" channel in the original image. Then, the system trains a CNN classifier based on the processed images to leverage information from both the original image and the mask, which helps direct the model's attention to the demarcation line. In a number of careful experiments comparing its performance to previous object segmentation systems and CNN-only systems trained on our dataset, our novel architecture significantly outperforms previous systems in accuracy, demonstrating the effectiveness of our proposed pipeline.

preprint2020arXiv

Unsupervised Domain Adaptation on Reading Comprehension

Reading comprehension (RC) has been studied in a variety of datasets with the boosted performance brought by deep neural networks. However, the generalization capability of these models across different domains remains unclear. To alleviate this issue, we are going to investigate unsupervised domain adaptation on RC, wherein a model is trained on labeled source domain and to be applied to the target domain with only unlabeled samples. We first show that even with the powerful BERT contextual representation, the performance is still unsatisfactory when the model trained on one dataset is directly applied to another target dataset. To solve this, we provide a novel conditional adversarial self-training method (CASe). Specifically, our approach leverages a BERT model fine-tuned on the source dataset along with the confidence filtering to generate reliable pseudo-labeled samples in the target domain for self-training. On the other hand, it further reduces domain distribution discrepancy through conditional adversarial learning across domains. Extensive experiments show our approach achieves comparable accuracy to supervised models on multiple large-scale benchmark datasets.

preprint2016arXiv

DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment

Worldwide, in 2014, more than 1.9 billion adults, 18 years and older, were overweight. Of these, over 600 million were obese. Accurately documenting dietary caloric intake is crucial to manage weight loss, but also presents challenges because most of the current methods for dietary assessment must rely on memory to recall foods eaten. The ultimate goal of our research is to develop computer-aided technical solutions to enhance and improve the accuracy of current measurements of dietary intake. Our proposed system in this paper aims to improve the accuracy of dietary assessment by analyzing the food images captured by mobile devices (e.g., smartphone). The key technique innovation in this paper is the deep learning-based food image recognition algorithms. Substantial research has demonstrated that digital imaging accurately estimates dietary intake in many environments and it has many advantages over other methods. However, how to derive the food information (e.g., food type and portion size) from food image effectively and efficiently remains a challenging and open research problem. We propose a new Convolutional Neural Network (CNN)-based food image recognition algorithm to address this problem. We applied our proposed approach to two real-world food image data sets (UEC-256 and Food-101) and achieved impressive results. To the best of our knowledge, these results outperformed all other reported work using these two data sets. Our experiments have demonstrated that the proposed approach is a promising solution for addressing the food image recognition problem. Our future work includes further improving the performance of the algorithms and integrating our system into a real-world mobile and cloud computing-based system to enhance the accuracy of current measurements of dietary intake.

preprint2016arXiv

Explore Stochastic Instabilities of Periodic Points by Transition Path Theory

We consider the noise-induced transitions in the randomly perturbed discrete logistic map from a linearly stable periodic orbit consisting of T periodic points. The traditional large deviation theory and asymptotic analysis for small noise limit as well as the derived quasi-potential can not distinguish the quantitative difference in noise-induced stochastic instabilities of these T periodic points. We generalize the transition path theory to the discrete-time continuous-space stochastic process to attack this problem. As a first criterion of quantifying the relative instability among T periodic points, we compare the distribution of the last passage locations in the transitions from the whole periodic orbit to a prescribed set far away. This distribution is related to the contributions to the transition rate from each periodic points. The second criterion is based on the competency of the transition paths associated with each periodic point. Both criteria utilise the reactive probability current in the transition path theory. Our numerical results for the logistic map reveal the transition mechanism of escaping from the stable periodic orbit and identify which peri- odic point is more prone to lose stability so as to make successful transitions under random perturbations.

preprint2016arXiv

Reducing the Model Order of Deep Neural Networks Using Information Theory

Deep neural networks are typically represented by a much larger number of parameters than shallow models, making them prohibitive for small footprint devices. Recent research shows that there is considerable redundancy in the parameter space of deep neural networks. In this paper, we propose a method to compress deep neural networks by using the Fisher Information metric, which we estimate through a stochastic optimization method that keeps track of second-order information in the network. We first remove unimportant parameters and then use non-uniform fixed point quantization to assign more bits to parameters with higher Fisher Information estimates. We evaluate our method on a classification task with a convolutional neural network trained on the MNIST data set. Experimental results show that our method outperforms existing methods for both network pruning and quantization.

preprint2015arXiv

Optimization of Unequal Error Protection Rateless Codes for Multimedia Multicasting

Rateless codes have been shown to be able to provide greater flexibility and efficiency than fixed-rate codes for multicast applications. In the following, we optimize rateless codes for unequal error protection (UEP) for multimedia multicasting to a set of heterogeneous users. The proposed designs have the objectives of providing either guaranteed or best-effort quality of service (QoS). A randomly interleaved rateless encoder is proposed whereby users only need to decode symbols up to their own QoS level. The proposed coder is optimized based on measured transmission properties of standardized raptor codes over wireless channels. It is shown that a guaranteed QoS problem formulation can be transformed into a convex optimization problem, yielding a globally optimal solution. Numerical results demonstrate that the proposed optimized random interleaved UEP rateless coder's performance compares favorably with that of other recently proposed UEP rateless codes.

preprint2015arXiv

QoE Optimization of Video Multicast with Heterogeneous Channels and Playback Requirements

We propose an application-layer forward error correction (AL-FEC) code rate allocation scheme to maximize the quality of experience (QoE) of a video multicast. The allocation dynamically assigns multicast clients to the quality layers of a scalable video bitstream, based on their heterogeneous channel qualities and video playback capabilities. Normalized mean opinion score (NMOS) is employed to value the client's quality of experience across various possible adaptations of a multilayer video, coded using mixed spatial-temporal-amplitude scalability. The scheme provides assurance of reception of the video layers using fountain coding and effectively allocates coding rates across the layers to maximize a multicast utility measure. An advantageous feature of the proposed scheme is that the complexity of the optimization is independent of the number of clients. Additionally, a convex formulation is proposed that attains close to the best performance and offers a reliable alternative when further reduction in computational complexity is desired. The optimization is extended to perform suppression of QoE fluctuations for clients with marginal channel qualities. The scheme offers a means to trade-off service utility for the entire multicast group and clients with the worst channels. According to the simulation results, the proposed optimization framework is robust against source rate variations and limited amount of client feedback.

preprint2015arXiv

Unsupervised Cross-Domain Recognition by Identifying Compact Joint Subspaces

This paper introduces a new method to solve the cross-domain recognition problem. Different from the traditional domain adaption methods which rely on a global domain shift for all classes between source and target domain, the proposed method is more flexible to capture individual class variations across domains. By adopting a natural and widely used assumption -- "the data samples from the same class should lay on a low-dimensional subspace, even if they come from different domains", the proposed method circumvents the limitation of the global domain shift, and solves the cross-domain recognition by finding the compact joint subspaces of source and target domain. Specifically, given labeled samples in source domain, we construct subspaces for each of the classes. Then we construct subspaces in the target domain, called anchor subspaces, by collecting unlabeled samples that are close to each other and highly likely all fall into the same class. The corresponding class label is then assigned by minimizing a cost function which reflects the overlap and topological structure consistency between subspaces across source and target domains, and within anchor subspaces, respectively.We further combine the anchor subspaces to corresponding source subspaces to construct the compact joint subspaces. Subsequently, one-vs-rest SVM classifiers are trained in the compact joint subspaces and applied to unlabeled data in the target domain. We evaluate the proposed method on two widely used datasets: object recognition dataset for computer vision tasks, and sentiment classification dataset for natural language processing tasks. Comparison results demonstrate that the proposed method outperforms the comparison methods on both datasets.

preprint2013arXiv

Elongation of energy exchange between femtosecond laser pulses via plasma formation in air

We experimentally demonstrate energy exchange between a delay-tuned femtosecond beam and two delay-fixed ones as they spatiotemporally overlapped and experienced filamentation in air. The energy exchange process in the relative time delay is dramatically elongated up to 40 ps in the presence of plasma grating, indicating that filamentary beams coupling may be an effective method for filament control.

preprint2013arXiv

Hyper-Graph Based Database Partitioning for Transactional Workloads

A common approach to scaling transactional databases in practice is horizontal partitioning, which increases system scalability, high availability and self-manageability. Usu- ally it is very challenging to choose or design an optimal partitioning scheme for a given workload and database. In this technical report, we propose a fine-grained hyper-graph based database partitioning system for transactional work- loads. The partitioning system takes a database, a workload, a node cluster and partitioning constraints as input and out- puts a lookup-table encoding the final database partitioning decision. The database partitioning problem is modeled as a multi-constraints hyper-graph partitioning problem. By deriving a min-cut of the hyper-graph, our system can min- imize the total number of distributed transactions in the workload, balance the sizes and workload accesses of the partitions and satisfy all the partition constraints imposed. Our system is highly interactive as it allows users to im- pose partition constraints, watch visualized partitioning ef- fects, and provide feedback based on human expertise and indirect domain knowledge for generating better partition- ing schemes.

preprint2012arXiv

Optimization of Analytic Window Functions

Analytic functions represent the state-of-the-art way of performing complex data analysis within a single SQL statement. In particular, an important class of analytic functions that has been frequently used in commercial systems to support OLAP and decision support applications is the class of window functions. A window function returns for each input tuple a value derived from applying a function over a window of neighboring tuples. However, existing window function evaluation approaches are based on a naive sorting scheme. In this paper, we study the problem of optimizing the evaluation of window functions. We propose several efficient techniques, and identify optimization opportunities that allow us to optimize the evaluation of a set of window functions. We have integrated our scheme into PostgreSQL. Our comprehensive experimental study on the TPC-DS datasets as well as synthetic datasets and queries demonstrate significant speedup over existing approaches.

preprint2010arXiv

Polarization-Engineering in III-V Nitride Heterostructures: New Opportunities For Device Design

The role of spontaneous and piezoelectric polarization in III-V nitride heterostructure devices is discussed. Problems as well as opportunities in incorporating polarization in abrupt and graded heterojunctions composed of binary, ternary, and quaternary nitrides are outlined.

preprint1997arXiv

The High Resolution IRAS Galaxy Atlas

An atlas of the Galactic plane (-4.7 deg < b < 4.7 deg) plus the molecular clouds in Orion, Rho Oph, and Taurus-Auriga has been produced at 60 and 100 micron from IRAS data. The Atlas consists of resolution-enhanced coadded images having 1 arcmin -- 2 arcmin resolution as well as coadded images at the native IRAS resolution. The IRAS Galaxy Atlas, together with the DRAO HI line / 21 cm continuum and FCRAO CO (1-0) line Galactic plane surveys, both with similar (approx. 1 arcmin) resolution, provide a powerful venue for studying the interstellar medium, star formation and large scale structure in our Galaxy. This paper documents the production and characteristics of the Atlas.

Yu Cao

What is connected

Connect this record

See the researcher in context

Building this map preview

34 published item(s)

MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate

A Knowledge-Based Decision Support System for In Vitro Fertilization Treatment

A Model-Agnostic Data Manipulation Method for Persona-based Dialogue Generation

COIN: Communication-Aware In-Memory Acceleration for Graph Convolutional Networks

DeePKS+ABACUS as a Bridge between Expensive Quantum Mechanical Models and Machine Learning Potentials

Dual-CLVSA: a Novel Deep Learning Approach to Predict Financial Markets with Sentiment Measurements

Interpretable Proof Generation via Iterative Backward Reasoning

Phrase-level Textual Adversarial Attack with Label Preservation

Swin-Pose: Swin Transformer Based Human Pose Estimation

CLVSA: A Convolutional LSTM Based Variational Sequence-to-Sequence Model with Attention for Predicting Trends of Financial Markets

Colorectal Polyp Detection in Real-world Scenario: Design and Experiment Study

Complexity of randomized algorithms for underdamped Langevin dynamics

Financial Markets Prediction with Deep Learning

Towards Efficiently Diversifying Dialogue Generation via Embedding Augmentation

3D Aggregated Faster R-CNN for General Lesion Detection

A Deep Reinforcement Learning Approach to Multi-component Job Scheduling in Edge Computing

A Progressive Sub-Network Searching Framework for Dynamic Inference

Algebraic Bounds on the Rayleigh-Bénard attractor

Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images

Identification of New Assembly Mode in the Heliconical Nematic Phase via Tender Resonant X-ray Scattering

Pseudo-Labeling for Small Lesion Detection on Diabetic Retinopathy Images

Retinopathy of Prematurity Stage Diagnosis Using Object Segmentation and Convolutional Neural Networks

Unsupervised Domain Adaptation on Reading Comprehension

DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment

Explore Stochastic Instabilities of Periodic Points by Transition Path Theory

Reducing the Model Order of Deep Neural Networks Using Information Theory

Optimization of Unequal Error Protection Rateless Codes for Multimedia Multicasting

QoE Optimization of Video Multicast with Heterogeneous Channels and Playback Requirements

Unsupervised Cross-Domain Recognition by Identifying Compact Joint Subspaces

Elongation of energy exchange between femtosecond laser pulses via plasma formation in air

Hyper-Graph Based Database Partitioning for Transactional Workloads

Optimization of Analytic Window Functions

Polarization-Engineering in III-V Nitride Heterostructures: New Opportunities For Device Design

The High Resolution IRAS Galaxy Atlas