Source author record

C. L. Philip Chen

C. L. Philip Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Cryptography and Security Machine Learning Artificial Intelligence eess.AS Information Retrieval Information Theory math.IT Multimedia Sound

Catalog footprint

What is connected

9works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MobileGeo: Exploring Hierarchical Knowledge Distillation for Resource-Efficient Cross-view Drone Geo-Localization

Cross-view geo-localization (CVGL) plays a vital role in drone-based multimedia applications, enabling precise localization by matching drone-captured aerial images against geo-tagged satellite databases in GNSS-denied environments. However, existing methods rely on resource-intensive feature alignment and multi-branch architectures, incurring high inference costs that limit their deployment on edge devices. We propose MobileGeo, a mobile-friendly framework designed for efficient on-device CVGL: 1) During training, a Hierarchical Distillation (HD-CVGL) paradigm, coupled with Uncertainty-Aware Prediction Alignment (UAPA), distills essential information into a compact model without incurring inference overhead. 2) During inference, an efficient Multi-view Selection Refinement Module (MSRM) leverages mutual information to filter redundant views and reduce computational load. Extensive experiments demonstrate that MobileGeo outperforms previous state-of-the-art methods, achieving a 4.19% improvement in AP on University1652 dataset while being over 5 times efficient in FLOPs and 3 times faster. Crucially, MobileGeo runs at 251.5 FPS on an NVIDIA AGX Orin edge device, demonstrating its practical viability for real-time on-device drone geo-localization. The code is available at https://github.com/SkyEyeLoc/MobileGeo.

preprint2022arXiv

A Novel Multi-Task Learning Method for Symbolic Music Emotion Recognition

Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML. Previous work mainly focused on learning better representation via (mask) language model pre-training but ignored the intrinsic structure of the music, which is extremely important to the emotional expression of music. In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks derived from the intrinsic structure of the music. The results show that our multi-task framework can be adapted to different models. Moreover, the labels of auxiliary tasks are easy to be obtained, which means our multi-task methods do not require manually annotated labels other than emotion. Conducting on two publicly available datasets (EMOPIA and VGMIDI), the experiments show that our methods perform better in SMER task. Specifically, accuracy has been increased by 4.17 absolute point to 67.58 in EMOPIA dataset, and 1.97 absolute point to 55.85 in VGMIDI dataset. Ablation studies also show the effectiveness of multi-task methods designed in this paper.

preprint2022arXiv

A Survey on Masked Facial Detection Methods and Datasets for Fighting Against COVID-19

Coronavirus disease 2019 (COVID-19) continues to pose a great challenge to the world since its outbreak. To fight against the disease, a series of artificial intelligence (AI) techniques are developed and applied to real-world scenarios such as safety monitoring, disease diagnosis, infection risk assessment, lesion segmentation of COVID-19 CT scans,etc. The coronavirus epidemics have forced people wear masks to counteract the transmission of virus, which also brings difficulties to monitor large groups of people wearing masks. In this paper, we primarily focus on the AI techniques of masked facial detection and related datasets. We survey the recent advances, beginning with the descriptions of masked facial detection datasets. Thirteen available datasets are described and discussed in details. Then, the methods are roughly categorized into two classes: conventional methods and neural network-based methods. Conventional methods are usually trained by boosting algorithms with hand-crafted features, which accounts for a small proportion. Neural network-based methods are further classified as three parts according to the number of processing stages. Representative algorithms are described in detail, coupled with some typical techniques that are described briefly. Finally, we summarize the recent benchmarking results, give the discussions on the limitations of datasets and methods, and expand future research directions. To our knowledge, this is the first survey about masked facial detection methods and datasets. Hopefully our survey could provide some help to fight against epidemics.

preprint2022arXiv

Multi-party Secure Broad Learning System for Privacy Preserving

Multi-party learning is an indispensable technique for improving the learning performance via integrating data from multiple parties. Unfortunately, directly integrating multi-party data would not meet the privacy preserving requirements. Therefore, Privacy-Preserving Machine Learning (PPML) becomes a key research task in multi-party learning. In this paper, we present a new PPML method based on secure multi-party interactive protocol, namely Multi-party Secure Broad Learning System (MSBLS), and derive security analysis of the method. The existing PPML methods generally cannot simultaneously meet multiple requirements such as security, accuracy, efficiency and application scope, but MSBLS achieves satisfactory results in these aspects. It uses interactive protocol and random mapping to generate the mapped features of data, and then uses efficient broad learning to train neural network classifier. This is the first privacy computing method that combines secure multi-party computing and neural network. Theoretically, this method can ensure that the accuracy of the model will not be reduced due to encryption, and the calculation speed is very fast. We verify this conclusion on three classical datasets.

preprint2022arXiv

OneDConv: Generalized Convolution For Transform-Invariant Representation

Convolutional Neural Networks (CNNs) have exhibited their great power in a variety of vision tasks. However, the lack of transform-invariant property limits their further applications in complicated real-world scenarios. In this work, we proposed a novel generalized one dimension convolutional operator (OneDConv), which dynamically transforms the convolution kernels based on the input features in a computationally and parametrically efficient manner. The proposed operator can extract the transform-invariant features naturally. It improves the robustness and generalization of convolution without sacrificing the performance on common images. The proposed OneDConv operator can substitute the vanilla convolution, thus it can be incorporated into current popular convolutional architectures and trained end-to-end readily. On several popular benchmarks, OneDConv outperforms the original convolution operation and other proposed models both in canonical and distorted images.

preprint2022arXiv

Siamese Labels Auxiliary Learning

In deep learning, auxiliary training has been widely used to assist the training of models. During the training phase, using auxiliary modules to assist training can improve the performance of the model. During the testing phase, auxiliary modules can be removed, so the test parameters are not increased. In this paper, we propose a novel auxiliary training method, Siamese Labels Auxiliary Learning (SiLa). Unlike Deep Mutual Learning (DML), SiLa emphasizes auxiliary learning and can be easily combined with DML. In general, the main work of this paper include: (1) propose SiLa Learning, which improves the performance of common models without increasing test parameters; (2) compares SiLa with DML and proves that SiLa can improve the generalization of the model; (3) SiLa is applied to Dynamic Neural Networks, and proved that SiLa can be used for various types of network structures.

preprint2022arXiv

Stacked BNAS: Rethinking Broad Convolutional Neural Network for Neural Architecture Search

Different from other deep scalable architecture-based NAS approaches, Broad Neural Architecture Search (BNAS) proposes a broad scalable architecture which consists of convolution and enhancement blocks, dubbed Broad Convolutional Neural Network (BCNN), as the search space for amazing efficiency improvement. BCNN reuses the topologies of cells in the convolution block so that BNAS can employ few cells for efficient search. Moreover, multi-scale feature fusion and knowledge embedding are proposed to improve the performance of BCNN with shallow topology. However, BNAS suffers some drawbacks: 1) insufficient representation diversity for feature fusion and enhancement and 2) time consumption of knowledge embedding design by human experts. This paper proposes Stacked BNAS, whose search space is a developed broad scalable architecture named Stacked BCNN, with better performance than BNAS. On the one hand, Stacked BCNN treats mini BCNN as a basic block to preserve comprehensive representation and deliver powerful feature extraction ability. For multi-scale feature enhancement, each mini BCNN feeds the outputs of deep and broad cells to the enhancement cell. For multi-scale feature fusion, each mini BCNN feeds the outputs of deep, broad and enhancement cells to the output node. On the other hand, Knowledge Embedding Search (KES) is proposed to learn appropriate knowledge embeddings in a differentiable way. Moreover, the basic unit of KES is an over-parameterized knowledge embedding module that consists of all possible candidate knowledge embeddings. Experimental results show that 1) Stacked BNAS obtains better performance than BNAS-v2 on both CIFAR-10 and ImageNet, 2) the proposed KES algorithm contributes to reducing the parameters of the learned architecture with satisfactory performance, and 3) Stacked BNAS delivers a state-of-the-art efficiency of 0.02 GPU days.

preprint2020arXiv

Modal Regression based Structured Low-rank Matrix Recovery for Multi-view Learning

Low-rank Multi-view Subspace Learning (LMvSL) has shown great potential in cross-view classification in recent years. Despite their empirical success, existing LMvSL based methods are incapable of well handling view discrepancy and discriminancy simultaneously, which thus leads to the performance degradation when there is a large discrepancy among multi-view data. To circumvent this drawback, motivated by the block-diagonal representation learning, we propose Structured Low-rank Matrix Recovery (SLMR), a unique method of effectively removing view discrepancy and improving discriminancy through the recovery of structured low-rank matrix. Furthermore, recent low-rank modeling provides a satisfactory solution to address data contaminated by predefined assumptions of noise distribution, such as Gaussian or Laplacian distribution. However, these models are not practical since complicated noise in practice may violate those assumptions and the distribution is generally unknown in advance. To alleviate such limitation, modal regression is elegantly incorporated into the framework of SLMR (term it MR-SLMR). Different from previous LMvSL based methods, our MR-SLMR can handle any zero-mode noise variable that contains a wide range of noise, such as Gaussian noise, random noise and outliers. The alternating direction method of multipliers (ADMM) framework and half-quadratic theory are used to efficiently optimize MR-SLMR. Experimental results on four public databases demonstrate the superiority of MR-SLMR and its robustness to complicated noise.

preprint2012arXiv

A Novel Latin Square Image Cipher

In this paper, we introduce a symmetric-key Latin square image cipher (LSIC) for grayscale and color images. Our contributions to the image encryption community include 1) we develop new Latin square image encryption primitives including Latin Square Whitening, Latin Square S-box and Latin Square P-box ; 2) we provide a new way of integrating probabilistic encryption in image encryption by embedding random noise in the least significant image bit-plane; and 3) we construct LSIC with these Latin square image encryption primitives all on one keyed Latin square in a new loom-like substitution-permutation network. Consequently, the proposed LSIC achieve many desired properties of a secure cipher including a large key space, high key sensitivities, uniformly distributed ciphertext, excellent confusion and diffusion properties, semantically secure, and robustness against channel noise. Theoretical analysis show that the LSIC has good resistance to many attack models including brute-force attacks, ciphertext-only attacks, known-plaintext attacks and chosen-plaintext attacks. Experimental analysis under extensive simulation results using the complete USC-SIPI Miscellaneous image dataset demonstrate that LSIC outperforms or reach state of the art suggested by many peer algorithms. All these analysis and results demonstrate that the LSIC is very suitable for digital image encryption. Finally, we open source the LSIC MATLAB code under webpage https://sites.google.com/site/tuftsyuewu/source-code.

C. L. Philip Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

MobileGeo: Exploring Hierarchical Knowledge Distillation for Resource-Efficient Cross-view Drone Geo-Localization

A Novel Multi-Task Learning Method for Symbolic Music Emotion Recognition

A Survey on Masked Facial Detection Methods and Datasets for Fighting Against COVID-19

Multi-party Secure Broad Learning System for Privacy Preserving

OneDConv: Generalized Convolution For Transform-Invariant Representation

Siamese Labels Auxiliary Learning

Stacked BNAS: Rethinking Broad Convolutional Neural Network for Neural Architecture Search

Modal Regression based Structured Low-rank Matrix Recovery for Multi-view Learning

A Novel Latin Square Image Cipher