Source author record

Xiao Han

Xiao Han appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

26works

29topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Comprehensive Survey of Website Fingerprinting Attacks and Defenses in Tor: Advances and Open Challenges

The Tor network provides users with strong anonymity by routing their internet traffic through multiple relays. While Tor encrypts traffic and hides IP addresses, it remains vulnerable to traffic analysis attacks such as the website fingerprinting (WF) attack, achieving increasingly high fingerprinting accuracy even under open-world conditions. In response, researchers have proposed a variety of defenses, ranging from adaptive padding, traffic regularization, and traffic morphing to adversarial perturbation, that seek to obfuscate or reshape traffic traces. However, these defenses often entail trade-offs between privacy, usability, and system performance. Despite extensive research, a comprehensive survey unifying WF datasets, attack methodologies, and defense strategies remains absent. This paper fills that gap by systematically categorizing existing WF research into three key domains: datasets, attack models, and defense mechanisms. We provide an in-depth comparative analysis of techniques, highlight their strengths and limitations under diverse threat models, and discuss emerging challenges such as multi-tab browsing and coarse-grained traffic features. By consolidating prior work and identifying open research directions, this survey serves as a foundation for advancing stronger privacy protection in Tor.

preprint2026arXiv

VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation

Emotion Recognition in Conversation (ERC) is essential for effective human-machine interaction, aiming to identify speakers' emotional states in multi-turn dialogues. Early text-based methods struggle with complex scenarios like sarcasm because they inherently neglect vital non-verbal information. While recent Vision-Language Models (VLMs) address this by analyzing video directly, they are not inherently tailored for ERC and often focus on emotionally irrelevant background regions or passive listeners rather than the active speaker. Furthermore, fine-tuning these large models incurs prohibitive computational costs. Additionally, isolated visual signals are frequently ambiguous or technically compromised without the context of linguistic content and vocal prosody. To address these challenges, we propose VISAFF, a speaker-centered VISual AFFective feature learning framework for ERC. VISAFF consists of two stages: Speaker-Centered Affective Grounding and Reliability-Guided Affective Complementation. VISAFF utilizes a tuning-free approach to unlock the reasoning capabilities of frozen VLMs, efficiently steering them to focus on the active speaker's emotional visual cues without heavy training overheads. In the second stage, we introduce a reliability-guided affective complementation mechanism that dynamically leverages textual and acoustic modalities to compensate for visual uncertainty. Experiments on two real-world datasets demonstrate that VISAFF achieves highly competitive performance compared to state-of-the-art methods in a tuning-free setting, significantly enhancing computational efficiency by eliminating the need for expensive fine-tuning of large VLMs. The source code is available at https://anonymous.4open.science/r/speaker-2365/.

preprint2025arXiv

Degree-Weighted Social Learning

We study social learning in which agents weight neighbors' opinions differently based on their degrees, capturing situations in which agents place more trust in well-connected individuals or, conversely, discount their influence. We derive asymptotic properties of learning outcomes in large stochastic networks and analyze how the weighting rule affects societal wisdom and convergence speed. We find that assigning greater weight to higher-degree neighbors harms wisdom but has a non-monotonic effect on convergence speed, depending on the diversity of views within high- and low-degree groups, highlighting a potential trade-off between convergence speed and wisdom.

preprint2024arXiv

Data Valuation for Vertical Federated Learning: A Model-free and Privacy-preserving Method

Vertical Federated learning (VFL) is a promising paradigm for predictive analytics, empowering an organization (i.e., task party) to enhance its predictive models through collaborations with multiple data suppliers (i.e., data parties) in a decentralized and privacy-preserving way. Despite the fast-growing interest in VFL, the lack of effective and secure tools for assessing the value of data owned by data parties hinders the application of VFL in business contexts. In response, we propose FedValue, a privacy-preserving, task-specific but model-free data valuation method for VFL, which consists of a data valuation metric and a federated computation method. Specifically, we first introduce a novel data valuation metric, namely MShapley-CMI. The metric evaluates a data party's contribution to a predictive analytics task without the need of executing a machine learning model, making it well-suited for real-world applications of VFL. Next, we develop an innovative federated computation method that calculates the MShapley-CMI value for each data party in a privacy-preserving manner. Extensive experiments conducted on six public datasets validate the efficacy of FedValue for data valuation in the context of VFL. In addition, we illustrate the practical utility of FedValue with a case study involving federated movie recommendations.

preprint2024arXiv

Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment

The size of deep learning models in artificial intelligence (AI) software is increasing rapidly, hindering the large-scale deployment on resource-restricted devices (e.g., smartphones). To mitigate this issue, AI software compression plays a crucial role, which aims to compress model size while keeping high performance. However, the intrinsic defects in a big model may be inherited by the compressed one. Such defects may be easily leveraged by adversaries, since a compressed model is usually deployed in a large number of devices without adequate protection. In this article, we aim to address the safe model compression problem from the perspective of safety-performance co-optimization. Specifically, inspired by the test-driven development (TDD) paradigm in software engineering, we propose a test-driven sparse training framework called SafeCompress. By simulating the attack mechanism as safety testing, SafeCompress can automatically compress a big model to a small one following the dynamic sparse training paradigm. Then, considering two kinds of representative and heterogeneous attack mechanisms, i.e., black-box membership inference attack and white-box membership inference attack, we develop two concrete instances called BMIA-SafeCompress and WMIA-SafeCompress. Further, we implement another instance called MMIA-SafeCompress by extending SafeCompress to defend against the occasion when adversaries conduct black-box and white-box membership inference attacks simultaneously. We conduct extensive experiments on five datasets for both computer vision and natural language processing tasks. The results show the effectiveness and generalizability of our framework. We also discuss how to adapt SafeCompress to other attacks besides membership inference attack, demonstrating the flexibility of SafeCompress.

preprint2023arXiv

Artificial intelligence for diagnosing and predicting survival of patients with renal cell carcinoma: Retrospective multi-center study

Background: Clear cell renal cell carcinoma (ccRCC) is the most common renal-related tumor with high heterogeneity. There is still an urgent need for novel diagnostic and prognostic biomarkers for ccRCC. Methods: We proposed a weakly-supervised deep learning strategy using conventional histology of 1752 whole slide images from multiple centers. Our study was demonstrated through internal cross-validation and external validations for the deep learning-based models. Results: Automatic diagnosis for ccRCC through intelligent subtyping of renal cell carcinoma was proved in this study. Our graderisk achieved aera the curve (AUC) of 0.840 (95% confidence interval: 0.805-0.871) in the TCGA cohort, 0.840 (0.805-0.871) in the General cohort, and 0.840 (0.805-0.871) in the CPTAC cohort for the recognition of high-grade tumor. The OSrisk for the prediction of 5-year survival status achieved AUC of 0.784 (0.746-0.819) in the TCGA cohort, which was further verified in the independent General cohort and the CPTAC cohort, with AUC of 0.774 (0.723-0.820) and 0.702 (0.632-0.765), respectively. Cox regression analysis indicated that graderisk, OSrisk, tumor grade, and tumor stage were found to be independent prognostic factors, which were further incorporated into the competing-risk nomogram (CRN). Kaplan-Meier survival analyses further illustrated that our CRN could significantly distinguish patients with high survival risk, with hazard ratio of 5.664 (3.893-8.239, p < 0.0001) in the TCGA cohort, 35.740 (5.889-216.900, p < 0.0001) in the General cohort and 6.107 (1.815 to 20.540, p < 0.0001) in the CPTAC cohort. Comparison analyses conformed that our CRN outperformed current prognosis indicators in the prediction of survival status, with higher concordance index for clinical prognosis.

preprint2022arXiv

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

Large-scale Vision-and-Language (V+L) pre-training for representation learning has proven to be effective in boosting various downstream V+L tasks. However, when it comes to the fashion domain, existing V+L methods are inadequate as they overlook the unique characteristics of both the fashion V+L data and downstream tasks. In this work, we propose a novel fashion-focused V+L representation learning framework, dubbed as FashionViL. It contains two novel fashion-specific pre-training tasks designed particularly to exploit two intrinsic attributes with fashion V+L data. First, in contrast to other domains where a V+L data point contains only a single image-text pair, there could be multiple images in the fashion domain. We thus propose a Multi-View Contrastive Learning task for pulling closer the visual representation of one image to the compositional multimodal representation of another image+text. Second, fashion text (e.g., product description) often contains rich fine-grained concepts (attributes/noun phrases). To exploit this, a Pseudo-Attributes Classification task is introduced to encourage the learned unimodal (visual/textual) representations of the same concept to be adjacent. Further, fashion V+L tasks uniquely include ones that do not conform to the common one-stream or two-stream architectures (e.g., text-guided image retrieval). We thus propose a flexible, versatile V+L model architecture consisting of a modality-agnostic Transformer so that it can be flexibly adapted to any downstream tasks. Extensive experiments show that our FashionViL achieves a new state of the art across five downstream tasks. Code is available at https://github.com/BrandonHanx/mmf.

preprint2022arXiv

Hopf algebroids from noncommutative bundles

We present two classes of examples of Hopf algebroids associated with noncommutative principal bundles. The first comes from deforming the principal bundle while leaving unchanged the structure Hopf algebra. The second is related to deforming a quantum homogeneous space; this needs a careful deformation of the structure Hopf algebra in order to preserve the compatibilities between the Hopf algebra operations.

preprint2022arXiv

Large-Scale Privacy-Preserving Network Embedding against Private Link Inference Attacks

Network embedding represents network nodes by a low-dimensional informative vector. While it is generally effective for various downstream tasks, it may leak some private information of networks, such as hidden private links. In this work, we address a novel problem of privacy-preserving network embedding against private link inference attacks. Basically, we propose to perturb the original network by adding or removing links, and expect the embedding generated on the perturbed network can leak little information about private links but hold high utility for various downstream tasks. Towards this goal, we first propose general measurements to quantify privacy gain and utility loss incurred by candidate network perturbations; we then design a PPNE framework to identify the optimal perturbation solution with the best privacy-utility trade-off in an iterative way. Furthermore, we propose many techniques to accelerate PPNE and ensure its scalability. For instance, as the skip-gram embedding methods including DeepWalk and LINE can be seen as matrix factorization with closed form embedding results, we devise efficient privacy gain and utility loss approximation methods to avoid the repetitive time-consuming embedding training for every candidate network perturbation in each iteration. Experiments on real-life network datasets (with up to millions of nodes) verify that PPNE outperforms baselines by sacrificing less utility and obtaining higher privacy protection.

preprint2022arXiv

Large-Scale Product Retrieval with Weakly Supervised Representation Learning

Large-scale weakly supervised product retrieval is a practically useful yet computationally challenging problem. This paper introduces a novel solution for the eBay Visual Search Challenge (eProduct) held at the Ninth Workshop on Fine-Grained Visual Categorisation workshop (FGVC9) of CVPR 2022. This competition presents two challenges: (a) E-commerce is a drastically fine-grained domain including many products with subtle visual differences; (b) A lacking of target instance-level labels for model training, with only coarse category labels and product titles available. To overcome these obstacles, we formulate a strong solution by a set of dedicated designs: (a) Instead of using text training data directly, we mine thousands of pseudo-attributes from product titles and use them as the ground truths for multi-label classification. (b) We incorporate several strong backbones with advanced training recipes for more discriminative representation learning. (c) We further introduce a number of post-processing techniques including whitening, re-ranking and model ensemble for retrieval enhancement. By achieving 71.53% MAR, our solution "Involution King" achieves the second position on the leaderboard.

preprint2022arXiv

Safety and Performance, Why not Both? Bi-Objective Optimized Model Compression toward AI Software Deployment

The size of deep learning models in artificial intelligence (AI) software is increasing rapidly, which hinders the large-scale deployment on resource-restricted devices (e.g., smartphones). To mitigate this issue, AI software compression plays a crucial role, which aims to compress model size while keeping high performance. However, the intrinsic defects in the big model may be inherited by the compressed one. Such defects may be easily leveraged by attackers, since the compressed models are usually deployed in a large number of devices without adequate protection. In this paper, we try to address the safe model compression problem from a safety-performance co-optimization perspective. Specifically, inspired by the test-driven development (TDD) paradigm in software engineering, we propose a test-driven sparse training framework called SafeCompress. By simulating the attack mechanism as the safety test, SafeCompress can automatically compress a big model to a small one following the dynamic sparse training paradigm. Further, considering a representative attack, i.e., membership inference attack (MIA), we develop a concrete safe model compression mechanism, called MIA-SafeCompress. Extensive experiments are conducted to evaluate MIA-SafeCompress on five datasets for both computer vision and natural language processing tasks. The results verify the effectiveness and generalization of our method. We also discuss how to adapt SafeCompress to other attacks besides MIA, demonstrating the flexibility of SafeCompress.

preprint2022arXiv

UIGR: Unified Interactive Garment Retrieval

Interactive garment retrieval (IGR) aims to retrieve a target garment image based on a reference garment image along with user feedback on what to change on the reference garment. Two IGR tasks have been studied extensively: text-guided garment retrieval (TGR) and visually compatible garment retrieval (VCR). The user feedback for the former indicates what semantic attributes to change with the garment category preserved, while the category is the only thing to be changed explicitly for the latter, with an implicit requirement on style preservation. Despite the similarity between these two tasks and the practical need for an efficient system tackling both, they have never been unified and modeled jointly. In this paper, we propose a Unified Interactive Garment Retrieval (UIGR) framework to unify TGR and VCR. To this end, we first contribute a large-scale benchmark suited for both problems. We further propose a strong baseline architecture to integrate TGR and VCR in one model. Extensive experiments suggest that unifying two tasks in one framework is not only more efficient by requiring a single model only, it also leads to better performance. Code and datasets are available at https://github.com/BrandonHanx/CompFashion.

preprint2020arXiv

Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset

We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. Our system has been online and serving users since late March 2020. The Covidex is the user application component of our three-pronged strategy to develop technologies for helping domain experts tackle the ongoing global pandemic. In addition, we provide robust and easy-to-use keyword search infrastructure that exploits mature fusion-based methods as well as standalone neural ranking models that can be incorporated into other applications. These techniques have been evaluated in the ongoing TREC-COVID challenge: Our infrastructure and baselines have been adopted by many participants, including some of the highest-scoring runs in rounds 1, 2, and 3. In round 3, we report the highest-scoring run that takes advantage of previous training data and the second-highest fully automatic run.

preprint2020arXiv

Eigen selection in spectral clustering: a theory guided practice

Based on a Gaussian mixture type model , we derive an eigen selection procedure that improves the usual spectral clustering in high-dimensional settings. Concretely, we derive the asymptotic expansion of the spiked eigenvalues under eigenvalue multiplicity and eigenvalue ratio concentration results, giving rise to the first theory-backed eigen selection procedure in spectral clustering. The resulting eigen-selected spectral clustering (ESSC) algorithm enjoys better stability and compares favorably against canonical alternatives. We demonstrate the advantages of ESSC using extensive simulation and multiple real data studies.

preprint2020arXiv

IEEE 802.11be-Wi-Fi 7: New Challenges and Opportunities

With the emergence of 4k/8k video, the throughput requirement of video delivery will keep grow to tens of Gbps. Other new high-throughput and low-latency video applications including augmented reality (AR), virtual reality (VR), and online gaming, are also proliferating. Due to the related stringent requirements, supporting these applications over wireless local area network (WLAN) is far beyond the capabilities of the new WLAN standard -- IEEE 802.11ax. To meet these emerging demands, the IEEE 802.11 will release a new amendment standard IEEE 802.11be -- Extremely High Throughput (EHT), also known as Wireless-Fidelity (Wi-Fi) 7. This article provides the comprehensive survey on the key medium access control (MAC) layer techniques and physical layer (PHY) techniques being discussed in the EHT task group, including the channelization and tone plan, multiple resource units (multi-RU) support, 4096 quadrature amplitude modulation (4096-QAM), preamble designs, multiple link operations (e.g., multi-link aggregation and channel access), multiple input multiple output (MIMO) enhancement, multiple access point (multi-AP) coordination (e.g., multi-AP joint transmission), enhanced link adaptation and retransmission protocols (e.g., hybrid automatic repeat request (HARQ)). This survey covers both the critical technologies being discussed in EHT standard and the related latest progresses from worldwide research. Besides, the potential developments beyond EHT are discussed to provide some possible future research directions for WLAN.

preprint2020arXiv

Microscope Based HER2 Scoring System

The overexpression of human epidermal growth factor receptor 2 (HER2) has been established as a therapeutic target in multiple types of cancers, such as breast and gastric cancers. Immunohistochemistry (IHC) is employed as a basic HER2 test to identify the HER2-positive, borderline, and HER2-negative patients. However, the reliability and accuracy of HER2 scoring are affected by many factors, such as pathologists' experience. Recently, artificial intelligence (AI) has been used in various disease diagnosis to improve diagnostic accuracy and reliability, but the interpretation of diagnosis results is still an open problem. In this paper, we propose a real-time HER2 scoring system, which follows the HER2 scoring guidelines to complete the diagnosis, and thus each step is explainable. Unlike the previous scoring systems based on whole-slide imaging, our HER2 scoring system is integrated into an augmented reality (AR) microscope that can feedback AI results to the pathologists while reading the slide. The pathologists can help select informative fields of view (FOVs), avoiding the confounding regions, such as DCIS. Importantly, we illustrate the intermediate results with membrane staining condition and cell classification results, making it possible to evaluate the reliability of the diagnostic results. Also, we support the interactive modification of selecting regions-of-interest, making our system more flexible in clinical practice. The collaboration of AI and pathologists can significantly improve the robustness of our system. We evaluate our system with 285 breast IHC HER2 slides, and the classification accuracy of 95\% shows the effectiveness of our HER2 scoring system.

preprint2020arXiv

On Chemical Distance and Local Uniqueness of a Sufficiently Supercritical Finitary Random Interlacement

In this paper, we study geometric properties of the unique infinite cluster $Γ$ in a sufficiently supercritical Finitary Random Interlacements $\mathcal{FI}^{u,T}$ in $\mathbb{Z}^d, \ d\ge 3$. We prove that the chemical distance in $Γ$ is, with stretched exponentially high probability, of the same order as the Euclidean distance in $\mathbb{Z}^d$. This also implies a shape theorem parallel to those for Bernoulli percolation and random interlacements. We also prove local uniqueness of $\mathcal{FI}^{u,T}$, which says any two large clusters in $\mathcal{FI}^{u,T}$ "close to each other" will with stretched exponentially high probability be connected to each other within the same order of the distance between them.

preprint2020arXiv

SK-Unet: an Improved U-net Model with Selective Kernel for the Segmentation of Multi-sequence Cardiac MR

In the clinical environment, myocardial infarction (MI) as one com-mon cardiovascular disease is mainly evaluated based on the late gadolinium enhancement (LGE) cardiac magnetic resonance images (CMRIs). The auto-matic segmentations of left ventricle (LV), right ventricle (RV), and left ven-tricular myocardium (LVM) in the LGE CMRIs are desired for the aided diag-nosis in clinic. To accomplish this segmentation task, this paper proposes a modified U-net architecture by combining multi-sequence CMRIs, including the cine, LGE, and T2-weighted CMRIs. The cine and T2-weighted CMRIs are used to assist the segmentation in the LGE CMRIs. In this segmentation net-work, the squeeze-and-excitation residual (SE-Res) and selective kernel (SK) modules are inserted in the down-sampling and up-sampling stages, respective-ly. The SK module makes the obtained feature maps more informative in both spatial and channel-wise space, and attains more precise segmentation result. The utilized dataset is from the MICCAI challenge (MS-CMRSeg 2019), which is acquired from 45 patients including three CMR sequences. The cine and T2-weighted CMRIs acquired from 35 patients and the LGE CMRIs acquired from 5 patients are labeled. Our method achieves the mean dice score of 0.922 (LV), 0.827 (LVM), and 0.874 (RV) in the LGE CMRIs.

preprint2020arXiv

Twisted Ehresmann Schauenburg bialgebroids

We construct an invertible normalised 2 cocycle on the Ehresmann Schauenburg bialgebroid of a cleft Hopf Galois extension under the condition that the corresponding Hopf algebra is cocommutative and the image of the unital cocycle corresponding to this cleft Hopf Galois extension belongs to the centre of the coinvariant subalgebra. Moreover, we show that any Ehresmann Schauenburg bialgebroid of this kind is isomorphic to a 2-cocycle twist of the Ehresmann Schauenburg bialgebroid corresponding to a Hopf Galois extension without cocycle, where comodule algebra is an ordinary smash product of the coinvariant subalgebra and the Hopf algebra (i.e. $\C(B/#_σH, H)\simeq \C(B\#H, H)^{\tildeσ}$). We also study the theory in the case of a Galois object where the base is trivial but without requiring the Hopf algebra to be cocommutative.

preprint2016arXiv

A unified matrix model including both CCA and F matrices in multivariate analysis: the largest eigenvalue and its applications

Let $\bbZ_{M_1\times N}=\bbT^{\frac{1}{2}}\bbX$ where $(\bbT^{\frac{1}{2}})^2=\bbT$ is a positive definite matrix and $\bbX$ consists of independent random variables with mean zero and variance one. This paper proposes a unified matrix model $$\bold{\bbom}=(\bbZ\bbU_2\bbU_2^T\bbZ^T)^{-1}\bbZ\bbU_1\bbU_1^T\bbZ^T,$$ where $\bbU_1$ and $\bbU_2$ are isometric with dimensions $N\times N_1$ and $N\times (N-N_2)$ respectively such that $\bbU_1^T\bbU_1=\bbI_{N_1}$, $\bbU_2^T\bbU_2=\bbI_{N-N_2}$ and $\bbU_1^T\bbU_2=0$. Moreover, $\bbU_1$ and $\bbU_2$ (random or non-random) are independent of $\bbZ_{M_1\times N}$ and with probability tending to one, $rank(\bbU_1)=N_1$ and $rank(\bbU_2)=N-N_2$. We establish the asymptotic Tracy-Widom distribution for its largest eigenvalue under moment assumptions on $\bbX$ when $N_1,N_2$ and $M_1$ are comparable. By selecting appropriate matrices $\bbU_1$ and $\bbU_2$, the asymptotic distributions of the maximum eigenvalues of the matrices used in Canonical Correlation Analysis (CCA) and of F matrices (including centered and non-centered versions) can be both obtained from that of $\bold{\bbom}$. %In particular, $\bbom$ can also cover nonzero mean by appropriate matrices $\bbU_1$ and $\bbU_2$. %relax the zero mean value restriction for F matrix in \cite{WY} to allow for any nonzero mean vetors. %thus a direct application of our proposed Tracy-Widom distribution is the independence testing via CCA. Moreover, via appropriate matrices $\bbU_1$ and $\bbU_2$, this matrix $\bold{\bbom}$ can be applied to some multivariate testing problems that cannot be done by the traditional CCA matrix.

preprint2015arXiv

Are You Really Hidden? Predicting Current City from Profile and Social Relationship

Privacy has become a major concern in Online Social Networks (OSNs) due to threats such as advertising spam, online stalking and identity theft. Although many users hide or do not fill out their private attributes in OSNs, prior studies point out that the hidden attributes may be inferred from some other public information. Thus, users' private information could still be at stake to be exposed. Hitherto, little work helps users to assess the exposure probability/risk that the hidden attributes can be correctly predicted, let alone provides them with pointed countermeasures. In this article, we focus our study on the exposure risk assessment by a particular privacy-sensitive attribute - current city - in Facebook. Specifically, we first design a novel current city prediction approach that discloses users' hidden `current city' from their self-exposed information. Based on 371,913 Facebook users' data, we verify that our proposed prediction approach can predict users' current city more accurately than state-of-the-art approaches. Furthermore, we inspect the prediction results and model the current city exposure probability via some measurable characteristics of the self-exposed information. Finally, we construct an exposure estimator to assess the current city exposure risk for individual users, given their self-exposed information. Several case studies are presented to illustrate how to use our proposed estimator for privacy protection.

preprint2015arXiv

Robust Reconstruction of Complex Networks from Sparse Data

Reconstructing complex networks from measurable data is a fundamental problem for understanding and controlling collective dynamics of complex networked systems. However, a significant challenge arises when we attempt to decode structural information hidden in limited amounts of data accompanied by noise and in the presence of inaccessible nodes. Here, we develop a general framework for robust reconstruction of complex networks from sparse and noisy data. Specifically, we decompose the task of reconstructing the whole network into recovering local structures centered at each node. Thus, the natural sparsity of complex networks ensures a conversion from the local structure reconstruction into a sparse signal reconstruction problem that can be addressed by using the lasso, a convex optimization method. We apply our method to evolutionary games, transportation and communication processes taking place in a variety of model and real complex networks, finding that universal high reconstruction accuracy can be achieved from sparse data in spite of noise in time series and missing data of partial nodes. Our approach opens new routes to the network reconstruction problem and has potential applications in a wide range of fields.

preprint2014arXiv

High Dimensional Correlation Matrices: CLT and Its Applications

Statistical inferences for sample correlation matrices are important in high dimensional data analysis. Motivated by this, this paper establishes a new central limit theorem (CLT) for a linear spectral statistic (LSS) of high dimensional sample correlation matrices for the case where the dimension p and the sample size $n$ are comparable. This result is of independent interest in large dimensional random matrix theory. Meanwhile, we apply the linear spectral statistic to an independence test for $p$ random variables, and then an equivalence test for p factor loadings and $n$ factors in a factor model. The finite sample performance of the proposed test shows its applicability and effectiveness in practice. An empirical application to test the independence of household incomes from different cities in China is also conducted.

preprint2013arXiv

Biomimetic fabrication and tunable wetting properties of three-dimensional hierarchical ZnO structures by combining soft lithography templated with lotus leaf and hydrothermal treatments

Three-dimensional hierarchical ZnO films with lotus-leaf-like micro/nano structures were successfully fabricated via a biomimetic route combining sol-gel technique, soft lithography and hydrothermal treatments. PDMS mold replicated from a fresh lotus leaf was used to imprint microscale pillar structures directly into a ZnO sol film. Hierarchical ZnO micro/nano structures were subsequently fabricated by a low-temperature hydrothermal growth of secondary ZnO nanorod arrays on the micro-structured ZnO film. The morphology and size of ZnO hierarchical micro/nano structures can be easily controlled by adjusting the hydrothermal reaction time. Wettability of hierarchical ZnO thin films was found to convert from superhydrophilicity to hydrophobicity after a low-surface-energy fluoroalkylsilane modification. Improved wetting properties from hydrophobic to superhydrophobic can be tuned by increasing the growth of ZnO nanorods structures.

preprint2013arXiv

Constraining the neutron-proton effective mass splitting using empirical constraints on the density dependence of nuclear symmetry energy around normal density

According to the Hugenholtz-Van Hove theorem, nuclear symmetry energy \esym and its slope \lr at an arbitrary density $ρ$ are determined by the nucleon isovector (symmetry) potential \usym and its momentum dependence $\frac{\partial U_{sym}}{\partial k}$. The latter determines uniquely the neutron-proton effective k-mass splitting $m^*_{n-p}(ρ,δ)\equiv (m_{\rm n}^*-m_{\rm p}^*)/m$ in neutron-rich nucleonic matter of isospin asymmetry $δ$. Using currently available constraints on the \es0 and \l0 at normal density $ρ_0$ of nuclear matter from 28 recent analyses of various terrestrial nuclear laboratory experiments and astrophysical observations, we try to infer the corresponding neutron-proton effective k-mass splitting $m^*_{n-p}(ρ_0,δ)$. While the mean values of the $m^*_{n-p}(ρ_0,δ)$ obtained from most of the studies are remarkably consistent with each other and scatter very closely around an empirical value of \emass$=0.27\cdotδ$, it is currently not possible to scientifically state surely that the \emass is positive within the present knowledge of the uncertainties. Quantifying, better understanding and then further reducing the uncertainties using modern statistical and computational techniques in extracting the \es0 and \l0 from analyzing the experimental data are much needed.

preprint2011arXiv

Fabrication of surface-patterned ZnO thin films using sol-gel methods and nanoimprint lithography

Surface-patterned ZnO thin films were fabricated by direct imprinting on ZnO sol and subsequent annealing process. The polymer-based ZnO sols were deposited on various substrates for the nanoimprint lithography and converted to surface-patterned ZnO gel films during the thermal curing nanoimprint process. Finally, crystalline ZnO films were obtained by subsequent annealing of the patterned ZnO gel films. The optical characterization indicates that the surface patterning of ZnO thin films can lead to an enhanced transmittance. Large-scale ZnO thin films with different patterns can be fabricated by various easy-made ordered templates using this combination of sol-gel and nanoimprint lithography techniques.

Xiao Han

What is connected

Connect this record

See the researcher in context

Building this map preview

26 published item(s)

A Comprehensive Survey of Website Fingerprinting Attacks and Defenses in Tor: Advances and Open Challenges

VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation

Degree-Weighted Social Learning

Data Valuation for Vertical Federated Learning: A Model-free and Privacy-preserving Method

Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment

Artificial intelligence for diagnosing and predicting survival of patients with renal cell carcinoma: Retrospective multi-center study

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

Hopf algebroids from noncommutative bundles

Large-Scale Privacy-Preserving Network Embedding against Private Link Inference Attacks

Large-Scale Product Retrieval with Weakly Supervised Representation Learning

Safety and Performance, Why not Both? Bi-Objective Optimized Model Compression toward AI Software Deployment

UIGR: Unified Interactive Garment Retrieval

Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset

Eigen selection in spectral clustering: a theory guided practice

IEEE 802.11be-Wi-Fi 7: New Challenges and Opportunities

Microscope Based HER2 Scoring System

On Chemical Distance and Local Uniqueness of a Sufficiently Supercritical Finitary Random Interlacement

SK-Unet: an Improved U-net Model with Selective Kernel for the Segmentation of Multi-sequence Cardiac MR

Twisted Ehresmann Schauenburg bialgebroids

A unified matrix model including both CCA and F matrices in multivariate analysis: the largest eigenvalue and its applications

Are You Really Hidden? Predicting Current City from Profile and Social Relationship

Robust Reconstruction of Complex Networks from Sparse Data

High Dimensional Correlation Matrices: CLT and Its Applications

Biomimetic fabrication and tunable wetting properties of three-dimensional hierarchical ZnO structures by combining soft lithography templated with lotus leaf and hydrothermal treatments

Constraining the neutron-proton effective mass splitting using empirical constraints on the density dependence of nuclear symmetry energy around normal density

Fabrication of surface-patterned ZnO thin films using sol-gel methods and nanoimprint lithography