Source author record

Yitong Li

Yitong Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

34works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models

Large Vision-Language Models (LVLMs) have demonstrated remarkable performance across multi-modal tasks by scaling model size and training data. However, these dense LVLMs incur significant computational costs and motivate the exploration of sparse Mixture of Experts (MoE) architectures. While MoE improve parameter efficiency, effectively applying MoE to simultaneously model modality-specific features and cross-modal associations in LVLMs remains challenging. In this work, we propose to incorporate Mixture of Intra- and Inter-Modality Experts (MoIIE) to LVLMs. For each token, expert routing is guided by its modality, directing tokens to their respective intra-modality experts as well as a shared pool of inter-modality experts, enabling the model to jointly learn rich intra-modal features and cross-modal interactions. We further introduce an effective and straightforward two-stage training strategy, which facilitates the direct activation of both MoE and multi-modal capabilities. Extensive experiments across different data scales and LLM backbone demonstrate the effectiveness, efficiency and generality of our approach. Notably, our MoIIE models with 5.5B and 11.3B activated parameters match or even surpass the performance of existing advanced open-source MoE-LLMs based multi-modal models that involve more activated parameters. The code is available at https://github.com/AlenjandroWang/MoIIE.

preprint2022arXiv

CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN

Deep learning models used in medical image analysis are prone to raising reliability concerns due to their black-box nature. To shed light on these black-box models, previous works predominantly focus on identifying the contribution of input features to the diagnosis, i.e., feature attribution. In this work, we explore counterfactual explanations to identify what patterns the models rely on for diagnosis. Specifically, we investigate the effect of changing features within chest X-rays on the classifier's output to understand its decision mechanism. We leverage a StyleGAN-based approach (StyleEx) to create counterfactual explanations for chest X-rays by manipulating specific latent directions in their latent space. In addition, we propose EigenFind to significantly reduce the computation time of generated explanations. We clinically evaluate the relevancy of our counterfactual explanations with the help of radiologists. Our code is publicly available.

preprint2022arXiv

CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented Dialog Systems

As labeling cost for different modules in task-oriented dialog (ToD) systems is high, a major challenge in practice is to learn different tasks with the least amount of labeled data. Recently, prompting methods over pre-trained language models (PLMs) have shown promising results for few-shot learning in ToD. To better utilize the power of PLMs, this paper proposes Comprehensive Instruction (CINS) that exploits PLMs with extra task-specific instructions. We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD, i.e. intent classification, dialog state tracking, and natural language generation. A sequence-to-sequence model (T5) is adopted to solve these three tasks in a unified framework. Extensive experiments are conducted on these ToD tasks in realistic few-shot learning scenarios with small validation data. Empirical results demonstrate that the proposed CINS approach consistently improves techniques that finetune PLMs with raw input or short prompts.

preprint2022arXiv

Compilable Neural Code Generation with Compiler Feedback

Automatically generating compilable programs with (or without) natural language descriptions has always been a touchstone problem for computational linguistics and automated software engineering. Existing deep-learning approaches model code generation as text generation, either constrained by grammar structures in decoder, or driven by pre-trained language models on large-scale code corpus (e.g., CodeGPT, PLBART, and CodeT5). However, few of them account for compilability of the generated programs. To improve compilability of the generated programs, this paper proposes COMPCODER, a three-stage pipeline utilizing compiler feedback for compilable code generation, including language model fine-tuning, compilability reinforcement, and compilability discrimination. Comprehensive experiments on two code generation tasks demonstrate the effectiveness of our proposed approach, improving the success rate of compilation from 44.18 to 89.18 in code completion on average and from 70.3 to 96.2 in text-to-code generation, respectively, when comparing with the state-of-the-art CodeGPT.

preprint2022arXiv

fairlib: A Unified Framework for Assessing and Improving Classification Fairness

This paper presents fairlib, an open-source framework for assessing and improving classification fairness. It provides a systematic framework for quickly reproducing existing baseline models, developing new methods, evaluating models with different metrics, and visualizing their results. Its modularity and extensibility enable the framework to be used for diverse types of inputs, including natural language, images, and audio. In detail, we implement 14 debiasing methods, including pre-processing, at-training-time, and post-processing approaches. The built-in metrics cover the most commonly used fairness criterion and can be further generalized and customized for fairness evaluation.

preprint2022arXiv

Impedance-based Root-cause Analysis: Comparative Study of Impedance Models and Calculation of Eigenvalue Sensitivity

Impedance models of power systems are useful when state-space models of apparatus such as inverter-based resources (IBRs) have not been made available and instead only black-box impedance models are available. For tracing the root causes of poor damping and tuning modes of the system, the sensitivity of the modes to components and parameters are needed. The so-called critical admittance-eigenvalue sensitivity based on nodal admittance model has provided a partial solution but omits meaningful directional information. The alternative whole-system impedance model yields participation factors of shunt-connected apparatus with directional information that allows separate tuning for damping and frequency, yet do not cover series-connected components. This paper formalises the relationships between the two forms of impedance models and between the two forms of root-cause analysis. The calculation of system eigenvalue sensitivity in impedance models is further developed, which fills the gaps of previous research and establishes a complete theory of impedance-based root-cause analysis. The theoretical relationships and the tuning of parameters have been illustrated with a three-node passive network, a modified IEEE 14-bus network and a modified NETS-NYPS 68-bus network, showing that tools can be developed for tuning of IBR-rich power systems where only black-box impedance models are available.

preprint2022arXiv

Improving Disentangled Text Representation Learning with Information-Theoretic Guidance

Learning disentangled representations of natural language is essential for many NLP tasks, e.g., conditional text generation, style transfer, personalized dialogue systems, etc. Similar problems have been studied extensively for other forms of data, such as images and videos. However, the discrete nature of natural language makes the disentangling of textual representations more challenging (e.g., the manipulation over the data space cannot be easily achieved). Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text, without any supervision on semantics. A new mutual information upper bound is derived and leveraged to measure dependence between style and content. By minimizing this upper bound, the proposed method induces style and content embeddings into two independent low-dimensional spaces. Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation in terms of content and style preservation.

preprint2022arXiv

Is my Depth Ground-Truth Good Enough? HAMMER -- Highly Accurate Multi-Modal Dataset for DEnse 3D Scene Regression

Depth estimation is a core task in 3D computer vision. Recent methods investigate the task of monocular depth trained with various depth sensor modalities. Every sensor has its advantages and drawbacks caused by the nature of estimates. In the literature, mostly mean average error of the depth is investigated and sensor capabilities are typically not discussed. Especially indoor environments, however, pose challenges for some devices. Textureless regions pose challenges for structure from motion, reflective materials are problematic for active sensing, and distances for translucent material are intricate to measure with existing sensors. This paper proposes HAMMER, a dataset comprising depth estimates from multiple commonly used sensors for indoor depth estimation, namely ToF, stereo, structured light together with monocular RGB+P data. We construct highly reliable ground truth depth maps with the help of 3D scanners and aligned renderings. A popular depth estimators is trained on this data and typical depth senosors. The estimates are extensively analyze on different scene structures. We notice generalization issues arising from various sensor technologies in household environments with challenging but everyday scene content. HAMMER, which we make publicly available, provides a reliable base to pave the way to targeted depth improvements and sensor fusion approaches.

preprint2022arXiv

Observation of Acoustic Non-Hermitian Bloch Braids and Associated Topological Phase Transitions

Topological features embedded in ancient braiding and knotting arts endow significant impacts on our daily life and even cutting-edge science. Recently, fast growing efforts are invested to the braiding topology of complex Bloch bands in non-Hermitian systems. This new classification of band topology goes far beyond those established in Hermitian counterparts. Here, we present the first acoustic realization of the topological non-Hermitian Bloch braids, based on a two-band model easily accessible for realizing any desired knot structure. The non-Hermitian bands are synthesized by a simple binary cavity-tube system, where the long-range, complex-valued, and momentum-resolved couplings are accomplished by a well-controlled unidirectional coupler. In addition to directly visualizing various two-band braiding patterns, we unambiguously observe the highly-elusive topological phase transitions between them. Not only do our results provide a direct demonstration for the non-Hermitian band topology, but also the experimental techniques open new avenues for designing unconventional acoustic metamaterials.

preprint2022arXiv

Pan More Gold from the Sand: Refining Open-domain Dialogue Training with Noisy Self-Retrieval Generation

Real human conversation data are complicated, heterogeneous, and noisy, from which building open-domain dialogue systems remains a challenging task. In fact, such dialogue data still contains a wealth of information and knowledge, however, they are not fully explored. In this paper, we show existing open-domain dialogue generation methods that memorize context-response paired data with autoregressive or encode-decode language models underutilize the training data. Different from current approaches, using external knowledge, we explore a retrieval-generation training framework that can take advantage of the heterogeneous and noisy training data by considering them as "evidence". In particular, we use BERTScore for retrieval, which gives better qualities of the evidence and generation. Experiments over publicly available datasets demonstrate that our method can help models generate better responses, even such training data are usually impressed as low-quality data. Such performance gain is comparable with those improved by enlarging the training set, even better. We also found that the model performance has a positive correlation with the relevance of the retrieved evidence. Moreover, our method performed well on zero-shot experiments, which indicates that our method can be more robust to real-world data.

preprint2022arXiv

PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model

In this paper, we introduce PanGu-Bot, a Chinese pre-trained open-domain dialogue generation model based on a large pre-trained language model (PLM) PANGU-alpha (Zeng et al.,2021). Different from other pre-trained dialogue models trained over a massive amount of dialogue data from scratch, we aim to build a powerful dialogue model with relatively fewer data and computation costs by inheriting valuable language capabilities and knowledge from PLMs. To this end, we train PanGu-Bot from the large PLM PANGU-alpha, which has been proven well-performed on a variety of Chinese natural language tasks. We investigate different aspects of responses generated by PanGu-Bot, including response quality, knowledge, and safety. We show that PanGu-Bot outperforms state-of-the-art Chinese dialogue systems (CDIALGPT (Wang et al., 2020), EVA (Zhou et al., 2021), EVA2.0 (Gu et al., 2022)) w.r.t. the above three aspects. We also demonstrate that PanGu-Bot can be easily deployed to generate emotional responses without further training. Throughout our empirical analysis, we also point out that the PanGu-Bot response quality, knowledge correctness, and safety are still far from perfect, and further explorations are indispensable to building reliable and smart dialogue systems. Our model and code will be available at https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/PanGu-Bot soon.

preprint2022arXiv

PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation with Photometrically Challenging Objects

Object pose estimation is crucial for robotic applications and augmented reality. Beyond instance level 6D object pose estimation methods, estimating category-level pose and shape has become a promising trend. As such, a new research field needs to be supported by well-designed datasets. To provide a benchmark with high-quality ground truth annotations to the community, we introduce a multimodal dataset for category-level object pose estimation with photometrically challenging objects termed PhoCaL. PhoCaL comprises 60 high quality 3D models of household objects over 8 categories including highly reflective, transparent and symmetric objects. We developed a novel robot-supported multi-modal (RGB, depth, polarisation) data acquisition and annotation process. It ensures sub-millimeter accuracy of the pose for opaque textured, shiny and transparent objects, no motion blur and perfect camera synchronisation. To set a benchmark for our dataset, state-of-the-art RGB-D and monocular RGB methods are evaluated on the challenging scenes of PhoCaL.

preprint2022arXiv

Polarimetric Pose Prediction

Light has many properties that vision sensors can passively measure. Colour-band separated wavelength and intensity are arguably the most commonly used for monocular 6D object pose estimation. This paper explores how complementary polarisation information, i.e. the orientation of light wave oscillations, influences the accuracy of pose predictions. A hybrid model that leverages physical priors jointly with a data-driven learning strategy is designed and carefully tested on objects with different levels of photometric complexity. Our design significantly improves the pose accuracy compared to state-of-the-art photometric approaches and enables object pose estimation for highly reflective and transparent objects. A new multi-modal instance-level 6D object pose dataset with highly accurate pose annotations for multiple objects with varying photometric complexity is introduced as a benchmark.

preprint2022arXiv

Revisiting Grid-Forming and Grid-Following Inverters: A Duality Theory

Power electronic converters for integrating renewable energy resources into power systems can be divided into grid-forming and grid-following inverters. They possess certain similarities, but several important differences, which means that the relationship between them is quite subtle and sometimes obscure. In this article, a new perspective based on duality is proposed to create new insights. It successfully unifies the grid interfacing and synchronization characteristics of the two inverter types in a symmetric, elegant, and technology-neutral form. Analysis shows that the grid-forming and grid-following inverters are duals of each other in several ways including a) synchronization controllers: frequency droop control and phase-locked loop (PLL); b) grid-interfacing characteristics: current-following voltage-forming and voltage-following current-forming; c) swing characteristics: current-angle swing and voltage-angle swing; d) inner-loop controllers: output impedance shaping and output admittance shaping; and e) grid strength compatibility: strong-grid instability and weak-grid instability. The swing equations are also derived in dual form, which reveal the dynamic interaction between the grid strength, the synchronization controllers, and the inner-loop controllers. Insights are generated into cases of poor stability in both small-signal and transient/large-signal. The theoretical analysis and simulation results are used to illustrate cases for simple single-inverter-infinite-bus systems and a multi-inverter power network.

preprint2022arXiv

RotateQVS: Representing Temporal Information as Rotations in Quaternion Vector Space for Temporal Knowledge Graph Completion

Temporal factors are tied to the growth of facts in realistic applications, such as the progress of diseases and the development of political situation, therefore, research on Temporal Knowledge Graph (TKG) attracks much attention. In TKG, relation patterns inherent with temporality are required to be studied for representation learning and reasoning across temporal facts. However, existing methods can hardly model temporal relation patterns, nor can capture the intrinsic connections between relations when evolving over time, lacking of interpretability. In this paper, we propose a novel temporal modeling method which represents temporal entities as Rotations in Quaternion Vector Space (RotateQVS) and relations as complex vectors in Hamilton's quaternion space. We demonstrate our method can model key patterns of relations in TKG, such as symmetry, asymmetry, inverse, and can further capture time-evolved relations by theory. Empirically, we show that our method can boost the performance of link prediction tasks over four temporal knowledge graph benchmarks.

preprint2021arXiv

Acoustic Möbius insulators from projective symmetry

Symmetry plays a critical role in classifying phases of matter. This is exemplified by how crystalline symmetries enrich the topological classification of materials and enable unconventional phenomena in topologically nontrivial ones. After an extensive study over the past decade, the list of topological crystalline insulators and semimetals seems to be exhaustive and concluded. However, in the presence of gauge symmetry, common but not limited to artificial crystals, the algebraic structure of crystalline symmetries needs to be projectively represented, giving rise to unprecedented topological physics. Here we demonstrate this novel idea by exploiting a projective translation symmetry and constructing a variety of Möbius-twisted topological phases. Experimentally, we realize two Möbius insulators in acoustic crystals for the first time: a two-dimensional one of first-order band topology and a three-dimensional one of higher-order band topology. We observe unambiguously the peculiar Möbius edge and hinge states via real-space visualization of their localiztions, momentum-space spectroscopy of their 4π periodicity, and phase-space winding of their projective translation eigenvalues. Not only does our work open a new avenue for artificial systems under the interplay between gauge and crystalline symmetries, but it also initializes a new framework for topological physics from projective symmetry.

preprint2021arXiv

Automated Generation of Interorganizational Disaster Response Networks through Information Extraction

When a disaster occurs, maintaining and restoring community lifelines subsequently require collective efforts from various stakeholders. Aiming at reducing the efforts associated with generating Stakeholder Collaboration Networks (SCNs), this paper proposes a systematic approach to reliable information extraction for stakeholder collaboration and automated network generation. Specifically, stakeholders and their interactions are extracted from texts through Named Entity Recognition (NER), one of the techniques in natural language processing. Once extracted, the collaboration information is transformed into structured datasets to generate the SCNs automatically. A case study of stakeholder collaboration during Hurricane Harvey was investigated and it had demonstrated the feasibility and applicability of the proposed method. Hence, the proposed approach was proved to significantly reduce practitioners' interpretation and data collection workloads. In the end, discussions and future work are provided.

preprint2021arXiv

Polarized hyperspectral imaging with single fiber bundle via incoherent light transmission matrix approach

The scattering of multispectral incoherent light is a common and unfavorable signal scrambling in natural scenes. However, the blurred light spot due to scattering still holds lots of information remaining to be explored. Former methods failed to recover the polarized hyperspectral information from scattered incoherent light or relied on additional dispersion elements. Here we put forward the transmission matrix (TM) approach for extended objects under incoherent illumination by speculating the unknown TM through experimentally calibrated or digitally emulated ways. Employing a fiber bundle as a powerful imaging and dispersion element, we recover the spatial information in 252 polarized-spectral channels from a single speckle, thus achieving single-shot, high-resolution, broadband hyperspectral imaging for two polarization states with the cheap, compact, fiber-bundle-only system. Based on the scattering principle itself, our method not only greatly improves the robustness of the TM approach to retrieve the input spectral information, but also reveals the feasibility to explore the polarized spatio-spectral information from blurry speckles only with the help of simple optical setups.

preprint2020arXiv

Automated Abstraction of Operation Processes from Unstructured Text for Simulation Modeling

Abstraction of operation processes is a fundamental step for simulation modeling. To reliably abstract an operation process, modelers rely on text information to study and understand details of operations. Aiming at reducing modelers' interpretation load and ensuring the reliability of the abstracted information, this research proposes a systematic methodology to automate the abstraction of operation processes. The methodology applies rule-based information extraction to automatically extract operation process-related information from unstructured text and creates graphical representations of operation processes using the extracted information. To demonstrate the applicability and feasibility of the proposed methodology, a text description of an earthmoving operation is used to create its corresponding graphical representation. Overall, this research enhances the state-of-the-art simulation modeling through achieving automated abstraction of operation processes, which largely reduces modelers' interpretation load and ensures the reliability of the abstracted operation processes.

preprint2020arXiv

How to Democratise and Protect AI: Fair and Differentially Private Decentralised Deep Learning

This paper firstly considers the research problem of fairness in collaborative deep learning, while ensuring privacy. A novel reputation system is proposed through digital tokens and local credibility to ensure fairness, in combination with differential privacy to guarantee privacy. In particular, we build a fair and differentially private decentralised deep learning framework called FDPDDL, which enables parties to derive more accurate local models in a fair and private manner by using our developed two-stage scheme: during the initialisation stage, artificial samples generated by Differentially Private Generative Adversarial Network (DPGAN) are used to mutually benchmark the local credibility of each party and generate initial tokens; during the update stage, Differentially Private SGD (DPSGD) is used to facilitate collaborative privacy-preserving deep learning, and local credibility and tokens of each party are updated according to the quality and quantity of individually released gradients. Experimental results on benchmark datasets under three realistic settings demonstrate that FDPDDL achieves high fairness, yields comparable accuracy to the centralised and distributed frameworks, and delivers better accuracy than the standalone framework.

preprint2020arXiv

Impedance-Based Whole-System Modeling for a Composite Grid via Frame-Dynamics Embedding

The paper establishes a methodology to overcome the difficulty of dynamic frame alignment and system separation in impedance modeling of ac grids, and thereby enables impedance-based whole-system modeling of generator-converter composite power systems. The methodology is based on a frame-dynamics-embedding transformation via an intermediary steady frame between local and global frames, which yields a locally defined impedance model for each generator or converter that does not rely on a global frame but retains all frame dynamics. The individual impedance model can then be readily combined into a whole-system model even for meshed networks via the proposed closed-loop formulation without network separation. Compared to start-of-the-art impedance-based models, the proposed method retains both frame dynamics and scalability, and is generally applicable to various network topologies (meshed, radial, etc) and combinations of machines (generators, motors, converters, etc). The methodology is used to analyze the dynamic interaction between generators and converters in a composite grid, which yields important findings and potential solutions for unstable oscillation caused by PLL-swing coupling in low-inertia grids.

preprint2020arXiv

Sequential Attention GAN for Interactive Image Editing

Most existing text-to-image synthesis tasks are static single-turn generation, based on pre-defined textual descriptions of images. To explore more practical and interactive real-life applications, we introduce a new task - Interactive Image Editing, where users can guide an agent to edit images via multi-turn textual commands on-the-fly. In each session, the agent takes a natural language description from the user as the input and modifies the image generated in the previous turn to a new design, following the user description. The main challenges in this sequential and interactive image generation task are two-fold: 1) contextual consistency between a generated image and the provided textual description; 2) step-by-step region-level modification to maintain visual consistency across the generated image sequence in each session. To address these challenges, we propose a novel Sequential Attention Generative Adversarial Net-work (SeqAttnGAN), which applies a neural state tracker to encode the previous image and the textual description in each turn of the sequence, and uses a GAN framework to generate a modified version of the image that is consistent with the preceding images and coherent with the description. To achieve better region-specific refinement, we also introduce a sequential attention mechanism into the model. To benchmark on the new task, we introduce two new datasets, Zap-Seq and DeepFashion-Seq, which contain multi-turn sessions with image-description sequences in the fashion domain. Experiments on both datasets show that the proposed SeqAttnGANmodel outperforms state-of-the-art approaches on the interactive image editing task across all evaluation metrics including visual quality, image sequence coherence, and text-image consistency.

preprint2020arXiv

Toward Interpretability of Dual-Encoder Models for Dialogue Response Suggestions

This work shows how to improve and interpret the commonly used dual encoder model for response suggestion in dialogue. We present an attentive dual encoder model that includes an attention mechanism on top of the extracted word-level features from two encoders, one for context and one for label respectively. To improve the interpretability in the dual encoder models, we design a novel regularization loss to minimize the mutual information between unimportant words and desired labels, in addition to the original attention method, so that important words are emphasized while unimportant words are de-emphasized. This can help not only with model interpretability, but can also further improve model accuracy. We propose an approximation method that uses a neural network to calculate the mutual information. Furthermore, by adding a residual layer between raw word embeddings and the final encoded context feature, word-level interpretability is preserved at the final prediction of the model. We compare the proposed model with existing methods for the dialogue response task on two public datasets (Persona and Ubuntu). The experiments demonstrate the effectiveness of the proposed model in terms of better Recall@1 accuracy and visualized interpretability.

preprint2020arXiv

Towards Differentially Private Text Representations

Most deep learning frameworks require users to pool their local data or model updates to a trusted server to train or maintain a global model. The assumption of a trusted server who has access to user information is ill-suited in many applications. To tackle this problem, we develop a new deep learning framework under an untrusted server setting, which includes three modules: (1) embedding module, (2) randomization module, and (3) classifier module. For the randomization module, we propose a novel local differentially private (LDP) protocol to reduce the impact of privacy parameter $ε$ on accuracy, and provide enhanced flexibility in choosing randomization probabilities for LDP. Analysis and experiments show that our framework delivers comparable or even better performance than the non-private framework and existing LDP protocols, demonstrating the advantages of our LDP protocol.

preprint2020arXiv

Towards Fair and Privacy-Preserving Federated Deep Models

The current standalone deep learning framework tends to result in overfitting and low utility. This problem can be addressed by either a centralized framework that deploys a central server to train a global model on the joint data from all parties, or a distributed framework that leverages a parameter server to aggregate local model updates. Server-based solutions are prone to the problem of a single-point-of-failure. In this respect, collaborative learning frameworks, such as federated learning (FL), are more robust. Existing federated learning frameworks overlook an important aspect of participation: fairness. All parties are given the same final model without regard to their contributions. To address these issues, we propose a decentralized Fair and Privacy-Preserving Deep Learning (FPPDL) framework to incorporate fairness into federated deep learning models. In particular, we design a local credibility mutual evaluation mechanism to guarantee fairness, and a three-layer onion-style encryption scheme to guarantee both accuracy and privacy. Different from existing FL paradigm, under FPPDL, each participant receives a different version of the FL model with performance commensurate with his contributions. Experiments on benchmark datasets demonstrate that FPPDL balances fairness, privacy and accuracy. It enables federated learning ecosystems to detect and isolate low-contribution parties, thereby promoting responsible participation.

preprint2020arXiv

Understanding the Dynamics of Information Flow During Disaster Response Using Absorbing Markov Chains

This paper aims to derive a quantitative model to evaluate the impact of information flow on the effectiveness of disaster response. At the core of the model is a specialized absorbing Markov chain that models the process of delivering federal assistance to the community while considering stakeholder interactions and information flow uncertainty. Using the proposed model, the probability of community satisfaction is computed to reflect the effectiveness of disaster response. A hypothetical example is provided to demonstrate the applicability and interpretability of the derived quantitative model. Practically, the research provides governmental stakeholders interpretable insights for evaluating the impact of information flow on their disaster response effectiveness so that critical stakeholders can be targeted proactive actions for enhanced disaster response.

preprint2019arXiv

Enhanced Input Modeling for Construction Simulation using Bayesian Deep Neural Networks

This paper aims to propose a novel deep learning-integrated framework for deriving reliable simulation input models through incorporating multi-source information. The framework sources and extracts multisource data generated from construction operations, which provides rich information for input modeling. The framework implements Bayesian deep neural networks to facilitate the purpose of incorporating richer information in input modeling. A case study on road paving operation is performed to test the feasibility and applicability of the proposed framework. Overall, this research enhances input modeling by deriving detailed input models, thereby, augmenting the decision-making processes in construction operations. This research also sheds lights on prompting data-driven simulation through incorporating machine learning techniques.

preprint2019arXiv

Enhanced Welding Operator Quality Performance Measurement: Work Experience-Integrated Bayesian Prior Determination

Measurement of operator quality performance has been challenging in the construction fabrication industry. Among various causes, the learning effect is a significant factor, which needs to be incorporated in achieving a reliable operator quality performance analysis. This research aims to enhance a previously developed operator quality performance measurement approach by incorporating the learning effect (i.e., work experience). To achieve this goal, the Plateau learning model is selected to quantitatively represent the relationship between quality performance and work experience through a beta-binomial regression approach. Based on this relationship, an informative prior determination approach, which incorporates operator work experience information, is developed to enhance the previous Bayesian-based operator quality performance measurement. Academically, this research provides a systematic approach to derive Bayesian informative priors through integrating multi-source information. Practically, the proposed approach reliably measures operator quality performance in fabrication quality control processes.

preprint2019arXiv

Interpreting Frame Transformations as Diagonalization of Harmonic Transfer Functions

Analysis of ac electrical systems can be performed via frame transformations in the time-domain or via harmonic transfer functions (HTFs) in the frequency-domain. The two approaches each have unique advantages but are hard to reconcile because the coupling effect in the frequency-domain leads to infinite dimensional HTF matrices that need to be truncated. This paper explores the relation between the two representations and shows that applying a similarity transformation to an HTF matrix creates a direct equivalence to a frame transformation on the input-output signals. Under certain conditions, such similarity transformations have a diagonalizing effect which, essentially, reduces the HTF matrix order from infinity to two or one, making the matrix tractable mathematically without truncation or approximation. This theory is applied to a droop-controlled voltage source inverter as an illustrative example. A stability criterion is derived in the frequency-domain which agrees with the conventional state-space model but offers greater insights into the mechanism of instability in terms of the negative damping (non-passivity) under droop control. The paper not only establishes a unified view in theory but also offers an effective practical tool for stability assessment.

preprint2016arXiv

Learning Robust Representations of Text

Deep neural networks have achieved remarkable results across many language processing tasks, however these methods are highly sensitive to noise and adversarial attacks. We present a regularization based method for limiting network sensitivity to its inputs, inspired by ideas from computer vision, thus learning models that are more robust. Empirical evaluation over a range of sentiment datasets with a convolutional neural network shows that, compared to a baseline model and the dropout method, our method achieves superior performance over noisy inputs and out-of-domain data.

preprint2015arXiv

A Survey of Heterogeneous Information Network Analysis

Most real systems consist of a large number of interacting, multi-typed components, while most contemporary researches model them as homogeneous networks, without distinguishing different types of objects and links in the networks. Recently, more and more researchers begin to consider these interconnected, multi-typed data as heterogeneous information networks, and develop structural analysis approaches by leveraging the rich semantic meaning of structural types of objects and links in the networks. Compared to widely studied homogeneous network, the heterogeneous information network contains richer structure and semantic information, which provides plenty of opportunities as well as a lot of challenges for data mining. In this paper, we provide a survey of heterogeneous information network analysis. We will introduce basic concepts of heterogeneous information network analysis, examine its developments on different data mining tasks, discuss some advanced topics, and point out some future research directions.

preprint2015arXiv

EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video

Event-specific concepts are the semantic concepts designed for the events of interest, which can be used as a mid-level representation of complex events in videos. Existing methods only focus on defining event-specific concepts for a small number of predefined events, but cannot handle novel unseen events. This motivates us to build a large scale event-specific concept library that covers as many real-world events and their concepts as possible. Specifically, we choose WikiHow, an online forum containing a large number of how-to articles on human daily life events. We perform a coarse-to-fine event discovery process and discover 500 events from WikiHow articles. Then we use each event name as query to search YouTube and discover event-specific concepts from the tags of returned videos. After an automatic filter process, we end up with 95,321 videos and 4,490 concepts. We train a Convolutional Neural Network (CNN) model on the 95,321 videos over the 500 events, and use the model to extract deep learning feature from video content. With the learned deep learning feature, we train 4,490 binary SVM classifiers as the event-specific concept library. The concepts and events are further organized in a hierarchical structure defined by WikiHow, and the resultant concept library is called EventNet. Finally, the EventNet concept library is used to generate concept based representation of event videos. To the best of our knowledge, EventNet represents the first video event ontology that organizes events and their concepts into a semantic structure. It offers great potential for event retrieval and browsing. Extensive experiments over the zero-shot event retrieval task when no training samples are available show that the EventNet concept library consistently and significantly outperforms the state-of-the-art (such as the 20K ImageNet concepts trained with CNN) by a large margin up to 207%.

preprint2015arXiv

Maximum Sum Rate of Slotted Aloha with Capture

The sum rate performance of random-access networks crucially depends on the access protocol and receiver structure. Despite extensive studies, how to characterize the maximum sum rate of the simplest version of random access, Aloha, remains an open question. In this paper, a comprehensive study of the sum rate performance of slotted Aloha networks is presented. By extending the unified analytical framework proposed in [20], [21] from the classical collision model to the capture model, the network steady-state point in saturated conditions is derived as a function of the signal-to-interference-plus-noise ratio (SINR) threshold which determines a fundamental tradeoff between the information encoding rate and the network throughput. To maximize the sum rate, both the SINR threshold and backoff parameters of nodes should be properly selected. Explicit expressions of the maximum sum rate and the optimal setting are obtained, which show that similar to the sum capacity of the multiple access channel, the maximum sum rate of slotted Aloha also logarithmically increases with the mean received signal-to-noise ratio (SNR), but the high-SNR slope is only $e^{-1}$. Effects of backoff and power control on the sum rate performance of slotted Aloha networks are further discussed, which shed important light on the practical network design.

preprint2014arXiv

HRank: A Path based Ranking Framework in Heterogeneous Information Network

Recently, there is a surge of interests on heterogeneous information network analysis. As a newly emerging network model, heterogeneous information networks have many unique features (e.g., complex structure and rich semantics) and a number of interesting data mining tasks have been exploited in this kind of networks, such as similarity measure, clustering, and classification. Although evaluating the importance of objects has been well studied in homogeneous networks, it is not yet exploited in heterogeneous networks. In this paper, we study the ranking problem in heterogeneous networks and propose the HRank framework to evaluate the importance of multiple types of objects and meta paths. Since the importance of objects depends upon the meta paths in heterogeneous networks, HRank develops a path based random walk process. Moreover, a constrained meta path is proposed to subtly capture the rich semantics in heterogeneous networks. Furthermore, HRank can simultaneously determine the importance of objects and meta paths through applying the tensor analysis. Extensive experiments on three real datasets show that HRank can effectively evaluate the importance of objects and paths together. Moreover, the constrained meta path shows its potential on mining subtle semantics by obtaining more accurate ranking results.

Yitong Li

What is connected

Connect this record

See the researcher in context

Building this map preview

34 published item(s)

MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models

CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN

CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented Dialog Systems

Compilable Neural Code Generation with Compiler Feedback

fairlib: A Unified Framework for Assessing and Improving Classification Fairness

Impedance-based Root-cause Analysis: Comparative Study of Impedance Models and Calculation of Eigenvalue Sensitivity

Improving Disentangled Text Representation Learning with Information-Theoretic Guidance

Is my Depth Ground-Truth Good Enough? HAMMER -- Highly Accurate Multi-Modal Dataset for DEnse 3D Scene Regression

Observation of Acoustic Non-Hermitian Bloch Braids and Associated Topological Phase Transitions

Pan More Gold from the Sand: Refining Open-domain Dialogue Training with Noisy Self-Retrieval Generation

PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model

PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation with Photometrically Challenging Objects

Polarimetric Pose Prediction

Revisiting Grid-Forming and Grid-Following Inverters: A Duality Theory

RotateQVS: Representing Temporal Information as Rotations in Quaternion Vector Space for Temporal Knowledge Graph Completion

Acoustic Möbius insulators from projective symmetry

Automated Generation of Interorganizational Disaster Response Networks through Information Extraction

Polarized hyperspectral imaging with single fiber bundle via incoherent light transmission matrix approach

Automated Abstraction of Operation Processes from Unstructured Text for Simulation Modeling

How to Democratise and Protect AI: Fair and Differentially Private Decentralised Deep Learning

Impedance-Based Whole-System Modeling for a Composite Grid via Frame-Dynamics Embedding

Sequential Attention GAN for Interactive Image Editing

Toward Interpretability of Dual-Encoder Models for Dialogue Response Suggestions

Towards Differentially Private Text Representations

Towards Fair and Privacy-Preserving Federated Deep Models

Understanding the Dynamics of Information Flow During Disaster Response Using Absorbing Markov Chains

Enhanced Input Modeling for Construction Simulation using Bayesian Deep Neural Networks

Enhanced Welding Operator Quality Performance Measurement: Work Experience-Integrated Bayesian Prior Determination

Interpreting Frame Transformations as Diagonalization of Harmonic Transfer Functions

Learning Robust Representations of Text

A Survey of Heterogeneous Information Network Analysis

EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video

Maximum Sum Rate of Slotted Aloha with Capture

HRank: A Path based Ranking Framework in Heterogeneous Information Network