Source author record

Yiming Zhang

Yiming Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

30works

24topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DINORANKCLIP: DINOv3 Distillation and Injection for Vision-Language Pretraining with High-Order Ranking Consistency

Contrastive language-image pretraining (CLIP) suffers from two structural weaknesses: the symmetric InfoNCE loss discards the relative ordering among unmatched in-batch pairs, and global pooling collapses the visual representation into a semantic bottleneck that is poorly sensitive to fine-grained local structure. RANKCLIP partially addresses the first issue with a list-wise Plackett-Luce ranking-consistency loss, but its model is strictly first-order and inherits the second weakness untouched. We propose DINORANKCLIP, a pretraining framework that addresses both jointly. Our principal contribution is injecting a frozen DINOv3 teacher into the contrastive trunk through a dual-branch lightweight student and a multi-scale fusion module with channel-spatial attention, a self-attention refiner, and a conflict-aware gate that preserves the cross-modal alignment up to first order. Complementarily, we introduce a high-order Plackett-Luce ranking model in which the per-position utility is augmented with attention-parameterised pairwise and tuple-wise transition terms; the family contains CLIP and RANKCLIP as nested zero-order and first-order special cases, and the optimal order on every benchmark is $R^*=3$. The full empirical study -- order sweep, Fine-grained Probe on five datasets, four-node Modality-Gap analysis, six-variant Fusion ablation -- fits in 72 hours on a single eight-GPU H100 node and trains entirely on Conceptual Captions 3M. DINORANKCLIP consistently outperforms CLIP, CyCLIP, ALIP, and RANKCLIP under matched compute, with the largest relative gains on the fine-grained and out-of-distribution evaluations that most directly stress local structural reasoning.

preprint2026arXiv

Learning to Decode in Parallel: Self-Coordinating Neural Network for Real-Time Quantum Error Correction

Fast, reliable decoders are pivotal components for enabling fault-tolerant quantum computation (FTQC). Neural network decoders like AlphaQubit have demonstrated potential, achieving higher accuracy than traditional human-designed decoding algorithms. However, existing implementations of neural network decoders lack the parallelism required to decode the syndrome stream generated by a superconducting logical qubit in real time. Moreover, integrating AlphaQubit with sliding window-based parallel decoding schemes presents non-trivial challenges: AlphaQubit is trained solely to output a single bit corresponding to the global logical correction for an entire memory experiment, rather than local physical corrections that can be easily integrated. We address this issue by training a recurrent, transformer-based neural network specifically tailored for parallel window decoding. While it still outputs a single bit, we derive training labels from a consistent set of local corrections and train on various types of decoding windows simultaneously. This approach enables the network to self-coordinate across neighboring windows, facilitating high-accuracy parallel decoding of arbitrarily long memory experiments. As a result, we overcome the throughput bottleneck that previously precluded the use of AlphaQubit-type decoders in FTQC. Our work presents the first scalable, neural-network-based parallel decoding framework that simultaneously achieves SOTA accuracy and the stringent throughput required for real-time quantum error correction. Using an end-to-end experimental workflow, we benchmark our decoder on the Zuchongzhi 3.2 superconducting quantum processor on surface codes with distances up to 7, demonstrating its superior accuracy. Moreover, we demonstrate that, using our approach, a single TPU v6e is capable of decoding surface codes with distances up to 25 within 1us per decoding round.

preprint2026arXiv

LinguaGame: A Linguistically Grounded Game-Theoretic Paradigm for Multi-Agent Dialogue Generation

Large Language Models (LLMs) have enabled Multi-Agent Systems (MASs) where agents interact through natural language to solve complex tasks or simulate multi-party dialogues. Recent work on LLM-based MASs has mainly focused on architecture design, such as role assignment and workflow orchestration. In contrast, this paper targets the interaction process itself, aiming to improve agents' communication efficiency by helping them convey their intended meaning more effectively through language. To this end, we propose LinguaGame, a linguistically-grounded game-theoretic paradigm for multi-agent dialogue generation. Our approach models dialogue as a signalling game over communicative intents and strategies, solved with a training-free equilibrium approximation algorithm for inference-time decision adjustment. Unlike prior game-theoretic MASs, whose game designs are often tightly coupled with task-specific objectives, our framework relies on linguistically informed reasoning with minimal task-specific coupling. Specifically, it treats dialogue as intentional and strategic communication, requiring agents to infer what others aim to achieve (intents) and how they pursue those goals (strategies). We evaluate our framework in simulated courtroom proceedings and debates, with human expert assessments showing significant gains in communication efficiency.

preprint2026arXiv

ScribbleEdit: Synthetic Data for Image Editing with Scribbles and Text

Recent progress in generative models has significantly advanced image editing capabilities, yet precise and intuitive user control remains difficult. Specifically, users often struggle to communicate both exact spatial layouts and specific semantic details simultaneously. While natural language instructions effectively convey high-level semantics like texture and color, they lack spatial specificity. Conversely, freehand scribbles provide rough spatial boundaries but cannot express detailed visual attributes. Consequently, achieving precise control requires combining both modalities. However, existing models struggle to jointly interpret abstract scribbles alongside text due to a lack of specialized training data. In this work, we introduce ScribbleEdit, a large-scale synthetic dataset designed to bridge this gap by combining natural language instructions with freehand scribble inputs for more accurate, controllable edits. We construct this dataset through a synthetic pipeline that automatically generates source-target image pairs via inpainting, which are then paired with human-drawn scribbles and VLM-generated text instructions. Using ScribbleEdit, we evaluate and finetune both diffusion-based and autoregressive unified multimodal image editing models. Our experiments reveal that while off-the-shelf models struggle with abstract scribble inputs, finetuning on our synthetic dataset significantly improves their ability to generate spatially aligned and semantically consistent edits.

preprint2026arXiv

SRAW-Attack: Space-Reweighted Adversarial Warping Attack for SAR Target Recognition

Synthetic aperture radar (SAR) imagery exhibits intrinsic information sparsity due to its unique electromagnetic scattering mechanism. Despite the widespread adoption of deep neural network (DNN)-based SAR automatic target recognition (SAR-ATR) systems, they remain vulnerable to adversarial examples and tend to over-rely on background regions, leading to degraded adversarial robustness. Existing adversarial attacks for SAR-ATR often require visually perceptible distortions to achieve effective performance, thereby necessitating an attack method that balances effectiveness and stealthiness. In this paper, a novel attack method termed Space-Reweighted Adversarial Warping (SRAW) is proposed, which generates adversarial examples through optimized spatial deformation with reweighted budgets across foreground and background regions. Extensive experiments demonstrate that SRAW significantly degrades the performance of state-of-the-art SAR-ATR models and consistently outperforms existing methods in terms of imperceptibility and adversarial transferability. Code is made available at https://github.com/boremycin/SAR-ATR-TransAttack.

preprint2026arXiv

Table as a Modality for Large Language Models

To migrate the remarkable successes of Large Language Models (LLMs), the community has made numerous efforts to generalize them to the table reasoning tasks for the widely deployed tabular data. Despite that, in this work, by showing a probing experiment on our proposed StructQA benchmark, we postulate that even the most advanced LLMs (such as GPTs) may still fall short of coping with tabular data. More specifically, the current scheme often simply relies on serializing the tabular data, together with the meta information, then inputting them through the LLMs. We argue that the loss of structural information is the root of this shortcoming. In this work, we further propose TAMO, which bears an ideology to treat the tables as an independent modality integrated with the text tokens. The resulting model in TAMO is a multimodal framework consisting of a hypergraph neural network as the global table encoder seamlessly integrated with the mainstream LLM. Empirical results on various benchmarking datasets, including HiTab, WikiTQ, WikiSQL, FeTaQA, and StructQA, have demonstrated significant improvements on generalization with an average relative gain of 42.65%.

preprint2025arXiv

Improving Few-Shot Change Detection Visual Question Answering via Decision-Ambiguity-guided Reinforcement Fine-Tuning

Change detection visual question answering (CDVQA) requires answering text queries by reasoning about semantic changes in bi-temporal remote sensing images. A straightforward approach is to boost CDVQA performance with generic vision-language models via supervised fine-tuning (SFT). Despite recent progress, we observe that a significant portion of failures do not stem from clearly incorrect predictions, but from decision ambiguity, where the model assigns similar confidence to the correct answer and strong distractors. To formalize this challenge, we define Decision-Ambiguous Samples (DAS) as instances with a small probability margin between the ground-truth answer and the most competitive alternative. We argue that explicitly optimizing DAS is crucial for improving the discriminability and robustness of CDVQA models. To this end, we propose DARFT, a Decision-Ambiguity-guided Reinforcement Fine-Tuning framework that first mines DAS using an SFT-trained reference policy and then applies group-relative policy optimization on the mined subset. By leveraging multi-sample decoding and intra-group relative advantages, DARFT suppresses strong distractors and sharpens decision boundaries without additional supervision. Extensive experiments demonstrate consistent gains over SFT baselines, particularly under few-shot settings.

preprint2022arXiv

Caption Feature Space Regularization for Audio Captioning

Audio captioning aims at describing the content of audio clips with human language. Due to the ambiguity of audio, different people may perceive the same audio differently, resulting in caption disparities (i.e., one audio may correlate to several captions with diverse semantics). For that, general audio captioning models achieve the one-to-many training by randomly selecting a correlated caption as the ground truth for each audio. However, it leads to a significant variation in the optimization directions and weakens the model stability. To eliminate this negative effect, in this paper, we propose a two-stage framework for audio captioning: (i) in the first stage, via the contrastive learning, we construct a proxy feature space to reduce the distances between captions correlated to the same audio, and (ii) in the second stage, the proxy feature space is utilized as additional supervision to encourage the model to be optimized in the direction that benefits all the correlated captions. We conducted extensive experiments on two datasets using four commonly used encoder and decoder architectures. Experimental results demonstrate the effectiveness of the proposed method. The code is available at https://github.com/PRIS-CV/Caption-Feature-Space-Regularization.

preprint2022arXiv

EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks

Distributed data-parallel training has been widely adopted for deep neural network (DNN) models. Although current deep learning (DL) frameworks scale well for dense models like image classification models, we find that these DL frameworks have relatively low scalability for sparse models like natural language processing (NLP) models that have highly sparse embedding tables. Most existing works overlook the sparsity of model parameters thus suffering from significant but unnecessary communication overhead. In this paper, we propose EmbRace, an efficient communication framework to accelerate communications of distributed training for sparse models. EmbRace introduces Sparsity-aware Hybrid Communication, which integrates AlltoAll and model parallelism into data-parallel training, so as to reduce the communication overhead of highly sparse parameters. To effectively overlap sparse communication with both backward and forward computation, EmbRace further designs a 2D Communication Scheduling approach which optimizes the model computation procedure, relaxes the dependency of embeddings, and schedules the sparse communications of each embedding row with a priority queue. We have implemented a prototype of EmbRace based on PyTorch and Horovod, and conducted comprehensive evaluations with four representative NLP models. Experimental results show that EmbRace achieves up to 2.41X speedup compared to the state-of-the-art distributed training baselines.

preprint2022arXiv

Heterogeneous Global Graph Neural Networks for Personalized Session-based Recommendation

Predicting the next interaction of a short-term interaction session is a challenging task in session-based recommendation. Almost all existing works rely on item transition patterns, and neglect the impact of user historical sessions while modeling user preference, which often leads to non-personalized recommendation. Additionally, existing personalized session-based recommenders capture user preference only based on the sessions of the current user, but ignore the useful item-transition patterns from other user's historical sessions. To address these issues, we propose a novel Heterogeneous Global Graph Neural Networks (HG-GNN) to exploit the item transitions over all sessions in a subtle manner for better inferring user preference from the current and historical sessions. To effectively exploit the item transitions over all sessions from users, we propose a novel heterogeneous global graph that contains item transitions of sessions, user-item interactions and global co-occurrence items. Moreover, to capture user preference from sessions comprehensively, we propose to learn two levels of user representations from the global graph via two graph augmented preference encoders. Specifically, we design a novel heterogeneous graph neural network (HGNN) on the heterogeneous global graph to learn the long-term user preference and item representations with rich semantics. Based on the HGNN, we propose the Current Preference Encoder and the Historical Preference Encoder to capture the different levels of user preference from the current and historical sessions, respectively. To achieve personalized recommendation, we integrate the representations of the user current preference and historical interests to generate the final user preference representation. Extensive experimental results on three real-world datasets show that our model outperforms other state-of-the-art methods.

preprint2022arXiv

High thermoelectric performances in PbP monolayers considering full electron-phonon coupling and four-phonon scattering processes

The band convergence strategy, which improves Seebeck coefficient by inducing multi-valley in bandstructures, has been widely used in thermoelectric performance (TE) enhancing. However, the phonon-assisted intervalley scattering effect is neglected and the mode-selection rules remain unclear. In this work, TE properties for $α$-, $β$- and $γ$-PbP are intestigated under the consideration of full mode-, energy- and momentum-resolved electron-phonon interactions (EPI). The group theory is used to analyze the selection rules for EPI matrix elements. Our calculations reveal that, the intervalley scattering contributes non-trivially to the total carrier relaxation time, and the intervalley scattering can be modulated through crystal symmetry. In addition, the investigation on the thermal properties reveals that four-phonon scattering effect dominates the phonon relaxation processes, since the three-phonon scattering is suppressed due to the significantly large acoustic-optical phonon bandgap in $α$-, $β$- and $γ$-PbP. By considering full EPI effect and high-order phonon scattering processes, the calculated ZT values reach 0.90, 0.24 and 1.25 for $α$-, $β$- and $γ$-PbP, repectively, indicating their promising applications in thermoelectric devices.

preprint2022arXiv

Intention Adaptive Graph Neural Network for Category-aware Session-based Recommendation

Session-based recommendation (SBR) is proposed to recommend items within short sessions given that user profiles are invisible in various scenarios nowadays, such as e-commerce and short video recommendation. There is a common scenario that user specifies a target category of items as a global filter, however previous SBR settings mainly consider the item sequence and overlook the rich target category information. Therefore, we define a new task called Category-aware Session-Based Recommendation (CSBR), focusing on the above scenario, in which the user-specified category can be efficiently utilized by the recommendation system. To address the challenges of the proposed task, we develop a novel method called Intention Adaptive Graph Neural Network (IAGNN), which takes advantage of relationship between items and their categories to achieve an accurate recommendation result. Specifically, we construct a category-aware graph with both item and category nodes to represent the complex transition information in the session. An intention-adaptive graph neural network on the category-aware graph is utilized to capture user intention by transferring the historical interaction information to the user-specified category domain. Extensive experiments on three real-world datasets are conducted to show our IAGNN outperforms the state-of-the-art baselines in the new task.

preprint2022arXiv

Measuring the knot of non-Hermitian degeneracies and non-commuting braids

Any system of coupled oscillators may be characterized by its spectrum of resonance frequencies (or eigenfrequencies), which can be tuned by varying the system's parameters. The relationship between control parameters and the eigenfrequency spectrum is central to a range of applications. However, fundamental aspects of this relationship remain poorly understood. For example, if the controls are varied along a path that returns to its starting point (i.e., around a "loop"), the system's spectrum must return to itself. In systems that are Hermitian (i.e., lossless and reciprocal) this process is trivial, and each resonance frequency returns to its original value. However, in non-Hermitian systems, where the eigenfrequencies are complex, the spectrum may return to itself in a topologically non-trivial manner, a phenomenon known as spectral flow. The spectral flow is determined by how the control loop encircles degeneracies, and this relationship is well understood for $N=2$ (where $N$ is the number of oscillators in the system). Here we extend this description to arbitrary $N$. We show that control loops generically produce braids of eigenfrequencies, and for $N>2$ these braids form a non-Abelian group which reflects the non-trivial geometry of the space of degeneracies. We demonstrate these features experimentally for $N=3$ using a cavity optomechanical system.

preprint2022arXiv

Multiple Choice Questions based Multi-Interest Policy Learning for Conversational Recommendation

Conversational recommendation system (CRS) is able to obtain fine-grained and dynamic user preferences based on interactive dialogue. Previous CRS assumes that the user has a clear target item. However, for many users who resort to CRS, they might not have a clear idea about what they really like. Specifically, the user may have a clear single preference for some attribute types (e.g. color) of items, while for other attribute types, the user may have multiple preferences or even no clear preferences, which leads to multiple acceptable attribute instances (e.g. black and red) of one attribute type. Therefore, the users could show their preferences over items under multiple combinations of attribute instances rather than a single item with unique combination of all attribute instances. As a result, we first propose a more realistic CRS learning setting, namely Multi-Interest Multi-round Conversational Recommendation, where users may have multiple interests in attribute instance combinations and accept multiple items with partially overlapped combinations of attribute instances. To effectively cope with the new CRS learning setting, in this paper, we propose a novel learning framework namely, Multi-Choice questions based Multi-Interest Policy Learning . In order to obtain user preferences more efficiently, the agent generates multi-choice questions rather than binary yes/no ones on specific attribute instance. Besides, we propose a union set strategy to select candidate items instead of existing intersection set strategy in order to overcome over-filtering items during the conversation. Finally, we design a Multi-Interest Policy Learning module, which utilizes captured multiple interests of the user to decide next action, either asking attribute instances or recommending items. Extensive experimental results on four datasets verify the superiority of our method for the proposed setting.

preprint2022arXiv

Nonholonomic dynamics and control of road vehicles: moving toward automation

Nonholonomic models of automobiles are developed by utilizing tools of analytical mechanics, in particular the Appellian approach that allows one to describe the vehicle dynamics with minimum number of time-dependent state variables. The models are categorized based on how they represent the wheel-ground contact, whether they incorporate the longitudinal dynamics, and whether they consider the steering dynamics. It is demonstrated that the developed models can be used to design low-complexity controllers that enable automated vehicles to execute a large variety of maneuvers with high precision.

preprint2022arXiv

Towards Unifying the Label Space for Aspect- and Sentence-based Sentiment Analysis

The aspect-based sentiment analysis (ABSA) is a fine-grained task that aims to determine the sentiment polarity towards targeted aspect terms occurring in the sentence. The development of the ABSA task is very much hindered by the lack of annotated data. To tackle this, the prior works have studied the possibility of utilizing the sentiment analysis (SA) datasets to assist in training the ABSA model, primarily via pretraining or multi-task learning. In this article, we follow this line, and for the first time, we manage to apply the Pseudo-Label (PL) method to merge the two homogeneous tasks. While it seems straightforward to use generated pseudo labels to handle this case of label granularity unification for two highly related tasks, we identify its major challenge in this paper and propose a novel framework, dubbed as Dual-granularity Pseudo Labeling (DPL). Further, similar to PL, we regard the DPL as a general framework capable of combining other prior methods in the literature. Through extensive experiments, DPL has achieved state-of-the-art performance on standard benchmarks surpassing the prior work significantly.

preprint2022arXiv

Virtual Displacement based Discontinuity Layout Optimization

Discontinuity layout optimization (DLO) is a relatively new upper bound limit analysis method. Compared to classic topology optimization methods, aimed at obtaining the optimum design of a structure by considering its self-weight, building cost or bearing capacity, DLO optimizes the failure pattern of the structure under specific loading conditions and constraints by minimizing the dissipation energy. In this work, we present a modified DLO algorithm that contains all of the advantages of DLO. It is referred to virtual displacement-based discontinuity layout optimization (VDLO). VDLO takes the stress state of a loaded structure as a snapshot and correspondingly provides the optimum failure pattern, which greatly extends the application potential of DLO. Numerical examples indicate the effectiveness and flexibility of VDLO. It is regarded as a highly promising supplemental tool for other numerical methods in element-/node-based frameworks.

preprint2021arXiv

Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations

Online conversations can go in many directions: some turn out poorly due to antisocial behavior, while others turn out positively to the benefit of all. Research on improving online spaces has focused primarily on detecting and reducing antisocial behavior. Yet we know little about positive outcomes in online conversations and how to increase them-is a prosocial outcome simply the lack of antisocial behavior or something more? Here, we examine how conversational features lead to prosocial outcomes within online discussions. We introduce a series of new theory-inspired metrics to define prosocial outcomes such as mentoring and esteem enhancement. Using a corpus of 26M Reddit conversations, we show that these outcomes can be forecasted from the initial comment of an online conversation, with the best model providing a relative 24% improvement over human forecasting performance at ranking conversations for predicted outcome. Our results indicate that platforms can use these early cues in their algorithmic ranking of early conversations to prioritize better outcomes.

preprint2021arXiv

MAX Phase Zr2SeC and Its Thermal Conduction Behavior

The elemental diversity is crucial to screen out ternary MAX phases with outstanding properties via tuning of bonding types and strength between constitutive atoms. As a matter of fact, the interactions between M and A atoms largely determine the physical and chemical properties of MAX phases. Herein, Se element was experimentally realized to occupy the A site of a MAX phase, Zr2SeC, becoming a new member within this nanolaminated ternary carbide family. Comprehensive characterizations including Rietveld refinement of X-ray Diffraction and atom-resolved transmission electron microscopy techniques were employed to validate this novel MAX phase. The distinct thermal conduction behaviors emerged are attributed to the characteristic interactions between Zr and Se atoms.

preprint2021arXiv

Temporal aware Multi-Interest Graph Neural Network For Session-based Recommendation

Session-based recommendation (SBR) is a challenging task, which aims at recommending next items based on anonymous interaction sequences. Despite the superior performance of existing methods for SBR, there are still several limitations: (i) Almost all existing works concentrate on single interest extraction and fail to disentangle multiple interests of user, which easily results in suboptimal representations for SBR. (ii) Furthermore, previous methods also ignore the multi-form temporal information, which is significant signal to obtain current intention for SBR. To address the limitations mentioned above, we propose a novel method, called \emph{Temporal aware Multi-Interest Graph Neural Network} (TMI-GNN) to disentangle multi-interest and yield refined intention representations with the injection of two level temporal information. Specifically, by appending multiple interest nodes, we construct a multi-interest graph for current session, and adopt the GNNs to model the item-item relation to capture adjacent item transitions, item-interest relation to disentangle the multi-interests, and interest-item relation to refine the item representation. Meanwhile, we incorporate item-level time interval signals to guide the item information propagation, and interest-level time distribution information to assist the scattering of interest information. Experiments on three benchmark datasets demonstrate that TMI-GNN outperforms other state-of-the-art methods consistently.

preprint2020arXiv

$α$-Satellite: An AI-driven System and Benchmark Datasets for Hierarchical Community-level Risk Assessment to Help Combat COVID-19

The novel coronavirus and its deadly outbreak have posed grand challenges to human society: as of March 26, 2020, there have been 85,377 confirmed cases and 1,293 reported deaths in the United States; and the World Health Organization (WHO) characterized coronavirus disease (COVID-19) - which has infected more than 531,000 people with more than 24,000 deaths in at least 171 countries - a global pandemic. A growing number of areas reporting local sub-national community transmission would represent a significant turn for the worse in the battle against the novel coronavirus, which points to an urgent need for expanded surveillance so we can better understand the spread of COVID-19 and thus better respond with actionable strategies for community mitigation. By advancing capabilities of artificial intelligence (AI) and leveraging the large-scale and real-time data generated from heterogeneous sources (e.g., disease related data from official public health organizations, demographic data, mobility data, and user geneated data from social media), in this work, we propose and develop an AI-driven system (named $α$-Satellite}, as an initial offering, to provide hierarchical community-level risk assessment to assist with the development of strategies for combating the fast evolving COVID-19 pandemic. More specifically, given a specific location (either user input or automatic positioning), the developed system will automatically provide risk indexes associated with it in a hierarchical manner (e.g., state, county, city, specific location) to enable individuals to select appropriate actions for protection while minimizing disruptions to daily life to the extent possible. The developed system and the generated benchmark datasets have been made publicly accessible through our website. The system description and disclaimer are also available in our website.

preprint2020arXiv

A micropolar peridynamics model with non-unified horizon for damage of solids with different non-local effects

Most peridynamics models adopt regular point distribution and unified horizon, limiting their flexibility and engineering applications. In this work, a micropolar peridynamics approach with non-unified horizon (NHPD) is proposed. This approach is implemented in a conventional finite element framework, using element-based discretization. By modifying the dual horizon approach into the pre-processing part, point dependent horizon and non-unified beam-like bonds are built. By implementing a domain correction strategy, the equivalence of strain energy density is assured. Then, a novel energy density-based failure criterion is presented which directly bridges the critical stretch to the mechanical strength. The numerical results indicate the weak mesh dependency of NHPD and the effectiveness of the new failure criterion. Moreover, it is proven that damage of solid with different non-local effects can lead to similar results by only adjusting the mechanical strength.

preprint2020arXiv

Advances in Bayesian Probabilistic Modeling for Industrial Applications

Industrial applications frequently pose a notorious challenge for state-of-the-art methods in the contexts of optimization, designing experiments and modeling unknown physical response. This problem is aggravated by limited availability of clean data, uncertainty in available physics-based models and additional logistic and computational expense associated with experiments. In such a scenario, Bayesian methods have played an impactful role in alleviating the aforementioned obstacles by quantifying uncertainty of different types under limited resources. These methods, usually deployed as a framework, allows decision makers to make informed choices under uncertainty while being able to incorporate information on the the fly, usually in the form of data, from multiple sources while being consistent with the physical intuition about the problem. This is a major advantage that Bayesian methods bring to fruition especially in the industrial context. This paper is a compendium of the Bayesian modeling methodology that is being consistently developed at GE Research. The methodology, called GE's Bayesian Hybrid Modeling (GEBHM), is a probabilistic modeling method, based on the Kennedy and O'Hagan framework, that has been continuously scaled-up and industrialized over several years. In this work, we explain the various advancements in GEBHM's methods and demonstrate their impact on several challenging industrial problems.

preprint2020arXiv

AinnoSeg: Panoramic Segmentation with High Perfomance

Panoramic segmentation is a scene where image segmentation tasks is more difficult. With the development of CNN networks, panoramic segmentation tasks have been sufficiently developed.However, the current panoramic segmentation algorithms are more concerned with context semantics, but the details of image are not processed enough. Moreover, they cannot solve the problems which contains the accuracy of occluded object segmentation,little object segmentation,boundary pixel in object segmentation etc. Aiming to address these issues, this paper presents some useful tricks. (a) By changing the basic segmentation model, the model can take into account the large objects and the boundary pixel classification of image details. (b) Modify the loss function so that it can take into account the boundary pixels of multiple objects in the image. (c) Use a semi-supervised approach to regain control of the training process. (d) Using multi-scale training and reasoning. All these operations named AinnoSeg, AinnoSeg can achieve state-of-art performance on the well-known dataset ADE20K.

preprint2020arXiv

Data-Informed Decomposition for Localized Uncertainty Quantification of Dynamical Systems

Industrial dynamical systems often exhibit multi-scale response due to material heterogeneities, operation conditions and complex environmental loadings. In such problems, it is the case that the smallest length-scale of the systems dynamics controls the numerical resolution required to effectively resolve the embedded physics. In practice however, high numerical resolutions is only required in a confined region of the system where fast dynamics or localized material variability are exhibited, whereas a coarser discretization can be sufficient in the rest majority of the system. To this end, a unified computational scheme with uniform spatio-temporal resolutions for uncertainty quantification can be very computationally demanding. Partitioning the complex dynamical system into smaller easier-to-solve problems based of the localized dynamics and material variability can reduce the overall computational cost. However, identifying the region of interest for high-resolution and intensive uncertainty quantification can be a problem dependent. The region of interest can be specified based on the localization features of the solution, user interest, and correlation length of the random material properties. For problems where a region of interest is not evident, Bayesian inference can provide a feasible solution. In this work, we employ a Bayesian framework to update our prior knowledge on the localized region of interest using measurements and system response. To address the computational cost of the Bayesian inference, we construct a Gaussian process surrogate for the forward model. Once, the localized region of interest is identified, we use polynomial chaos expansion to propagate the localization uncertainty. We demonstrate our framework through numerical experiments on a three-dimensional elastodynamic problem.

preprint2020arXiv

Direct dissipation-based arc-length approach for the cracking elements method

Dissipated energy, representing a monotonically increasing state variable in nonlinear fracture mechanics, can be used as a restraint for tracing the dissipation instead of the elastic unloading path of the structure response. In this work, in contrast to other energy-based approaches that use internal energy and the work done by the external loads, a novel arc-length approach is proposed. It directly extracts the dissipated energy based on crack openings and tractions (displacement jumps and cohesive forces between two surfaces of one crack), taking advantage of the global/extended method of cracking elements. Its linearized form is developed, and the stiffness factor of the arc-length restraint is naturally obtained by means of the Sherman-Morrison formula. Once cohesive cracks appear, the proposed approach can be applied until most of the fracture energy is dissipated. Results from several numerical tests, in which arc-length control and self-propagating cracks are jointly used, are presented. They demonstrate the robustness of the proposed method, which captures both global and local peak loads and all snap-back parts of the force-displacement responses of loaded structures with multiple cracks.

preprint2020arXiv

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection

In this paper, we address the 3D object detection task by capturing multi-level contextual information with the self-attention mechanism and multi-scale feature fusion. Most existing 3D object detection methods recognize objects individually, without giving any consideration on contextual information between these objects. Comparatively, we propose Multi-Level Context VoteNet (MLCVNet) to recognize 3D objects correlatively, building on the state-of-the-art VoteNet. We introduce three context modules into the voting and classifying stages of VoteNet to encode contextual information at different levels. Specifically, a Patch-to-Patch Context (PPC) module is employed to capture contextual information between the point patches, before voting for their corresponding object centroid points. Subsequently, an Object-to-Object Context (OOC) module is incorporated before the proposal and classification stage, to capture the contextual information between object candidates. Finally, a Global Scene Context (GSC) module is designed to learn the global scene context. We demonstrate these by capturing contextual information at patch, object and scene levels. Our method is an effective way to promote detection accuracy, achieving new state-of-the-art detection performance on challenging 3D object detection datasets, i.e., SUN RGBD and ScanNet. We also release our code at https://github.com/NUAAXQ/MLCVNet.

preprint2020arXiv

The role of Hume-Rothery's rules play in the MAX phases formability

MAX phases are a family of layered, hexagonal-structure ternary carbides or nitrides of a transitional metal and an A-group element. What makes this type of material fascinating and potentially useful is their remarkable combinations of metallic and ceramic characteristics; as well as the indispensable role in 'top-down' synthesis of their 2D counterparts, MXenes. To enhance the efficiency in the successful search for potential novel MAX phases, the main efforts could go toward creating an informationprediction system incorporating all MAX phases' databases, as well as generally valid principles and the high-quality regularities. In this work, we employ structure mapping methodology, which has shown its merit of being useful guides in materials design, with Hume-Rothery parameters to provide guiding principles in the search of novel MAX phases. The formable/non-formable data on MAX phases can be ordered within a twodimensional plot by using proposed expression of geometrical and electron concentration factors.

preprint2019arXiv

Multielemental single-atom-thick A layers in nanolaminated V2(Sn, A)C (A=Fe, Co, Ni, Mn) for tailoring magnetic properties

Tailoring of individual single-atom-thick layers in nanolaminated materials offers atomic-level control over material properties. Nonetheless, multielement alloying in individual atomic layers in nanolaminates is largely unexplored. Here, we report a series of inherently nanolaminated V2(A'xSn1-x)C (A'=Fe, Co, Ni and Mn, and combinations thereof, with x=1/3) synthesized by an alloy-guided reaction. The simultaneous occupancy of the four magnetic elements and Sn, the individual single-atom-thick A layers in the compound constitute high-entropy-alloy analogues, two-dimensional in the sense that the alloying exclusively occurs in the A layers. V2(A'xSn1-x)C exhibit distinct ferromagnetic behavior that can be compositionally tailored from the multielement A-layer alloying. This two-dimensional alloying provides a structural-design route with expanded chemical space for discovering materials and exploit properties.

preprint2008arXiv

Distinct Signatures For Coulomb Blockade and Aharonov-Bohm Interference in Electronic Fabry-Perot Interferometers

Two distinct types of magnetoresistance oscillations are observed in two electronic Fabry-Perot interferometers of different sizes in the integer quantum Hall regime. Measuring these oscillations as a function of magnetic field and gate voltages, we observe three signatures that distinguish the two types. The oscillations observed in a 2.0 square micron device are understood to arise from the Coulomb blockade mechanism, and those observed in an 18 square micron device from the Aharonov-Bohm mechanism. This work clarifies, provides ways to distinguish, and demonstrates control over, these distinct physical origins of resistance oscillations seen in electronic Fabry-Perot interferometers.

Yiming Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

30 published item(s)

DINORANKCLIP: DINOv3 Distillation and Injection for Vision-Language Pretraining with High-Order Ranking Consistency

Learning to Decode in Parallel: Self-Coordinating Neural Network for Real-Time Quantum Error Correction

LinguaGame: A Linguistically Grounded Game-Theoretic Paradigm for Multi-Agent Dialogue Generation

ScribbleEdit: Synthetic Data for Image Editing with Scribbles and Text

SRAW-Attack: Space-Reweighted Adversarial Warping Attack for SAR Target Recognition

Table as a Modality for Large Language Models

Improving Few-Shot Change Detection Visual Question Answering via Decision-Ambiguity-guided Reinforcement Fine-Tuning

Caption Feature Space Regularization for Audio Captioning

EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks

Heterogeneous Global Graph Neural Networks for Personalized Session-based Recommendation

High thermoelectric performances in PbP monolayers considering full electron-phonon coupling and four-phonon scattering processes

Intention Adaptive Graph Neural Network for Category-aware Session-based Recommendation

Measuring the knot of non-Hermitian degeneracies and non-commuting braids

Multiple Choice Questions based Multi-Interest Policy Learning for Conversational Recommendation

Nonholonomic dynamics and control of road vehicles: moving toward automation

Towards Unifying the Label Space for Aspect- and Sentence-based Sentiment Analysis

Virtual Displacement based Discontinuity Layout Optimization

Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations

MAX Phase Zr2SeC and Its Thermal Conduction Behavior

Temporal aware Multi-Interest Graph Neural Network For Session-based Recommendation

$α$-Satellite: An AI-driven System and Benchmark Datasets for Hierarchical Community-level Risk Assessment to Help Combat COVID-19

A micropolar peridynamics model with non-unified horizon for damage of solids with different non-local effects

Advances in Bayesian Probabilistic Modeling for Industrial Applications

AinnoSeg: Panoramic Segmentation with High Perfomance

Data-Informed Decomposition for Localized Uncertainty Quantification of Dynamical Systems

Direct dissipation-based arc-length approach for the cracking elements method

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection

The role of Hume-Rothery's rules play in the MAX phases formability

Multielemental single-atom-thick A layers in nanolaminated V2(Sn, A)C (A=Fe, Co, Ni, Mn) for tailoring magnetic properties

Distinct Signatures For Coulomb Blockade and Aharonov-Bohm Interference in Electronic Fabry-Perot Interferometers