Researcher profile

Yilong Yang

Yilong Yang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation

Large vision-language models (LVLMs) have emerged as a powerful paradigm for multimodal intelligence, but their growing deployment also expands the attack surface of prompt injection. Despite this growing concern, existing attacks still suffer from a critical limitation: the injected prompt for one modality only steers the model's interpretation of that singular input. Alternatively, these attacks remain multimodal but fail to achieve cross-modal prompt perturbation. To bridge this gap, we introduce a novel cross-modal prompt injection attack CrossMPI, which can steer the model's interpretation of both textual and visual inputs via image-only prompt injection. Our design is underpinned by the following key breakthroughs. First, we turn the focus of the injected prompt perturbation optimization from the visual embedding space (typically with only $10^5$ parameters) to the model hidden state space (for multimodal information integration and with $10^7$ parameters). Then, two strategies are adopted to mitigate the optimization challenges posed by the larger parameter space. To constrain the optimized model parameter space, we introduce a layer selection strategy that identifies the layers most critical to multimodal integration. Interestingly, deviating from the past experience, our analysis reveals that the optimal layers for LVLM prompt perturbation reside in the middle of the model rather than the last. To constrain the image perturbation space, we propose a new distance-decremental perturbation budget assignment strategy that allocates budgets decrementally as the pixel distance to semantic-critical regions increases. Extensive experiments across multiple LVLMs and datasets show that our method significantly outperforms baseline approaches.

preprint2026arXiv

On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models

Recent advances in image-to-3D models have significantly improved the fidelity and accessibility of 3D content creation. Such a powerful reconstruction capability that enables creative design can also be misused by the adversary to generate harmful geometries, which can be further fabricated via 3D printers and pose real-world risks. However, such risks are largely underexplored: it remains unclear how well current image-to-3D models can produce these harmful geometries, and whether existing safeguards can reliably prevent such generation. To fill this gap, we conduct a systematic measurement study of harmful geometry generation and mitigation. We first describe this risk through three kinds of unsafe categories: direct-use physical hazards, risky templates or components, and deceptive replicas. Each category is instantiated with representative objects. We evaluate both open-source and commercial image-to-3D models under original, degraded, viewpoint-shifted, and semantically camouflaged inputs. We consider different evaluation metrics, including geometric validity, multi-view VLM-based semantic scoring, targeted human validation, and controlled physical fabrication. The results reveal a concerning reality that current image-to-3D models can effectively reconstruct the harmful geometries, while fewer than 0.3% of such geometries trigger commercial moderation flags. As a first step toward mitigation, we evaluate three representative safeguard families, including input moderation, model-level benign alignment, and output-level filtering. We find that existing safeguards have distinct weaknesses. We further develop a stacked defense that can reduce harmful retention to <1%, but still at 11% overall false-positive cost. Taken together, our findings demonstrate that the risk in current system and encourage better geometry-aware safeguards for moderation.

preprint2022arXiv

Automated Enterprise Applications Generation from Requirements Model

Enterprise applications can be automatically generated from a sophisticated OO design model based on model-driven approach. The design model contains information about how to decompose the system into components, how to encapsulate the system operations into classes, and how the objects of classes collaborate to fulfill the functionality of the system operations. However, the efforts to build the design model from a validated requirements model are not proportional to the return. In practice, it is very desirable to have an approach that can automatically generate standardized enterprise applications directly from the validated requirements models. In this paper, we propose an approach named RM2EA, which can reach this goal based on the contract-based requirements model. We demonstrate the proposed approach through 13 case studies. The evaluation result shows that the quality and efficiency of the generated applications are almost equal to the applications implemented by developers: firstly, we demonstrate that a popular type of enterprise applications (i.e., a Jakarta EE application) can be successfully generated by customizing and improving the set of rules; secondly, RM2EA can generate more readable or maintainable code; thirdly, the enterprise applications generated by RM2EA achieve similar performance in test results. Overall, the result is satisfactory, and the implementation of the proposed approach can be further enhanced and applied to software development in the industry.

preprint2022arXiv

Reduced Power Graphs of $\mathrm{PGL}_n(\mathbb{F}_q)$

Given a group $G$, let us connect two non-identity elements by an edge if and only if one is a power of another. This gives a graph structure on $G$ minus identity, called the reduced power graph. It is conjectured by Akbari and Ashrafi that if a non-abelian finite simple group has a connected reduced power graph, then it must be an alternating group. In this paper, we shall give a complete description of when the reduced power graphs of $\mathrm{PGL}_n(\mathbb{F}_q)$ are connected for all $q$ and all $n\geq 3$. In particular, the conjectured by Akbari and Ashrafi is false. We shall also provide an upper bound in their diameters, and in case of disconnection, provide a description of all connected components.

preprint2020arXiv

Cloud-based Federated Boosting for Mobile Crowdsensing

The application of federated extreme gradient boosting to mobile crowdsensing apps brings several benefits, in particular high performance on efficiency and classification. However, it also brings a new challenge for data and model privacy protection. Besides it being vulnerable to Generative Adversarial Network (GAN) based user data reconstruction attack, there is not the existing architecture that considers how to preserve model privacy. In this paper, we propose a secret sharing based federated learning architecture FedXGB to achieve the privacy-preserving extreme gradient boosting for mobile crowdsensing. Specifically, we first build a secure classification and regression tree (CART) of XGBoost using secret sharing. Then, we propose a secure prediction protocol to protect the model privacy of XGBoost in mobile crowdsensing. We conduct a comprehensive theoretical analysis and extensive experiments to evaluate the security, effectiveness, and efficiency of FedXGB. The results indicate that FedXGB is secure against the honest-but-curious adversaries and attains less than 1% accuracy loss compared with the original XGBoost model.