Source author record

Daqing He

Daqing He appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Information Retrieval Artificial Intelligence Cryptography and Security cs.CY Machine Learning physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

8works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning

Large language models (LLMs) have demonstrated remarkable capabilities across diverse applications, however, they remain critically vulnerable to jailbreak attacks that elicit harmful responses violating human values and safety guidelines. Despite extensive research on defense mechanisms, existing safeguards prove insufficient against sophisticated adversarial strategies. In this work, we propose iMIST (\underline{i}nteractive \underline{M}ulti-step \underline{P}rogre\underline{s}sive \underline{T}ool-disguised Jailbreak Attack), a novel adaptive jailbreak method that synergistically exploits vulnerabilities in current defense mechanisms. iMIST disguises malicious queries as normal tool invocations to bypass content filters, while simultaneously introducing an interactive progressive optimization algorithm that dynamically escalates response harmfulness through multi-turn dialogues guided by real-time harmfulness assessment. Our experiments on widely-used models demonstrate that iMIST achieves higher attack effectiveness, while maintaining low rejection rates. These results reveal critical vulnerabilities in current LLM safety mechanisms and underscore the urgent need for more robust defense strategies.

preprint2026arXiv

Retrieval--Reasoning Processes for Multi-hop Question Answering: A Four-Axis Design Framework and Empirical Trends

Multi-hop question answering (QA) requires systems to iteratively retrieve evidence and reason across multiple hops. While recent RAG and agentic methods report strong results, the underlying retrieval--reasoning \emph{process} is often left implicit, making procedural choices hard to compare across model families. This survey takes the execution procedure as the unit of analysis and introduces a four-axis framework covering (A) overall execution plan, (B) index structure, (C) next-step control (strategies and triggers), and (D) stop/continue criteria. Using this schema, we map representative multi-hop QA systems and synthesize reported ablations and tendencies on standard benchmarks (e.g., HotpotQA, 2WikiMultiHopQA, MuSiQue), highlighting recurring trade-offs among effectiveness, efficiency, and evidence faithfulness. We conclude with open challenges for retrieval--reasoning agents, including structure-aware planning, transferable control policies, and robust stopping under distribution shift.

preprint2022arXiv

Does Order Matter? An Empirical Study on Generating Multiple Keyphrases as a Sequence

Recently, concatenating multiple keyphrases as a target sequence has been proposed as a new learning paradigm for keyphrase generation. Existing studies concatenate target keyphrases in different orders but no study has examined the effects of ordering on models' behavior. In this paper, we propose several orderings for concatenation and inspect the important factors for training a successful keyphrase generation model. By running comprehensive comparisons, we observe one preferable ordering and summarize a number of empirical findings and challenges, which can shed light on future research on this line of work.

preprint2020arXiv

Concept Annotation for Intelligent Textbooks

With the increased popularity of electronic textbooks, there is a growing interests in developing a new generation of "intelligent textbooks", which have the ability to guide the readers according to their learning goals and current knowledge. The intelligent textbooks extend regular textbooks by integrating machine-manipulatable knowledge such as a knowledge map or a prerequisite-outcome relationship between sections, among which, the most popular integrated knowledge is a list of unique knowledge concepts associated with each section. With the help of this concept, multiple intelligent operations, such as content linking, content recommendation or student modeling, can be performed. However, annotating a reliable set of concepts to a textbook section is a challenge. Automatic unsupervised methods for extracting key-phrases as the concepts are known to have insufficient accuracy. Manual annotation by experts is considered as a preferred approach and can be used to produce both the target outcome and the labeled data for training supervised models. However, most researchers in education domain still consider the concept annotation process as an ad-hoc activity rather than an engineering task, resulting in low-quality annotated data. In this paper, we present a textbook knowledge engineering method to obtain reliable concept annotations. The outcomes of our work include a validated knowledge engineering procedure, a code-book for technical concept annotation, and a set of concept annotations for the target textbook, which could be used as gold standard in further research.

preprint2020arXiv

One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases

Different texts shall by nature correspond to different number of keyphrases. This desideratum is largely missing from existing neural keyphrase generation models. In this study, we address this problem from both modeling and evaluation perspectives. We first propose a recurrent generative model that generates multiple keyphrases as delimiter-separated sequences. Generation diversity is further enhanced with two novel techniques by manipulating decoder hidden states. In contrast to previous approaches, our model is capable of generating diverse keyphrases and controlling number of outputs. We further propose two evaluation metrics tailored towards the variable-number generation. We also introduce a new dataset StackEx that expands beyond the only existing genre (i.e., academic writing) in keyphrase generation tasks. With both previous and new evaluation metrics, our model outperforms strong baselines on all datasets.

preprint2015arXiv

User Participation in an Academic Social Networking Service: A Survey of Open Group Users on Mendeley

Although there are a number of social networking services that specifically target scholars, little has been published about the actual practices and the usage of these so-called academic social networking services (ASNSs). To fill this gap, we explore the populations of academics who engage in social activities using an ASNS; as an indicator of further engagement, we also determine their various motivations for joining a group in ASNSs. Using groups and their members in Mendeley as the platform for our case study, we obtained 146 participant responses from our online survey about users' common activities, usage habits, and motivations for joining groups. Our results show that 1) participants did not engage with social-based features as frequently and actively as they engaged with research-based features, and 2) users who joined more groups seemed to have a stronger motivation to increase their professional visibility and to contribute the research articles they had read to the group reading list. Our results generate interesting insights into Mendeley's user populations, their activities, and their motivations relative to the social features of Mendeley. We also argue that further design of ASNSs is needed to take greater account of disciplinary differences in scholarly communication and to establish incentive mechanisms for encouraging user participation.

preprint2014arXiv

Benchmarking the Privacy-Preserving People Search

People search is an important topic in information retrieval. Many previous studies on this topic employed social networks to boost search performance by incorporating either local network features (e.g. the common connections between the querying user and candidates in social networks), or global network features (e.g. the PageRank), or both. However, the available social network information can be restricted because of the privacy settings of involved users, which in turn would affect the performance of people search. Therefore, in this paper, we focus on the privacy issues in people search. We propose simulating different privacy settings with a public social network due to the unavailability of privacy-concerned networks. Our study examines the influences of privacy concerns on the local and global network features, and their impacts on the performance of people search. Our results show that: 1) the privacy concerns of different people in the networks have different influences. People with higher association (i.e. higher degree in a network) have much greater impacts on the performance of people search; 2) local network features are more sensitive to the privacy concerns, especially when such concerns come from high association peoples in the network who are also related to the querying user. As the first study on this topic, we hope to generate further discussions on these issues.

preprint2013arXiv

Automatic Detection of Search Tactic in Individual Information Seeking: A Hidden Markov Model Approach

Information seeking process is an important topic in information seeking behavior research. Both qualitative and empirical methods have been adopted in analyzing information seeking processes, with major focus on uncovering the latent search tactics behind user behaviors. Most of the existing works require defining search tactics in advance and coding data manually. Among the few works that can recognize search tactics automatically, they missed making sense of those tactics. In this paper, we proposed using an automatic technique, i.e. the Hidden Markov Model (HMM), to explicitly model the search tactics. HMM results show that the identified search tactics of individual information seeking behaviors are consistent with Marchioninis Information seeking process model. With the advantages of showing the connections between search tactics and search actions and the transitions among search tactics, we argue that HMM is a useful tool to investigate information seeking process, or at least it provides a feasible way to analyze large scale dataset.