Source author record

Heng Ji

Heng Ji appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence Machine Learning Computer Vision cond-mat.mtrl-sci cond-mat.str-el cs.CY Social and Information Networks Multimedia cond-mat.mes-hall hep-ex Human-Computer Interaction Information Retrieval Neural and Evolutionary Computing physics.ins-det

Catalog footprint

What is connected

40works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks but remain fundamentally static, unable to adapt their internal parameters to novel tasks, evolving knowledge domains, or dynamic interaction contexts. As LLMs are increasingly deployed in open-ended, interactive environments, this static nature has become a critical bottleneck, necessitating agents that can adaptively reason, act, and evolve in real time. This paradigm shift -- from scaling static models to developing self-evolving agents -- has sparked growing interest in architectures and methods enabling continual learning and adaptation from data, interactions, and experiences. This survey provides the first systematic and comprehensive review of self-evolving agents, organizing the field around three foundational dimensions: what, when, and how to evolve. We examine evolutionary mechanisms across agent components (e.g., models, memory, tools, architecture), categorize adaptation methods by stages (e.g., intra-test-time, inter-test-time), and analyze the algorithmic and architectural designs that guide evolutionary adaptation (e.g., scalar rewards, textual feedback, single-agent and multi-agent systems). Additionally, we analyze evaluation metrics and benchmarks tailored for self-evolving agents, highlight applications in domains such as coding, education, and healthcare, and identify critical challenges and research directions in safety, scalability, and co-evolutionary dynamics. By providing a structured framework for understanding and designing self-evolving agents, this survey establishes a roadmap for advancing more adaptive, robust, and versatile agentic systems in both research and real-world deployments, and ultimately sheds light on the realization of Artificial Super Intelligence (ASI) where agents evolve autonomously and perform beyond human-level intelligence across tasks.

preprint2026arXiv

Agentic Reasoning for Large Language Models

Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world settings, they struggle in open-ended and dynamic environments. Agentic reasoning marks a paradigm shift by reframing LLMs as autonomous agents that plan, act, and learn through continual interaction. In this survey, we organize agentic reasoning along three complementary dimensions. First, we characterize environmental dynamics through three layers: foundational agentic reasoning, which establishes core single-agent capabilities including planning, tool use, and search in stable environments; self-evolving agentic reasoning, which studies how agents refine these capabilities through feedback, memory, and adaptation; and collective multi-agent reasoning, which extends intelligence to collaborative settings involving coordination, knowledge sharing, and shared goals. Across these layers, we distinguish in-context reasoning, which scales test-time interaction through structured orchestration, from post-training reasoning, which optimizes behaviors via reinforcement learning and supervised fine-tuning. We further review representative agentic reasoning frameworks across real-world applications and benchmarks, including science, robotics, healthcare, autonomous research, and mathematics. This survey synthesizes agentic reasoning methods into a unified roadmap bridging thought and action, and outlines open challenges and future directions, including personalization, long-horizon interaction, world modeling, scalable multi-agent training, and governance for real-world deployment.

preprint2026arXiv

Assessing the Creativity of Large Language Models: Testing, Limits, and New Frontiers

Measuring the creativity of large language models (LLMs) is essential for designing methods that can improve creativity and for enhancing our scientific understanding of this ability. To accomplish this, it has become common in recent years to administer tests of human creativity to LLMs. Although these tests provide a convenient and fully automated way to score "creativity," their validity as measures of machine creativity has not been established, and these tests already have limited validity as predictors of human creativity. To address this problem, we conduct the first large-scale, systematic study assessing the effectiveness of human creativity tests for predicting the creative achievement of LLMs across three target constructs: creative writing, divergent thinking, and scientific ideation. We find that the Divergent Association Task (DAT) and the Conditional DAT are the best predictors of creative writing and divergent thinking, respectively, but that test effectiveness varies significantly by construct, and no single test predicts all constructs well. Moreover, contrary to popular belief, no existing test reliably predicts scientific ideation ability. Motivated by this problem, we introduce the Divergent Remote Association Test (DRAT), a vocabulary-space test that assesses both convergent and divergent thinking in a single instrument. The DRAT is the first and only creativity test for LLMs that is a significant predictor of scientific ideation ability, demonstrating robustness across major design choices. Furthermore, the performance gain of the DRAT is not recoverable from any linear combination of the Divergent Association Task and the Remote Associates Test, indicating that assessing divergent and convergent thinking in the same test is essential to reliably predicting scientific ideation ability.

preprint2026arXiv

Beyond Perfect APIs: A Comprehensive Evaluation of LLM Agents Under Real-World API Complexity

We introduce WildAGTEval, a benchmark designed to evaluate large language model (LLM) agents' function-calling capabilities under realistic API complexity. Unlike prior work that assumes an idealized API system and disregards real-world factors such as noisy API outputs, WildAGTEval accounts for two dimensions of real-world complexity: 1. API specification, which includes detailed documentation and usage constraints, and 2. API execution, which captures runtime challenges. Consequently, WildAGTEval offers (i) an API system encompassing 60 distinct complexity scenarios that can be composed into approximately 32K test configurations, and (ii) user-agent interactions for evaluating LLM agents on these scenarios. Using WildAGTEval, we systematically assess several advanced LLMs and observe that most scenarios are challenging, with irrelevant information complexity posing the greatest difficulty and reducing the performance of strong LLMs by 27.3%. Furthermore, our qualitative analysis reveals that LLMs occasionally distort user intent merely to claim task completion, critically affecting user satisfaction.

preprint2026arXiv

Current Agents Fail to Leverage World Model as Tool for Foresight

Agents built on vision-language models increasingly face tasks that demand anticipating future states rather than relying on short-horizon reasoning. Generative world models offer a promising remedy: agents could use them as external simulators to foresee outcomes before acting. This paper empirically examines whether current agents can leverage such world models as tools to enhance their cognition. Across diverse agentic and visual question answering tasks, we observe that some agents rarely invoke simulation (fewer than 1%), frequently misuse predicted rollouts (approximately 15%), and often exhibit inconsistent or even degraded performance (up to 5%) when simulation is available or enforced. Attribution analysis further indicates that the primary bottleneck lies in the agents' capacity to decide when to simulate, how to interpret predicted outcomes, and how to integrate foresight into downstream reasoning. These findings underscore the need for mechanisms that foster calibrated, strategic interaction with world models, paving the way toward more reliable anticipatory cognition in future agent systems.

preprint2024arXiv

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

The prominent large language models (LLMs) of today differ from past language models not only in size, but also in the fact that they are trained on a combination of natural language and formal language (code). As a medium between humans and computers, code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity. In this survey, we present an overview of the various benefits of integrating code into LLMs' training data. Specifically, beyond enhancing LLMs in code generation, we observe that these unique properties of code help (i) unlock the reasoning ability of LLMs, enabling their applications to a range of more complex natural language tasks; (ii) steer LLMs to produce structured and precise intermediate steps, which can then be connected to external execution ends through function calls; and (iii) take advantage of code compilation and execution environment, which also provides diverse feedback for model improvement. In addition, we trace how these profound capabilities of LLMs, brought by code, have led to their emergence as intelligent agents (IAs) in situations where the ability to understand instructions, decompose goals, plan and execute actions, and refine from feedback are crucial to their success on downstream tasks. Finally, we present several key challenges and future directions of empowering LLMs with code.

preprint2024arXiv

Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification

State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks. However, current approaches are limited to small closed vocabularies which are far from enough for natural communication. In addition, most of the high-performing approaches require data from invasive devices (e.g., ECoG). In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks. We hypothesis that the human brain functions as a special text encoder and propose a novel framework leveraging pre-trained language models (e.g., BART). Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines. Furthermore, we show that our proposed model can handle data from various subjects and sources, showing great potential for a high-performance open vocabulary brain-to-text system once sufficient data is available

preprint2022arXiv

A Survey of Knowledge-Enhanced Text Generation

The goal of text generation is to make machines express in human language. It is one of the most important yet challenging tasks in natural language processing (NLP). Since 2014, various neural encoder-decoder models pioneered by Seq2Seq have been proposed to achieve the goal by learning to map input text to output text. However, the input text alone often provides limited knowledge to generate the desired output, so the performance of text generation is still far from satisfaction in many real-world scenarios. To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models. This research direction is known as knowledge-enhanced text generation. In this survey, we present a comprehensive review of the research on knowledge enhanced text generation over the past five years. The main content includes two parts: (i) general methods and architectures for integrating knowledge into text generation; (ii) specific techniques and applications according to different forms of knowledge data. This survey can have broad audiences, researchers and practitioners, in academia and industry.

preprint2022arXiv

A Weibo Dataset for the 2022 Russo-Ukrainian Crisis

Online social networks such as Twitter and Weibo play an important role in how people stay informed and exchange reactions. Each crisis encompasses a new opportunity to study the portability of models for various tasks (e.g., information extraction, complex event understanding, misinformation detection, etc.), due to differences in domain, entities, and event types. We present the Russia-Ukraine Crisis Weibo (RUW) dataset, with over 3.5M user posts and comments in the first release. Our data is available at https://github.com/yrf1/RussiaUkraine_weibo_dataset.

preprint2022arXiv

CLIP-Event: Connecting Text and Images with Event Structures

Vision-language (V+L) pretraining models have achieved great success in supporting multimedia applications by understanding the alignments between images and text. While existing vision-language pretraining models primarily focus on understanding objects in images or entities in text, they often ignore the alignment at the level of events and their argument structures. In this work, we propose a contrastive learning framework to enforce vision-language pretraining models to comprehend events and associated argument (participant) roles. To achieve this, we take advantage of text information extraction technologies to obtain event structural knowledge, and utilize multiple prompt functions to contrast difficult negative descriptions by manipulating event structures. We also design an event graph alignment loss based on optimal transport to capture event argument structures. In addition, we collect a large event-rich dataset (106,875 images) for pretraining, which provides a more challenging image retrieval benchmark to assess the understanding of complicated lengthy sentences. Experiments show that our zero-shot CLIP-Event outperforms the state-of-the-art supervised model in argument extraction on Multimedia Event Extraction, achieving more than 5% absolute F-score gain in event extraction, as well as significant improvements on a variety of downstream tasks under zero-shot settings.

preprint2022arXiv

Corpus-based Open-Domain Event Type Induction

Traditional event extraction methods require predefined event types and their corresponding annotations to learn event extractors. These prerequisites are often hard to be satisfied in real-world applications. This work presents a corpus-based open-domain event type induction method that automatically discovers a set of event types from a given corpus. As events of the same type could be expressed in multiple ways, we propose to represent each event type as a cluster of <predicate sense, object head> pairs. Specifically, our method (1) selects salient predicates and object heads, (2) disambiguates predicate senses using only a verb sense dictionary, and (3) obtains event types by jointly embedding and clustering <predicate sense, object head> pairs in a latent spherical space. Our experiments, on three datasets from different domains, show our method can discover salient and high-quality event types, according to both automatic and human evaluations.

preprint2022arXiv

EA$^2$E: Improving Consistency with Event Awareness for Document-Level Argument Extraction

Events are inter-related in documents. Motivated by the one-sense-per-discourse theory, we hypothesize that a participant tends to play consistent roles across multiple events in the same document. However recent work on document-level event argument extraction models each individual event in isolation and therefore causes inconsistency among extracted arguments across events, which will further cause discrepancy for downstream applications such as event knowledge base population, question answering, and hypothesis generation. In this work, we formulate event argument consistency as the constraints from event-event relations under the document-level setting. To improve consistency we introduce the Event-Aware Argument Extraction (EA$^2$E) model with augmented context for training and inference. Experiment results on WIKIEVENTS and ACE2005 datasets demonstrate the effectiveness of EA$^2$E compared to baseline methods.

preprint2022arXiv

Enhanced Knowledge Selection for Grounded Dialogues via Document Semantic Graphs

Providing conversation models with background knowledge has been shown to make open-domain dialogues more informative and engaging. Existing models treat knowledge selection as a sentence ranking or classification problem where each sentence is handled individually, ignoring the internal semantic connection among sentences in the background document. In this work, we propose to automatically convert the background knowledge documents into document semantic graphs and then perform knowledge selection over such graphs. Our document semantic graphs preserve sentence-level information through the use of sentence nodes and provide concept connections between sentences. We jointly apply multi-task learning for sentence-level and concept-level knowledge selection and show that it improves sentence-level selection. Our experiments show that our semantic graph-based knowledge selection improves over sentence selection baselines for both the knowledge selection task and the end-to-end response generation task on HollE and improves generalization on unseen topics in WoW.

preprint2022arXiv

Entity-Conditioned Question Generation for Robust Attention Distribution in Neural Information Retrieval

We show that supervised neural information retrieval (IR) models are prone to learning sparse attention patterns over passage tokens, which can result in key phrases including named entities receiving low attention weights, eventually leading to model under-performance. Using a novel targeted synthetic data generation method that identifies poorly attended entities and conditions the generation episodes on those, we teach neural IR to attend more uniformly and robustly to all entities in a given passage. On two public IR benchmarks, we empirically show that the proposed method helps improve both the model's attention patterns and retrieval performance, including in zero-shot settings.

preprint2022arXiv

Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking

Entity linking (EL) is the task of linking entity mentions in a document to referent entities in a knowledge base (KB). Many previous studies focus on Wikipedia-derived KBs. There is little work on EL over Wikidata, even though it is the most extensive crowdsourced KB. The scale of Wikidata can open up many new real-world applications, but its massive number of entities also makes EL challenging. To effectively narrow down the search space, we propose a novel candidate retrieval paradigm based on entity profiling. Wikidata entities and their textual fields are first indexed into a text search engine (e.g., Elasticsearch). During inference, given a mention and its context, we use a sequence-to-sequence (seq2seq) model to generate the profile of the target entity, which consists of its title and description. We use the profile to query the indexed search engine to retrieve candidate entities. Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary, enabling us to further design a highly effective hybrid method for candidate retrieval. Combined with a simple cross-attention reranker, our complete EL framework achieves state-of-the-art results on three Wikidata-based datasets and strong performance on TACKBP-2010.

preprint2022arXiv

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

Recently, there has been an increasing interest in building question answering (QA) models that reason across multiple modalities, such as text and images. However, QA using images is often limited to just picking the answer from a pre-defined set of options. In addition, images in the real world, especially in news, have objects that are co-referential to the text, with complementary information from both modalities. In this paper, we present a new QA evaluation benchmark with 1,384 questions over news articles that require cross-media grounding of objects in images onto text. Specifically, the task involves multi-hop questions that require reasoning over image-caption pairs to identify the grounded visual object being referred to and then predicting a span from the news body text to answer the question. In addition, we introduce a novel multimedia data augmentation framework, based on cross-media knowledge extraction and synthetic question-answer generation, to automatically augment data that can provide weak supervision for this task. We evaluate both pipeline-based and end-to-end pretraining-based multimedia QA models on our benchmark, and show that they achieve promising performance, while considerably lagging behind human performance hence leaving large room for future work on this challenging new task.

preprint2022arXiv

Rethinking Task Sampling for Few-shot Vision-Language Transfer Learning

Despite achieving state-of-the-art zero-shot performance, existing vision-language models still fall short of few-shot transfer ability on domain-specific problems. Classical fine-tuning often fails to prevent highly expressive models from exploiting spurious correlations. Although model-agnostic meta-learning (MAML) presents as a natural alternative for few-shot transfer learning, the expensive computation due to implicit second-order optimization limits its use on large-scale vision-language models such as CLIP. While much literature has been devoted to exploring alternative optimization strategies, we identify another essential aspect towards effective few-shot transfer learning, task sampling, which is previously only be viewed as part of data pre-processing in MAML. To show the impact of task sampling, we propose a simple algorithm, Model-Agnostic Multitask Fine-tuning (MAMF), which differentiates classical fine-tuning only on uniformly sampling multiple tasks. Despite its simplicity, we show that MAMF consistently outperforms classical fine-tuning on five few-shot vision-language classification tasks. We further show that the effectiveness of the bi-level optimization in MAML is highly sensitive to the zero-shot performance of a task in the context of few-shot vision-language classification. The goal of this paper is to provide new insights on what makes few-shot learning work, and encourage more research into investigating better task sampling strategies.

preprint2022arXiv

Schema-Guided Event Graph Completion

We tackle a new task, event graph completion, which aims to predict missing event nodes for event graphs. Existing link prediction or graph completion methods have difficulty dealing with event graphs because they are usually designed for a single large graph such as a social network or a knowledge graph, rather than multiple small dynamic event graphs. Moreover, they can only predict missing edges rather than missing nodes. In this work, we propose to utilize event schema, a template that describes the stereotypical structure of event graphs, to address the above issues. Our schema-guided event graph completion approach first maps an instance event graph to a subgraph of the schema graph by a heuristic subgraph matching algorithm. Then it predicts whether a candidate event node in the schema graph should be added to the instantiated schema subgraph by characterizing two types of local topology of the schema graph: neighbors of the candidate node and the subgraph, and paths that connect the candidate node and the subgraph. These two modules are later combined together for the final prediction. We also propose a self-supervised strategy to construct training samples, as well as an inference algorithm that is specifically designed to complete event graphs. Extensive experimental results on four datasets demonstrate that our proposed method achieves state-of-the-art performance, with 4.3% to 19.4% absolute F1 gains over the best baseline method on the four datasets.

preprint2022arXiv

Seeded Hierarchical Clustering for Expert-Crafted Taxonomies

Practitioners from many disciplines (e.g., political science) use expert-crafted taxonomies to make sense of large, unlabeled corpora. In this work, we study Seeded Hierarchical Clustering (SHC): the task of automatically fitting unlabeled data to such taxonomies using only a small set of labeled examples. We propose HierSeed, a novel weakly supervised algorithm for this task that uses only a small set of labeled seed examples. It is both data and computationally efficient. HierSeed assigns documents to topics by weighing document density against topic hierarchical structure. It outperforms both unsupervised and supervised baselines for the SHC task on three real-world datasets.

preprint2022arXiv

Semi-supervised New Event Type Induction and Description via Contrastive Loss-Enforced Batch Attention

Most event extraction methods have traditionally relied on an annotated set of event types. However, creating event ontologies and annotating supervised training data are expensive and time-consuming. Previous work has proposed semi-supervised approaches which leverage seen (annotated) types to learn how to automatically discover new event types. State-of-the-art methods, both semi-supervised or fully unsupervised, use a form of reconstruction loss on specific tokens in a context. In contrast, we present a novel approach to semi-supervised new event type induction using a masked contrastive loss, which learns similarities between event mentions by enforcing an attention mechanism over the data minibatch. We further disentangle the discovered clusters by approximating the underlying manifolds in the data, which allows us to increase normalized mutual information and Fowlkes-Mallows scores by over 20% absolute. Building on these clustering results, we extend our approach to two new tasks: predicting the type name of the discovered clusters and linking them to FrameNet frames.

preprint2022arXiv

The Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction

Event schemas encode knowledge of stereotypical structures of events and their connections. As events unfold, schemas are crucial to act as a scaffolding. Previous work on event schema induction focuses either on atomic events or linear temporal event sequences, ignoring the interplay between events via arguments and argument relations. We introduce a new concept of Temporal Complex Event Schema: a graph-based schema representation that encompasses events, arguments, temporal connections and argument relations. In addition, we propose a Temporal Event Graph Model that predicts event instances following the temporal complex event schema. To build and evaluate such schemas, we release a new schema learning corpus containing 6,399 documents accompanied with event graphs, and we have manually constructed gold-standard schemas. Intrinsic evaluations based on schema matching and instance graph perplexity, prove the superior quality of our probabilistic graph schema library compared to linear representations. Extrinsic evaluation on schema-guided future event prediction further demonstrates the predictive power of our event graph model, significantly outperforming human schemas and baselines by more than 23.8% on HITS@1.

preprint2021arXiv

Controllable and Diverse Text Generation in E-commerce

In E-commerce, a key challenge in text generation is to find a good trade-off between word diversity and accuracy (relevance) in order to make generated text appear more natural and human-like. In order to improve the relevance of generated results, conditional text generators were developed that use input keywords or attributes to produce the corresponding text. Prior work, however, do not finely control the diversity of automatically generated sentences. For example, it does not control the order of keywords to put more relevant ones first. Moreover, it does not explicitly control the balance between diversity and accuracy. To remedy these problems, we propose a fine-grained controllable generative model, called~\textit{Apex}, that uses an algorithm borrowed from automatic control (namely, a variant of the \textit{proportional, integral, and derivative (PID) controller}) to precisely manipulate the diversity/accuracy trade-off of generated text. The algorithm is injected into a Conditional Variational Autoencoder (CVAE), allowing \textit{Apex} to control both (i) the order of keywords in the generated sentences (conditioned on the input keywords and their order), and (ii) the trade-off between diversity and accuracy. Evaluation results on real-world datasets show that the proposed method outperforms existing generative models in terms of diversity and relevance. Apex is currently deployed to generate production descriptions and item recommendation reasons in Taobao owned by Alibaba, the largest E-commerce platform in China. The A/B production test results show that our method improves click-through rate (CTR) by 13.17\% compared to the existing method for production descriptions. For item recommendation reason, it is able to increase CTR by 6.89\% and 1.42\% compared to user reviews and top-K item recommendation without reviews, respectively.

preprint2021arXiv

The Paradox of Information Access: On Modeling Social-Media-Induced Polarization

The paper develops a stochastic model of drift in human beliefs that shows that today's sheer volume of accessible information, combined with consumers' confirmation bias and natural preference to more outlying content, necessarily lead to increased polarization. The model explains the paradox of growing ideological fragmentation in the age of increased sharing. As social media, search engines, and other real-time information sharing outlets purport to facilitate access to information, a need for content filtering arises due to the ensuing information overload. In general, consumers select information that matches their individual views and values. The bias inherent in such selection is echoed by today's information curation services that maximize user engagement by filtering new content in accordance with observed consumer preferences. Consequently, individuals get exposed to increasingly narrower bands of the ideology spectrum, thus fragmenting society into increasingly ideologically isolated enclaves. We call this dynamic the paradox of information access. The model also suggests the disproportionate damage attainable with a small infusion of well-positioned misinformation. The paper describes the modeling methodology, and evaluates modeling results for different population sizes and parameter settings.

preprint2021arXiv

White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content

This white paper presents a summary of the discussions regarding critical considerations to develop an extensive repository of online videos annotated with labels indicating questionable content. The main discussion points include: 1) the type of appropriate labels that will result in a valuable repository for the larger AI community; 2) how to design the collection and annotation process, as well as the distribution of the corpus to maximize its potential impact; and, 3) what actions we can take to reduce risk of trauma to annotators.

preprint2020arXiv

Cross-media Structured Common Space for Multimedia Event Extraction

We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments. We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information from textual and visual data into a common embedding space. The structures are aligned across modalities by employing a weakly supervised training strategy, which enables exploiting available resources without explicit cross-media annotation. Compared to uni-modal state-of-the-art methods, our approach achieves 4.0% and 9.8% absolute F-score gains on text event argument role labeling and visual event extraction. Compared to state-of-the-art multimedia unstructured representations, we achieve 8.3% and 5.0% absolute F-score gains on multimedia event extraction and argument role labeling, respectively. By utilizing images, we extract 21.4% more event mentions than traditional text-only methods.

preprint2020arXiv

Learning to Learn Words from Visual Scenes

Language acquisition is the process of learning words from the surrounding scene. We introduce a meta-learning framework that learns how to learn word representations from unconstrained scenes. We leverage the natural compositional structure of language to create training episodes that cause a meta-learner to learn strong policies for language acquisition. Experiments on two datasets show that our approach is able to more rapidly acquire novel words as well as more robustly generalize to unseen compositions, significantly outperforming established baselines. A key advantage of our approach is that it is data efficient, allowing representations to be learned from scratch without language pre-training. Visualizations and analysis suggest visual information helps our approach learn a rich cross-modal representation from minimal examples. Project webpage is available at https://expert.cs.columbia.edu/

preprint2020arXiv

The Paradox of Information Access: Growing Isolation in the Age of Sharing

Modern online media, such as Twitter, Instagram, and YouTube, enable anyone to become an information producer and to offer online content for potentially global consumption. By increasing the amount of globally accessible real-time information, today's ubiquitous producers contribute to a world, where an individual consumes vanishingly smaller fractions of all produced content. In general, consumers preferentially select information that closely matches their individual views and values. The bias inherent in such selection is further magnified by today's information curation services that maximize user engagement (and thus service revenue) by filtering new content in accordance with observed consumer preferences. Consequently, individuals get exposed to increasingly narrower bands of the ideology spectrum. Societies get fragmented into increasingly ideologically isolated enclaves. These enclaves (or echo-chambers) then become vulnerable to misinformation spread, which in turn further magnifies polarization and bias. We call this dynamic the paradox of information access; a growing ideological fragmentation in the age of sharing. This article describes the technical, economic, and socio-cognitive contributors to this paradox, and explores research directions towards its mitigation.

preprint2020arXiv

Training with Streaming Annotation

In this paper, we address a practical scenario where training data is released in a sequence of small-scale batches and annotation in earlier phases has lower quality than the later counterparts. To tackle the situation, we utilize a pre-trained transformer network to preserve and integrate the most salient document information from the earlier batches while focusing on the annotation (presumably with higher quality) from the current batch. Using event extraction as a case study, we demonstrate in the experiments that our proposed framework can perform better than conventional approaches (the improvement ranges from 3.6 to 14.9% absolute F-score gain), especially when there is more noise in the early annotation; and our approach spares 19.1% time with regard to the best conventional method.

preprint2016arXiv

Aligning Coordinated Text Streams through Burst Information Network Construction and Decipherment

Aligning coordinated text streams from multiple sources and multiple languages has opened many new research venues on cross-lingual knowledge discovery. In this paper we aim to advance state-of-the-art by: (1). extending coarse-grained topic-level knowledge mining to fine-grained information units such as entities and events; (2). following a novel Data-to-Network-to-Knowledge (D2N2K) paradigm to construct and utilize network structures to capture and propagate reliable evidence. We introduce a novel Burst Information Network (BINet) representation that can display the most important information and illustrate the connections among bursty entities, events and keywords in the corpus. We propose an effective approach to construct and decipher BINets, incorporating novel criteria based on multi-dimensional clues from pronunciation, translation, burst, neighbor and graph topological structure. The experimental results on Chinese and English coordinated text streams show that our approach can accurately decipher the nodes with high confidence in the BINets and that the algorithm can be efficiently run in parallel, which makes it possible to apply it to huge amounts of streaming data for never-ending language and information decipherment.

preprint2016arXiv

Building a Fine-Grained Entity Typing System Overnight for a New X (X = Language, Domain, Genre)

Recent research has shown great progress on fine-grained entity typing. Most existing methods require pre-defining a set of types and training a multi-class classifier from a large labeled data set based on multi-level linguistic features. They are thus limited to certain domains, genres and languages. In this paper, we propose a novel unsupervised entity typing framework by combining symbolic and distributional semantics. We start from learning general embeddings for each entity mention, compose the embeddings of specific contexts using linguistic structures, link the mention to knowledge bases and learn its related knowledge representations. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework doesn't rely on any annotated data, predefined typing schema, or hand-crafted features, therefore it can be quickly adapted to a new domain, genre and language. Furthermore, it has great flexibility at incorporating linguistic structures (e.g., Abstract Meaning Representation (AMR), dependency relations) to improve specific context representation. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.

preprint2016arXiv

Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding

Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions. However, the type labels so obtained from knowledge bases are often noisy (i.e., incorrect for the entity mention's local context). We define a new task, Label Noise Reduction in Entity Typing (LNR), to be the automatic identification of correct type labels (type-paths) for training examples, given the set of candidate type labels obtained by distant supervision with a given type hierarchy. The unknown type labels for individual entity mentions and the semantic similarity between entity types pose unique challenges for solving the LNR task. We propose a general framework, called PLE, to jointly embed entity mentions, text features and entity types into the same low-dimensional space where, in that space, objects whose types are semantically close have similar representations. Then we estimate the type-path for each training example in a top-down manner using the learned embeddings. We formulate a global objective for learning the embeddings from text corpora and knowledge bases, which adopts a novel margin-based loss that is robust to noisy labels and faithfully models type correlation derived from knowledge bases. Our experiments on three public typing datasets demonstrate the effectiveness and robustness of PLE, with an average of 25% improvement in accuracy compared to next best method.

preprint2015arXiv

A Dependency-Based Neural Network for Relation Classification

Previous research on relation classification has verified the effectiveness of using dependency shortest paths or subtrees. In this paper, we further explore how to make full use of the combination of these dependency information. We first propose a new structure, termed augmented dependency path (ADP), which is composed of the shortest dependency path between two entities and the subtrees attached to the shortest path. To exploit the semantic representation behind the ADP structure, we develop dependency-based neural networks (DepNN): a recursive neural network designed to model the subtrees, and a convolutional neural network to capture the most important features on the shortest path. Experiments on the SemEval-2010 dataset show that our proposed method achieves state-of-art results.

preprint2015arXiv

A Novel Method of Encoded Multiplexing Readout for Micro-pattern Gas Detectors

The requirement of a large number of electronic channels poses a big challenge for Micro-pattern Gas Detector (MPGD) to achieve good spatial resolution. By using the redundancy that at least two neighboring strips record the signal of a particle, a novel method of encoded multiplexing readout for MPGDs is presented in this paper. The method offers a feasible and easily-extensible way of encoding and decoding, and can significantly reduce the number of readout channels. A verification test was carried out on a 5*5 cm2 Thick Gas Electron Multiplier (THGEM) detector using a 8 keV Cu X-ray source with 100um slit, where 166 strips are read out by 21 encoded readout channels. The test results show a good linearity in its position response, and the spatial resolution root-mean-square (RMS) of the test system is about 260 μm. This method has an attractive potential to build large area detectors and can be easily adapted to other detectors like MPGDs.

preprint2015arXiv

Leveraging Deep Neural Networks and Knowledge Graphs for Entity Disambiguation

Entity Disambiguation aims to link mentions of ambiguous entities to a knowledge base (e.g., Wikipedia). Modeling topical coherence is crucial for this task based on the assumption that information from the same semantic context tends to belong to the same topic. This paper presents a novel deep semantic relatedness model (DSRM) based on deep neural networks (DNN) and semantic knowledge graphs (KGs) to measure entity semantic relatedness for topical coherence modeling. The DSRM is directly trained on large-scale KGs and it maps heterogeneous types of knowledge of an entity from KGs to numerical feature vectors in a latent space such that the distance between two semantically-related entities is minimized. Compared with the state-of-the-art relatedness approach proposed by (Milne and Witten, 2008a), the DSRM obtains 19.4% and 24.5% reductions in entity disambiguation errors on two publicly available datasets respectively.

preprint2015arXiv

Very large magnetoresistance in Fe$_{0.28}$TaS$_{2}$ single crystals

Magnetic moments intercalated into layered transition metal dichalcogenides are an excellent system for investigating the rich physics associated with magnetic ordering in a strongly anisotropic, strong spin-orbit coupling environment. We examine electronic transport and magnetization in Fe$_{0.28}$TaS$_{2}$, a highly anisotropic ferromagnet with a Curie temperature $T_{\mathrm{C}} \sim 68.8~$K. We find anomalous Hall data confirming a dominance of spin-orbit coupling in the magnetotransport properties of this material, and a remarkably large field-perpendicular-to-plane MR exceeding 60% at 2 K, much larger than the typical MR for bulk metals, and comparable to state-of-the-art GMR in thin film heterostructures, and smaller only than CMR in Mn perovskites or high mobility semiconductors. Even within the Fe$_x$TaS$_2$ series, for the current $x$ = 0.28 single crystals the MR is nearly $100\times$ higher than that found previously in the commensurate compound Fe$_{0.25}$TaS$_{2}$. After considering alternatives, we argue that the large MR arises from spin disorder scattering in the strong spin-orbit coupling environment, and suggest that this can be a design principle for materials with large MR.

preprint2014arXiv

Hydrogen Diffusion and Stabilization in Single-crystal VO2 Micro/nanobeams by Direct Atomic Hydrogenation

We report measurements of the diffusion of atomic hydrogen in single crystalline VO2 micro/nanobeams by direct exposure to atomic hydrogen, without catalyst. The atomic hydrogen is generated by a hot filament, and the doping process takes place at moderate temperature (373 K). Undoped VO2 has a metal-to-insulator phase transition at ~340 K between a high-temperature, rutile, metallic phase and a low-temperature, monoclinic, insulating phase with a resistance exhibiting a semiconductor-like temperature dependence. Atomic hydrogenation results in stabilization of the metallic phase of VO2 micro/nanobeams down to 2 K, the lowest point we could reach in our measurement setup. Based on observing the movement of the hydrogen diffusion front in single crystalline VO2 beams, we estimate the diffusion constant for hydrogen along the c-axis of the rutile phase to be 6.7 x 10^{-10} cm^2/s at approximately 373 K, exceeding the value in isostructural TiO2 by ~ 38x. Moreover, we find that the diffusion constant along the c-axis of the rutile phase exceeds that along the equivalent a-axis of the monoclinic phase by at least three orders of magnitude. This remarkable change in kinetics must originate from the distortion of the "channels" when the unit cell doubles along this direction upon cooling into the monoclinic structure. Ab initio calculation results are in good agreement with the experimental trends in the relative kinetics of the two phases. This raises the possibility of a switchable membrane for hydrogen transport.

preprint2014arXiv

In situ diffraction study of catalytic hydrogenation of VO2: Stable phases and origins of metallicity

Controlling electronic population through chemical doping is one way to tip the balance between competing phases in materials with strong electronic correlations. Vanadium dioxide exhibits a first-order phase transition at around 338 K between a high temperature, tetragonal, metallic state (T) and a low temperature, monoclinic, insulating state (M1), driven by electron-electron and electron-lattice interactions. Intercalation of VO2 with atomic hydrogen has been demonstrated, with evidence that this doping suppresses the transition. However, the detailed effects of intercalated H on the crystal and electronic structure of the resulting hydride have not been previously reported. Here we present synchrotron and neutron diffraction studies of this material system, mapping out the structural phase diagram as a function of temperature and hydrogen content. In addition to the original T and M1 phases, we find two orthorhombic phases, O1 and O2, which are stabilized at higher hydrogen content. We present density functional calculations that confirm the metallicity of these states and discuss the physical basis by which hydrogen stabilizes conducting phases, in the context of the metal-insulator transition.

preprint2014arXiv

Nanostructure Investigations of Nonlinear Differential Conductance in NdNiO$_3$ Thin Films

Transport measurements on thin films of NdNiO$_3$ reveal a crossover to a regime of pronounced nonlinear conduction below the well-known metal-insulator transition temperature. The evolution of the transport properties at temperatures well below this transition appears consistent with a gradual formation of a gap in the hole-like Fermi surface of this strongly correlated system. As $T$ is decreased below the nominal transition temperature, transport becomes increasily non-Ohmic, with a model of Landau-Zener breakdown becoming most suited for describing $I(V)$ characteristics as the temperature approaches 2~K.

preprint2014arXiv

Thermally Driven Analog of the Barkhausen Effect at the Metal-Insulator Transition in Vanadium Dioxide

The physics of the metal-insulator transition (MIT) in vanadium dioxide remains a subject of intense interest. Because of the complicating effects of elastic strain on the phase transition, there is interest in comparatively strain-free means of examining VO2 material properties. We report contact-free, low-strain studies of the MIT through an inductive bridge approach sensitive to the magnetic response of VO2 powder. Rather than observing the expected step-like change in susceptibility at the transition, we argue that the measured response is dominated by an analog of the Barkhausen effect, due to the extremely sharp jump in the magnetic response of each grain as a function of time as the material is cycled across the phase boundary. This effect suggests that future measurements could access the dynamics of this and similar phase transitions.

preprint2013arXiv

Tweet, but Verify: Epistemic Study of Information Verification on Twitter

While Twitter provides an unprecedented opportunity to learn about breaking news and current events as they happen, it often produces skepticism among users as not all the information is accurate but also hoaxes are sometimes spread. While avoiding the diffusion of hoaxes is a major concern during fast-paced events such as natural disasters, the study of how users trust and verify information from tweets in these contexts has received little attention so far. We survey users on credibility perceptions regarding witness pictures posted on Twitter related to Hurricane Sandy. By examining credibility perceptions on features suggested for information verification in the field of Epistemology, we evaluate their accuracy in determining whether pictures were real or fake compared to professional evaluations performed by experts. Our study unveils insight about tweet presentation, as well as features that users should look at when assessing the veracity of tweets in the context of fast-paced events. Some of our main findings include that while author details not readily available on Twitter feeds should be emphasized in order to facilitate verification of tweets, showing multiple tweets corroborating a fact misleads users to trusting what actually is a hoax. We contrast some of the behavioral patterns found on tweets with literature in Psychology research.

Heng Ji

What is connected

Connect this record

See the researcher in context

Building this map preview

40 published item(s)

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Agentic Reasoning for Large Language Models

Assessing the Creativity of Large Language Models: Testing, Limits, and New Frontiers

Beyond Perfect APIs: A Comprehensive Evaluation of LLM Agents Under Real-World API Complexity

Current Agents Fail to Leverage World Model as Tool for Foresight

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification

A Survey of Knowledge-Enhanced Text Generation

A Weibo Dataset for the 2022 Russo-Ukrainian Crisis

CLIP-Event: Connecting Text and Images with Event Structures

Corpus-based Open-Domain Event Type Induction

EA$^2$E: Improving Consistency with Event Awareness for Document-Level Argument Extraction

Enhanced Knowledge Selection for Grounded Dialogues via Document Semantic Graphs

Entity-Conditioned Question Generation for Robust Attention Distribution in Neural Information Retrieval

Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

Rethinking Task Sampling for Few-shot Vision-Language Transfer Learning

Schema-Guided Event Graph Completion

Seeded Hierarchical Clustering for Expert-Crafted Taxonomies

Semi-supervised New Event Type Induction and Description via Contrastive Loss-Enforced Batch Attention

The Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction

Controllable and Diverse Text Generation in E-commerce

The Paradox of Information Access: On Modeling Social-Media-Induced Polarization

White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content

Cross-media Structured Common Space for Multimedia Event Extraction

Learning to Learn Words from Visual Scenes

The Paradox of Information Access: Growing Isolation in the Age of Sharing

Training with Streaming Annotation

Aligning Coordinated Text Streams through Burst Information Network Construction and Decipherment

Building a Fine-Grained Entity Typing System Overnight for a New X (X = Language, Domain, Genre)

Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding

A Dependency-Based Neural Network for Relation Classification

A Novel Method of Encoded Multiplexing Readout for Micro-pattern Gas Detectors

Leveraging Deep Neural Networks and Knowledge Graphs for Entity Disambiguation

Very large magnetoresistance in Fe$_{0.28}$TaS$_{2}$ single crystals

Hydrogen Diffusion and Stabilization in Single-crystal VO2 Micro/nanobeams by Direct Atomic Hydrogenation

In situ diffraction study of catalytic hydrogenation of VO2: Stable phases and origins of metallicity

Nanostructure Investigations of Nonlinear Differential Conductance in NdNiO$_3$ Thin Films

Thermally Driven Analog of the Barkhausen Effect at the Metal-Insulator Transition in Vanadium Dioxide

Tweet, but Verify: Epistemic Study of Information Verification on Twitter