Researcher profile

Gang Fang

Gang Fang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Community-aware evaluation and threshold calibration for open-set plankton image recognition

Automated plankton image recognition is increasingly used in aquatic ecosystem monitoring, but deployed classifiers inevitably encounter unseen taxa and non-target particles. Open-set recognition methods are usually evaluated with sample-level metrics such as AUROC, AUPR, and FPR@95% unknown-recall operating points, whereas ecological monitoring depends on community-level estimates of taxon abundance and diversity. This study examines the mismatch between these objectives using controlled pseudo-communities and three datasets spanning marine zooplankton imaged by ZooScan, marine phytoplankton imaged by IFCB, and freshwater plankton imaged by an in-situ camera. We define Open-Set Community Distortion (OSCD), a Bray-Curtis-style error over known taxa plus an unknown bin, with directional components distinguishing known-taxon overestimation from underestimation. Closed-set classifiers achieved high known-class accuracy, but unknown samples were often absorbed with high confidence and in structured ways. Sample-level OOD metrics were not sufficient to select ecological operating points: for MSP, FPR@95% unknown-recall thresholds produced large test-community OSCD on all three datasets mainly because true known taxa were over-rejected into the unknown bin. Community-aware threshold calibration reduced MSP OSCD relative to fixed 95% known recall on SYKE-ZooScan 2024 and SYKE-IFCB 2022; on ZooLake the fixed-recall baseline was already close to the community-aware threshold, and the best community-level method was a prototype-distance variant rather than MSP. The benefit of community-aware calibration therefore depends on validation-community representativeness and the gap between fixed recall and the community optimum. These results show that open-set plankton recognition should be evaluated as an ecological measurement problem, not only as a sample-level detection task.

preprint2026arXiv

Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation

Cutscenes are carefully choreographed cinematic sequences embedded in video games and interactive media, serving as the primary vehicle for narrative delivery, character development, and emotional engagement. Producing cutscenes is inherently complex: it demands seamless coordination across screenwriting, cinematography, character animation, voice acting, and technical direction, often requiring days to weeks of collaborative effort from multidisciplinary teams to produce minutes of polished content. In this work, we present Cutscene Agent, an LLM agent framework for automated end-to-end cutscene generation. The framework makes three contributions: (1)~a Cutscene Toolkit built on the Model Context Protocol (MCP) that establishes \emph{bidirectional} integration between LLM agents and the game engine -- agents not only invoke engine operations but continuously observe real-time scene state, enabling closed-loop generation of editable engine-native cinematic assets; (2)~a multi-agent system where a director agent orchestrates specialist subagents for animation, cinematography, and sound design, augmented by a visual reasoning feedback loop for perception-driven refinement; and (3)~CutsceneBench, a hierarchical evaluation benchmark for cutscene generation. Unlike typical tool-use benchmarks that evaluate short, isolated function calls, cutscene generation requires long-horizon, multi-step orchestration of dozens of interdependent tool invocations with strict ordering constraints -- a capability dimension that existing benchmarks do not cover. We evaluate a range of LLMs on CutsceneBench and analyze their performance across this challenging task.

preprint2026arXiv

Evidence-Guided Unknown Rejection for High-Confidence Near-Known Unknowns

Open-set recognition systems face a neglected failure mode: high-confidence near-known unknowns, which lie outside the known label set but are close enough to known classes that a closed-set classifier accepts them with high confidence. We show that this failure is widespread across scalar-threshold methods, including recent post-hoc detectors, and that stronger encoders can amplify rather than remove the risk. We propose EGUR-A, which changes the decision from ``is this sample's score high enough?'' to ``does this predicted known class have sufficient evidence to accept this sample?'' EGUR-A combines class-conditional local acceptance evidence with global residual evidence, and selects their relative weight from known-sample statistics without unknown validation data. Across CUB, FGVC-Aircraft, and ImageNet-hard, EGUR-A substantially reduces high-confidence false known acceptance at matched known-rejection operating points. The result is not a stronger threshold; it is a different question: whether a known class is entitled to accept a sample.

preprint2011arXiv

Characterizing Discriminative Patterns

Discriminative patterns are association patterns that occur with disproportionate frequency in some classes versus others, and have been studied under names such as emerging patterns and contrast sets. Such patterns have demonstrated considerable value for classification and subgroup discovery, but a detailed understanding of the types of interactions among items in a discriminative pattern is lacking. To address this issue, we propose to categorize discriminative patterns according to four types of item interaction: (i) driver-passenger, (ii) coherent, (iii) independent additive and (iv) synergistic beyond independent additive. Either of the last three is of practical importance, with the latter two representing a gain in the discriminative power of a pattern over its subsets. Synergistic patterns are most restrictive, but perhaps the most interesting since they capture a cooperative effect. For domains such as genetic research, differentiating among these types of patterns is critical since each yields very different biological interpretations. For general domains, the characterization provides a novel view of the nature of the discriminative patterns in a dataset, which yields insights beyond those provided by current approaches that focus mostly on pattern-based classification and subgroup discovery. This paper presents a comprehensive discussion that defines these four pattern types and investigates their properties and their relationship to one another. In addition, these ideas are explored for a variety of datasets (ten UCI datasets, one gene expression dataset and two genetic-variation datasets). The results demonstrate the existence, characteristics and statistical significance of the different types of patterns. They also illustrate how pattern characterization can provide novel insights into discriminative pattern mining and the discriminative structure of different datasets.

preprint2011arXiv

Integration of Differential Gene-combination Search and Gene Set Enrichment Analysis: A General Approach

Gene Set Enrichment Analysis (GSEA) and its variations aim to discover collections of genes that show moderate but coordinated differences in expression. However, such techniques may be ineffective if many individual genes in a phenotype-related gene set have weak discriminative power. A potential solution is to search for combinations of genes that are highly differentiating even when individual genes are not. Although such techniques have been developed, these approaches have not been used with GSEA to any significant degree because of the large number of potential gene combinations and the heterogeneity of measures that assess the differentiation provided by gene groups of different sizes. To integrate the search for differentiating gene combinations and GSEA, we propose a general framework with two key components: (A) a procedure that reduces the number of scores to be handled by GSEA to the number of genes by summarizing the scores of the gene combinations involving a particular gene in a single score, and (B) a procedure to integrate the heterogeneous scores from combinations of different sizes and from different gene combination measures by mapping the scores to p-values. Experiments on four gene expression data sets demonstrate that the integration of GSEA and gene combination search can enhance the power of traditional GSEA by discovering gene sets that include genes with weak individual differentiation but strong joint discriminative power. Also, gene sets discovered by the integrative framework share several common biological processes and improve the consistency of the results among three lung cancer data sets.