Researcher profile

Jianfeng Chen

Jianfeng Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2024arXiv

AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning

Previous studies in automated audio captioning have faced difficulties in accurately capturing the complete temporal details of acoustic scenes and events within long audio sequences. This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning. Specifically, we propose to fine-tune the pre-trained hierarchical token-semantic audio Transformer by incorporating contrastive learning between hybrid acoustic representations. We then leverage LLMs to generate audio logs that summarize textual descriptions of the acoustic environment. Finally, we evaluate the AudioLog system on two datasets with both scene and event annotations. Experiments show that the proposed system achieves exceptional performance in acoustic scene classification and sound event detection, surpassing existing methods in the field. Further analysis of the prompts to LLMs demonstrates that AudioLog can effectively summarize long audio sequences. To the best of our knowledge, this approach is the first attempt to leverage LLMs for summarizing long audio sequences.

preprint2021arXiv

Faster SAT Solving for Software with Repeated Structures (with Case Studies on Software Test Suite Minimization)

Theorem provers has been used extensively in software engineering for software testing or verification. However, software is now so large and complex that additional architecture is needed to guide theorem provers as they try to generate test suites. The SNAP test suite generator (introduced in this paper) combines the Z3 theorem prover with the following tactic: cluster some candidate tests, then search for valid tests by proposing small mutations to the cluster centroids. This technique effectively removes repeated structures in the tests since many repeated structures can be replaced with one centroid. In practice, SNAP is remarkably effective. For 27 real-world programs with up to half a million variables, SNAP found test suites which were 10 to 750 smaller times than those found by the prior state-of-the-art. Also, SNAP ran orders of magnitude faster and (unlike prior work) generated 100% valid tests.

preprint2021arXiv

Learning to Recognize Actionable Static Code Warnings (is Intrinsically Easy)

Static code warning tools often generate warnings that programmers ignore. Such tools can be made more useful via data mining algorithms that select the "actionable" warnings; i.e. the warnings that are usually not ignored. In this paper, we look for actionable warnings within a sample of 5,675 actionable warnings seen in 31,058 static code warnings from FindBugs. We find that data mining algorithms can find actionable warnings with remarkable ease. Specifically, a range of data mining methods (deep learners, random forests, decision tree learners, and support vector machines) all achieved very good results (recalls and AUC (TRN, TPR) measures usually over 95% and false alarms usually under 5%). Given that all these learners succeeded so easily, it is appropriate to ask if there is something about this task that is inherently easy. We report that while our data sets have up to 58 raw features, those features can be approximated by less than two underlying dimensions. For such intrinsically simple data, many different kinds of learners can generate useful models with similar performance. Based on the above, we conclude that learning to recognize actionable static code warnings is easy, using a wide range of learning algorithms, since the underlying data is intrinsically simple. If we had to pick one particular learner for this task, we would suggest linear SVMs (since, at least in our sample, that learner ran relatively quickly and achieved the best median performance) and we would not recommend deep learning (since this data is intrinsically very simple).

preprint2020arXiv

Building Very Small Test Suites (with Snap)

Software is now so large and complex that additional architecture is needed to guide theorem provers as they try to generate test suites. For example, the SNAP test suite generator (introduced in this paper) combines the Z3 theorem prover with the following tactic: sample around the average values seen in a few randomly selected valid tests. This tactic is remarkably effective. For 27 real-world programs with up to half a million variables, SNAP found test suites which were 10 to 750 smaller times than those found by the prior state-of-the-art. Also, SNAP ran orders of magnitude faster and (unlike prior work) generated 100% valid tests.

preprint2018arXiv

Data-Driven Search-based Software Engineering

This paper introduces Data-Driven Search-based Software Engineering (DSE), which combines insights from Mining Software Repositories (MSR) and Search-based Software Engineering (SBSE). While MSR formulates software engineering problems as data mining problems, SBSE reformulates SE problems as optimization problems and use meta-heuristic algorithms to solve them. Both MSR and SBSE share the common goal of providing insights to improve software engineering. The algorithms used in these two areas also have intrinsic relationships. We, therefore, argue that combining these two fields is useful for situations (a) which require learning from a large data source or (b) when optimizers need to know the lay of the land to find better solutions, faster. This paper aims to answer the following three questions: (1) What are the various topics addressed by DSE? (2) What types of data are used by the researchers in this area? (3) What research approaches do researchers use? The paper briefly sets out to act as a practical guide to develop new DSE techniques and also to serve as a teaching resource. This paper also presents a resource (tiny.cc/data-se) for exploring DSE. The resource contains 89 artifacts which are related to DSE, divided into 13 groups such as requirements engineering, software product lines, software processes. All the materials in this repository have been used in recent software engineering papers; i.e., for all this material, there exist baseline results against which researchers can comparatively assess their new ideas.