Source author record

Min Song

Min Song appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Digital Libraries Artificial Intelligence Computation and Language Distributed, Parallel, and Cluster Computing eess.SP hep-th physics.optics physics.soc-ph

Catalog footprint

What is connected

9works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Read, Grep, and Synthesize: Diagnosing Cross-Domain Seed Exposure for LLM Research Ideation

The discovery of novel methodologies for emerging problems is a continuing cycle in ML, often driven by the migration of techniques across domains. Building on this observation, we ask whether current LLM ideation systems benefit from targeted cross-domain retrieval or simply from exposure to diverse mechanisms. We study this question through PaperGym, a three-stage pipeline: (1) tool-augmented seed extraction via read, grep, and bash over an isolated paper environment, (2) cross-domain seed retrieval via paraphrasing across seven ML domains, and (3) method synthesis from retrieved seeds, each scored by rubric-based judges. Tool-augmented extraction improves specificity, and paraphrase-based retrieval broadens domain coverage. In synthesis, cross-domain retrieval receives more pairwise novelty wins than no-retrieval and same-domain baselines, but shows no significant difference from a random diverse-seed control. These findings suggest LLM ideation systems benefit from diverse seed exposure, but do not yet reliably exploit the semantic reason particular seeds were retrieved. We release the seed library, rubric prompts, and run scripts at https://github.com/yunjoochoi/PaperGym

preprint2023arXiv

Topic Segmentation Model Focusing on Local Context

Topic segmentation is important in understanding scientific documents since it can not only provide better readability but also facilitate downstream tasks such as information retrieval and question answering by creating appropriate sections or paragraphs. In the topic segmentation task, topic coherence is critical in predicting segmentation boundaries. Most of the existing models have tried to exploit as many contexts as possible to extract useful topic-related information. However, additional context does not always bring promising results, because the local context between sentences becomes incoherent despite more sentences being supplemented. To alleviate this issue, we propose siamese sentence embedding layers which process two input sentences independently to get appropriate amount of information without being hampered by excessive information. Also, we adopt multi-task learning techniques including Same Topic Prediction (STP), Topic Classification (TC) and Next Sentence Prediction (NSP). When these three classification layers are combined in a multi-task manner, they can make up for each other's limitations, improving performance in all three tasks. We experiment different combinations of the three layers and report how each layer affects other layers in the same combination as well as the overall segmentation performance. The model we proposed achieves the state-of-the-art result in the WikiSection dataset.

preprint2020arXiv

Building a PubMed knowledge graph

PubMed is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguated, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID, and identifying fine-grained affiliation data from MapAffil. Through the integration of the credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving a F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities. The PKG is freely available on Figshare (https://figshare.com/s/6327a55355fc2c99f3a2, simplified version that exclude PubMed raw data) and TACC website (http://er.tacc.utexas.edu/datasets/ped, full version).

preprint2020arXiv

Holographic Complexity Growth Rate in a dual FLRW Universe

In this paper, taking the large $R$ limit and using the complexity-volume duality, we investigate the holographic complexity growth rate of a field state defined on the universe located at an asymptotical AdS boundary in Gauss-Bonnet gravity and massive gravity, respectively. For the Gauss-Bonnet gravity case, its growth behavior of the state mainly presents three kinds of contributions: one, as a finite term viewed as an interaction term, comes from a conserved charge, the second one is from the spatial volume of the universe and the third one relates the curvature of the horizon in the AdS Gauss-Bonnet black hole, where the Gauss-Bonnet effect plays a vital role on such growth rate. For massive gravity case, except the first divergent term still obeying the growth rate of the spatial volume of the Universe, its results reveal the more interesting novel phenomenons: beside the conserved charge $E$, the graviton mass term also provides its effect to the finite term; and the third divergent term is determined by the spatial curvature of its horizon $k$ and graviton mass effect; furthermore, the graviton mass effect can be completely responsible for the second divergent term as a new additional term saturating an area law.

preprint2020arXiv

The Pace of Artificial Intelligence Innovations: Speed, Talent, and Trial-and-Error

Innovations in artificial intelligence (AI) are occurring at speeds faster than ever witnessed before. However, few studies have managed to measure or depict this increasing velocity of innovations in the field of AI. In this paper, we combine data on AI from arXiv and Semantic Scholar to explore the pace of AI innovations from three perspectives: AI publications, AI players, and AI updates (trial and error). A research framework and three novel indicators, Average Time Interval (ATI), Innovation Speed (IS) and Update Speed (US), are proposed to measure the pace of innovations in the field of AI. The results show that: (1) in 2019, more than 3 AI preprints were submitted to arXiv per hour, over 148 times faster than in 1994. Furthermore, there was one deep learning-related preprint submitted to arXiv every 0.87 hours in 2019, over 1,064 times faster than in 1994. (2) For AI players, 5.26 new researchers entered into the field of AI each hour in 2019, more than 175 times faster than in the 1990s. (3) As for AI updates (trial and error), one updated AI preprint was submitted to arXiv every 41 days, with around 33% of AI preprints having been updated at least twice in 2019. In addition, as reported in 2019, it took, on average, only around 0.2 year for AI preprints to receive their first citations, which is 5 times faster than 2000-2007. This swift pace in AI illustrates the increase in popularity of AI innovation. The systematic and fine-grained analysis of the AI field enabled to portrait the pace of AI innovation and demonstrated that the proposed approach can be adopted to understand other fast-growing fields such as cancer research and nano science.

preprint2018arXiv

DTER: Schedule Optimal RF Energy Request and Harvest for Internet of Things

We propose a new energy harvesting strategy that uses a dedicated energy source (ES) to optimally replenish energy for radio frequency (RF) energy harvesting powered Internet of Things. Specifically, we develop a two-step dual tunnel energy requesting (DTER) strategy that minimizes the energy consumption on both the energy harvesting device and the ES. Besides the causality and capacity constraints that are investigated in the existing approaches, DTER also takes into account the overhead issue and the nonlinear charge characteristics of an energy storage component to make the proposed strategy practical. Both offline and online scenarios are considered in the second step of DTER. To solve the nonlinear optimization problem of the offline scenario, we convert the design of offline optimal energy requesting problem into a classic shortest path problem and thus a global optimal solution can be obtained through dynamic programming (DP) algorithms. The online suboptimal transmission strategy is developed as well. Simulation study verifies that the online strategy can achieve almost the same energy efficiency as the global optimal solution in the long term.

preprint2013arXiv

Entitymetrics: Measuring the Impact of Entities

This paper proposes entitymetrics to measure the impact of knowledge units. Entitymetrics highlight the importance of entities embedded in scientific literature for further knowledge discovery. In this paper, we use Metformin, a drug for diabetes, as an example to form an entity-entity citation network based on literature related to Metformin. We then calculate the network features and compare the centrality ranks of biological entities with results from Comparative Toxicogenomics Database (CTD). The comparison demonstrates the usefulness of entitymetrics to detect most of the outstanding interactions manually curated in CTD.

preprint2013arXiv

Real-time Data Collection Scheduling in Multi-hop Wireless Sensor Networks

We study real time periodic query scheduling for data collection in multihop Wireless Sensor Networks (WSNs). Given a set of heterogenous data collection queries in WSNs, each query requires the data from the source sensor nodes to be collected to the control center within a certain end-to-end delay. We first propose almost-tight necessary conditions for a set of different queries to be schedulable by a WSN. We then develop a family of efficient and effective data collection algorithms that can meet the real-time requirement under resource constraints by addressing three tightly coupled tasks: (1) routing tree construction for data collection, (2) link activity scheduling, and (3) packet-level scheduling. Our theoretical analysis for the schedulability of these algorithms show that they can achieve a constant fraction of the maximum schedulable load. For the case of overloaded networks where not all queries can be possibly satisfied, we propose an efficient approximation algorithm to select queries to maximize the total weight of selected schedulable queries. The simulations corroborate our theoretical analysis.

preprint2013arXiv

Tunable Fano resonances in heterogenous Al-Ag nanorod dimer

We theoretically investigate the plasmonic coupling in heterogenous Al-Ag nanorod dimer. A pronounced Fano dip is found in the extinction spectrum produced by the destructive interference between the bright dipole mode from a short Al nanorod and the dark quadrupole mode from a long Ag nanorod nearby. This Fano resonance can be widely tuned in both wavelength and amplitude by varying the rod dimensions, the separation distance and the local dielectric environment. The Al-Ag heterogeneous nanorod dimer shows a high sensitivity to the surrounding environment with a local surface plasmon resonance figure of merit of 7.0, which enables its promising applications in plasmonic sensing and detection.

Min Song

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Read, Grep, and Synthesize: Diagnosing Cross-Domain Seed Exposure for LLM Research Ideation

Topic Segmentation Model Focusing on Local Context

Building a PubMed knowledge graph

Holographic Complexity Growth Rate in a dual FLRW Universe

The Pace of Artificial Intelligence Innovations: Speed, Talent, and Trial-and-Error

DTER: Schedule Optimal RF Energy Request and Harvest for Internet of Things

Entitymetrics: Measuring the Impact of Entities

Real-time Data Collection Scheduling in Multi-hop Wireless Sensor Networks

Tunable Fano resonances in heterogenous Al-Ag nanorod dimer