Source author record

Hsinchun Chen

Hsinchun Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning cs.CY Molecular Networks Applications Computation and Language Computer Vision Information Retrieval Quantitative Methods Social and Information Networks

Catalog footprint

What is connected

6works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Counteracting Dark Web Text-Based CAPTCHA with Generative Adversarial Learning for Proactive Cyber Threat Intelligence

Automated monitoring of dark web (DW) platforms on a large scale is the first step toward developing proactive Cyber Threat Intelligence (CTI). While there are efficient methods for collecting data from the surface web, large-scale dark web data collection is often hindered by anti-crawling measures. In particular, text-based CAPTCHA serves as the most prevalent and prohibiting type of these measures in the dark web. Text-based CAPTCHA identifies and blocks automated crawlers by forcing the user to enter a combination of hard-to-recognize alphanumeric characters. In the dark web, CAPTCHA images are meticulously designed with additional background noise and variable character length to prevent automated CAPTCHA breaking. Existing automated CAPTCHA breaking methods have difficulties in overcoming these dark web challenges. As such, solving dark web text-based CAPTCHA has been relying heavily on human involvement, which is labor-intensive and time-consuming. In this study, we propose a novel framework for automated breaking of dark web CAPTCHA to facilitate dark web data collection. This framework encompasses a novel generative method to recognize dark web text-based CAPTCHA with noisy background and variable character length. To eliminate the need for human involvement, the proposed framework utilizes Generative Adversarial Network (GAN) to counteract dark web background noise and leverages an enhanced character segmentation algorithm to handle CAPTCHA images with variable character length. Our proposed framework, DW-GAN, was systematically evaluated on multiple dark web CAPTCHA testbeds. DW-GAN significantly outperformed the state-of-the-art benchmark methods on all datasets, achieving over 94.4% success rate on a carefully collected real-world dark web dataset...

preprint2022arXiv

Gene Function Prediction with Gene Interaction Networks: A Context Graph Kernel Approach

Predicting gene functions is a challenge for biologists in the post genomic era. Interactions among genes and their products compose networks that can be used to infer gene functions. Most previous studies adopt a linkage assumption, i.e., they assume that gene interactions indicate functional similarities between connected genes. In this study, we propose to use a gene's context graph, i.e., the gene interaction network associated with the focal gene, to infer its functions. In a kernel-based machine-learning framework, we design a context graph kernel to capture the information in context graphs. Our experimental study on a testbed of p53-related genes demonstrates the advantage of using indirect gene interactions and shows the empirical superiority of the proposed approach over linkage-assumption-based methods, such as the algorithm to minimize inconsistent connected genes and diffusion kernels.

preprint2022arXiv

Global Mapping of Gene/Protein Interactions in PubMed Abstracts: A Framework and an Experiment with P53 Interactions

Gene/protein interactions provide critical information for a thorough understanding of cellular processes. Recently, considerable interest and effort has been focused on the construction and analysis of genome-wide gene networks. The large body of biomedical literature is an important source of gene/protein interaction information. Recent advances in text mining tools have made it possible to automatically extract such documented interactions from free-text literature. In this paper, we propose a comprehensive framework for constructing and analyzing large-scale gene functional networks based on the gene/protein interactions extracted from biomedical literature repositories using text mining tools. Our proposed framework consists of analyses of the network topology, network topology-gene function relationship, and temporal network evolution to distill valuable information embedded in the gene functional interactions in literature. We demonstrate the application of the proposed framework using a testbed of P53-related PubMed abstracts, which shows that literature-based P53 networks exhibit small-world and scale-free properties. We also found that high degree genes in the literature-based networks have a high probability of appearing in the manually curated database and genes in the same pathway tend to form local clusters in our literature-based networks. Temporal analysis showed that genes interacting with many other genes tend to be involved in a large number of newly discovered interactions.

preprint2013arXiv

A Statistical Learning Based System for Fake Website Detection

Existing fake website detection systems are unable to effectively detect fake websites. In this study, we advocate the development of fake website detection systems that employ classification methods grounded in statistical learning theory (SLT). Experimental results reveal that a prototype system developed using SLT-based methods outperforms seven existing fake website detection systems on a test bed encompassing 900 real and fake websites.

preprint2013arXiv

Detecting Fake Escrow Websites using Rich Fraud Cues and Kernel Based Methods

The ability to automatically detect fraudulent escrow websites is important in order to alleviate online auction fraud. Despite research on related topics, fake escrow website categorization has received little attention. In this study we evaluated the effectiveness of various features and techniques for detecting fake escrow websites. Our analysis included a rich set of features extracted from web page text, image, and link information. We also proposed a composite kernel tailored to represent the properties of fake websites, including content duplication and structural attributes. Experiments were conducted to assess the proposed features, techniques, and kernels on a test bed encompassing nearly 90,000 web pages derived from 410 legitimate and fake escrow sites. The combination of an extended feature set and the composite kernel attained over 98% accuracy when differentiating fake sites from real ones, using the support vector machines algorithm. The results suggest that automated web-based information systems for detecting fake escrow sites could be feasible and may be utilized as authentication mechanisms.

preprint2013arXiv

Evaluating the Usefulness of Sentiment Information for Focused Crawlers

Despite the prevalence of sentiment-related content on the Web, there has been limited work on focused crawlers capable of effectively collecting such content. In this study, we evaluated the efficacy of using sentiment-related information for enhanced focused crawling of opinion-rich web content regarding a particular topic. We also assessed the impact of using sentiment-labeled web graphs to further improve collection accuracy. Experimental results on a large test bed encompassing over half a million web pages revealed that focused crawlers utilizing sentiment information as well as sentiment-labeled web graphs are capable of gathering more holistic collections of opinion-related content regarding a particular topic. The results have important implications for business and marketing intelligence gathering efforts in the Web 2.0 era.

Hsinchun Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Counteracting Dark Web Text-Based CAPTCHA with Generative Adversarial Learning for Proactive Cyber Threat Intelligence

Gene Function Prediction with Gene Interaction Networks: A Context Graph Kernel Approach

Global Mapping of Gene/Protein Interactions in PubMed Abstracts: A Framework and an Experiment with P53 Interactions

A Statistical Learning Based System for Fake Website Detection

Detecting Fake Escrow Websites using Rich Fraud Cues and Kernel Based Methods

Evaluating the Usefulness of Sentiment Information for Focused Crawlers