Source author record

Giancarlo Crocetti

Giancarlo Crocetti appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

6works
7topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2016arXiv

Identifying Structures in Social Conversations in NSCLC Patients through the Semi-Automatic extraction of Topical Taxonomies

The exploration of social conversations for addressing patient's needs is an important analytical task in which many scholarly publications are contributing to fill the knowledge gap in this area. The main difficulty remains the inability to turn such contributions into pragmatic processes the pharmaceutical industry can leverage in order to generate insight from social media data, which can be considered as one of the most challenging source of information available today due to its sheer volume and noise. This study is based on the work by Scott Spangler and Jeffrey Kreulen and applies it to identify structure in social media through the extraction of a topical taxonomy able to capture the latent knowledge in social conversations in health-related sites. The mechanism for automatically identifying and generating a taxonomy from social conversations is developed and pressured tested using public data from media sites focused on the needs of cancer patients and their families. Moreover, a novel method for generating the category's label and the determination of an optimal number of categories is presented which extends Scott and Jeffrey's research in a meaningful way. We assume the reader is familiar with taxonomies, what they are and how they are used.

preprint2015arXiv

A Multivariate Biomarker for Parkinson's Disease

In this study, we executed a genomic analysis with the objective of selecting a set of genes (possibly small) that would help in the detection and classification of samples from patients affected by Parkinson Disease. We performed a complete data analysis and during the exploratory phase, we selected a list of differentially expressed genes. Despite their association with the diseased state, we could not use them as a biomarker tool. Therefore, our research was extended to include a multivariate analysis approach resulting in the identification and selection of a group of 20 genes that showed a clear potential in detecting and correctly classify Parkinson Disease samples even in the presence of other neurodegenerative disorders.

preprint2015arXiv

Textual Spatial Cosine Similarity

When dealing with document similarity many methods exist today, like cosine similarity. More complex methods are also available based on the semantic analysis of textual information, which are computationally expensive and rarely used in the real time feeding of content as in enterprise-wide search environments. To address these real-time constraints, we developed a new measure of document similarity called Textual Spatial Cosine Similarity, which is able to detect similitude at the semantic level using word placement information contained in the document. We will see in this paper that two degenerate cases exist for this model, which coincide with Cosine Similarity on one side and with a paraphrasing detection model to the other.

preprint2015arXiv

Topical Discovery of Web Content

This work describes the theory and the implementation of a new software tool, the "Web Topical Discovery System" (WTDS), which provides an approach to the automatic discovery and selection of new web pages relevant to specific analytical needs. We will see how it is possible to specify the research context with search keywords related to the area of interest and consider the important problem of removing extraneous data from a web page containing an article in order to reduce, to a minimum, false positives represented by a match on a keyword that is showing up on the latest news box of the same page. The removal of duplicates, the analysis of richness of information contained in the article and lexical diversity are all taken into consideration in order to provide the optimum set of recommendations to the end user or system.

preprint2015arXiv

Transforming Telemedicine Through Big Data Analytics

A look at how big data is transforming telemedicine to provide better care by tapping into a larger source of patient information. Telemedicine will have a profound impact on patient care, increase access and quality, and represent an opportunity to keep health care costs down. Data generated by smart devices will enable the real-time monitoring of chronic diseases, allowing optimal dosage of drugs and improve patient outcomes.

preprint2015arXiv

Using Ensemble Models in the Histological Examination of Tissue Abnormalities

Classification models for the automatic detection of abnormalities on histological samples do exists, with an active debate on the cost associated with false negative diagnosis (underdiagnosis) and false positive diagnosis (overdiagnosis). Current models tend to underdiagnose, failing to recognize a potentially fatal disease. The objective of this study is to investigate the possibility of automatically identifying abnormalities in tissue samples through the use of an ensemble model on data generated by histological examination and to minimize the number of false negative cases.