Source author record

Silvio Peroni

Silvio Peroni appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Digital Libraries Artificial Intelligence Computation and Language Social and Information Networks

Catalog footprint

What is connected

11works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Knowledge Graph Embeddings based Approach for Author Name Disambiguation using Literals

Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available as Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: 1) Multimodal KGEs, 2) A blocking procedure, and finally, 3) Hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8-14% in terms of the F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github: https://github.com/sntcristian/and-kge and Zenodo:https://doi.org/10.5281/zenodo.6309855 respectively.

preprint2022arXiv

A map of Digital Humanities research across bibliographic data sources

Purpose. This study presents the results of an experiment we performed to measure the coverage of Digital Humanities (DH) publications in mainstream open and proprietary bibliographic data sources, by further highlighting the relations among DH and other disciplines. Methodology. We created a list of DH journals based on manual curation and bibliometric data. We used that list to identify DH publications in the bibliographic data sources under consideration. We used the ERIH-PLUS list of journals to identify Social Sciences and Humanities (SSH) publications. We analysed the citation links they included to understand the relationship between DH publications and SSH and non-SSH fields. Findings. Crossref emerges as the database containing the highest number of DH publications. Citations from and to DH publications show strong connections between DH and research in Computer Science, Linguistics, Psychology, and Pedagogical & Educational Research. Computer Science is responsible for a large part of incoming and outgoing citations to and from DH research, which suggests a reciprocal interest between the two disciplines. Value. This is the first bibliometric study of DH research involving several bibliographic data sources, including open and proprietary databases. Research limitations. The list of DH journals we created might be only partially representative of broader DH research. In addition, some DH publications could have been cut off from the study since we did not consider books and other publications published in proceedings of DH conferences and workshops. Finally, we used a specific time coverage (2000-2018) that could have prevented the inclusion of additional DH publications.

preprint2022arXiv

Open bibliographic data and the Italian National Scientific Qualification: measuring coverage of academic fields

The importance of open bibliographic repositories is widely accepted by the scientific community. For evaluation processes, however, there is still some skepticism: even if large repositories of open access articles and free publication indexes exist and are continuously growing, assessment procedures still rely on proprietary databases, mainly due to the richness of the data available in these proprietary databases and the services provided by the companies they are offered by. This paper investigates the status of open bibliographic data of three of the most used open resources, namely Microsoft Academic Graph, Crossref and OpenAIRE, evaluating their potentialities as substitutes of proprietary databases for academic evaluation processes. We focused on the Italian National Scientific Qualification (NSQ), the Italian process for University Professor qualification, which uses data from commercial indexes, and investigated similarities and differences between research areas, disciplines and application roles. The main conclusion is that open datasets are ready to be used for some disciplines, among which mathematics, natural sciences, economics and statistics, even if there is still room for improvement; but there is still a large gap to fill in others - like history, philosophy, pedagogy and psychology - and a stronger effort is required from researchers and institutions.

preprint2022arXiv

OpenCitations, an open e-infrastructure to foster maximum reuse of citation data

OpenCitations is an independent not-for-profit infrastructure organization for open scholarship dedicated to the publication of open bibliographic and citation data by the use of Semantic Web (Linked Data) technologies. OpenCitations collaborates with projects that are part of the Open Science ecosystem and complies with the UNESCO founding principles of Open Science, the I4OC recommendations, and the FAIR data principles that data should be Findable, Accessible, Interoperable and Reusable. Since its data satisfies all the Reuse guidelines provided by FAIR in terms of richness, provenance, usage licenses and domain-relevant community standards, OpenCitations provides an example of a successful open e-infrastructure in which the reusability of data is integral to its mission.

preprint2022arXiv

The case for the Humanities Citation Index (HuCI): a citation index by the humanities, for the humanities

Citation indexes are by now part of the research infrastructure in use by most scientists: a necessary tool in order to cope with the increasing amounts of scientific literature being published. Commercial citation indexes are designed for the sciences and have uneven coverage and unsatisfactory characteristics for humanities scholars, while no comprehensive citation index is published by a public organization. We argue that an open citation index for the humanities is desirable, for four reasons: it would greatly improve and accelerate the retrieval of sources, it would offer a way to interlink collections across repositories (such as archives and libraries), it would foster the adoption of metadata standards and best practices by all stakeholders (including publishers) and it would contribute research data to fields such as bibliometrics and science studies. We also suggest that the citation index should be informed by a set of requirements relevant to the humanities. We discuss four such requirements: source coverage must be comprehensive, including books and citations to primary sources; there needs to be chronological depth, as scholarship in the humanities remains relevant over time; the index should be collection-driven, leveraging the accumulated thematic collections of specialized research libraries; and it should be rich in context in order to allow for the qualification of each citation, for example by providing citation excerpts. We detail the fit-for-purpose research infrastructure which can make the Humanities Citation Index a reality. Ultimately, we argue that a citation index for the humanities can be created by humanists, via a collaborative, distributed and open effort.

preprint2022arXiv

The way we cite: common metadata used across disciplines for defining bibliographic references

Current citation practices observed in articles are very noisy, confusing, and not standardised, making identifying the cited works problematic for hu-mans and any reference extraction software. In this work, we want to investigate such citation practices for referencing different types of entities and, in particular, to understand the most used metadata in bibliographic refer-ences. We identified 36 types of cited entities (the most cited ones were articles, books, and proceeding papers) within the 34,140 bibliographic references extracted from a vast set of journal articles on 27 different subject ar-eas. The analysis of such bibliographic references, grouped by the particular type of cited entities, enabled us to highlight the most used metadata for de-fining bibliographic references across the subject areas. However, we also noticed that, in some cases, bibliographic references did not provide the essential elements to identify the work they refer to easily.

preprint2021arXiv

Citing and referencing habits in Medicine and Social Sciences journals in 2019

This article explores citing and referencing systems in Social Sciences and Medicine articles from different theoretical and practical perspectives, considering bibliographic references as a facet of descriptive representation. The analysis of citing and referencing elements (i.e. bibliographic references, mentions, quotations, and respective in-text reference pointers) identified citing and referencing habits within disciplines under consideration and errors occurring over the long term as stated by previous studies now expanded. Future expected trends of information retrieval from bibliographic metadata was gathered by approaching these referencing elements from the FRBR Entities concepts. Reference styles do not fully accomplish with their role of guiding authors and publishers on providing concise and well-structured bibliographic metadata within bibliographic references. Trends on representative description revision suggest a predicted distancing on the ways information is approached by bibliographic references and bibliographic catalogs adopting FRBR concepts, including the description levels adopted by each of them under the perspective of the FRBR Entities concept. This study was based on a subset of Medicine and Social Sciences articles published in 2019 and, therefore, it may not be taken as a final and broad coverage. Future studies expanding these approaches to other disciplines and chronological periods are encouraged. By approaching citing and referencing issues as descriptive representation's facets, findings on this study may encourage further studies that will support Information Science and Computer Science on providing tools to become bibliographic metadata description simpler, better structured and more efficient facing the revision of descriptive representation actually in progress.

preprint2020arXiv

The Landscape of Ontology Reuse Approaches

Ontology reuse aims to foster interoperability and facilitate knowledge reuse. Several approaches are typically evaluated by ontology engineers when bootstrapping a new project. However, current practices are often motivated by subjective, case-by-case decisions, which hamper the definition of a recommended behaviour. In this chapter we argue that to date there are no effective solutions for supporting developers' decision-making process when deciding on an ontology reuse strategy. The objective is twofold: (i) to survey current approaches to ontology reuse, presenting motivations, strategies, benefits and limits, and (ii) to analyse two representative approaches and discuss their merits.

preprint2020arXiv

The OpenCitations Data Model

A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with different nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data supplier or context application. In this paper we present the OpenCitations Data Model (OCDM), a generic data model for describing bibliographic entities and citations, developed using Semantic Web technologies. We also evaluate the effective reusability of OCDM according to ontology evaluation practices, mention existing users of OCDM, and discuss the use and impact of OCDM in the wider open science community.

preprint2020arXiv

The practice of self-citations: a longitudinal study

In this article, we discuss the outcomes of an experiment where we analysed whether and to what extent the introduction, in 2012, of the new research assessment exercise in Italy (a.k.a. Italian Scientific Habilitation) affected self-citation behaviours in the Italian research community. The Italian Scientific Habilitation attests to the scientific maturity of researchers and in Italy, as in many other countries, is a requirement for accessing to a professorship. To this end, we obtained from ScienceDirect 35,673 articles published from 1957 and 2016 by the participants to the 2012 Italian Scientific Habilitation, that resulted in the extraction of 1,379,050 citations retrieved through Semantic Publishing technologies. Our analysis showed an overall increment in author self-citations (i.e. where the citing article and the cited article share at least one author) in several of the 24 academic disciplines considered. However, we depicted a stronger causal relation between such increment and the rules introduced by the 2012 Italian Scientific Habilitation in 10 out of 24 disciplines analysed.

preprint2019arXiv

OpenCitations, an infrastructure organization for open scholarship

OpenCitations is an infrastructure organization for open scholarship dedicated to the publication of open citation data as Linked Open Data using Semantic Web technologies, thereby providing a disruptive alternative to traditional proprietary citation indexes. Open citation data are valuable for bibliometric analysis, increasing the reproducibility of large-scale analyses by enabling publication of the source data. Following brief introductions to the development and benefits of open scholarship and to Semantic Web technologies, this paper describes OpenCitations and its datasets, tools, services and activities. These include the OpenCitations Data Model; the SPAR (Semantic Publishing and Referencing) Ontologies; OpenCitations' open software of generic applicability for searching, browsing and providing REST APIs over RDF triplestores; Open Citation Identifiers (OCIs) and the OpenCitations OCI Resolution Service; the OpenCitations Corpus (OCC), a database of open downloadable bibliographic and citation data made available in RDF under a Creative Commons public domain dedication; and the OpenCitations Indexes of open citation data, of which the first and largest is COCI, the OpenCitations Index of Crossref Open DOI-to-DOI Citations, which currently contains over 445 million bibliographic citations and is receiving considerable usage by the scholarly community.

Silvio Peroni

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

A Knowledge Graph Embeddings based Approach for Author Name Disambiguation using Literals

A map of Digital Humanities research across bibliographic data sources

Open bibliographic data and the Italian National Scientific Qualification: measuring coverage of academic fields

OpenCitations, an open e-infrastructure to foster maximum reuse of citation data

The case for the Humanities Citation Index (HuCI): a citation index by the humanities, for the humanities

The way we cite: common metadata used across disciplines for defining bibliographic references

Citing and referencing habits in Medicine and Social Sciences journals in 2019

The Landscape of Ontology Reuse Approaches

The OpenCitations Data Model

The practice of self-citations: a longitudinal study

OpenCitations, an infrastructure organization for open scholarship