Source author record

Mike Thelwall

Mike Thelwall appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Digital Libraries Social and Information Networks Computation and Language cs.CY physics.soc-ph Human-Computer Interaction Information Retrieval

Catalog footprint

What is connected

31works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations' COCI: a multidisciplinary comparison of coverage via citations

New sources of citation data have recently become available, such as Microsoft Academic, Dimensions, and the OpenCitations Index of CrossRef open DOI-to-DOI citations (COCI). Although these have been compared to the Web of Science (WoS), Scopus, or Google Scholar, there is no systematic evidence of their differences across subject categories. In response, this paper investigates 3,073,351 citations found by these six data sources to 2,515 English-language highly-cited documents published in 2006 from 252 subject categories, expanding and updating the largest previous study. Google Scholar found 88% of all citations, many of which were not found by the other sources, and nearly all citations found by the remaining sources (89%-94%). A similar pattern held within most subject categories. Microsoft Academic is the second largest overall (60% of all citations), including 82% of Scopus citations and 86% of Web of Science citations. In most categories, Microsoft Academic found more citations than Scopus and WoS (182 and 223 subject categories, respectively), but had coverage gaps in some areas, such as Physics and some Humanities categories. After Scopus, Dimensions is fourth largest (54% of all citations), including 84% of Scopus citations and 88% of WoS citations. It found more citations than Scopus in 36 categories, more than WoS in 185, and displays some coverage gaps, especially in the Humanities. Following WoS, COCI is the smallest, with 28% of all citations. Google Scholar is still the most comprehensive source. In many subject categories Microsoft Academic and Dimensions are good alternatives to Scopus and WoS in terms of coverage.

preprint2020arXiv

A gender equality paradox in academic publishing: Countries with a higher proportion of female first-authored journal articles have larger first author gender disparities between fields

Current attempts to address the shortfall of female researchers in Science, Technology, Engineering and Mathematics (STEM) have not yet succeeded despite other academic subjects having female majorities. This article investigates the extent to which gender disparities are subject-wide or nation-specific by a first author gender comparison of 30 million articles from all 27 Scopus broad fields within the 31 countries with the most Scopus-indexed articles 2014-18. The results show overall and geocultural patterns as well as individual national differences. Almost half of the subjects were always more male (7; e.g., Mathematics) or always more female (6; e.g., Immunology & Microbiology) than the national average. A strong overall trend (Spearman correlation 0.546) is for countries with a higher proportion of female first-authored research to also have larger differences in gender disparities between fields (correlation 0.314 for gender ratios). This confirms the international gender equality paradox previously found for degree subject choices: increased gender equality overall associates with moderately greater gender differentiation between subjects. This is consistent with previous USA-based claims that gender differences in academic careers are partly due to (socially constrained) gender differences in personal preferences. Radical solutions may therefore be needed for some STEM subjects to overcome gender disparities.

preprint2020arXiv

All downhill from the PhD? The typical impact trajectory of US academic careers

Within academia, mature researchers tend to be more senior, but do they also tend to write higher impact articles? This article assesses long-term publishing (16+ years) United States (US) researchers, contrasting them with shorter-term publishing researchers (1, 6 or 10 years). A long-term US researcher is operationalised as having a first Scopus-indexed journal article in exactly 2001 and one in 2016-2019, with US main affiliations in their first and last articles. Researchers publishing in large teams (11+ authors) were excluded. The average field and year normalised citation impact of long- and shorter-term US researchers' journal articles decreases over time relative to the national average, with especially large falls to the last articles published that may be at least partly due to a decline in self-citations. In many cases researchers start by publishing above US average citation impact research and end by publishing below US average citation impact research. Thus, research managers should not assume that senior researchers will usually write the highest impact papers.

preprint2020arXiv

Coronavirus research before 2020 is more relevant than ever, especially when interpreted for COVID-19

The speed with which biomedical researchers were able to identify and characterise COVID-19 was clearly due to prior research with other coronaviruses. Early epidemiological comparisons with two previous coronaviruses, Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS), also made it easier to predict COVID-19's likely spread and lethality. This article assesses whether academic interest in prior coronavirus research has translated into interest in the primary source material, using Mendeley reader counts for early academic impact evidence. The results confirm that SARS and MERS research 2008-2017 experienced anomalously high increases in Mendeley readers in April-May 2020. Nevertheless, studies learning COVID-19 lessons from SARS and MERS or using them as a benchmark for COVID-19 have generated much more academic interest than primary studies of SARS or MERS. Thus, research that interprets prior relevant research for new diseases when they are discovered seems to be particularly important to help researchers to understand its implications in the new context.

preprint2020arXiv

COVID-19 publications: Database coverage, citations, readers, tweets, news, Facebook walls, Reddit posts

The COVID-19 pandemic requires a fast response from researchers to help address biological, medical and public health issues to minimize its impact. In this rapidly evolving context, scholars, professionals and the public may need to quickly identify important new studies. In response, this paper assesses the coverage of scholarly databases and impact indicators during 21 March to 18 April 2020. The results confirm a rapid increase in the volume of research, which particularly accessible through Google Scholar and Dimensions, and less through Scopus, the Web of Science, PubMed. A few COVID-19 papers from the 21,395 in Dimensions were already highly cited, with substantial news and social media attention. For this topic, in contrast to previous studies, there seems to be a high degree of convergence between articles shared in the social web and citation counts, at least in the short term. In particular, articles that are extensively tweeted on the day first indexed are likely to be highly read and relatively highly cited three weeks later. Researchers needing wide scope literature searches (rather than health focused PubMed or medRxiv searches) should start with Google Scholar or Dimensions and can use tweet and Mendeley reader counts as indicators of likely importance.

preprint2020arXiv

Covid-19 Tweeting in English: Gender Differences

At the start of 2020, COVID-19 became the most urgent threat to global public health. Uniquely in recent times, governments have imposed partly voluntary, partly compulsory restrictions on the population to slow the spread of the virus. In this context, public attitudes and behaviors are vitally important for reducing the death rate. Analyzing tweets about the disease may therefore give insights into public reactions that may help guide public information campaigns. This article analyses 3,038,026 English tweets about COVID-19 from March 10 to 23, 2020. It focuses on one relevant aspect of public reaction: gender differences. The results show that females are more likely to tweet about the virus in the context of family, social distancing and healthcare whereas males are more likely to tweet about sports cancellations, the global spread of the virus and political reactions. Thus, women seem to be taking a disproportionate share of the responsibility for directly keeping the population safe. The detailed results may be useful to inform public information announcements and to help understand the spread of the virus. For example, failure to impose a sporting bans whilst encouraging social distancing may send mixed messages to males.

preprint2020arXiv

Pot, kettle: Nonliteral titles aren't (natural) science

Researchers may be tempted to attract attention through poetic titles for their publications, but would this be mistaken in some fields? Whilst poetic titles are known to be common in medicine, it is not clear whether the practice is widespread elsewhere. This article investigates the prevalence of poetic expressions in journal article titles 1996-2019 in 3.3 million articles from all 27 Scopus broad fields. Expressions were identified by manually checking all phrases with at least 5 words that occurred at least 25 times, finding 149 stock phrases, idioms, sayings, literary allusions, film names and song titles or lyrics. The expressions found are most common in the social sciences and the humanities. They are also relatively common in medicine, but almost absent from engineering and the natural and formal sciences. The differences may reflect the less hierarchical and more varied nature of the social sciences and humanities, where interesting titles may attract an audience. In engineering, natural science and formal science fields, authors should take extra care with poetic expressions, in case their choice is judged inappropriate. This includes interdisciplinary research overlapping these areas. Conversely, reviewers of interdisciplinary research involving the social sciences should be more tolerant of poetic license.

preprint2019arXiv

Should Citations be Counted Separately from Each Originating Section

Articles are cited for different purposes and differentiating between reasons when counting citations may therefore give finer-grained citation count information. Although identifying and aggregating the individual reasons for each citation may be impractical, recording the number of citations that originate from different article sections might illuminate the general reasons behind a citation count (e.g., 110 citations = 10 Introduction citations + 100 Methods citations). To help investigate whether this could be a practical and universal solution, this article compares 19 million citations with DOIs from six different standard sections in 799,055 PubMed Central open access articles across 21 out of 22 fields. There are apparently non-systematic differences between fields in the most citing sections and the extent to which citations from one section overlap with citations from another, with some degree of overlap in most cases. Thus, at a science-wide level, section headings are partly unreliable indicators of citation context, even if they are more standard within individual fields. They may still be used within fields to help identify individual highly cited articles that have had one type of impact, especially methodological (Methods) or context setting (Introduction), but expert judgement is needed to validate the results.

preprint2019arXiv

The rhetorical structure of science? A multidisciplinary analysis of article headings

An effective structure helps an article to convey its core message. The optimal structure depends on the information to be conveyed and the expectations of the audience. In the current increasingly interdisciplinary era, structural norms can be confusing to the authors, reviewers and audiences of scientific articles. Despite this, no prior study has attempted to assess variations in the structure of academic papers across all disciplines. This article reports on the headings commonly used by over 1 million research articles from the PubMed Central Open Access collection, spanning 22 broad categories covering all academia and 172 out of 176 narrow categories. The results suggest that no headings are close to ubiquitous in any broad field and that there are substantial differences in the extent to which most headings are used. In the humanities, headings may be avoided altogether. Researchers should therefore be aware of unfamiliar structures that are nevertheless legitimate when reading, writing and reviewing articles.

preprint2016arXiv

Are the discretised lognormal and hooked power law distributions plausible for citation data?

There is no agreement over which statistical distribution is most appropriate for modelling citation count data. This is important because if one distribution is accepted then the relative merits of different citation-based indicators, such as percentiles, arithmetic means and geometric means, can be more fully assessed. In response, this article investigates the plausibility of the discretised lognormal and hooked power law distributions for modelling the full range of citation counts, with an offset of 1. The citation counts from 23 Scopus subcategories were fitted to hooked power law and discretised lognormal distributions but both distributions failed a Kolmogorov-Smirnov goodness of fit test in over three quarters of cases. The discretised lognormal distribution also seems to have the wrong shape for citation distributions, with too few zeros and not enough medium values for all subjects. The cause of poor fits could be the impurity of the subject subcategories or the presence of interdisciplinary research. Although it is possible to test for subject subcategory purity indirectly through a goodness of fit test in theory with large enough sample sizes, it is probably not possible in practice. Hence it seems difficult to get conclusive evidence about the theoretically most appropriate statistical distribution.

preprint2016arXiv

Are there too many uncited articles? Zero inflated variants of the discretised lognormal and hooked power law distributions

Although statistical models fit many citation data sets reasonably well with the best fitting models being the hooked power law and discretised lognormal distribution, the fits are rarely close. One possible reason is that there might be more uncited articles than would be predicted by any model if some articles are inherently uncitable. Using data from 23 different Scopus categories, this article tests the assumption that removing a proportion of uncited articles from a citation dataset allows statistical distributions to have much closer fits. It also introduces two new models, zero inflated discretised lognormal distribution and the zero inflated hooked power law distribution and algorithms to fit them. In all 23 cases, the zero inflated version of the discretised lognormal distribution was an improvement on the standard version and in 15 out of 23 cases the zero inflated version of the hooked power law was an improvement on the standard version. Without zero inflation the discretised lognormal models fit the data better than the hooked power law distribution 6 out of 23 times and with it, the discretised lognormal models fit the data better than the hooked power law distribution 9 out of 23 times. Apparently uncitable articles seem to occur due to the presence of academic-related magazines in Scopus categories. In conclusion, future citation analysis and research indicators should take into account uncitable articles, and the best fitting distribution for sets of citation counts from a single subject and year is either the zero inflated discretised lognormal or zero inflated hooked power law.

preprint2016arXiv

Citation count distributions for large monodisciplinary journals

Many different citation-based indicators are used by researchers and research evaluators to help evaluate the impact of scholarly outputs. Although the appropriateness of individual citation indicators depends in part on the statistical properties of citation counts, there is no universally agreed best-fitting statistical distribution against which to check them. The two current leading candidates are the discretised lognormal and the hooked or shifted power law. These have been mainly tested on sets of articles from a single field and year but these collections can include multiple specialisms that might dilute their properties. This article fits statistical distributions to 50 large subject-specific journals in the belief that individual journals can be purer than subject categories and may therefore give clearer findings. The results show that in most cases the discretised lognormal fits significantly better than the hooked power law, reversing previous findings for entire subcategories. This suggests that the discretised lognormal is the more appropriate distribution for modelling pure citation data. Thus future analytical investigations of the properties of citation indicators can use the lognormal distribution to analyse their basic properties. This article also includes improved software for fitting the hooked power law.

preprint2016arXiv

Not dead, just resting: The practical value of per publication citation indicators

In the final analysis citation-based indicators are inferior to effective peer review and even peer review is flawed. It is impossible to accurately measure the value or impact of scientific research and a key task of scientometricians should be to produce figures for policy makers and others that are as informative as it is practical to make them and to ensure that users are fully aware of their limitations. Although the Abramo and D'Angelo (2016) suggestions make a lot of theoretical sense and so are a goal that is worth aiming for, it is unrealistic in practice to advocate their universal use in the contexts discussed above. This is because the indicators would still have flaws in addition to the generic limitations of citation-based indicators and would still be inadequate for replacing peer review. Thus, the expense of the data gathering does not always justify the value in practice of the extra accuracy. In the longer term, the restructuring of education needed in order to get the homogeneity necessary for genuinely comparable statistics would be too expensive and probably damaging to the research mission, in addition to being out of proportion to the likely value of any citation-based indicator.

preprint2016arXiv

TensiStrength: Stress and relaxation magnitude detection for social media texts

Computer systems need to be able to react to stress in order to perform optimally on some tasks. This article describes TensiStrength, a system to detect the strength of stress and relaxation expressed in social media text messages. TensiStrength uses a lexical approach and a set of rules to detect direct and indirect expressions of stress or relaxation, particularly in the context of transportation. It is slightly more effective than a comparable sentiment analysis program, although their similar performances occur despite differences on almost half of the tweets gathered. The effectiveness of TensiStrength depends on the nature of the tweets classified, with tweets that are rich in stress-related terms being particularly problematic. Although generic machine learning methods can give better performance than TensiStrength overall, they exploit topic-related terms in a way that may be undesirable in practical applications and that may not work as well in more focused contexts. In conclusion, TensiStrength and generic machine learning approaches work well enough to be practical choices for intelligent applications that need to take advantage of stress information, and the decision about which to use depends on the nature of the texts analysed and the purpose of the task.

preprint2016arXiv

The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression

Identifying the statistical distribution that best fits citation data is important to allow robust and powerful quantitative analyses. Whilst previous studies have suggested that both the hooked power law and discretised lognormal distributions fit better than the power law and negative binomial distributions, no comparisons so far have covered all articles within a discipline, including those that are uncited. Based on an analysis of 26 different Scopus subject areas in seven different years, this article reports comparisons of the discretised lognormal and the hooked power law with citation data, adding 1 to citation counts in order to include zeros. The hooked power law fits better in two thirds of the subject/year combinations tested for journal articles that are at least three years old, including most medical, life and natural sciences, and for virtually all subject areas for younger articles. Conversely, the discretised lognormal tends to fit best for arts, humanities, social science and engineering fields. The difference between the fits of the distributions is mostly small, however, and so either could reasonably be used for modelling citation data. For regression analyses, however, the best option is to use ordinary least squares regression applied to the natural logarithm of citation counts plus one, especially for sets of younger articles, because of the increased precision of the parameters.

preprint2016arXiv

Three practical field normalised alternative indicator formulae for research evaluation

Although altmetrics and other web-based alternative indicators are now commonplace in publishers' websites, they can be difficult for research evaluators to use because of the time or expense of the data, the need to benchmark in order to assess their values, the high proportion of zeros in some alternative indicators, and the time taken to calculate multiple complex indicators. These problems are addressed here by (a) a field normalisation formula, the Mean Normalised Log-transformed Citation Score (MNLCS) that allows simple confidence limits to be calculated and is similar to a proposal of Lundberg, (b) field normalisation formulae for the proportion of cited articles in a set, the Equalised Mean-based Normalised Proportion Cited (EMNPC) and the Mean-based Normalised Proportion Cited (MNPC), to deal with mostly uncited data sets, (c) a sampling strategy to minimise data collection costs, and (d) free unified software to gather the raw data, implement the sampling strategy, and calculate the indicator formulae and confidence limits. The approach is demonstrated (but not fully tested) by comparing the Scopus citations, Mendeley readers and Wikipedia mentions of research funded by Wellcome, NIH, and MRC in three large fields for 2013-2016. Within the results, statistically significant differences in both citation counts and Mendeley reader counts were found even for sets of articles that were less than six months old. Mendeley reader counts were more precise than Scopus citations for the most recent articles and all three funders could be demonstrated to have an impact in Wikipedia that was significantly above the world average.

preprint2015arXiv

Distributions for cited articles from individual subjects and years

The citations to a set of academic articles are typically unevenly shared, with many articles attracting few citations and few attracting many. It is important to know more precisely how citations are distributed in order to help statistical analyses of citations, especially for sets of articles from a single discipline and a small range of years, as normally used for research evaluation. This article fits discrete versions of the power law, the lognormal distribution and the hooked power law to 20 different Scopus categories, using citations to articles published in 2004 and ignoring uncited articles. The results show that, despite its popularity, the power law is not a suitable model for collections of articles from a single subject and year, even for the purpose of estimating the slope of the tail of the citation data. Both the hooked power law and the lognormal distributions fit best for some subjects but neither is a universal optimal choice and parameter estimates for both seem to be unreliable. Hence only the hooked power law and discrete lognormal distributions should be considered for subject-and-year-based citation analysis in future and parameter estimates should always be interpreted cautiously.

preprint2015arXiv

Geometric journal impact factors correcting for individual highly cited articles

Journal impact factors (JIFs) are widely used and promoted but have important limitations. In particular, JIFs can be unduly influenced by individual highly cited articles and hence are inherently unstable. A logical way to reduce the impact of individual high citation counts is to use the geometric mean rather than the standard mean in JIF calculations. Based upon journal rankings 2004-2014 in 50 sub-categories within 5 broad categories, this study shows that journal rankings based on JIF variants tend to be more stable over time if the geometric mean is used rather than the standard mean. The same is true for JIF variants using Mendeley reader counts instead of citation counts. Thus, although the difference is not large, the geometric mean is recommended instead of the arithmetic mean for future JIF calculations. In addition, Mendeley readership-based JIF variants are as stable as those using Scopus citations, confirming the value of Mendeley readership as an academic impact indicator.

preprint2015arXiv

More Precise Methods for National Research Citation Impact Comparisons

Governments sometimes need to analyse sets of research papers within a field in order to monitor progress, assess the effect of recent policy changes, or identify areas of excellence. They may compare the average citation impacts of the papers by dividing them by the world average for the field and year. Since citation data is highly skewed, however, simple averages may be too imprecise to robustly identify differences within, rather than across, fields. In response, this article introduces two new methods to identify national differences in average citation impact, one based on linear modelling for normalised data and the other using the geometric mean. Results from a sample of 26 Scopus fields between 2009-2015 show that geometric means are the most precise and so are recommended for smaller sample sizes, such as for individual fields. The regression method has the advantage of distinguishing between national contributions to internationally collaborative articles, but has substantially wider confidence intervals than the geometric mean, undermining its value for any except the largest sample sizes.

preprint2015arXiv

National, disciplinary and temporal variations in the extent to which articles with more authors have more impact: Evidence from a geometric field normalised citation indicator

The importance of collaboration in research is widely accepted, as is the fact that articles with more authors tend to be more cited. Nevertheless, although previous studies have investigated whether the apparent advantage of collaboration varies by country, discipline, and number of co-authors, this study introduces a more fine-grained method to identify differences: the geometric Mean Normalized Citation Score (gMNCS). Based on comparisons between disciplines, years and countries for two million journal articles, the average citation impact of articles increases with the number of authors, even when international collaboration is excluded. This apparent advantage of collaboration varies substantially by discipline and country and changes a little over time. Against the trend, however, in Russia solo articles have more impact. Across the four broad disciplines examined, collaboration had by far the strongest association with impact in the arts and humanities. Although international comparisons are limited by the availability of systematic data for author country affiliations, the new indicator is the most precise yet and can give statistical evidence rather than estimates.

preprint2015arXiv

Regression for citation data: An evaluation of different methods

Citations are increasingly used for research evaluations. It is therefore important to identify factors affecting citation scores that are unrelated to scholarly quality or usefulness so that these can be taken into account. Regression is the most powerful statistical technique to identify these factors and hence it is important to identify the best regression strategy for citation data. Citation counts tend to follow a discrete lognormal distribution and, in the absence of alternatives, have been investigated with negative binomial regression. Using simulated discrete lognormal data (continuous lognormal data rounded to the nearest integer) this article shows that a better strategy is to add one to the citations, take their log and then use the general linear (ordinary least squares) model for regression (e.g., multiple linear regression, ANOVA), or to use the generalised linear model without the log. Reasonable results can also be obtained if all the zero citations are discarded, the log is taken of the remaining citation counts and then the general linear model is used, or if the generalised linear model is used with the continuous lognormal distribution. Similar approaches are recommended for altmetric data, if it proves to be lognormally distributed.

preprint2015arXiv

The influence of time and discipline on the magnitude of correlations between citation counts and quality scores

Although various citation-based indicators are commonly used to help research evaluations, there are ongoing controversies about their value. In response, they are often correlated with quality ratings or with other quantitative indicators in order to partly assess their validity. When correlations are calculated for sets of publications from multiple disciplines or years, however, the magnitude of the correlation coefficient may be reduced, masking the strength of the underlying correlation. In response, this article uses simulations to systematically investigate the extent to which mixing years or disciplines reduces correlations. The results show that mixing two sets of articles with different correlation strengths can reduce the correlation for the combined set to substantially below the average of the two. Moreover, even mixing two sets of articles with the same correlation strength but different mean citation counts can substantially reduce the correlation for the combined set. The extent of the reduction in correlation also depends upon whether the articles assessed have been pre-selected for being high quality and whether the relationship between the quality ratings and citation counts is linear or exponential. The results underline the importance of using homogeneous data sets but also help to interpret correlation coefficients when this is impossible.

preprint2015arXiv

The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach

When comparing the citation impact of nations, departments or other groups of researchers within individual fields, three approaches have been proposed: arithmetic means, geometric means, and percentage in the top X%. This article compares the precision of these statistics using 97 trillion experimentally simulated citation counts from 6875 sets of different parameters (although all having the same scale parameter) based upon the discretised lognormal distribution with limits from 1000 repetitions for each parameter set. The results show that the geometric mean is the most precise, closely followed by the percentage of a country's articles in the top 50% most cited articles for a field, year and document type. Thus the geometric mean citation count is recommended for future citation-based comparisons between nations. The percentage of a country's articles in the top 1% most cited is a particularly imprecise indicator and is not recommended for international comparisons based on individual fields. Moreover, whereas standard confidence interval formulae for the geometric mean appear to be accurate, confidence interval formulae are less accurate and consistent for percentile indicators. These recommendations assume that the scale parameters of the samples are the same but the choice of indicator is complex and partly conceptual if they are not.

preprint2014arXiv

The role of handbooks in knowledge creation and diffusion: A case of science and technology studies

Genre is considered to be an important element in scholarly communication and in the practice of scientific disciplines. However, scientometric studies have typically focused on a single genre, the journal article. The goal of this study is to understand the role that handbooks play in knowledge creation and diffusion and their relationship with the genre of journal articles, particularly in highly interdisciplinary and emergent social science and humanities disciplines. To shed light on these questions we focused on handbooks and journal articles published over the last four decades belonging to the research area of Science and Technology Studies (STS), broadly defined. To get a detailed picture we used the full-text of five handbooks (500,000 words) and a well-defined set of 11,700 STS articles. We confirmed the methodological split of STS into qualitative and quantitative (scientometric) approaches. Even when the two traditions explore similar topics (e.g., science and gender) they approach them from different starting points. The change in cognitive foci in both handbooks and articles partially reflects the changing trends in STS research, often driven by technology. Using text similarity measures we found that, in the case of STS, handbooks play no special role in either focusing the research efforts or marking their decline. In general, they do not represent the summaries of research directions that have emerged since the previous edition of the handbook.

preprint2014arXiv

Tweets vs. Mendeley readers: How do these two social media metrics differ?

A set of 1.4 million biomedical papers was analyzed with regards to how often articles are mentioned on Twitter or saved by users on Mendeley. While Twitter is a microblogging platform used by a general audience to distribute information, Mendeley is a reference manager targeted at an academic user group to organize scholarly literature. Both platforms are used as sources for so-called altmetrics to measure a new kind of research impact. This analysis shows in how far they differ and compare to traditional citation impact metrics based on a large set of PubMed papers.

preprint2013arXiv

Mapping the network structure of science parks: An exploratory study of cross-sectoral interactions reflected on the web

This study introduces a method based on link analysis to investigate the structure of the R&D support infrastructure associated with science parks in order to determine whether this webometric approach gives plausible results. Three science parks from Yorkshire and the Humber in the UK were analysed with webometric and social network analysis techniques. Interlinking networks were generated through the combination of two different data sets extracted from three sources (Yahoo!, Bing, SocSciBot). These networks suggest that institutional sectors, representing business, universities and public bodies, are primarily tied together by a core formed by research institutions, support structure organisations and business developers. The comparison of the findings with traditional indicators suggests that the web-based networks reflect the offline conditions and policy measures adopted in the region, giving some evidence that the webometric approach is plausible to investigating science park networks. This is the first study that applies a web-based approach to investigate to what extent the science parks facilitate a closer interaction between the heterogeneous organisations that converge in R&D networks. This indicates that link analysis may help to get a first insight into the organisation of the R&D support infrastructure provided by science parks.

preprint2013arXiv

Motivation for hyperlink creation using inter-page relationships

Using raw hyperlink counts for webometrics research has been shown to be unreliable and researchers have looked for alternatives. One alternative is classifying hyperlinks in a website based on the motivation behind the hyperlink creation. The method used for this type of classification involves manually visiting a webpage and then classifying individual links on the webpage. This is time consuming, making it infeasible for large scale studies. This paper speeds up the classification of hyperlinks in UK academic websites by using a machine learning technique, decision tree induction, to group web pages found in UK academic websites into one of eight categories and then infer the motivation for the creation of a hyperlink in a webpage based on the linking pattern of the category the webpage belongs to.

preprint2013arXiv

The entrepreneurial role of the University: a link analysis of York Science Park

This study introduces a structured analysis of science parks as arenas designed to stimulate institutional collaboration and the commercialization of academic knowledge and technology, and the promotion of social welfare. A framework for key actors and their potential behaviour in this context is introduced based on the Triple Helix (TH) model and related literature. A link analysis was conducted to build an inter-linking network that may map the infrastructure support network through the online interactions of the organisations involved in York Science Park. A comparison between the framework and the diagram shows that the framework can be used to identify most of the actors and assess their interconnections. The web patterns found correspond to previous evaluations based on traditional indicators and suggest that the network, which is developed to foster and support innovation, arises from the functional cooperation between the University of York and regional authorities, which both serve as the major driving forces in the trilateral linkages and the development of an innovation infrastructure.

preprint2013arXiv

Tweeting biomedicine: an analysis of tweets and citations in the biomedical literature

Data collected by social media platforms have recently been introduced as a new source for indicators to help measure the impact of scholarly research in ways that are complementary to traditional citation-based indicators. Data generated from social media activities related to scholarly content can be used to reflect broad types of impact. This paper aims to provide systematic evidence regarding how often Twitter is used to diffuse journal articles in the biomedical and life sciences. The analysis is based on a set of 1.4 million documents covered by both PubMed and Web of Science (WoS) and published between 2010 and 2012. The number of tweets containing links to these documents was analyzed to evaluate the degree to which certain journals, disciplines, and specialties were represented on Twitter. It is shown that, with less than 10% of PubMed articles mentioned on Twitter, its uptake is low in general. The relationship between tweets and WoS citations was examined for each document at the level of journals and specialties. The results show that tweeting behavior varies between journals and specialties and correlations between tweets and citations are low, implying that impact metrics based on tweets are different from those based on citations. A framework utilizing the coverage of articles and the correlation between Twitter mentions and citations is proposed to facilitate the evaluation of novel social-media based metrics and to shed light on the question in how far the number of tweets is a valid metric to measure research impact.

preprint2011arXiv

Collective emotions online and their influence on community life

E-communities, social groups interacting online, have recently become an object of interdisciplinary research. As with face-to-face meetings, Internet exchanges may not only include factual information but also emotional information - how participants feel about the subject discussed or other group members. Emotions are known to be important in affecting interaction partners in offline communication in many ways. Could emotions in Internet exchanges affect others and systematically influence quantitative and qualitative aspects of the trajectory of e-communities? The development of automatic sentiment analysis has made large scale emotion detection and analysis possible using text messages collected from the web. It is not clear if emotions in e-communities primarily derive from individual group members' personalities or if they result from intra-group interactions, and whether they influence group activities. We show the collective character of affective phenomena on a large scale as observed in 4 million posts downloaded from Blogs, Digg and BBC forums. To test whether the emotions of a community member may influence the emotions of others, posts were grouped into clusters of messages with similar emotional valences. The frequency of long clusters was much higher than it would be if emotions occurred at random. Distributions for cluster lengths can be explained by preferential processes because conditional probabilities for consecutive messages grow as a power law with cluster length. For BBC forum threads, average discussion lengths were higher for larger values of absolute average emotional valence in the first ten comments and the average amount of emotion in messages fell during discussions. Our results prove that collective emotional states can be created and modulated via Internet communication and that emotional expressiveness is the fuel that sustains some e-communities.

preprint2011arXiv

Negative emotions boost users activity at BBC Forum

We present an empirical study of user activity in online BBC discussion forums, measured by the number of posts written by individual debaters and the average sentiment of these posts. Nearly 2.5 million posts from over 18 thousand users were investigated. Scale free distributions were observed for activity in individual discussion threads as well as for overall activity. The number of unique users in a thread normalized by the thread length decays with thread length, suggesting that thread life is sustained by mutual discussions rather than by independent comments. Automatic sentiment analysis shows that most posts contain negative emotions and the most active users in individual threads express predominantly negative sentiments. It follows that the average emotion of longer threads is more negative and that threads can be sustained by negative comments. An agent based computer simulation model has been used to reproduce several essential characteristics of the analyzed system. The model stresses the role of discussions between users, especially emotionally laden quarrels between supporters of opposite opinions, and represents many observed statistics of the forum.

Mike Thelwall

What is connected

Connect this record

See the researcher in context

Building this map preview

31 published item(s)

Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations' COCI: a multidisciplinary comparison of coverage via citations

A gender equality paradox in academic publishing: Countries with a higher proportion of female first-authored journal articles have larger first author gender disparities between fields

All downhill from the PhD? The typical impact trajectory of US academic careers

Coronavirus research before 2020 is more relevant than ever, especially when interpreted for COVID-19

COVID-19 publications: Database coverage, citations, readers, tweets, news, Facebook walls, Reddit posts

Covid-19 Tweeting in English: Gender Differences

Pot, kettle: Nonliteral titles aren't (natural) science

Should Citations be Counted Separately from Each Originating Section

The rhetorical structure of science? A multidisciplinary analysis of article headings

Are the discretised lognormal and hooked power law distributions plausible for citation data?

Are there too many uncited articles? Zero inflated variants of the discretised lognormal and hooked power law distributions

Citation count distributions for large monodisciplinary journals

Not dead, just resting: The practical value of per publication citation indicators

TensiStrength: Stress and relaxation magnitude detection for social media texts

The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression

Three practical field normalised alternative indicator formulae for research evaluation

Distributions for cited articles from individual subjects and years

Geometric journal impact factors correcting for individual highly cited articles

More Precise Methods for National Research Citation Impact Comparisons

National, disciplinary and temporal variations in the extent to which articles with more authors have more impact: Evidence from a geometric field normalised citation indicator

Regression for citation data: An evaluation of different methods

The influence of time and discipline on the magnitude of correlations between citation counts and quality scores

The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach

The role of handbooks in knowledge creation and diffusion: A case of science and technology studies

Tweets vs. Mendeley readers: How do these two social media metrics differ?

Mapping the network structure of science parks: An exploratory study of cross-sectoral interactions reflected on the web

Motivation for hyperlink creation using inter-page relationships

The entrepreneurial role of the University: a link analysis of York Science Park

Tweeting biomedicine: an analysis of tweets and citations in the biomedical literature

Collective emotions online and their influence on community life

Negative emotions boost users activity at BBC Forum