This study investigates “key statistical instruments”, such as the mean or the sum, used in obtaining numeric polarity scores in lexicon-based tools for sentiment analysis. First, a large number of texts rated for sentiment intensity by independent human judges was collected. Next, 15 different sentiment lexicons were used to generate sets of numeric values for each of the texts. Then, the key statistical instruments were calculated on the basis of these results and compared with the corresponding human scoring using tests for association between paired samples. The results of these tests were further examined with the use of ANOVA and Tukey HSD post-hoc analysis. The broad conclusion drawn from the analysis is that the mean, all other things being equal, is the most reliable key statistical instrument for obtaining numeric polarity scores that are similar to scores provided by human assessors. These results may be of particular importance for both developers of lexiconbased programs performing sentiment analysis and users of such software packages
Wydawnictwo Uniwersytetu Jana Kochanowskiego w Kielcach
oai:bibliotekacyfrowa.ujk.edu.pl:8009 ; doi:10.25951/4827
Token : A Journal of English Linguistics
Feb 14, 2023
Feb 13, 2023
20
https://bibliotekacyfrowa.ujk.edu.pl/publication/4827