Quantitative Analysis: Looking for Commonalities in a Sea of Differences
How are professors paid? Can the "best and brightest" be attracted to the academic profession? With universities facing international competition, which countries compensate their academics best, and which ones lag behind? Paying the Professoriate examines these questions and provides key insights and recommendations into the current state of the academic profession worldwide. Paying the Professoriate is the first comparative analysis of global faculty salaries, remuneration, and terms of employment. Offering an in-depth international comparison of academic salaries in twenty-eight countries across public, private, research, and non-research universities, chapter authors shed light on the conditions and expectations that shape the modern academic profession. The top researchers on the academic profession worldwide analyze common themes, trends, and the impact of these matters on academic quality and research productivity. In a world where higher education capacity is a key driver of national innovation and prosperity, and nations seek to fast-track their economic growth through expansion of higher education systems, policy makers and administrators increasingly seek answers about what actions they should be taking. Paying the Professoriate provides a much needed resource, illuminating the key issues and offering recommendations
Machine learning systems are facing problem of incomparability of their results in case of different languages; one of the subarea here is quantitative analysis of syntax. In this paper, we introduce a new quantitative method based on statistics of words co-occurrence in syntactically tagged corpora. The method allows quantitatively evaluate difference and similarity among languages, select most influential phenomena. Experimental setup consists materials for more than 50 languages. Our experiments demonstrate that the introduced method correctly cluster languages among language families.
Corpus linguistics can be broadly defined in terms of two partially overlapping research dimensions . On the one hand, corpus linguistics is knowledge of how to compile and annotate linguistic corpora. On the other hand, corpus linguistics is a family of qualitative and quantitative methods of language study based on corpus data. The book presents the first steps taken by Russian corpus linguistics toward the development of language corpora and corpus-based resources as well as their use in grammatical and lexical analysis.
The first part of the book focuses on the annotation of Russian texts at several levels: lemmas, part of speech and inflectional forms, word formation, lexical-semantic classes, syntactic dependencies, semantic roles, frames, and lexical constructions. We discuss various theoretical principles and practical considerations motivating the corpus markup design, provide details on the creation of lexical resources (electronic dictionaries and databases) and text processing software, and consider complicated cases that present challenges for the annotation of corpora both manually and automatically. In most cases we describe the annotation of the Russian National Corpus (RNC, ruscorpora.ru) and its affiliate project FrameBank (framebank.ru).
Frequency data depend not only on the representativeness and balance of texts in a corpus, but also on the rules and tools used for annotation. The book addresses the development of evaluation standards for Russian NLP resources, namely, morphological taggers and dependency parsers. In addition, the book presents several experiments on automatic annotation and disambiguation: lemmatization of word forms not in the dic- tionary; word sense disambiguation based on vectors formed by lexical, semantic and grammatical cues of context; and semantic role labeling.
The final chapters of the first part of the book outline two types of frequency dictionaries based on the RNC data: a general-purpose frequency dictionary and a lexico-grammatical one.
The second part of the book presents an analysis of corpus data and includes a number of case studies of Russian grammar and lexical-grammatical interaction using quantitative methods. The key concept underlying our analysis is the behavioral profile (Hanks 1996; Divjak, Gries 2006), which is the frequency distribution of variable elements in a linguistic unit as attested in a corpus. This covers grammatical profiles (the frequency distribution of inflected forms of a word), constructional profiles (the frequency distri- bution of argument or any other constructions attested for a key predicate), lexical and semantic profiles (the frequency distribution of words and lexical-semantic classes in construction slots or, more generally, in the context of a word), and radial category profiles (the frequency distribution of word senses and word uses across the radial category network of a polysemous unit). We use grammatical, constructional, semantic, and radial category profiling to study tense, aspect and mood specialization of Russian verb forms; to identify singular-oriented and plural-oriented nouns; to investigate factors for prefix choice and prefix variation in natural perfectives (chistovidovye perfectivy); to analyze constraints on the filling of slots in a construction and how this affects the meaning of the construction, taking as an example the Genitive construction of shape and the spatial construction with the preposition poverkh ‘up and over’.
The quantitative corpus-based techniques used for the analysis vary from simple descriptive statistics (e. g., absolute frequencies, percentages, measures of the central ten- dency and outliers) to exact Fisher test and logistic regression. We claim that the vector modeling approaches to quantitative grammatical studies in theoretical linguistics are no less effective than in computational linguistics, where they have become a standard tool.
The performed cross-national tests with negative binomial regression models support the presence of a curvilinear relationship between the quantitative expansion of education (measured with mean years of schooling) and terrorist attack intensity. Growth of schooling in the least educationally developed countries is associated with a significant ten- dency towards the growth of terrorist attack intensity. This tendency remains significant when controlled for income level, type of political regime, unemployment, inequality, and urbanization; wherein the peak of the terrorist attack intensity is observed for a relatively low, but not zero level of the quantitative expansion of formal education (approximately three to six years of schooling). Further growth of schooling in more developed countries is associated with a significant trend toward the decrease of terrorist attack intensity. This tendency remains significant after being controlled for income level, political regime, unemployment, inequality, and urbanization. The most radical decrease is observed for the interval between seven and eight mean years of schooling. In addi- tion, this quantitative analysis indicates the presence of a similar curvi- linear relationship between GDP per capita and terrorist attack intensity with a wide peak from $4000 to $14,000. The explanation of a curvilinear relationship between GDP per capita and terrorist activity through mean years of schooling intermediary can only be partial. The regression ana- lysis suggests that the growth of mean years of schooling with economic development of middle and high income countries may really be one of the factors accounting for the decrease of terrorist attacks in countries with GDP per capita growth. However, this regression analysis indicates that a partial role in the explanation of negative correlation between GDP per capita and terrorist attack intensity for middle and high income countries is also played by a lower level of unemployment rate in the high income countries, as well as by a very high share of consolidated democracies and an extremely low share of factional democracies among the high income states. It is especially worth noting that after the intro- duction of all controls, the coefficient sign for per capita GDP changes from negative to positive, i.e., GDP growth in middle and high income countries after the introduction of controls for inequality, education, unemployment, type of regime, etc. turns out to be a factor of increase rather than decline of the intensity of terrorist activity. On the one hand, this suggests that the negative correlation between per capita GDP and the level of terrorist activity in these countries is actually explained to an extremely high degree by the fact that per capita GDP growth here tends to be accompanied by an increase in the educational level of the popula- tion, a decrease in unemployment, a reduction in inequality, a decrease in the number of factional democracies, and an increase in the number of consolidated democracies. On the other hand, the positive sign (with a statistically significant correlation) indicates here that if in the middle and high countries economic growth is not accompanied by an increase in economic equality and education of the population, a decrease in unemployment, a decrease in the number of unstable factional democ- racies, and an increase in the number of consolidated democracies (that is, if in fact all the fruits of economic growth are captured by the elites, and almost nothing gets from this growth to the commoner population), then such economic growth would tend to lead to an increase in terrorist activity (and not to its reduction).
The article analyzes relative deprivation as a possible factor of sociopolitical instability during the Arab Spring events using the methods of correlation and multiple regression analysis. In this case, relative deprivation is operationalized in two ways: (a) through the indicator of subjective feeling of happiness on the eve of the events of the Arab Spring, and (b) through the scale of decrease of the subjective feeling of happiness on the eve of the events of Arab Spring. It is shown that the change in the level of subjective feeling of happiness between 2009 and 2010 is a powerful, statistically significant predictor of the level of destabilization in Arab countries in 2011. The next most powerful predictor is the mean value of the subjective feeling of happiness in the corresponding country for 2010. At the same time, the fundamental economic indicators we tested, while controlling for them, have turned out to be extremely weak and at the same time statistically insignificant predictors of the level of sociopolitical instability in the Arab countries in 2011.