Интеллектуальный анализ текстов в социальных науках
Throughout most of their history, sociologists have sought to study unstructured organic texts: newspaper materials, diaries, memoirs, letters, documents, and, more recently, messages, publications and other texts on various online platforms. This article discusses how modern techniques of text mining can improve classical sociological approaches to the analysis of this type of data. The article is structured according to the following plan. First, examples of classical quantitative content analysis and its limitations are discussed that could be solved with the help of text mining. Then I discuss how text mining is applied in contemporary social science research with topic modeling and text classification. Finally, I conclude with a discussion of some of the current approaches to text analysis using deep learning, as well as theoretical issues related to the application of text mining
This is the first part of a large survey paper in which we analyze recent literature on Formal Concept Analysis (FCA) and some closely related disciplines using FCA. We collected 1072 papers published between 2003 and 2011 mentioning terms related to Formal Concept Analysis in the title, abstract and keywords. We developed a knowledge browsing environment to support our literature analysis process. We use the visualization capabilities of FCA to explore the literature, to discover and conceptually represent the main research topics in the FCA community. In this first part, we zoom in on and give an extensive overview of the papers published between 2003 and 2011 on developing FCA-based methods for knowledge processing. We also give an overview of the literature on FCA extensions such as pattern structures, logical concept analysis, relational concept analysis, power context families, fuzzy FCA, rough FCA, temporal and triadic concept analysis and discuss scalability issues.
An important text mining problem is to find, in a large collection of texts, documents related to specific topics and then discern further structure among the found texts. This problem is especially important for social sciences, where the purpose is to find the most representative documents for subsequent qualitative interpretation. To solve this problem, we propose an interval semi-supervised LDA approach, in which certain predefined sets of keywords (that define the topics researchers are interested in) are restricted to specific intervals of topic assignments.
Concept discovery is a Knowledge Discovery in Databases (KDD) research field that uses human-centered techniques such as Formal Concept Analysis (FCA), Biclustering, Triclustering, Conceptual Graphs etc. for gaining insight into the underlying conceptual structure of the data. Traditional machine learning techniques are mainly focusing on structured data whereas most data available resides in unstructured, often textual, form. Compared to traditional data mining techniques, human-centered instruments actively engage the domain expert in the discovery process. This volume contains the contributions to CDUD 2011, the International Workshop on Concept Discovery in Unstructured Data (CDUD) held in Moscow. The main goal of this workshop was to provide a forum for researchers and developers of data mining instruments working on issues with analyzing unstructured data. We are proud that we could welcome 13 valuable contributions to this volume. The majority of the accepted papers described innovative research on data discovery in unstructured texts. Authors worked on issues such as transforming unstructured into structured information by amongst others extracting keywords and opinion words from texts with Natural Language Processing methods. Multiple authors who participated in the workshop used methods from the conceptual structures field including Formal Concept Analysis and Conceptual Graphs. Applications include but are not limited to text mining police reports, sociological definitions, movie reviews, etc.
Formal Concept Analysis (FCA) is a mathematical technique that has been extensively applied to Boolean data in knowledge discovery, information retrieval, web mining, etc. applications. During the past years, the research on extending FCA theory to cope with imprecise and incomplete information made significant progress. In this paper, we give a systematic overview of the more than 120 papers published between 2003 and 2011 on FCA with fuzzy attributes and rough FCA. We applied traditional FCA as a text-mining instrument to 1072 papers mentioning FCA in the abstract. These papers were formatted in pdf files and using a thesaurus with terms referring to research topics, we transformed them into concept lattices. These lattices were used to analyze and explore the most prominent research topics within the FCA with fuzzy attributes and rough FCA research communities. FCA turned out to be an ideal metatechnique for representing large volumes of unstructured texts.
Concept Relation Discovery and Innovation Enabling Technology (CORDIET), is a toolbox for gaining new knowledge from unstructured text data. At the core of CORDIET is the C-K theory which captures the essential elements of innovation. The tool uses Formal Concept Analysis (FCA), Emergent Self Organizing Maps (ESOM) and Hidden Markov Models (HMM) as main artifacts in the analysis process. The user can define temporal, text mining and compound attributes. The text mining attributes are used to analyze the unstructured text in documents, the temporal attributes use these document’s timestamps for analysis. The compound attributes are XML rules based on text mining and temporal attributes. The user can cluster objects with object-cluster rules and can chop the data in pieces with segmentation rules. The artifacts are optimized for efficient data analysis; object labels in the FCA lattice and ESOM map contain an URL on which the user can click to open the selected document.
An important text mining problem is to find, in a large collection of texts, documents related to specic topics and then discern further structure among the found texts. This problem is especially important for social sciences, where the purpose is to nd the most representative documents for subsequent qualitative interpretation. To solve this problem, we propose an interval semi-supervised LDA approach, in which certain predened sets of keywords (that dene the topics researchers are interested in) are restricted to specic intervals of topic assignments. We present a case study on a Russian LiveJournal dataset aimed at ethnicity discourse analysis.
Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.
Several approaches to the concept of fatherhood present in Western sociological tradition are analyzed and compared: biological determinism, social constructivism and biosocial theory. The problematics of fatherhood and men’s parental practices is marginalized in modern Russian social research devoted to family and this fact makes the traditional inequality in family relations, when the father’s role is considered secondary compared to that of mother, even stronger. However, in Western critical men’s studies several stages can be outlined: the development of “sex roles” paradigm (biological determinism), the emergence of the hegemonic masculinity concept, inter-disciplinary stage (biosocial theory). According to the approach of biological determinism, the role of a father is that of the patriarch, he continues the family line and serves as a model for his ascendants. Social constructivism looks into man’s functions in the family from the point of view of masculine pressure and establishing hegemony over a woman and children. Biosocial theory aims to unite the biological determinacy of fatherhood with social, cultural and personal context. It is shown that these approaches are directly connected with the level of the society development, marriage and family perceptions, the level of egality of gender order.