Skill-based clustering algorithm for online job advertisements

A. Ternikov

doi:10.18500/1816-9791-2022-22-2-250-265

?

Skill-based clustering algorithm for online job advertisements

Известия Саратовского университета. Новая серия. Серия: Математика. Механика. Информатика. 2022. Vol. 22. No. 2. P. 250–265.

Ternikov A.

Clustering on the basis of categorical data is one of the challenging problems in data mining. The paper provides the clustering algorithm for job vacancies using information about the skills required. In the first step, the procedure of unstructured textual information standardization is proposed. The resulting procedures include stages of synonyms and general terms identification based on the combination of TF-IDF and 𝑛-grams approaches for translated and transliterated terms. Then, the algorithm is provided and validated on the data obtained from the cross-regional hiring platform. The algorithm provides validation of clusters’ extraction, including hierarchical cluster analysis and Girvan – Newman coalition search. Output number of clusters is verified with internal validity scores and suggests disjoint sets of terms that describe particular job occupation groups in the IT sector. Based on obtained clusters well-matched and mismatched terms are identified using Silhouette scores. Given procedures allow to minimize human involvement in clustering itself and produce reasonable clusters for the following interpretation and analysis. In general, the approach for clusters identification based on categorical data is provided and tested on a sample of online job advertisements. It has a high potential in use for feature engineering tasks in machine learning research and applied labor market research in economics.

Research target: Computer Science Economics and Management

Language: English

DOI

Keywords: online job advertisements

Технология нейросетевого моделирования и обзор работ Пермской научной школы искусственного интеллекта

Yasnitsky L., Черепанов Ф. М., Богданов К. В., Фундаментальные исследования 2013 № 1-3 С. 736–740

Излолжена технология и описан опыт применения метода нейросетевого математического моделирования в работах Пермской научной школы искусственного интеллекта: в промышленности, в экономике, в политологии, в социологии, в медцине, в криминалистике и др. Отмечен факт выявления с помощью нейросетей новых научных знаний. ...

Added: November 18, 2013

Simulation of human crowd behavior in extreme situations

Akopov A. S., Beklaryan L., International Journal of Pure and Applied Mathematics 2012 Vol. 79 No. 1 P. 121–138

In this research work is presented the approach to modeling of the crowd behavior (ensemble) in extreme situations based on methods of an agent simulation. The main feature of the approach is the taking into account the dynamics of each agent in researched ensemble. It is important to note, the effect of the full or partial ...

Added: September 26, 2012

Построение доверительного множества связанных акций фондового рынка

Koldanov A. P., Koldanov P., Semenov D., Журнал Новой экономической ассоциации 2021 Т. 2 № 50 С. 12–34

. The problem of analysis of pairwise connections between stocks of financial market by observations on stock returns is considered. Such problem arise in stock market network analysis. It is assumed that joint distribution of stock returns belongs to the wide class of elliptical distributions. Classical Pearson correlation, Fechner correlation and Kendall correlation are used ...

Added: June 17, 2021

Block by block: a bibliometric analysis of blockchain in real estate

Wang F., Journal of information systems engineering & management 2023 Vol. 8 No. 2 Article 21498

Blockchain technology is a novel and disruptive innovation that has the potential to transform various industries, such as finance, supply chain, and healthcare. However, the application and impact of blockchain technology in real estate remain largely unexplored. This study aims to investigate the characteristics, development, and structure of the research field of blockchain in real ...

Added: November 13, 2023

Труды Международного симпозиума «НАДЕЖНОСТЬ И КАЧЕСТВО»: в 2 т.

Пенза: ПГУ, 2015

В сборник трудов включены доклады юбилейного ХХ-го Международного симпозиума «Надежность и качество», проходившего с 25 по 31 мая 2015 г. в городе Пензе. Рассмотрены актуальные проблемы теории и практики повышения надежности и качества; эффективности внедрения инновационных и информационных технологий в фундаментальных научных и прикладных исследованиях, образовательных и коммуникативных системах и средах, экономике и юриспруденции; методов и ...

Added: May 31, 2015

Моделирование риска рыночной ликвидности с учетом глубины рынка

Naumenko V., / Высшая школа экономики. Серия WP16 "Финансовая инженерия, риск-менеджмент и актуарная наука". 2007. № 04.

Прежде всего, концепция ликвидности применима как к рынкам, так и к отдельным компаниям. Ликвидность фирмы (liquidity of firms) относится к способности компании согласовывать входящие и исходящие денежные потоки для обеспечения своевременного погашения принятых на себя обязательств. Финансовый институт, сталкивающийся с проблемами невыполнения в срок платежей по своим обязательствам, подвергается так называемому риску ликвидности фондирования (funding ...

Added: February 15, 2013

Importance of Information Technology in Reaching HR Effectiveness: Example of Local and International Banks in Russia

Prosvirkina E. Y., Humanity & Social Sciences Journal 2013 No. 8 (1) P. 35–40

It is often supposed that the usage of technology applications increases effectiveness. The current research is devoted to the analysis of information technology in human resource management of the Russian banking industry and its influence on the organizational performance of banks. The semi-structured interviews with HR Directors of both international and local banks were conducted ...

Added: September 23, 2013

Health Care Information Technologies Innovation in Russia: Comparative Analysis and Measuring

Serova E., Guryeva I., Khvatova T., International Journal of Technology and Human Interaction 2018

In such a socially significant sphere as healthcare industry, innovation activity has become vital especially in such areas as automation of physician working place, creation of unified electronic medical record, distribution of intelligent decision support systems for medical solutions, application and wide dissemination of new medical technologies, telemedicine development. The intersection of medicine and ICT ...

Added: March 15, 2018

Применение нейросетевого индикатора тренда в анализе стоимости нефтяных фьючерсов 2014 г.

Kryuchkov M., Rusakov S. V., Вестник Ижевского государственного технического университета 2015 № 2(66) С. 110–112

This paper describes the results of testing the neuronal technical trend indicator according to the exchange rate of Brent oil in 2014. Testing of the model was carried out on three time series, which characterized by their features. ...

Added: August 31, 2015

Dominant, Weakly Stable, Uncovered Sets: Properties and Extensions

Subochev A., / Высшая школа экономики. Series WP7 "Математические методы анализа решений в экономике, бизнесе и политике". 2008. No. 3.

Twelve sets, proposed as social choice solution concepts, are compared: the core, five versions of the uncovered set, two versions of the minimal weakly stable sets, the uncaptured set, the untrapped set, the minimal undominated set (strong top cycle) and the minimal dominant set (weak top cycle). The main results presented are the following. A ...

Added: December 26, 2012

Об устойчивости результатов для правил агрегирования

Karabekyan D., Журнал Новой экономической ассоциации 2022 № 5(57) С. 24–37

Some distortions are possible in the process of preference aggregation. For example, one voter who is pivotal for some preference profi le may not read instructions properly and accidently submit wrong preference. We study how different voting rules react to these distortions for three, four and fi ve alternatives with computer modelling. One of the ...

Added: January 17, 2023

Параллельные вычисления и задачи управления (PACO 2010). Труды 5-й международной конференции

М.: ИПУ РАН, 2010

Труды Международной конференции "Параллельные вычислени изадачи управления" РАСО`10. ...

Added: March 4, 2013

Аналитические технологии в обучении логистов

Zakhodiakin G., Novikov V. E., Логистика сегодня 2013 № 1 С. 56–61

В статье приводится обоснование необходимости ознакомления студентов вузов — будущих логистов с возможностями применения аналитических систем для решения задач логистики. Авторы подробно описывают, как данный подход реализуется в НИУ ВШЭ. ...

Added: November 26, 2013

Аналитическое обоснование выбора источника финансирования лизинговой деятельности

Kravchenko T. K., Аудит и финансовый анализ 2013 № 5 С. 352–357

At present analytical justification of strategic decisions of leasing companies are quite rare in occurrence. The main reason is the lack of market of special software products – decision support systems (DSS), which may be applied for deliberate decision making. At present the most common are decision support systems that use methods advanced by Thomas ...

Added: November 12, 2013

Пятая Международная конференция «Системный анализ и информационные технологии» САИТ-2013 (19–25 сентября 2013 г., г.Красноярск, Россия): Труды конференции. В 2-х т.

Красноярск: ИВМ СО РАН, 2013

Труды Пятой Международной конференции «Системный анализ и информационные технологии» САИТ-2013 (19–25 сентября 2013 г., г.Красноярск, Россия): ...

Added: November 18, 2013

Нефтегаз: сланцы, интеллект, ИТ

Марина Полякова, Директор информационной службы 2013 № 5 С. 20–25

We analyze the situation in the oil and gas market. New technologies, foreign consumers, environmental requirements significantly affect on the development of russian energy sector. Companies need to increase efficiency, reduce costs, automate operational processes. ...

Added: November 17, 2013

Совершенствование функционирования логистической сети производственной компании с применением имитационного моделирования

Lychkina N. N., Глазков Д. Н., Хоружевская А. П., Логистика и управление цепями поставок 2014 Т. 05 № 64 С. 57–63

В статье демонстрируется применение метода имитационного моделирования в среде AnyLogic в целях оптимизации логистической сети промышленной компании, работающей на рынке промышленных газов. В модели детализированы основные процессы, связанные с транпортировкой продукта, обслуживанием заявок клиентов и распределением продукта потребителям. Были учтены факторы, носящие стохастический характер, такие как неритмичность производства, погодные условия, влияющие на уровень спроса, количество ...

Added: November 30, 2014

Proceedings of the 2016 SAI Intelligent Systems Conference (INTELLISYS)

L.: IEEE, 2016

IntelliSys 2016 conference will focus on areas of intelligent systems and artificial intelligence and how it applies to the real world. It is an opportunity for researchers in this field to meet and discuss solutions, scientific results, and methods in solving important problems in this field. Conference Topics include, but are not limited to: Artificial ...

Added: February 25, 2017

Новое в науке и образовании: Ежегодная международная научно-практическая конференция.

М.: МАКС Пресс, 2016

This volume includes the research articles of the participants of the Annual international scientific-practical conference «Innovations in science and education», which took place in the Educational private institution of higher education «International Jewish Institute of Economics, Finance and Law». The works touch on the wide range of topics that were discussed in the following sections ...

Added: January 12, 2017

Стационарные режимы в модели Хенинга и ее модификациях

Бекларян Л. А., Makarov V., Машинное обучение и анализ данных 2015 Т. 10 С. 1385–1395

The Henning model of population behavior and its modifications are considered. Modifications of the model are made to overcome some disadvantages of Henning model, which are connected to death of the whole population. This subject is important to study, since such phenomenons may be observed as in unexplored wilderness and in human civilization. Another one ...

Added: February 21, 2015

Использование метода главных компонент для анализа надежности цепей поставок

Kuznetsov V. O., Логистика и управление цепями поставок 2018 № 4 (87) С. 27–33

One of the options for a more flexible approach to analyzing the reliability of supply chains is the principal component analysis (PCA). With a large number of variables describing supply chain, it is a difficult task to analyze the structure of variables in two-dimensional space. Within the analysis of the variables dependencies PCA allows to ...

Added: November 29, 2018

Анализ функционирования цепей поставок сетевых розничных компаний с использованием модифицированной Бостонской матрицы

Придворова Е. Э., Sorsunova L. A., Логистика и управление цепями поставок 2009 № 5(34) С. 12–20

В статье рассматривается сравнительный анализ периодов для обнаружения трендов в объемах и структуре продаж в контексте различных товарных категорий и поставщиков товаров, с целью возможной коррекции в организации цепей поставок. Анализ розничных продаж в сетевых розничных компаниях проводится с использованием информационных систем класса BI (Business Intelligence) и OLAP (On-Line Analytical Processing) технологий. В статье использована ...

Added: February 11, 2013

Система менеджмента знаний в стратегическом управлении университетом

Dneprovskaya N., Шевцова И. В., Бизнес-информатика 2023 Т. 17 № 2 С. 20–40

The purpose of this study is a conceptual description of the implementation of knowledge management systems (KMS) as a mechanism for universities’ strategic development. Knowledge management (KM) practice from around the world proved the positive influence of KMS on productivity of educational institutions. The theoretical provisions and concept for KMS are determined based on an ...

Added: August 2, 2023

Система предотвращения мошенничества как составляющая кредитного конвейера

Levin V., Козлов Д. Н., Банковское кредитование 2013 Т. 48 № 2 С. 15–25

Система предотвращения мошенничества (внутреннего и внешнего) мошенничества при потребительском кредитовании нацелена на выявление искажений персональных данных самими клиентами и/или с участием лиц вне банка (организованных преступных группировок, "черных(серых)" брокеров), а также для выявления фактов внутреннего мошенничества с участием сотрудников банка. Упор делается на индикаторы мошенничества, построенные на основе алгоритма нечетких совпадений. В системе предусмотрены возможности ...

Added: November 24, 2013