In the domain of web security, websites strive to prevent themselves from data gathering performed by automatic programs called bots. In that way, crawler traps are an efficient brake against this kind of programs. By creating similar pages or random content dynamically, crawler traps give fake information to the bot and resulting by wasting time and resources. Nowadays, there is no available bots able to detect the presence of a crawler trap. Our aim was to find a generic solution to escape any type of crawler trap. Since the random generation is potentially endless, the only way to perform crawler trap detection is on the fly. Using machine learning, it is possible to compute the comparison between datasets of webpages extracted from regular websites from those generated by crawler traps. Since machine learning requires to use distances, we designed our system using information theory. We considered widely used distances compared to a new one designed to take into account heterogeneous data. Indeed, two pages does not have necessary the same words and it is operationally impossible to know all possible words by advance. To solve our problematic, our new distance compares two webpages and the results showed that our distance is more accurate than other tested distances. By extension, we can say that our distance has a much larger potential range than just crawler traps detection. This opens many new possibilities in the scope of data classification and data mining.
In 2018, the authors of this article developed a cryptographic mechanism, which was adopted in 2019 as a recommendations on standardization R 1323565.1.028-2019 “Cryptographic mechanisms for secure interaction of control and measuring Devices” by Technical Committee “Cryptographic Information Protection”. These recommendations contain a description of the family of cryptographic protocols designed to produce key information, as well as for the exchange of encrypted information with integrity protection. The article describes the cryptographic mechanisms used in the protocol, their difference from the existing solutions, peculiarities of the key system and methods of authentication of participants in secure interaction. The results of the program implementation developed by the authors will be presented.
In this paper, we present the results of a deep TOR routing protocol analysis from a statistical and combinatorial point of view. We have modelled all possible routes of this famous anonymity network exhaustively while taking different parameters into account with the data provided by the TOR foundation only. We have then confronted our theoretical model with the reality on the ground. To do this, we have generated thousands of roads on the actual TOR network and compared the results obtained with those predicted by the theory. A last step of combinatorial analysis has enabled us to identify critical subsets of Onion routers (ORs) which 33%, 50%, 66% and 75% of the TOR traffic respectively depends on. We have also managed to extract most of the TOR relay bridges which are non public nodes managed by the TOR foundation. The same results as for the ORs have been observed.
Printed documents protection problem against leakage is still one of the relevant. Existing security tools allow us to protect electronic text documents, however are ineffective in protecting their printed versions. This research presents the marking approach for text electronic documents invariant to the print-and-scan transformation. During marker embedding source text line spacing values are changing to the specified values within perceptual invisibility. The watermark extraction is carried out from the images containing text and based on normal Radon transform and Gaussian mixture model. This marking approach is robust to various image transformations and distortions. The accuracy of embedded information extraction was more than 0.98 for 200 DPI images and line spacing value change about 490 micrometers.