Detection Method for Distributed Web-Crawlers: A Long-Tail Threshold Model
Joint Authors
Im, Eul Gyu
Ro, Inwoo
Han, Joong Soo
Source
Security and Communication Networks
Issue
Vol. 2018, Issue 2018 (31 Dec. 2018), pp.1-7, 7 p.
Publisher
Hindawi Publishing Corporation
Publication Date
2018-12-04
Country of Publication
Egypt
No. of Pages
7
Main Subjects
Information Technology and Computer Science
Abstract EN
This paper proposes an advanced countermeasure against distributed web-crawlers.
We investigated other methods for crawler detection and analyzed how distributed crawlers can bypass these methods.
Our method can detect distributed crawlers by focusing on the property that web traffic follows the power distribution.
When we sort web pages by the number of requests, most of requests are concentrated on the most frequently requested web pages.
In addition, there will be some web pages that normal users do not generally request.
But crawlers will request for these web pages because their algorithms are intended to request iteratively by parsing web pages to collect every item the crawlers encounter.
Therefore, we can assume that if some IP addresses are frequently used to request the web pages that are located in the long-tail area of a power distribution graph, those IP addresses can be classified as crawler nodes.
The experimental results with NASA web traffic data showed that our method was effective in identifying distributed crawlers with 0.0275% false positives when a conventional frequency-based detection method shows 2.882% false positives with an equal access threshold.
American Psychological Association (APA)
Ro, Inwoo& Han, Joong Soo& Im, Eul Gyu. 2018. Detection Method for Distributed Web-Crawlers: A Long-Tail Threshold Model. Security and Communication Networks،Vol. 2018, no. 2018, pp.1-7.
https://search.emarefa.net/detail/BIM-1214488
Modern Language Association (MLA)
Ro, Inwoo…[et al.]. Detection Method for Distributed Web-Crawlers: A Long-Tail Threshold Model. Security and Communication Networks No. 2018 (2018), pp.1-7.
https://search.emarefa.net/detail/BIM-1214488
American Medical Association (AMA)
Ro, Inwoo& Han, Joong Soo& Im, Eul Gyu. Detection Method for Distributed Web-Crawlers: A Long-Tail Threshold Model. Security and Communication Networks. 2018. Vol. 2018, no. 2018, pp.1-7.
https://search.emarefa.net/detail/BIM-1214488
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references
Record ID
BIM-1214488