SOTAVerified

EasySpider: A No-Code Visual System for Crawling the Web

2023-04-30ACM The Web Conference 2023Code Available7· sign in to hype

Naibo Wang, Wenjie Feng, Jianwei Yin, See-Kiong Ng

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

The web is a treasure trove for data that is increasingly used by computer scientists for building large machine learning models as well as non-computer scientists for social studies or marketing analyses. As such, web-crawling is an essential tool for both computational and non-computational scientists to conduct research. However, most of the existing web crawler frameworks and software products either require professional coding skills without an easy-to-use graphic user interface or are expensive and limited in features. They are thus not friendly to newbies and inconvenient for complicated web-crawling tasks. In this paper, we present an easy-to-use visual web crawler system, EasySpider, for designing and executing web crawling tasks without coding. The workflow of a new web crawling task can be visually programmed by following EasySpider’s visual wizard on the target webpages using an intuitive point-and-click interface. The generated crawler task can then be easily invoked locally or as a web service. Our EasySpider is cross-platform and flexible to adapt to different web-resources. It also supports advanced configuration for complicated tasks and extension. The whole system is open-sourced and transparent for free-access at GitHub, which avoids possible privacy leakage.

Tasks

Reproductions