Selective Web Scraping

Over the last few weeks, my research has led me to web scraping tools.

I prefer web scrapers that rely on low-code, no-code, or Chrome Extensions. I’m not interested in learning beyond Python’s rudiments.

What I noticed is web scraping tools and the people behind them like to download hordes of data. This creates a new, bigger problem - dirty data.

Dirty data requires extensive data scrubbing. Scrubbing can outweigh the benefits of scraping.

Save time acquiring data; lose more time cleaning and sorting it.

I prefer clean, organized, uniform data in all that I do.

These two points led me to scraping tools that privilege selection and control over masses of unorganized data.

A slower scrape that affords me control of the extraction and ensuring data organization. Minimizes cleaning and organizing.

Two Chrome Extensions I’ve found useful for this approach:

  1. Link Klipper to extract ahref data - a Chrome Extension
  2. Simple Mass Downloader - to extract files via selected links - a Firefox Extension
