Jason Frasca |||

Selective Web Scraping

Over the last few weeks, my research has led me to web scraping tools.

I prefer web scrapers that rely on low-code, no-code, or Chrome Extensions. I’m not interested in learning beyond Python’s rudiments.

What I noticed is web scraping tools and the people behind them like to download hordes of data. This creates a new, bigger problem - dirty data.

Dirty data requires extensive data scrubbing. Scrubbing can outweigh the benefits of scraping.

Save time acquiring data; lose more time cleaning and sorting it.

I prefer clean, organized, uniform data in all that I do.

These two points led me to scraping tools that privilege selection and control over masses of unorganized data.

A slower scrape that affords me control of the extraction and ensuring data organization. Minimizes cleaning and organizing.

Two Chrome Extensions I’ve found useful for this approach:

  1. Link Klipper to extract ahref data - a Chrome Extension
  2. Simple Mass Downloader - to extract files via selected links - a Firefox Extension
Up next Going Global: Entrepreneurship Abroad at Montclair State University November 2020 I had the honor and privilege moderating a panel discussion on International entrepreneurship. Panel members: Susanne Klepsch of Meet Trust At A Glance - TAAG In rifling through the comments of a Product Hunt post I noticed an unfamiliar acronym: Competence At A glance [CAAG]. A quick Google search did not
Latest posts Long Overdue Update Tools To Name Your Startup Company Only Good has Come From Learning Search Engine Optimization EMCee of the 2021 Startup Montclair Pitch Competition Journaling is the Game Film of Your Work No Such Thing as an Overnight Success Keyword Research to Determine Supply and Demand Building Long IF Statements in Google Sheets Thoughts: Super Wildcard Weekend 2021 Website Images: The Hidden Sunk Cost Fall 2020 Semester Recap - No Code ROT13 to Obfuscate for Better Search Results 2020 MIX Lab - A Year in Review Differences in SEO Tool Evaluations Improvements to the 3D Printing Lab’s Network Trust At A Glance - TAAG Selective Web Scraping Going Global: Entrepreneurship Abroad at Montclair State University Drafts Action Posts to Blot Write Once Use Twice Research Ways To Present My Newsletter Podcast Editing Migraine Journal in Airtable Pandemic Network Effect Helping Local Business During the Pandemic Getting the Copyright Right Juxtaposed Setting Up The Day With Intention How I Measure Sprint Progress What Will The Kids Do This Summer? Power of the Daily Standup




© 2020 Jason M. Frasca