Posts tagged "scraping"

Tracking HackerNews history in git

Published on
#hackernews #scraping #automation 

First of all: this is based on the idea of https://simonwillison.net/2020/Oct/9/git-scraping/ and https://github.com/simonw/ca-fires-history

The gist:

  • every 20min
    • scrape hackernews frontpage items
    • save them in hn.json
    • commit and push

Find the git repository here at christian-fei/hn-history


Read more

Full list of Chromium Puppeteer flags and command line switches

Published on
#puppeteer #scraping 

After quite a bit of research, I found the full list of Chromium Command Line Switches.


Read more

How to solve Puppeteer Chrome Error ERR_INVALID_ARGUMENT

Published on
#puppeteer #javascript #scraping 

I was encountering this error when trying to set up a puppeteer instance with a proxy.


Read more

How to connect puppeteer to a Proxy

Published on
#puppeteer #javascript #scraping 

In a previous post I tried to explain how to troubleshoot an issue when connecting to a Proxy with Puppeteer investigating API documentations , Chromium flags and all that funny jazz..


Read more

Crawling a web site with browserless, puppeteer and Node.js

Example repository and explanation to a practical crawling with browserless and puppeteer.


Read more

Ultimate web scraping with browserless, puppeteer and Node.js

Browser automation built for enterprises, loved by developers.

browserless.io is a neat service for hosted puppeteer scraping, but there is also the official Docker image for running it locally.

I was amazed when I found out about it 🤯!

Find the whole source code on Github christian-fei/browserless-example!


Read more

Don't miss out on special content for my subscribers.


View the past editions of my newsletter