First of all: this is based on the idea of https://simonwillison.net/2020/Oct/9/git-scraping/ and https://github.com/simonw/ca-fires-history

The gist:

  • every 20min
    • scrape hackernews frontpage items
    • save them in hn.json
    • commit and push

Find the git repository here at christian-fei/hn-history

The schedule is done with the following GitHub Workflow file:

name: scrape hn data

on:
  push:
  workflow_dispatch:
  schedule:
    - cron:  '5,25,45 * * * *'

jobs:
  scheduled:
    runs-on: ubuntu-latest
    steps:
    - name: checkout
      uses: actions/[email protected]
    - name: fetch hndata
      run: |-
        npx @christian_fei/hn --json | jq . > hn.json
    - name: Commit and push if it changed
      run: |-
        git config user.name "Automated"
        git config user.email "[email protected]"
        git add -A
        timestamp=$(date -u)
        git commit -m "Latest data: ${timestamp}" || exit 0
        git push

The data is scraped with @christian_fei/hn using the --json flag.

Read more about it here

Git diff FTW

The diff looks actually quite comprehensive:

   {
     "title": "Hooking Up Our Custom OS to a Standard Library",
     "url": "https://blog.stephenmarz.com/2020/10/25/hooking-up-our-custom-os-to-a-standard-library/",
-    "upvotes": 18,
+    "upvotes": 22,
     "author": "azhenley",
     "comments": null,
     "link": "https://news.ycombinator.com/item?id=25008953"
   },
+  {
+    "title": "HP ends its customers' lives [redefines 'lifetime' deal]",
+    "url": "https://pluralistic.net/2020/11/06/horrible-products/#inkwars",
+    "upvotes": 143,
+    "author": "samizdis",
+    "comments": 66,
+    "link": "https://news.ycombinator.com/item?id=25008894"
+  },
   {
     "title": "Gron – Make JSON Greppable",
     "url": "https://github.com/tomnomnom/gron",
-    "upvotes": 247,
+    "upvotes": 252,
     "author": "capableweb",
-    "comments": 58,
+    "comments": 59,
     "link": "https://news.ycombinator.com/item?id=25006277"
   },
   {
     "title": "Double Robotics",
     "url": "https://angel.co/company/double-robotics/jobs/73583-operations-and-supply-chain-manager"
   },
-  {
-    "title": "HP ends its customers' lives [redefines 'lifetime' deal]",
-    "url": "https://pluralistic.net/2020/11/06/horrible-products/#inkwars",
-    "upvotes": 118,
-    "author": "samizdis",
-    "comments": 47,
-    "link": "https://news.ycombinator.com/item?id=25008894"
-  },
   {
     "title": "Ask HN: As a person, what can I do to improve a city?",
     "url": "item?id=25007697",
-    "upvotes": 231,
[diff] Changes to 'hn.json' - line 74 of 346                                                                                                                                                                                               34%