In a previous post I tried to explain how to troubleshoot an issue when connecting to a Proxy with Puppeteer investigating API documentations , Chromium flags and all that funny jazz…
This is the succint version of how to use a Proxy with Puppeteer.
using get-free-https-proxy
In this example I am going use get-free-https-proxy
, a small module that returns a list of free HTTPS proxies found on sslproxies.org.
The same applies of course if you already have a Proxy. You would simply use --proxy-server=YOUR_IP:YOUR_PORT
(omit the port if not needed).
Full source code below:
const puppeteer = require('puppeteer')
const getFreeProxies = require('get-free-https-proxy')
;(async () => {
const [proxy1] = await getFreeProxies()
console.log('using proxy', proxy1)
const browser = await puppeteer.launch({
args: [
'--no-sandbox',
`--proxy-server=${proxy1.host}:${proxy1.port}`
],
headless: false,
ignoreHTTPSErrors: true
})
const page = await browser.newPage()
await page.goto('https://ipinfo.io/json')
const content = await page.content()
const serialized = content.substring(content.indexOf('{'), content.indexOf('}') + 1)
console.log(JSON.parse(serialized))
await page.waitFor(5000)
await page.close()
await browser.close()
process.exit(0)
})()
using mega-scraper
mega-scraper is a OSS node.js library built to save some time while scraping any webpage. can be used as a cli and a node.js module with a clean API, based on puppeteer of course.
install via npm i mega-scraper
.
Use in the following way to simply create a browser instance with configured proxy.
By default, a free random proxy is used, scraped from sslproxies.org
(using get-free-https-proxy
).
You can supply your own Proxy address by passing it as an option in the format host:port:
const {browser: {createBrowser}} = require('mega-scraper')
;(async () => {
const browser = await createBrowser({
proxy: true, // or `YOUR_PROXY_IP:YOUR_PROXY_PASSWORD`,
// more options!!
// incognito: true,
// headless: true,
// cookie: '',
// stylesheets: false,
// images: false,
// slowMo: true,
// userAgent: ''
...
})
const page = await browser.newPage()
await page.goto('https://ipinfo.io/json')
const content = await page.content()
const serialized = content.substring(content.indexOf('{'), content.indexOf('}') + 1)
console.log(JSON.parse(serialized))
await page.waitFor(5000)
await page.close()
await browser.close()
process.exit(0)
Authenticating to a Proxy with Puppeteer
To use a proxy that requires authentication, you would need to use await page.authenticate()
found on the official pptr.dev documentation.
const puppeteer = require('puppeteer')
;(async () => {
const browser = await puppeteer.launch({
args: [
'--no-sandbox',
`--proxy-server=YOUR_PROXY_IP:YOUR_PROXY_PORT`
],
headless: false,
ignoreHTTPSErrors: true
})
const page = await browser.newPage()
await page.authenticate({
username: 'YOUR_PROXY_USERNAME',
password: 'YOUR_PROXY_PASSWORD'
})
await page.goto('https://ipinfo.io/json')
const content = await page.content()
const serialized = content.substring(content.indexOf('{'), content.indexOf('}') + 1)
console.log(JSON.parse(serialized))
await page.waitFor(5000)
await page.close()
await browser.close()
process.exit(0)
})()