NodeCrawler Vs. Proxies API

The world of web scraping is varied and complex, and Proxies API sits at one of the most crucial junctions. They are allowing web scrapers/crawlers to bypass IP blocks by using a single API endpoint to access our 20 million-plus high-speed proxies on rotation.

Example:

curl "http://api.proxiesapi.com/?auth_key=YOUR_KEY&url=URL"

One of the questions we get frequently is how we are different from services like OctoParse or Diffbot. Many times it is like comparing Apples and Oranges. Still, when we send this comparison table to our customer’s developer team, their CXO, their marketing, or SEO team, they typically get it quite quickly if we are a convenient service or not.

So here is how we are different from NodeCrawler.

This powerful crawling and scraping package for Node Js allows server-side DOM and injection of JQuery and has queueing support with controllable pool sizes, priority settings, and rate limit control.

It’s great for working with bottlenecks like rate limits that many websites impose.

Here is an example that does that.

var crawler = require('crawler');var c = new Crawler({
rateLimit: 2000,
maxConnections: 1,
callback: function(error, res, done) {
if(error) {
console.log(error)
} else {
var $ = res.$;
console.log($('title').text())
}
done();
}
})
// if you want to crawl some website with 2000ms gap between requests
c.queue('')
c.queue('')
c.queue('')
// if you want to crawl some website using proxy with 2000ms gap between requests for each proxy
c.queue({
uri:'',
limiter:'proxy_1',
proxy:'proxy_1'
})
c.queue({
uri:'',
limiter:'proxy_2',
proxy:'proxy_2'
})
c.queue({
uri:'',
limiter:'proxy_3',
proxy:'proxy_3'
})
c.queue({
uri:'',
limiter:'proxy_1',
proxy:'proxy_1'
})

link http://nodecrawler.org/#basic-usage

NodeCrawler vs. Proxies API

The blog was originally posted at: https://www.proxiesapi.com/blog/nodecrawler-vs-proxies-api.html.php