Colly Vs. Proxies API

The world of web scraping is varied and complex, and Proxies API sits at one of the most crucial junctions. They are allowing web scrapers/crawlers to bypass IP blocks by using a single API endpoint to access our 20 million-plus high-speed proxies on rotation.

Example:

curl "http://api.proxiesapi.com/?auth_key=YOUR_KEY&url=URL"

One of the questions we get frequently is how we are different from services like OctoParse or Diffbot. Many times it is like comparing Apples and Oranges. Still, when we send this comparison table to our customer’s developer team, their CXO, their marketing, or SEO team, they typically get it quite easily if we are a convenient service or not.

So here is how we are different from Colly.

Colly is a super fast and scalable and extremely popular spider/scraper.

it supports web crawling, rate limiting, caching, parallel scraping, cookie, and session handling and distributed scraping

Here is an example of fetching 2 URLs in parallel.

package mainimport (
"fmt"
"github.com/gocolly/colly/v2"
"github.com/gocolly/colly/v2/queue"
)
func main() {
url := ""
// Instantiate default collector
c := colly.NewCollector(colly.AllowURLRevisit())
// create a request queue with 2 consumer threads
q, _ := queue.New(
2, // Number of consumer threads
&queue.InMemoryQueueStorage{MaxSize: 10000}, // Use default queue storage
)
c.OnRequest(func(r *colly.Request) {
fmt.Println("visiting", r.URL)
if r.ID < 15 {
r2, err := r.New("GET", fmt.Sprintf("%s?x=%v", url, r.ID), nil)
if err == nil {
q.AddRequest(r2)
}
}
})
for i := 0; i < 5; i {
// Add URLs to the queue
q.AddURL(fmt.Sprintf("%s?n=%d", url, i))
}
// Consume URLs
q.Run(c)
}

Colly vs. Proxies API

The blog was originally posted at : https://www.proxiesapi.com/blog/colly-vs-proxies-api.html.php