Scrapy Vs Proxies API

Mohan Ganesan
2 min readJun 12, 2020

The world of web scraping is varied and complex and Proxies API sits at one of the most crucial junctions. Allowing web scrapers/crawlers to bypass IP blocks by using a single API endpoint to access our 20 million-plus high-speed proxies on rotation.

Example:

curl "http://api.proxiesapi.com/?auth_key=YOUR_KEY&url=URL"

One of the questions we get frequently is how we are different from services like OctoParse or Diffbot. Many times it is like comparing Apples and Oranges but when we send this comparison table to our customer’s developer team, their CXO, their marketing or SEO team, they typically get it quite easily if we are a suitable service or not.

So here is how we are different from Scrapy.

Scrapy is an extremely powerful crawling and scraping library written in Python.

Here is how easy it is to create a multi-threaded crawler and parse it at a single endpoint.

import scrapyclass MySpider(scrapy.Spider):
name = 'example.com'
allowed_domains = ['example.com']
start_urls = [
'',
'',
'',
]
def parse(self, response):
for h3 in response.xpath('//h3').getall():
yield {"title": h3}
for href in response.xpath('//a/@href').getall():
yield scrapy.Request(response.urljoin(href), self.parse)

and to scrape, it allows both XPath and CSS selectors.

>>> response.xpath('//span/text()').get()
'good'
>>> response.css('span::text').get()
'good'

Scrapy vs Proxies API

The blog was originally posted at https://www.proxiesapi.com/blog/scrapy-vs-proxies-api.html.php

--

--