What To Look For When You Hire A Web Scraping Agency?

If you don’t have in-house resources like programmers and data managers, you have the option of using a cloud-based web crawling company like Octoparse.

But if you need someone to manage everything fully, maybe you can look for an agency with a good track record.

Web crawling and scraping is a lot about the ability to tame the chaos, and a lot of it is not under your control. Websites change code, change their navigation, put up restrictions, may even block your IP if you are not using rotating proxies like Proxies API, the network speeds go up and down — these are just the realities in the world of web scraping.

Here are the top 3 most important things to look for in an agency if you want:

a. A proven track record ideally looks for their presence on Upwork. It is a typical profile on Upwork. The most important I look for is a combination of high ratings (4.5 and above), enough experience (total hours worked), and a near 100% completion rate. You don’t want an agency abandoning your work midway ever. Read the customer reviews to get an idea of the vibe the team has with its customers.

b. The combination of skill sets — Web crawling is not just about coding; there will be a fair bit of manual work and wrangling with data. You can check the kind of skills they have in their team.

c. It’s ideal if they have listed web crawling or web scraping as a part of the leading service description. Like these guys.

Once you have shortlisted a few, you can ask them a few questions to further get a sense of the level of competence.

And here are 5 questions you can ask and notice how comfortable they are with the challenges they pose. Are they vague, struggling for clear answers, or are they ready for you and even impressed that you asked them these questions? So here they are:

a. Ask them what measures will they take to overcome restrictions like CAPTCHAs, rate limits, code changes, etc.

b. Go for the Jugular — Ask them how they will deal with a situation where the website IP blocks them.

c. What checks and measures have you built in to know if the web crawler is working as it should?

d. What framework do they use to build on top of and why: You don’t want them reinventing the wheel from scratch. It will NOT go well. Frameworks like Scrapy offer abstractions to control concurrency, support multiple spiders, CSS selectors for scraping, automatic link extractors, cookies support, etc.

e. Ask them how they will get data that is rendered by Javascript.

The author is the founder of Proxies API, a proxy rotation api service.