5 Web Crawling Habits You Should Adopt.

I am just going to preach it to you. Here is the list of frameworks you HAVE to know about and study because if you don’t and call yourself a programmer that knows web scraping, well, there is something wrong with you.

a. Scrapy — please, for the love of GOD, don’t take a step in any direction before you have explored what this can do for you… let it open your mind to what is out there, the kind of abstractions and weaponry the community has created before you venture out on your own armed with your cute CURL calls and stupid RegEx patterns.

b. Puppeteer — Whatever a human can do, you can simulate that with Puppeteer. Do you see those services that take snapshots of entire webpages, fully rendered in their evening best? Ever wondered how they do it? It is by using Puppeteer man. Go know it now before you read another line or waste another breath.

c. There is a lot more. But I know you are not listening to me. Just do this for now. I have low expectations for you!

2. Take these tests to ramp up your web crawling genius.

3. Ask yourself — How can I double the speed of my web crawler?

The path should take you to something like Scrapyd. When you come to it, tell them my name, and they will set you right up.

4. Just learn XPath and CSS selectors if you haven’t already. That's how you get good at extracting data. XPath is confusing, to be honest, just force yourself to learn it, man. It's your life raft!

5. I can't remember the 5th…. I KNOW I SAID THERE ARE 5 !@^$#@… Just do the 4 for now, will you?

Sorry, I am in a bad kinda way today.

The author is the founder of Proxies API, a proxy rotation api service.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store