This is an excellent question. Because well begun is half the battle.
The choice of framework depends on what your immediate needs are, the kind of website you need to scrape, and the limitations they have will make a big difference as well.
If you have no real compulsion there, I am always tempted to say Scrapy.
The Scrapy Framework provides all sorts of abstraction for common crawling issues like concurrent requests, using multiple spiders using Scrapyd, Managing crawling rules, Obeying site constraints like rate limiting, managing data with pipelines, and finally support for XPath and CSS selectors for scraping the data quickly.
Between these two options, you are pretty well set. But production-level web scraping is mostly not about coding. It is about reliability. If you run a webcrawler or any scale and you want to run it frequently, and the data is mission-critical, you will find that most web servers are savvy enough to detect, warn and block your crawler quite easily.
YELPPPP. I mean HELPPPP
To overcome this, you will need to use a proxy service and code it in such a way you rotate between a few of them now and then. There are some places on the internet that you get a list of active proxies free like here https://www.proxynova.com/proxy-server-list/, or you can go for a professional Rotating Proxy API Service like Proxies API. Full disclosure: I am the founder of Proxies API.
The blog was originally posted at : https://www.proxiesapi.com/blog/copy-of-web-crawling-where-do-i-start.html.php