How to Scrape Wikipedia using Python Scrapy

pip install scrapy
scrapy startproject scrapingproject
New Scrapy project 'scrapingproject', using template directory '/Library/Python/2.7/site-packages/scrapy/templates/project', created in:
/Applications/MAMP/htdocs/scrapy_examples/scrapingproject
You can start your first spider with:
cd scrapingproject
scrapy genspider example example.com
cd scrapingproject
cd scrapingproject
scrapy genspider ourfirstbot https://en.wikipedia.org/wiki/List_of_common_misconceptions
Created spider 'ourfirstbot' using template 'basic' in module:
scrapingproject.spiders.ourfirstbot
# -*- coding: utf-8 -*-
import scrapy
class OurfirstbotSpider(scrapy.Spider):
name = 'ourfirstbot'
start_urls = ['https://en.wikipedia.org/wiki/List_of_common_misconceptions']
def parse(self, response):
pass
dates = response.css('.mw-headline').extract()
curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"
# -*- coding: utf-8 -*-
import scrapy
from bs4 import BeautifulSoup
import urllib


class OurfirstbotSpider(scrapy.Spider):
name = 'ourfirstbot'
start_urls = [
'http://api.proxiesapi.com/?key=API_KEY&url=https://en.wikipedia.org/wiki/List_of_common_misconceptions',
]

def parse(self, response):
#yield response
headings = response.css('.mw-headline').extract()
datas = response.css('ul').extract()


for item in zip(headings, datas):
all_items = {
'headings' : BeautifulSoup(item[0]).text,
'datas' : BeautifulSoup(item[1]).text,


}


yield all_items

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store