How to Scrape Weather Data Using Python Scrapy

pip install scrapy
scrapy startproject scrapingproject
New Scrapy project 'scrapingproject', using template directory '/Library/Python/2.7/site-packages/scrapy/templates/project', created in:
/Applications/MAMP/htdocs/scrapy_examples/scrapingproject
You can start your first spider with:
cd scrapingproject
scrapy genspider example example.com
cd scrapingproject
cd scrapingproject
scrapy genspider ourfirstbot https://weather.com/en-IN/weather/tenday/l/6d031a57074ba2aebf48f086cb118df52748edf41d9c624fd95329c6e070754d
Created spider 'ourfirstbot' using template 'basic' in module:
scrapingproject.spiders.ourfirstbot
# -*- coding: utf-8 -*-
import scrapy
class OurfirstbotSpider(scrapy.Spider):
name = 'ourfirstbot'
start_urls = ['https://weather.com/en-IN/weather/tenday/l/6d031a57074ba2aebf48f086cb118df52748edf41d9c624fd95329c6e070754d']
def parse(self, response):
pass
dates = response.css('.day-detail').extract()
dates = response.css('.day-detail').extract()       
descriptions = response.css('.description').extract()
temps = response.css('.temp').extract()
precipitations = response.css('.precip').extract()
winds = response.css('.wind').extract()
humiditys = response.css('.humidity').extract()
# -*- coding: utf-8 -*-
import scrapy
from bs4 import BeautifulSoup
import urllib
class OurfirstbotSpider(scrapy.Spider):
name = 'ourfirstbot'
start_urls = [
'https://weather.com/en-IN/weather/tenday/l/6d031a57074ba2aebf48f086cb118df52748edf41d9c624fd95329c6e070754d',
]
def parse(self, response):
#yield response
dates = response.css('.day-detail').extract()
descriptions = response.css('.description').extract()
temps = response.css('.temp').extract()
precipitations = response.css('.precip').extract()
winds = response.css('.wind').extract()
humiditys = response.css('.humidity').extract()
#links = response.css('.css-8atqhb a::attr(href)').extract()

for item in zip(dates, descriptions, temps, precipitations, winds, humiditys):
all_items = {
'date' : BeautifulSoup(item[0]).text,
'description' : BeautifulSoup(item[1]).text,
'temp' : BeautifulSoup(item[2]).text,
'precipitation' : BeautifulSoup(item[3]).text,
'wind' : BeautifulSoup(item[4]).text,
'humidity' : BeautifulSoup(item[5]).text,
}
yield all_items
scrapy crawl ourfirstbot -s ROBOTSTXT_OBEY=False
scrapy crawl ourfirstbot -o data.csv
scrapy crawl ourfirstbot -o data.json
scrapy crawl ourfirstbot -s USER_AGENT="Mozilla/5.0 (Windows NT 6.1; WOW64)/
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36" /
-s ROBOTSTXT_OBEY=False
  • With millions of high speed rotating proxies located all over the world
  • With our automatic IP rotation
  • With our automatic User-Agent-String rotation (which simulates requests from different, valid web browsers and web browser versions)
  • With our automatic CAPTCHA solving technology
curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"
# -*- coding: utf-8 -*-
import scrapy
from bs4 import BeautifulSoup
import urllib
class OurfirstbotSpider(scrapy.Spider):
name = 'ourfirstbot'
start_urls = [
'http://api.proxiesapi.com/?key=API_KEY&url=https://weather.com/en-IN/weather/tenday/l/6d031a57074ba2aebf48f086cb118df52748edf41d9c624fd95329c6e070754d',
]
def parse(self, response):
#yield response
dates = response.css('.day-detail').extract()
descriptions = response.css('.description').extract()
temps = response.css('.temp').extract()
precipitations = response.css('.precip').extract()
winds = response.css('.wind').extract()
humiditys = response.css('.humidity').extract()
#links = response.css('.css-8atqhb a::attr(href)').extract()

for item in zip(dates, descriptions, temps, precipitations, winds, humiditys):
all_items = {
'date' : BeautifulSoup(item[0]).text,
'description' : BeautifulSoup(item[1]).text,
'temp' : BeautifulSoup(item[2]).text,
'precipitation' : BeautifulSoup(item[3]).text,
'wind' : BeautifulSoup(item[4]).text,
'humidity' : BeautifulSoup(item[5]).text,
}
yield all_items

--

--

--

Founder @ ProxiesAPI.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to Convert Load to Gcash (2022 Guide)

How to Convert Load to Gcash (2022 Guide)

Kali Linux 2020.2- TP-Link WN722N firmware install to enable monitor mode/ packet injection

THE NEW BUILD IS COMING

This Week In TurtleCoin (June 30, 2020)

CS 373 Spring 2021: Final Entry

(Agile) Features priority management

Projects to Add in your Resume(Computer Science)

Docker basics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mohan Ganesan

Mohan Ganesan

Founder @ ProxiesAPI.com

More from Medium

How to Scrape Sports Data and Store it into a Database with Python and PlanetScale – Part I

Web Scrape with Python In 5 Minutes

Web scraping basics with python

Flight arrivals web scraper