Scraping the Hacker News with Python and Beautiful Soup

pip3 install beautifulsoup4
pip3 install requests soupsieve lxml
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url='https://news.ycombinator.com/'
response=requests.get(url,headers=headers)
print(response)
python3 HN_bs.py
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url='https://news.ycombinator.com/'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
for item in soup.find_all('tr'):
try:
print(item)
except Exception as e:
raise e
print('')
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url='https://news.ycombinator.com/'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
for item in soup.find_all('tr'):
try:
#print(item)
if item.select('.storylink'):
print(item.select('.storylink')[0].get_text())
except Exception as e:
raise e
print('')
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url='https://news.ycombinator.com/'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
for item in soup.find_all('tr'):
try:
#print(item)
if item.select('.storylink'):
print(item.select('.storylink')[0].get_text())
print(item.select('.storylink')[0]['href'])
if item.select('.hnuser'):
print(item.select('.hnuser')[0].get_text())
print(item.select('.score')[0].get_text())
print(item.find_all('a')[3].get_text())
print('------------------')
except Exception as e:
raise e
print('')
  • With millions of high speed rotating proxies located all over the world,
  • With our automatic IP rotation
  • With our automatic User-Agent-String rotation (which simulates requests from different, valid web browsers and web browser versions)
  • With our automatic CAPTCHA solving technology,
curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

--

--

--

Founder @ ProxiesAPI.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

CHECK. me if you can

Private Kubernetes Deployment.

The Future of BI: Action-Enabled Dashboards & Deployable Insights

QCon 2018: Will Larson on how to successfully run a migration to tackle technical debt

Leetcode — Find Pivot Index

A Natural History of Resilience — Titanic to Chernobyl and Lessons for Complex Systems

Swagger examined (part 3) How to create a connector to your app using DBSync

Fetch value from appsetting and populate it in the properties in another class.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mohan Ganesan

Mohan Ganesan

Founder @ ProxiesAPI.com

More from Medium

Python: Web scraping the Headlines into an Excel Spreadsheet

How to scrape the web with Python

Web Scraping with Python: An Absolute Beginner’s Guide on Building an End-to-End Web Scraper for…

Book Depository Web Scraping with Python