lundi 24 août 2015

(Beautiful Soup) How to extract data from HTML tags

So far I have started with this. I can't get the normal text from div.

from BeautifulSoup import BeautifulSoup
import urllib2
get = BeautifulSoup(urllib2.urlopen("https://example/com/").read()).findAll('div', {'class':'h4 entry-title'})
import sys
for  i in get:
print i

How can I scrap data from this HTML please ? I only need these color name and paragraph.

<div class="h4 entry-title">
<a href="https://example/com/01/">RED</a>
</div>
<p>
I am paragraph red
<p>

<div class="h4 entry-title">
<a href="https://example.com/02/">WHITE</a>
</div>
<p>
I am paragraph white
</p>


<div class="h4 entry-title">
<a href="https://example.com/03/">PINK</a>
</div>
<p>
I am paragraph pink
</p>

My Questions:

  1. How can I scrap data from this HTML? I need the text and paragraph only.

Output I need in console:

RED I am paragraph red
WHITE I am paragraph white
PINK I am paragraph pink
  1. How can I import these set of data into a SQL file automatically?

Output Database table(name,description) I want:

name: RED,WHITE,PINK
description: I am paragraph RED, I am paragraph WHITE, I am paragraph PINK

Aucun commentaire:

Enregistrer un commentaire