I have some smaller uncertainties about the Instagram's APIs. I'm trying to get the info about all the pictures with a certain hashtag. I would like to store some of those informations in a sqlite database, and then retrieve those information later on (for example, showing a map with the pictures' locations..). I do not need to download the pictures, just saving the urls is enough.
I tried to wrote a small script (copied below), that connects to Instagram, asks for more pictures by following the Instagram's next_url
until there is no next_url
available: I save all the pictures in the pics
variable.
After that, I loop through pics
and save each picture in a row of my sqlite database.
I can't understand why:
-
I had to add the
IGNORE
statement to the sqlite query, otherwise some pictures would appear two times in the database. -
I try to save the pictures with the oldest in the first row, and the newest in the last row, but by looking at the picture's id this is not working. I'm trying to do this so that I'll be able to update the database by querying the Instagram's APIs with the max-tag-id, and ask only for pictures that were published after the picture in the last row of my db.
-
When I run this with the
theearthcap
test hashtag, I end up with 281 rows in my database, while I see that on http://ift.tt/1pe2hZO there are 682 photos.
Here is the code:
import requests
import sqlite3
access_token = "InstagramAccessToken"
token = "theearthcap"
conn = sqlite3.connect('maps.db')
c = conn.cursor()
url = "http://ift.tt/1p5IevP" % \
(token, access_token)
def getPictures(next_url):
c = 0
pics = []
raw_data = requests.get(next_url).json()
pics += raw_data['data']
next_url = raw_data['pagination']['next_url']
while next_url is not None:
print c
raw_data = requests.get(next_url).json()
pics += raw_data['data']
try:
next_url = raw_data['pagination']['next_url']
except:
next_url = None
c += 1
return pics
pics = getPictures(url)
for p in reversed(pics):
current_pic = ([
str(p['id']),
str(p['user']['username']),
int(p['likes']['count']),
int(p['comments']['count']),
str(p['type']),
int(p['created_time']),
str(p['location']),
str(p['images']['thumbnail']['url']),
str(p['images']['low_resolution']['url']),
str(p['images']['standard_resolution']['url']),
str(p['caption'])
])
print current_pic[0]
print 'inserting'
c.execute('INSERT OR IGNORE INTO theearthcap VALUES (?,?,?,?,?,?,?,?,?,?,?)', current_pic)
conn.commit()
conn.close()
thanks!
Aucun commentaire:
Enregistrer un commentaire