I have a very large, relatively complex json file that has one json object per line. Each object looks like this:
{'domain': {'google.com': 150.0},
'time': 15736,
'uid': 199.0,
'hours': [0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 20.0, 5.0, 0.0, 5.0, 5.0, 20.0, 5.0, 10.0, 5.0, 5.0, 10.0, 5.0, 5.0, 5.0, 5.0],
'bid': '0928134h12349',
'user': u'',
'b': 'news.google.com',
'q': {},
'total': 150.0,
'type': 2.0}
Because the file is 3.5gb and I don't have a computer capable of loading this to memory, I would like to put all of this data into a SQLite database for easy analysis with Pandas.
There are some caveats about how I am attempting to put this data into the sqlite database.
- I would like to leave out the user, query and domain key:value pairs
- I would like to have each of the 24 items in the hours key's array to have it's own column
This is what I have attempted so far:
query = "insert into data_table values (?,?,?,?,?,?,?,?,?,?)"
columns = ['domain', 'time', 'uid', 'hours',
'bid', 'user','b',
'q', 'total', 'type']
with open(JSON_FILE) as f:
for line in f:
x = json.loads(line)
keys = tuple(x[c] for c in columns)
c.execute(query, keys)
The above does not work and doesn't have code to satisfy my two conditions above. Any thoughts on steps I can take to best accomplish this?
Aucun commentaire:
Enregistrer un commentaire