samedi 13 juin 2015

Large, complex json file to SQLite with Python

I have a very large, relatively complex json file that has one json object per line. Each object looks like this:

{'domain': {'google.com': 150.0},
 'time': 15736,
 'uid': 199.0,
 'hours': [0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 20.0, 5.0, 0.0, 5.0, 5.0, 20.0, 5.0, 10.0, 5.0, 5.0, 10.0, 5.0, 5.0, 5.0, 5.0],
 'bid': '0928134h12349',
 'user': u'',
 'b': 'news.google.com',
 'q': {},
 'total': 150.0,
 'type': 2.0}

Because the file is 3.5gb and I don't have a computer capable of loading this to memory, I would like to put all of this data into a SQLite database for easy analysis with Pandas.

There are some caveats about how I am attempting to put this data into the sqlite database.

  • I would like to leave out the user, query and domain key:value pairs
  • I would like to have each of the 24 items in the hours key's array to have it's own column

This is what I have attempted so far:

query = "insert into data_table values (?,?,?,?,?,?,?,?,?,?)"

columns = ['domain', 'time', 'uid', 'hours',
           'bid', 'user','b',
           'q', 'total', 'type']

with open(JSON_FILE) as f:
    for line in f:
        x = json.loads(line)
        keys = tuple(x[c] for c in columns)
        c.execute(query, keys)

The above does not work and doesn't have code to satisfy my two conditions above. Any thoughts on steps I can take to best accomplish this?

Aucun commentaire:

Enregistrer un commentaire