I am trying using pandas to analyze some json data and later write to a sqlite3 database.
The input data is loaded from here and I only care about the 'bugs' information. Since there are list(for blocks, depends_on, flags, keywords) and null (for dupe_of), I use json_normalize to generate the DataFrame df. But when I use to_sql:
engine = create_engine('sqlite:///mydb.db')
df.to_sql("mydb", engine, index=False, if_exists='append')
It reports error:
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 966, in to_sql
dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 538, in to_sql
chunksize=chunksize, dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 1172, in to_sql
table.insert(chunksize)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 717, in insert
self._execute_insert(conn, keys, chunk_iter)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 692, in _execute_insert
conn.execute(self.insert_statement(), data)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 841, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", line 322, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 938, in _execute_clauseelement
compiled_sql, distilled_params
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1070, in _execute_context
context)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1271, in _handle_dbapi_exception
exc_info
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1063, in _execute_context
context)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 442, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.InterfaceError: (InterfaceError) Error binding parameter 0 - probably unsupported type. u'INSERT INTO "MozillaRepo" (blocks, classification, component, creatio
n_time, depends_on, dupe_of, flags, id, is_confirmed, is_open, keywords, last_change_time, op_sys, platform, priority, product, resolution, severity, status, summary, version) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)' ([], u'Client Software', u'Preferences', u'1999-07-30T22:55:51Z', [], None, [] ... displaying 10 of 22 total bound parameter sets ... u'Dialup properties needs to be exposed in prefs', u'Trunk')
From the last line of the message, it should be the data type issue. But I don't know how to correct it.
The result of df.dtypes is:
blocks object
classification object
component object
creation_time object
depends_on object
dupe_of object
flags object
id int64
is_confirmed bool
is_open bool
keywords object
last_change_time object
op_sys object
platform object
priority object
product object
resolution object
severity object
status object
summary object
version object
dtype: object
And I can see that the database table is created as:
CREATE TABLE "mydb" (
blocks TEXT,
classification TEXT,
component TEXT,
creation_time TEXT,
depends_on TEXT,
dupe_of TEXT,
flags TEXT,
id BIGINT,
is_confirmed BOOLEAN,
is_open BOOLEAN,
keywords TEXT,
last_change_time TEXT,
op_sys TEXT,
platform TEXT,
priority TEXT,
product TEXT,
resolution TEXT,
severity TEXT,
status TEXT,
summary TEXT,
version TEXT,
CHECK (is_confirmed IN (0, 1)),
CHECK (is_open IN (0, 1)),
CHECK (is_open IN (0, 1)),
CHECK (is_confirmed IN (0, 1))
)
Aucun commentaire:
Enregistrer un commentaire