mercredi 30 septembre 2015

SQLITE_ERROR: Connection is closed when connecting from Spark via JDBC to SQLite database

I am using Apache Spark 1.5 and trying to connect to a local SQLite database named clinton.db. Creating a data frame from a table of the database works fine but when I do some operations on the created object, I get the error below which says "SQL error or missing database (Connection is closed)". Funny thing is that I get the result of the operation nevertheless. Any idea what I can do to solve the problem, i.e., avoid the error?

Start command for spark-shell:

../spark/bin/spark-shell --master local[8] --jars ../libraries/sqlite-jdbc-3.8.11.1.jar --classpath ../libraries/sqlite-jdbc-3.8.11.1.jar

Reading from the database:

val emails = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlite:../data/clinton.sqlite", "dbtable" -> "Emails")).load()

Simple count (fails):

emails.count

Error:

15/09/30 09:06:39 WARN JDBCRDD: Exception closing statement java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (Connection is closed) at org.sqlite.core.DB.newSQLException(DB.java:890) at org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109) at org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358) at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:90) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) res1: Long = 7945

Aucun commentaire:

Enregistrer un commentaire