vendredi 27 novembre 2015

Connect to SQLite in Apache Spark

I want to run a custom function on all tables in a SQLite database. The function is more or less the same, but depends on the schema of the individual table. Also, the tables and their schemata are only known at runtime (the program is called with an argument that specifies the path of the database).

This is what I have so far:

val conf = new SparkConf().setAppName("MyApp")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

// somehow bind sqlContext to DB

val allTables = sqlContext.tableNames

for( t <- allTables) {
    val df = sqlContext.table(t)
    val schema = df.columns
    sqlContext.sql("SELECT * FROM " + t + "...").map(x => myFunc(x,schema))
}

The only hint I found so far needs to know the table in advance, which is not the case in my scenario:

val tableData = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlite:/patyh/to/file.db", "dbtable" -> t)).load()

I am using the xerial sqlite jdbc driver. So how can I conntect solely to a database, not to a table?

Aucun commentaire:

Enregistrer un commentaire