jeudi 22 janvier 2015

Unable to receive the entire data frame from Sqlite using R (dplyr)

I am experimenting on manipulating data using R with remote on-disk data stored in Sqlite databases. Here's my steps:


Firstly, I copied the flights data set to my empty test database:



library(nycflights13)
library(dplyr)
test_db = src_sqlite("E:/sqlite/test_db")
copy_to(test_db, flights, temporary = FALSE)


Then, I cleared R memory and attempted to extract the flights from test database:



library(dplyr)
test_db = src_sqlite("E:/Sqlite/test_db")
flights = tbl(test_db, "flights")
flights_df = data.frame(select(flights, year, month, day, dep_time, tailnum))


Warnings poped up with



Only first 100,000 results retrieved. Use n = -1 to retrieve all.


Have no clue how to data frame the entire flights data which should include more than 300,000 rows.


Actually a side question: Is that possible to do some advanced data manipulation (reshape, long data to wide data, wide to long) using R directly on the remote on-disk data in database instead of calling/data framing them in R which might usually memory issue?


Aucun commentaire:

Enregistrer un commentaire