I am experimenting on manipulating data using R with remote on-disk data stored in Sqlite databases. Here's my steps:
Firstly, I copied the flights data set to my empty test database:
library(nycflights13)
library(dplyr)
test_db = src_sqlite("E:/sqlite/test_db")
copy_to(test_db, flights, temporary = FALSE)
Then, I cleared R memory and attempted to extract the flights from test database:
library(dplyr)
test_db = src_sqlite("E:/Sqlite/test_db")
flights = tbl(test_db, "flights")
flights_df = data.frame(select(flights, year, month, day, dep_time, tailnum))
Warnings poped up with
Only first 100,000 results retrieved. Use n = -1 to retrieve all.
Have no clue how to data frame the entire flights data which should include more than 300,000 rows.
Actually a side question: Is that possible to do some advanced data manipulation (reshape, long data to wide data, wide to long) using R directly on the remote on-disk data in database instead of calling/data framing them in R which might usually memory issue?
Aucun commentaire:
Enregistrer un commentaire