So I have a table in a SQLite database that contains just over 3 billion values, ranging from 0 to 400. It's just a single column of data. I want to bin the data into bins of 0.001 size, i.e. 0-0.001, 0.001-0.001 .... 399.998-399.999, 399.999-400 before calculating the median value of the binned values. I know this can be done in R, however I've noticed that I run into problems when doing this with a dataframe containing just a million values of the 3 billion. It reports the median value of the bin range (i.e. for bin 0 - 0.001, if there's no values that fall into it it still reports a median value of 0.0005) whereas it's simply reported as NA when using a data frame the size of say 100,000.
Is there a way I can do this in SQL or over come the problem I'm facing with R?
This is what I perform in R:-
> library(DBI)
> con <- dbConnect(RSQLite::SQLite(), dbname = "diffs.sqlite")
> tables <- dbListTables(con)
> mzDiff150 <- dbGetQuery(conn = con, statement = paste("SELECT `abs(diffs)` FROM mzdiff where `abs(diffs)` <= 150 and `abs(diffs)` > 100 ", sep = ","))
> bin <- seq(100, 150, by = 0.001)
> binnedData <- tapply(mzDiff150[, 1], cut(mzDiff150[, 1], breaks = bin), median)
Thanks
 
Aucun commentaire:
Enregistrer un commentaire