lundi 16 mars 2015

Unable to calculate correct median value in R

I'm currently importing data from a SQLite database into R so that I can bin the values that are contained within it, specifically values within the range of 100 to 150. I'm trying to bin the values into bins of 0.001 before the median of the values is taken, so for example:-



> head(mzDiff150)
abs(diffs)
1 100.0008
2 100.0158
3 100.0212
4 100.0233
5 100.0327
6 100.0364


These values that make up the head of my data should be binned into bins of 0.001 as so:-



(100,100.001] (100.001,100.002] (100.002,100.003]
100.0008 N/A N/A etc


So from looking at my data I shouldn't have values in a lot of bins, i.e. a lot of N/As, which is fine. However, I get the following results:-



(100,100.001] (100.001,100.002] (100.002,100.003] (100.003,100.004] (100.004,100.005] (100.005,100.006]
100.0005 100.0015 100.0025 100.0035 100.0045 100.0055


which I don't get as there shouldn't be any values that fall within those bin ranges. The data is all sorted as well. This is the code that I perform:-



> library(DBI)
> con <- dbConnect(RSQLite::SQLite(), dbname = "diffs.sqlite")
> tables <- dbListTables(con)
> mzDiff150 <- dbGetQuery(conn = con, statement = paste("SELECT `abs(diffs)` FROM mzdiff where `abs(diffs)` <= 150 and `abs(diffs)` > 100 ", sep = ","))
> bin <- seq(100, 150, by = 0.001)
> binnedData <- tapply(mzDiff150[, 1], cut(mzDiff150[, 1], breaks = bin), median)


I feel like the mistake is obvious but I can't see where it is. Can anyone see where I'm going wrong?


Thanks


Aucun commentaire:

Enregistrer un commentaire