mercredi 25 novembre 2015

R: importing large fixed width file without line endings indicated

I am trying to import a large government text file (about 7GB) that is in a fixed width format -- without indicators for the ends of lines. I have the sas input file that the agency uses.

Here's what I've tried:

  1. read.fwf - This crashes R, presumably because of lack of memory (my computer has 16GB, but I gather you need about 3x as much as your file size).

  2. LaF - This looked like it would solve the memory problem, but doesn't seem to handle the lack of line end indicators. It winds up only reading the first row of data.

  3. read.SAScii.sqlite - This takes the sas file and uses it to read into R, and uses sqlite to get around the memory limitations. But when I run it, using this instruction:

    tanf12 <- read.SAScii.sqlite("TANF12A.txt", 
    "input_TANF12A.sas",
    lrecl = 1210)
    
    

    I get the following error message:

    Error in match(x, table, nomatch = 0L) : 
      argument "tablename" is missing, with no default
    Called from: top level 
    
    

And I don't follow what the function does well enough to solve this.

Aucun commentaire:

Enregistrer un commentaire