I am debugging db corruption. After I get some corrupted db, I found that they all corrupted by writing zero-valued bytes.
So, I decide to add some check and dump call stackin the source code in order to find out who corrupts the db.
Here is the code I added in the source code.
int sqlite3CheckZeroValuedBytes(const unsigned char* data, const int length)
{
const size_t* s = (const size_t*)data;
const unsigned char* d = (const unsigned char*)data;
int n = length/sizeof(size_t);
int i;
for (i = 0; i n; i++) {
if (s[i]!=0) {
return 0;
}
}
for (i = i*sizeof(size_t); ilength; i++) {
if (d[i]!=0) {
return 0;
}
}
return 1;
}
static int unixWrite(
sqlite3_file *id,
const void *pBuf,
int amt,
sqlite3_int64 offset
){
unixFile *pFile = (unixFile*)id;
if (amt>0&&sqlite3CheckZeroValuedBytes(pBuf, amt)) {
SQLITE_KNOWN_ERROR(SQLITE_CORRUPT, "writing zero-valued bytes into %s from %d length %d", unixGetFilename(pFile-zPath), offset, amt);
}
...
}
The code is simple. I check the data whether is all null in [sqlite3CheckNullData], and add a macro [SQLITE_KNOWN_ERROR], which is defined as [sqlite_log], to throw this error outside SQLite. Outside SQLite, I dump the call stack of all thread, and I got this:
0x195774000 + 113628 objc_msgSend (in libobjc.dylib) + 28
0x1000f8000 + 7781724 _ZL9LogSQLitePviPKc,WCDataBase.mm,line 81
0x1000f8000 + 2836888 sqlite3_vlog,printf.c,line 1023
0x1000f8000 + 2778664 sqlite3KnownError,main.c,line 3192
0x1000f8000 + 2554560 unixWrite,os_unix.c,line 3335
0x1000f8000 + 2821984 sqlite3WalCheckpoint,wal.c,line 1798
0x1000f8000 + 2819864 sqlite3WalClose,wal.c,line 1914
0x1000f8000 + 2529964 sqlite3PagerClose,pager.c,line 3995
0x1000f8000 + 2574152 sqlite3BtreeClose,btree.c,line 2516
0x1000f8000 + 2774444 sqlite3LeaveMutexAndCloseZombie,main.c,line 10834297741736
0x1000f8000 + 2774220 sqlite3Close,main.c,line 1026
This is the only thread operating database. All other call stack of threads make no sense. You can see the SQLite checkpointing. That is the reason why my database corrupt. And I have no idea how this happened even I checking the source code.
Here is some of my conclusion:
- This checking zero-valued bytes also work for writing into WAL file, but there is no report that WAL is been written by zero-valued bytes. 2.Some rogue file descriptor may write the zero-valued bytes into WAL file. But, I have several db with the same problem. It’s a rare event that the rogue writter only write the zero-valued bytes into the WAL, not all other db files or normal files.
- I guess it could be a problem of operating system. I work on iOS, but I have no any further idea.
- It would happened in normal knee. But it could easily happen when the disk free space is low. I also have no any further idea about this.
So, this is my confusion:
- Does anyone have any idea about this?
- What can I do to reserve this type of corruption?
Note that if a page of sqlite_master is been rewritten by zero-valued bytes, the [.dump] shell command will not work to repair the database.
Aucun commentaire:
Enregistrer un commentaire