I have several SQLite databases ranging in size from 1 to 150 MB some with as many as 30,000 rows. The data being searched is basic HTML. I'm looking for the quickest way to search the HTML text while compensating for any HTML tags.
For instance, if I am searching for "the sky is blue" and a record in a database has an italics tag (i.e. "the <i>sky</i> is blue"), I need it to find it.
Obviously a straight search,
SELECT * FROM dictionary WHERE definition LIKE "%the sky is blue%"
won't work.
So I tried a search for all the individual words in a record in any order and then filter them with a regular expression. This works but is slow. It delivers too many false records that must be scanned by the regex. Especially if there are common words in the search string.
I tried searching for the individual words in order (LIKE "%the%sky%is%blue%") but this would sometimes cause the SQL search to hang with the larger records for some reason. I think it is because of the short common strings ("is", "at", etc.) that produce 1000s of hits.
An SQL regex search is also too slow for my purposes.
One option is to make another table with the data in all the records stripped of HTML tags and search that instead, but this nearly doubles the size of the database.
What other options are there to compensate for the tags?
Aucun commentaire:
Enregistrer un commentaire