lundi 29 juin 2015

Synonym chains - Efficient routing algorithm for iOS/sqlite

A synonym chain is a series of closely related words that span two anchors. For example, the English words "black" and "white" can connected as:

black-dark-obscure-hidden-concealed-snug-comfortable-easy-simple-pure-white

Or, here's "true" and "false":

true-just=fair=beautiful=pretty-artful-artificial-sham-false

I'm working on a thesaurus iOS app, and I would like to display synonym chains also. The goal is to return minimum spanning tree(s) of a weighted planar graph of word relations. My source is a very large thesaurus with weighted data, where the weights measure similarity between words. (e.g., "outlaw" is closely related to "bandit", but more distantly related to "rogue.")

What optimization strategies do you recommend to make this realistic, e.g., within 5 seconds of processing on a typical iOS device? Assume the thesaurus has half a million terms, each with 20 associations. I'm sure there's a ton of prior research on these kinds of problems, and I'd appreciate pointers on what might be applied to this.

My current algorithm involves recursively descending a few levels from the start and end words, and then looking for intercepting words, but that becomes too slow with thousands of sqlite (or Realm) selects.

Aucun commentaire:

Enregistrer un commentaire