I have a column of URIs in my SQLite database. I would like to identify which of these are subdomains of a same website.
For instance, for the given set...
1. daiquiri.rum.cu
2. mojito.rum.cu
3. cubalibre.rum.cu
4. americano.campari.it
5. negroni.campari.it
6. hemingway.com
... I would like to run a query that returns:
Website | Occurrences
----------------------------
rum.cu | 3
campari.it | 2
hemingway.com | 1
That is, the domain names / patterns that were matched, ranked by the number of times they were found in the database.
The heuristic I would use is: for every URI with 3+ domains, replace first domain with '%'and execute the pseudoquery: COUNT(uris from website where uris LIKE '%.remainderofmyuri').
Note that I don't care much about execution speed (in fact, not at all). The number of entries is within the range of 10k-100k.
Aucun commentaire:
Enregistrer un commentaire