I would describe myself pretty much a beginner working with databases. I once used mysql for an application and know some basic sql. But never had any form of training or read a book that goes deep into the topic. Everything I know I collected from random places on the internet.
So I was quite confused when I was looking into how to better the performance of my small sqlite database. Because everyone seems to have different priorities and things they recommend. So maybe someone here can help me figure out what are the most important things for my use case.
I am writing a small rss-reader that stores all its articles, feeds, categories, tags and so on in the sql-database. All the writing is done by a daemon. While the ui (another process that communicates with the daemon over dbus) just reads the data and presents it. The write performance is not super important to me, because it is done in the background anyways, and the user wont notice if it took 10s longer until he gets notified about new articles that are available. But reading should be reasonable fast, because there is nothing more annoying than a slow ui.
My database contains the following tables:
articles:
- "articleID" TEXT primary key
- "feedID" TEXT
- "title" TEXT
- "author" TEXT
- "url" TEXT
- "preview" TEXT
- "unread" INT
- "marked" INT
- "tags" TEXT
categories:
- "categorieID" TEXT primary key
- "title" TEXT
- "unread" INT
- "orderID" INT
- "parent" TEXT
- "level" INT
feeds:
- "feedID" TEXT primary key
- "name" TEXT
- "url" TEXT
- "hasIcon" INT
- "unread" INT
- "categorieID" TEXT
tags:
- "tagID" TEXT primary key
- "title" TEXT
- "color" INT
So first about inserts:
At the moment they are as simple and inefficient as they can get. Every article, feed category gets inserted one at a time. I read that I can insert multiple rows at once using only a single query like this
INSERT INTO 'tablename' ('column1', 'column2') VALUES
('data1', 'data2'),
('data1', 'data2'),
('data1', 'data2'),
('data1', 'data2');
But that would create one huge query since in theory there can be hundredths of new articles at once.
Some other source recommended to use "BEGIN TRANSACTION" and then prepare a single statement and bind the different parameters each time and finish with and "END TRANSACTION"
What would be the better solution for me?
Another tip I read about was turning "PRAGMA synchronous=OFF". Are there any drawbacks to this?
What else would you recommend? I don't want to over-complicate things since write-performance is not super critical for me and safety is more important :)
Reading:
This is the more critical and complicated part I guess. These are the things I read, but am not really sure about:
indexing: if I understood correctly this comes with only the drawback of slightly slower inserts which is fine by me, but what columns to index (unique, compound - I have no idea). And do I only create an index after the creation of the database or after each insert?
I read that a lot of WHERE and OR statements are bad, but what are the alternatives? This is how a random query looks at the moment:
SELECT * FROM "main"."articles" WHERE ("feedID" = "37" OR "feedID" = "45" OR "feedID" = "47" OR "feedID" = "50" OR "feedID" = "57" OR "feedID" = "61" OR "feedID" = "62") ORDER BY rowid DESC LIMIT 23 OFFSET 0
This is for example a query that gets all articles of a single category. First I get all the feedID's of all the feeds that the category contains and then search for articles that contain these feedID's. I guess this is suboptimal at best :D What can be done to improve this?
what about increasing the number of pages in memory? "PRAGMA cache_size"
search is realized by using "instr".
SELECT * FROM "main"."articles" WHERE instr(UPPER("title"), UPPER("searchterm")) > 0 ORDER BY rowid DESC LIMIT 23 OFFSET 0
this is pretty much the most complex query I use. How can something like this be optimized with all its ANDs ORs and WHEREs?
SELECT * FROM "main"."articles" WHERE ("feedID" = "41" OR "feedID" = "42" OR "feedID" = "46" OR "feedID" = "48") AND "unread" = 9 AND "marked" = 11 AND instr(UPPER("title"), UPPER("searchterm")) > 0 ORDER BY rowid DESC LIMIT 23 OFFSET 0
Is it better to select the rows one by one (LIMIT 1 and an additional OFFSET for every select query) or get them all at once?
any other tips for making select statements faster? :)
I hope this topic is not against any rules around here since it has something to do with opinions. But I only would like to know what optimizations are the most important for my use case without over-complicating the whole thing :)
Thank you in advance for any answers.
Aucun commentaire:
Enregistrer un commentaire