So I have two tables, ‘videos’ and ‘video_likes’. The ‘videos’ table looks like this:
id | creator | url | likes 1 | 5 |https://... | 10 2 | 8 |https://... | 20 3 | 4 |https://... | 30 4 | 2 |https://... | 40
And ‘video_likes’ table looks like this:
id | video_id | like_user_id 1 | 2 | 8 2 | 2 | 5 3 | 4 | 2
As you can see both are basic tables. Ok Now here is the fun part. When a user likes/dislikes a post, instead of incrementing and decrementing likes’ from ‘videos’ table, I just update them to be as same as the count of rows in the ‘video_likes’ table, like this:
UPDATE videos SET likes = (SELECT COUNT(id) FROM video_likes WHERE video_id = '$video_id') WHERE id = '$video_id'
When it is time to fetch the no. of likes of the video, I just do this:
SELECT likes FROM videos WHERE id = '$video_id' LIMIT 1
The problem is that the update query is called each time a user likes or dislikes a video, which could be very expensive I guess. I have three questions related to this problem:
1) is there a performance issue when the ‘video_likes’ table grows to a large number, say a million or a billion?
2) Does sql lock the table while updating the count? If yes, will a new insert (a user liking the post while the update is going on) fail? If so, how to prevent this?
3) What is the most fastest and efficient way to achieve the same result? I don’t want to increment or decrement the likes field, because this will be just a denormalization and there could be inconsistencies in the actual likes count. What would be the right way to do this?
These are a few questions that has been troubling me for the past few days. Hope you’ll answer. Regards.
- Don’t recompute; that will just get slower and slower.
- Increment the counter at the same time you
INSERTinto the Likes table.
- When you get so busy that there are hundreds of writes (Inserts, Updates) per second, split out the like counter into a parallel table (“vertical partitioning”). This will help avoid conflicts due to the row lock (not table lock) when doing increment a single video like-count, versus other accesses to that table.
SELECT likes FROM videos WHERE id = '$video_id' LIMIT 1does not make sense — is there more than one row for a video in the
videostable? Why? If so, which one (
limit 1) do you want?
- Did you also want a “dislike” counter? Or will you simply ‘decrement’. Think about it.
video_likesdoes not need an
PRIMARY KEY(video_id, user_id). This will let you prevent a user from repeatedly bumping the count. (However, this needs checking before doing the increment.)
If “Dislike” is not a feature, but “Un-Like” is, I suggest “Un-Like” be implemented something like
START TRANSACTION; DELETE FROM video_likes WHERE ...; UPDATE videos SET like_ct = like_ct - 1 WHERE ...; COMMIT;
That way, the two “like” mechanism would stay perfectly in sync. This is, to some extent, a mirror image of the “Like” code.
LIMIT only for controlling how much output you get.
- When there is only one row anyway,
LIMIT 1doesn’t change how long the query takes.
- When the query has to sort (
ORDER BY) to decide which is the first/last/whatever, then it has already done a lot of work; the
LIMITmerely cuts back on the number of rows delivered.
- On the other hand, when an
INDEXcan be used for the
ORDER BY, the sort might vanish. Hence the
LIMITcontrols the number of rows delivered.
AUTO_INCREMENT or not
- A table must have a
- If you don’t explicitly specify a PK, one is generated for you. (Not advised.)
- A “natural” PK is on that occurs naturally. Example, a table of countries could (should) use the standard 2-letter “country codes” — US, FR, IN, RU, CN, CH, IT, etc.
id INT UNSIGNED NOT NULL AUTO_INCREMENTis useful for tables that don’t have a “natural” PK. Or the natural PK is a long string.
In that example with a 4 byte id and a 2-byte country-code, you need two indexes:
PRIMARY KEY(id), INDEX(country_code)
And the table with
id is bigger. (Though not enough bigger in this example to really matter.)
In the case of a “many:many”, such as your
id hurts performance. I discuss that here.