I have a table tab
contains events where participants are listed as a comma separated list of ids: 1,5,6,24,124
I want to get the AVG participant number for events only if the resolution
is equal to Completed
This is what I tried:
SELECT avg(case when (resolution = 'Completed' then LENGTH(REPLACE(participants, ',', '')) end) participants FROM tab;
but it doesn’t work
Answer
Fix your data model! Storing lists in a string is wrong for a lot of reasons:
- Numbers should be stored as numbers not strings.
- Ids should have properly declared foreign key relationships, which doesn’t work when they are in a string.
- Columns are intended to store single values, not multiple values.
- SQL has lousy string processing capabilities.
- SQL has a great data structure for storing lists. It is called a table.
The right thing to do is to fix the data model. Sometimes, you are stuck with someone else’s really, really bad data model. In that case, you can count the number of commas and add one:
SELECT AVG( 1 + LENGTH(participants) - LENGTH(REPLACE(participants, ',', ''))) as avg_participants FROM tab WHERE resolution = 'Completed';
However, your real effort should be on fixing the data model.