11 May 2022

Subset ratio

Share this message

Every Friday at Bitmetric we’re posting a new Qlik certification practice question to our LinkedIn company page. Last Friday we asked the following Qlik Data Architect certification practice question:

The correct answer is C: 28% of the CustomerID’s have never placed an order

To validate and check the quality of the data model which you have just created, the data model viewer is an important tool. One of the things you can check here is the subset ratio. The subset ratio is the percentage of present distinct values within that table compared to the total distinct values of the chosen field in the whole data model . To demonstrate this see the following image:

In the image above we have selected the key field CustomerID in the Sales table. We can see in the bottom half of the screen that the field CustomerID has a total of 100 distinct values in all tables in the model, not just in the selected table, visible as Total distinct values. By selecting this field within the Sales table however, we see that there are 72 distinct values in this table alone, visible as the Present distinct values. So of all 100 distinct CustomerID’s within all tables we now know that there are 72 distinct CustomerID’s in the Sales table. So by diving these we will receive the subset ratio:

Present distinct values / total distinct values = 72 / 100 = 0,72 or 72%

Now knowing that there is a total of 100 values this means that the Customer table must be filled with 100 distinct CustomerID’s. Having a look in the data model viewer confirms this:

There are 100 distinct values present in the Customer table. Now by subtracting the 72% of the Sales table of the total 100% we end up with 28% of CustomerID’s in the total model (in this case all present in the Customer table) which have never placed an order.

Some other things to keep in mind about the subset ratio:

  • What if the subset ratio of the dimension table is also lower than 100%?

If this amount would have been lower there would have been a discrepancy between the Customer table and the Sales table in which both tables would have had values not present in the other. For a fact table it is not uncommon to have a subset ratio of lower then 100%, however a dimension table, like the Customer table in the example, with a subset ratio of less then 100% means that you should have a look at the data in the model. If for example the subset ratio in the Customer table would have been 90%, it means that we have 10% of distinct CustomerID’s present in the Sales table, which are not being matched with a CustomerID in the Customer table.

  • What if the total of the subset ratio’s is 100%?

If the combined total of the subset ratio’s of all tables would be 100% it means that there are no matching values between the tables. Good luck 😉

 That’s it for this week. See you next Friday? And remember:

  • If you have suggestions for questions, we love to hear from you via WhatsApp or at info@bitmetric.nl
  • If you’re enjoying these questions and want to work on stuff like this every day (but a bit more challenging), we’re always on the lookout for new colleagues. Check our job openings here.

How can we help?

Feel free to contact us if you have any comments or questions.