12 May 2022

# Using the subset ratio in Qlik to spot errors in key fields

Every Friday at Bitmetric we’re posting a new Qlik certification practice question to our LinkedIn company page. Last Friday we asked the following Qlik Data Architect certification practice question about subset ratios in Qlik Sense:

### The correct answer is C: 28% of the CustomerID’s have never placed an order

To validate and check the quality of the data model which you have just created, the data model viewer is an important tool. One of the things you can check here is the subset ratio. The subset ratio is the percentage of present distinct values within that table compared to the total distinct values of the chosen field in the whole data model . To demonstrate this see the following image:

In the image above we have selected the key field CustomerID in the Sales table. We can see in the bottom half of the screen that the field CustomerID has a total of 100 distinct values in all tables in the model, not just in the selected table, visible as Total distinct values. By selecting this field within the Sales table however, we see that there are 72 distinct values in this table alone, visible as the Present distinct values. So of all 100 distinct CustomerID’s within all tables we now know that there are 72 distinct CustomerID’s in the Sales table. So by diving these we will receive the subset ratio:

Present distinct values / total distinct values = 72 / 100 = 0,72 or 72%

Now knowing that there is a total of 100 values this means that the Customer table must be filled with 100 distinct CustomerID’s. Having a look in the data model viewer confirms this:

There are 100 distinct values present in the Customer table. Now by subtracting the 72% of the Sales table of the total 100% we end up with 28% of CustomerID’s in the total model (in this case all present in the Customer table) which have never placed an order.

### Other things to keep in mind

Some other things to keep in mind about the subset ratio:

• What if the subset ratio of the dimension table is also lower then 100%?

If this amount would have been lower there would have been a discrepancy between the Customer table and the Sales table in which both tables would have had values not present in the other. For a fact table it is not uncommon to have a subset ratio of lower then 100%, however a dimension table, like the Customer table in the example, with a subset ratio of less then 100% means that you should have a look at the data in the model. If for example the subset ratio in the Customer table would have been 90%, it means that we have 10% of distinct CustomerID’s present in the Sales table, which are not being matched with a CustomerID in the Customer table.

• What if the total of the subset ratio’s is 100%?

If the combined total of the subset ratio’s of all tables would be 100% it means that there are no matching values between the tables. Good luck 😉

Addressing or correcting the errors identified by a low subset ratio in Qlik Sense or QlikView typically involves a manual review of the data model and the relationships between tables. Once a low subset ratio is identified, indicating discrepancies or missing data, users should examine the key fields and relationships to understand the cause of these discrepancies. This might include checking for incorrect key field associations, missing entries, or data quality issues in the source data. Corrective actions can vary, ranging from modifying the data load script to adjust how tables are joined, correcting data entry errors in the source data, or redefining relationships within the data model to ensure all relevant data is accurately associated.

Qlik Sense and QlikView do not have built-in tools that automatically fix discrepancies highlighted by the subset ratio. However, they offer robust data modeling and scripting capabilities that allow users to manually address and rectify these issues. Users can leverage the data model viewer and the script editor to make adjustments and improvements to the data model. Additionally, Qlik’s community forums and documentation provide extensive resources and examples on how to handle common data modeling challenges.

The subset ratio’s impact on the performance and accuracy of data analysis in Qlik Sense or QlikView is significant. A low subset ratio can indicate that not all data is being correctly associated, which may lead to incomplete or inaccurate analysis outcomes. For instance, if customer IDs are not matching correctly between tables, sales analysis could miss out on certain transactions, leading to underreported sales figures. Furthermore, discrepancies in key field associations can lead to performance issues, as the data model might be more complex or less optimized than intended. Ensuring a high subset ratio, where appropriate, helps in maintaining the integrity and reliability of the data analysis, enabling more accurate and efficient reporting and insights.

That’s it for this week. See you next Friday?

### Take your Qlik skills to the next level!

Since 2013, the Masters Summit for Qlik is the premier advanced training for Qlik. Join us in Vienna and take your Qlik skills to the next level.

### Join the team!

Do you want to work within a highly-skilled, informal team where craftsmanship, ingenuity, knowledge sharing and personal development are valued and encouraged? Check out our job openings.

Data Model Friday Qlik Test Prep Solution

### How can we help?

Barry has over 20 years experience as a Data & Analytics architect, developer, trainer and author. He will gladly help you with any questions you may have.