22 June 2022

GROUP BY

Share this message

Every Friday at Bitmetric we’re posting a new Qlik certification practice question to our LinkedIn company page. Last Friday we asked the following Qlik Data Architect certification practice question:

The correct answer is D.

The problem stated in this question was that the data in the front end showed wrong results. By examining the table in Figure 1 we can quickly determine that the problem lies in the Target. This was duplicated per row for each Customer and Month. So when using a Sum(Sales) and Sum(Target) in the frontend, Customer A in January would end up with a total sales amount of 41 and a total target amount of 100, which should have been 50. It was requested to fix this in the script.

To create the solution in the script we need to aggregate the sales for each customer for each month while loading the table. This should be done by using Sum() as the aggregation function. However just using Sum() in the load is not enough, when making aggregations we need to tell Qlik what to aggregate them by. This is done using the Group By clause.

So let’s have a look at how the final table should be loaded into the data model and which answer provides this.

It is apparent that we need to sum the sales per customer and month. This is done by adding the following script:

Totals:
LOAD
Customer
, Month
, Sum(Sales) As SalesAmount
Resident Sales
GROUP BY Customer, Month
;

Aggregating values while loading a table in the script always has to be accompanied by the Group By clause. In this case we need the Sum() of Sales per Customer and Month, so the Group By should include those as is shown above.

All that is left now is to add the target to the table. As we already figured out is that the target is duplicated per row, so if we check the possible answers, using a join and distinct load this can negate this:

Target:
LEFT JOIN (Totals)
LOAD DISTINCT
Customer
, Month
, Target
RESIDENT Sales
;

The script above will load the table shown below and join it to the final table:

If we wouldn’t have used distinct load the table below would have been loaded and joined duplicating rows in the joined table:

This in turn could lead to bad results in the front end, since by the duplication of the rows not only the target would be duplicated but also the rows containing the SalesAmount, which is unwanted.

By combining these two options we would get the end result we are looking for and the answer to this question. First aggregating the values per customer and month, then adding the unique target per customer and month to that table.

Some other things to notice:

  • It is not needed to include all Group By fields into the load. If we only load Customer and sum(Sales) grouped by Customer and Month, we would have gotten the same aggregated results, be it without the Month field, which was needed in this example month to get the appropriate end result.
  • Loading fields and not adding them to the Group By clause won’t work. All fields that are being loaded into the table and are not aggregated, need to be mentioned in the Group By clause. For example; if we would load Customer, Month and Sum(Sales) and only group these on Customer we would receive an error. In the resulting table Qlik can’t aggregate the Sum(Sales), since it doesn’t know how to aggregate on the Month field, without telling to do so, it has only received to Group By Customer.

And then finally, before we round up a quick shoutout to everyone who answered C. This is technically speaking definitely a correct answer. The end results are not wrong and the data model is not bad either:

The two tables are connected on Customer and Month, resulting in a synthetic key, something which in the official documentation is noticed as something unwanted and a possible cause of low performance.

However, in this case the SalesAmount and Target are to be linked to each other trough Customer and Month. So instead of creating a composite key ourselves:

Totals:
LOAD
Customer & ‘-‘ & Month AS %CustomerMonthKEY

Qlik has done this for us in the form of a synthetic key and table. However, since we are preparing for Qlik certification questions, synthetic keys are ruled out. Be assured that in any Qlik certification you will be taking, a synthetic key would never be the correct answer.

 That’s it for this week. See you next Friday? And remember:

  • If you have suggestions for questions, we love to hear from you via WhatsApp or at info@bitmetric.nl
  • If you’re enjoying these questions and want to work on stuff like this every day (but a bit more challenging), we’re always on the lookout for new colleagues. Check our job openings here.

How can we help?

Feel free to contact us if you have any comments or questions.