22 June 2022

Using the GROUP BY clause in Qlik Sense

Share this message
Using the GROUP BY clause in Qlik Sense and QlikView

Every Friday at Bitmetric we’re posting a new Qlik certification practice question to our LinkedIn company page. Last Friday we asked the following Qlik Data Architect certification practice question about the GROUP BY clause in Qlik:

A Data Architect receives the table shown in figure 1. The current front end solution shows the wrong results. The request is to clean this table in the script so that the front end shows the total sales per customer by month compared to the target per month. Which solution should be used?

The correct answer is D.

The problem stated in this question was that the data in the front end showed wrong results. By examining the table in Figure 1 we can quickly determine that the problem lies in the Target. This was duplicated per row for each Customer and Month. So when using a Sum(Sales) and Sum(Target) in the frontend, Customer A in January would end up with a total sales amount of 41 and a total target amount of 100, which should have been 50. It was requested to fix this in the script.

To create the solution in the script we need to aggregate the sales for each customer for each month while loading the table. This should be done by using Sum() as the aggregation function. However just using Sum() in the load is not enough, when making aggregations we need to tell Qlik what to aggregate them by. This is done using the Group By clause.

So let’s have a look at how the final table should be loaded into the data model and which answer provides this.

The source table for this question.

It is apparent that we need to sum the sales per customer and month. This is done by adding the following script:

Totals:
LOAD
Customer
, Month
, Sum(Sales) As SalesAmount
Resident Sales
GROUP BY
Customer
, Month
;

Aggregating values while loading a table in the script always has to be accompanied by the Group By clause. In this case we need the Sum() of Sales per Customer and Month, so the Group By should include those as is shown above.

All that is left now is to add the target to the table. As we already figured out is that the target is duplicated per row, so if we check the possible answers, using a join and distinct load this can negate this:

Target:
LEFT JOIN (Totals)
LOAD DISTINCT
Customer
, Month
, Target
RESIDENT Sales
;

The script above will load the table shown below and join it to the final table:

The target table

If we wouldn’t have used distinct load the table below would have been loaded and joined duplicating rows in the joined table:

The target table, if we hadn't used DISTINCT.

This in turn could lead to bad results in the front end, since by the duplication of the rows not only the target would be duplicated but also the rows containing the SalesAmount, which is unwanted.

By combining these two options we would get the end result we are looking for and the answer to this question. First aggregating the values per customer and month, then adding the unique target per customer and month to that table.

Some other things to notice:

  • It is not needed to include all Group By fields into the load. If we only load Customer and sum(Sales) grouped by Customer and Month, we would have gotten the same aggregated results, be it without the Month field, which was needed in this example month to get the appropriate end result.
  • Loading fields and not adding them to the Group By clause won’t work. All fields that are being loaded into the table and are not aggregated, need to be mentioned in the Group By clause. For example; if we would load Customer, Month and Sum(Sales) and only group these on Customer we would receive an error. In the resulting table Qlik can’t aggregate the Sum(Sales), since it doesn’t know how to aggregate on the Month field, without telling to do so, it has only received to Group By Customer.

And then finally, before we round up a quick shoutout to everyone who answered C. This is technically speaking definitely a correct answer. The end results are not wrong and the data model is not bad either:

Synthetic key in Qlik Sense

The two tables are connected on Customer and Month, resulting in a synthetic key, something which in the official documentation is noticed as something unwanted and a possible cause of low performance.

However, in this case the SalesAmount and Target are to be linked to each other trough Customer and Month. So instead of creating a composite key ourselves:

Totals:
LOAD
Customer & ‘-‘ & Month AS %CustomerMonthKEY

Qlik has done this for us in the form of a synthetic key and table. However, since we are preparing for Qlik certification questions, synthetic keys are ruled out. Be assured that in any Qlik certification you will be taking, a synthetic key would never be the correct answer.

That’s it for this week. See you next Friday?

Take your Qlik skills to the next level!

Since 2013, the Masters Summit for Qlik is the premier advanced training for Qlik. Join us in Vienna and take your Qlik skills to the next level.

Join the Bitmetric team!

Join the team!

Enjoying these challenges? We have many more, and we’ll even pay you to solve them 😉 Check out our job opening.

Friday Qlik Test Prep Performance Script Solution

How can we help?

Barry has over 20 years experience as a Data & Analytics architect, developer, trainer and author. He will gladly help you with any questions you may have.