19 October 2022 Qlik subset load with SAMPLE and FIRST Share this message Every Friday at Bitmetric we’re posting a new Qlik certification practice question to our LinkedIn company page. Last Friday we asked the following Qlik Data Architect certification practice question about Qlik subset creation in load scripts with the FIRST and SAMPLE statements: While many people knew about loading data subsets using First, it was clear that the Sample prefix is a lot less known. The correct answer is D: Random 10% of total rows are loaded from Customer.QVD Arguably one of the most encountered problems during development is long loading times of (very) large datasets. In order to speed up development it is obviously nice to be able to create a subset in the load of the dataset. There are various ways of doing this, but looking at the comments and answers to the question, First is one which came to mind by many. First is a prefix which limits the load to a specified number of rows. For example First 1000 will limit the load to the first 1000 rows from the table. Making the load and thus the development time quicker. So what is the catch? However, in some cases it could be that loading the first rows doesn’t cut it. For example: The first rows of the dataset are filled with data of questionable quality (empty rows or test data) Data is sorted on alphabet and only name records starting with A are loaded Dates. The data could be sorted on entrydate and you only load the first dates from the dataset Especially that last example makes sense if we are looking into front-end results. How do we compare year to date measures or point in time reporting if we only load a few dates? Take the following example dashboard from a car sales company. In this example we have used First 1000 as a prefix. We can clearly see that there is data in the dashboard, but that most of the charts and KPI objects are not really telling us much. In this way we cannot check whether the measures are performing as expected. Now using an equivalent sized Sample load, in this case using Sample (0.1) as prefix we get the following result: This dashboards speaks more to the imagination, giving us a better look at all the various data. Since we now have data randomly picked from the total dataset we can immediately see that year to date and in time measures are performing as expected. Using the SAMPLE prefix in Qlik The sample prefix has the following syntax: Sample p ( loadstatement | selectstatement ) The p argument is to indicate the probability that a record is read. It is optional to put this amount in parentheses. So looking at the example from the question we can tell that a random 10% is added to the loaded dataset. Things to keep in mind? Sample will look through the entire dataset. First will limit it to a fixed row number and stop. This means that sample could potentially still be slower. Every time you reload again the sample set will change. Keep that in mind. Each row is evaluated separately. So with Sample (0.1) each row will have 10% chance to end up in the resulting table. This means that if you load a dataset consisting of 1000 rows, it will on average load 100 rows into the table, but for each iteration this could deviate a little bit, since it will go through the total dataset and evaluate each row on 10%. Sample must be a value between 0 and 1. Combining First and Sample is possible. Wait what? Combining the FIRST and SAMPLE prefixes Yes it is possible to combine both prefixes. Should you and what does it do? Let’s have a look at the results first to understand what is happening. The following load script is used: FIRST 1000SAMPLE (0.1)LOAD*FROM [LIB:\\Customers.qvd](qvd); After this load we also have loaded Sample (0.2), Sample (0.3) and so on, untill Sample (1). If we then check the resulting data models record and row numbers we see the following results: As we can see the load is limited to 1000 rows, but still random sampled. There is no difference in order of these prefixes. So it doesn’t matter if we use first and then sample or sample and then first. However the logical conclusion of First 1000 Sample (0.1) would be that Qlik would load the first 1000 rows and then samples 10% of these, giving us probably 100 rows as result. But this is not the case. When combining first and sample the order doesn’t matter. Qlik will always evaluate sample first and then limit this load to the first 1000 samples. As we can see in the table above this is as expected. Sample(1) will result in a 100% chance for each row to return. This gradually moves to a bigger sample with how less the chance is a record is added. So sample (0.1) means that each row has a 10% chance of being added to the data model. First 1000 tells the load script to stop at row 1000, which could be a higher record number because of the chance of addition. So should you use this? All depends on the size of the data sets and what you wish to evaluate. But if you want to do a over time analysis on a subset of data, the usage of sample should be sufficient. That’s it for this week. See you next Friday? Take your Qlik skills to the next level! If you enjoy Qlik, you’ll love the Masters Summit for Qlik. 3 days of Qlik deep dives, peer networking, ready to use resources and much more! See you there! Test your Qlik knowledge! Check out the Friday Qlik Test Prep archive for more Qlik questions and answers. Friday Qlik Test Prep Solution How can we help? Feel free to contact us if you have any comments or questions. Call us Mail us 21 September 2023 Qlik Sense Enterprise for Windows – Critical Security Fix #2 A new security issue in Qlik Sense Enterprise for Windows has been identified. This resolves an incomplete fix for the vulnerability that was reported earlier this month. Patches are available and it is recommended to update your Qlik Sense Enterprise for Windows environment at the earliest possibility. New Release Qlik Security 6 September 2023 Battle Ships & Big Data: join us for some fun! Unleash the Power of Data in the Ultimate Battle Ship Game! Join us at the Big Data Expo for a thrilling experience that blends strategy, competition, and analytics. Discover how data influences gameplay and be part of our research! Plus, dive into the world of data and analytics with the experts from Bitmetric. Don’t miss out! See you at booth 46! Bitmetric Event Qlik Team 29 August 2023 Qlik Sense Enterprise for Windows – Critical Security Fixes Two security issues in Qlik Sense Enterprise for Windows have been identified and patches made available. Details can be found in Security Bulletin Critical Security fixes for Qlik Sense Enterprise for Windows (CVE-2023-41266, CVE-2023-41265). New Release Qlik Security
21 September 2023 Qlik Sense Enterprise for Windows – Critical Security Fix #2 A new security issue in Qlik Sense Enterprise for Windows has been identified. This resolves an incomplete fix for the vulnerability that was reported earlier this month. Patches are available and it is recommended to update your Qlik Sense Enterprise for Windows environment at the earliest possibility. New Release Qlik Security
6 September 2023 Battle Ships & Big Data: join us for some fun! Unleash the Power of Data in the Ultimate Battle Ship Game! Join us at the Big Data Expo for a thrilling experience that blends strategy, competition, and analytics. Discover how data influences gameplay and be part of our research! Plus, dive into the world of data and analytics with the experts from Bitmetric. Don’t miss out! See you at booth 46! Bitmetric Event Qlik Team
29 August 2023 Qlik Sense Enterprise for Windows – Critical Security Fixes Two security issues in Qlik Sense Enterprise for Windows have been identified and patches made available. Details can be found in Security Bulletin Critical Security fixes for Qlik Sense Enterprise for Windows (CVE-2023-41266, CVE-2023-41265). New Release Qlik Security