Enabling execution of Cartesian Joins in MicroStrategy

Enabling execution of Cartesian Joins in MicroStrategy

Recent years have seen the emergence of self-service and data visualization vendors such as Tableau, Power BI and Qlik Sense that has forced enterprise BI players such as MicroStrategy to improvise their tools to compete with these vendors. I have been a solutions architect in the field of BI been using MicroStrategy for more than 10 years. From my perspective as an architect, I appreciate the power of the underlying SQL engine in MicroStrategy and do not foresee self-service vendors replacing MicroStrategy as an enterprise BI tool in the coming years. What thrills me most as a Solutions Architect in Microstrategy is implementing workaround solutions to meet complex reporting requirements. In other words, I am building additional logic to change the default behavior of the MicroStrategy SQL engine. One example that comes to mind is the effective use of Cartesian joins of dimension tables in certain advanced reporting scenarios. As per conventional best practices, it is not recommended to use Cartesian joins in MicroStrategy.  However, there are certain business cases where the execution of Cartesian joins is required. In this blog, I have illustrated the use of Cartesian joins of dimension tables using two scenarios.

Scenario 1- Accurate analysis across date dimension without the need of fictitious/dummy data

In this scenario, I used a public data set of construction spending data across educational and religious sectors for the year 2016.  I massaged this data set further by deleting construction spending data for the religious sector for the month June 2016.  Shown below is a simple bar chart visualization displaying this scenario.

Construction Spending - Educational

Construction Spending - Religious.PNG

As seen in this graph, since there is no data in religious sector for May 2016, the bars are more spread apart in comparison with the educational sector data. Moreover, reference line calculations such mean, median ignore the fact that there was no construction spending for the religious sector in the month May 2016, thereby misleading the audience. This issue becomes evident if the bar chart is converted in to a line graph.

Construction Spending - Religious line graph 1

Using this graph, we may wrongly conclude that construction spending was consistently high during the middle of the year and slows down during the start and end of year.

This issue can be resolved by entering dummy data in the fact table to ensure all months appear across any other dimension during time series analysis. Nevertheless, not all organizations provide flexibility to manually enter fictitious data in the data warehouse to satisfy certain reporting scenarios. In such cases we may have to implement workaround solutions within the reporting layer to handle these complex issues.

In order to resolve this issue using MicroStrategy, I enabled the VLDB property- ‘Cartesian Join Warning’ to ‘Execute’. I also created an additional dummy metric that serves the same purpose as introduction of dummy data within the data warehouse.  The sql statement for this metric is shown below.

CREATE TEMPORARY TABLE ZZMD01 as

select       a11.month_key  month_key,

a12.sector_key  cat_key,

max(a11.month_key)  WJXBFS1

from         d_month  a11

cross join d_sector  a12

where       a11.month_key between 201601 and 201612

group by  a11.month_key,

a12.sector_key

 

This sql pass is basically a Cartesian join between the dimensions- Month and Sector with an additional filter restricting data to 2016. This sql pass thereby ensures there is a row for all months of 2016 for all sectors irrespective of the actual construction spending data.

The revised bar chart and line chart with the introduction of this metric is shown below.

Construction Spending - Religious 1

 

Construction Spending - Religiious 1- line graphAs seen in the graphs, the revised average line considers the month of May 2016, dropping the average from 311 to 285.

Scenario 2- Month End Reporting

For this scenario, let us consider a hypothetical example of ‘Sales Commissions data’.  A sales man ‘TOM’ belongs to the ‘CENTRAL’ region. On Jan 15, 2017, he moved from ‘CENTRAL’ region to ‘SOUTH EAST’ region. The dimension data corresponding to ‘Tom’ is shown below.

Sales Rep Key Sales Rep ID Sales Rep Region Effective Start Date Effective End Date
100 1 TOM CENTRAL 1/1/2016 1/15/2017
200 1 TOM SOUTH EAST 1/16/2017 12/31/9999

 

As seen above, Type 2 strategy has been used to maintain history for the ‘Sales Rep’ dimension.

The Commissions fact data for this sales rep is shown below.

Sales Rep Key Month Commission Amt
100 Jan-17 2000
200 Jan-17 1000

 

A simple report to show distribution of commissions across regions has been shown below.

Let us make this scenario a bit more complicated. What if the requirement for the month-end report is to allocate all commissions for each month to the region where the sales rep was assigned at the end of the month? To satisfy this requirement, I had to implement a workaround solution in which execution of Cartesian Joins was enabled. In this solution, the month-end region had to be created as a metric.  An alias table for the Type 2 dimension was also created. The sql is shown below.

CREATE TEMPORARY TABLE ZZMD01 as

select       a11.sls_rep_id  sls_rep_id,

a12.month_key  month_key,

max(a11.sls_rep_nm)  WJXBFS1,

max(a11.sls_rep_region)  WJXBFS2

from         d_salesrep                 a11

cross join d_month a12

where       (((a11.sls_rep_key)

in              (select      r12.sls_rep_key

from         d_month  r11

cross join d_salesrep                 r12

where       r11.month_end_date between r12.effective_start_dttm and r12.effective_end_dttm

group by  r12.sls_rep_key))

and a12.month_key = 201701)

group by  sls_rep_id,

a12.month_key

 

This sql pass is basically a Cartesian join between the dimensions- Month and Sales Rep identifying all sales rep versions that are active for the date- ‘201701’.

The revised report results is shown below.

As seen in both the scenarios, execution of Cartesian joins can prove to be very useful to meet complex reporting scenarios. What was used in both these scenarios was essentially a non-equi join with additional where clauses.  It is recommended though that enabling execution of Cartesian joins be enabled only for certain complex reports. Enabling this setting at an overall project level can result in incorrect results and performance degradation especially when the Cartesian joins are implemented without any filters.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s