Tableau Blog Posts Series – Tips,Tricks,Best Practices
Blog three – Data Blending
- When you add the first dimension/measure to the view, the data source that the first dimension/measure belongs to become the primary data source.
- All dimensions should come from the primary data source, all secondary measures are aggregated.
- When to use data blending instead of joining:
- 1. the data source contains too many records for a join to be practical.
- 2. You want to display a summary and details at the same time. (eg, duplicate data source and do a self-blending)
- http://kb.tableausoftware.com/articles/knowledgebase/join-vs-relationship
- http://kb.tableausoftware.com/articles/knowledgebase/summary-and-detail-60
- Data blending is sheet specific, only the data in the view will be linked.
- Primary data source is left join secondary data source.
- For each dimension in primary, there can be only one attribute for each value in that dimension, can create roll up in secondary but on slice.
- Blend a secondary data source that does not have common key with the primary but have common key with another secondary data source: Create a help sheet to group the common key (with the primary) in the secondary data source into the other secondary that does not have common key with the primary.
- no-common key secondary should be the primary in the help sheet
- Drag the common keys of these two secondary and the key you want to group into the no-common key secondary into the view
- Click the common key (with the primary) and choose to create the primary group.
- Then blend data on the primary key group you created.
- http://kb.tableausoftware.com/articles/knowledgebase/how-create-primary-groups-secondary-source?lang=es-es
- There are two drawbacks of this approach:
- 1. The primary group is static instead of dynamic. If there are changes to any fields in the group, you need to recreate the group.
- 2. The performance of blending on group can be very bad since Tableau is adding a case statement into the query to the data source ( the group in Tableau is essential a case statement in the query)
- Use IF FIRST()==0 part to ensure that there is only one mark per partition in order to save resources http://microstrategyexamprep.com/2014/04/13/an-introduction-to-data-blending-part-4-data-blending-design-principles/
- In Data blending, since the join key in a secondary result set may match a subset of the blended result set primary key, portions of the secondary result set may be duplicated across repeated values in the mediated result set. This does not pose risk of double-counting measure values, because all aggregation is performed prior to the join. It is the benefit of using data blending compared to join since join may result in double-counting measure values.
- Blend on Date also need to define relationship for Day,Week,Month, Quarter,Year and so on, if you need to use Day/Week/Month/Quarter/Year on the view instead of the exact date.
- Understand the business question and then choose the right blending keys. Different blending keys may answer different questions. Make sure to use the right lending to answer the right question.
- Domain padding assumption: the complete domain of each individual join key is contained in at least one of the data sources. Why? Because blending does not do outer joins.
- Remove duplicated data after joining tables