精品丨DAX性能优化问题
BOSS:那个,白茶啊,这个报表刷新有点慢啊,你看,每次我点筛选或者刷新都会让我等很久。
白茶:(认真脸)BOSS,那您再等会就好!
…
BOSS:白茶!!两小时了!!还没出来!!
白茶:(思考)老板,这个有点难啊,这个问题技术要求比较高。
BOSS:加钱!!!!
白茶:好嘞!
一张好的报表是如何界定的?DAX计算无误、前端展现明了、业务思路清晰、报表响应速度,白茶觉得这些因素就可以界定一张好的报表。
本期我们来聊一聊PowerBI中DAX函数性能优化的问题。
毕竟一张可视化报表需要15分钟刷新才能呈现出来,这对用户来说太不友好了。
先来看看本期的示例文件:
一张产品维度表,一张销售明细表。
需求是什么?
这张是销售明细表中的分店维度信息,为了便于小伙伴理解,白茶单独整理出来。
这张表是需求的计算逻辑图。什么意思呢?就是当Key小于15时,计算每个Key对应的分店,当Key大于14时,根据计算逻辑对不同的分店进行汇总计算。
编写基础的DAX计算代码:
SalesAmount =
SUMX ( 'Fact_SalesDetail', [Quantity] * RELATED ( Dim_Product[SalesAmount] ) )
在不考虑性能的情况下,DAX计算逻辑如下:
SalesAmountByDisplay =
SUMX (
'Dim_DisplayDepartment',
SWITCH (
TRUE (),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 1,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 1 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 2,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 2 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 3,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 3 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 4,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 4 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 5,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 5 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 6,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 6 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 7,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 7 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 8,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 8 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 9,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 9 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 10,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 10 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 11,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 11 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 12,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 12 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 13,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 13 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 14,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 14 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 15,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 13 )
)
+ CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 14 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 16,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 1 )
)
+ CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 7 )
)
+ CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 9 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 17,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 1 )
)
+ CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 2 )
)
+ CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 3 )
)
+ CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 11 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 18,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 9 )
)
+ CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 11 )
)
+ CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 12 )
)
- CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 2 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 19,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 4 )
)
+ CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 7 )
),
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] ) = 20, CALCULATE ( [SalesAmount], ALLSELECTED ( Fact_SalesDetail ) )
)
)
相信不用白茶多说,小伙伴也能看出来代码的问题,太长了。
这段代码功能基本上是实现了,问题点有哪些呢?
1.SELECTEDVALUE复用度较高,可以使用变量代替
2.多个条件汇总迭代次数较多,可以使用提供List
这也是小伙伴常见的问题,如果DAX的构建可以绕开这两个问题,那么性能会有很大的提升。
简化版写法:
SalesAmountByDisplay2 =
VAR CurrentDepartmentKey =
SELECTEDVALUE ( Dim_DisplayDepartment[DepartmentKey] )
VAR CurrentDepartment =
TREATAS (
VALUES ( Dim_DisplayDepartment[DepartmentKey] ),
Fact_SalesDetail[DepartmentKey]
)
VAR Results =
SWITCH (
TRUE (),
CurrentDepartmentKey = 15,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] IN { 13, 14 } )
),
CurrentDepartmentKey = 16,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] IN { 1, 7, 9 } )
),
CurrentDepartmentKey = 17,
CALCULATE (
[SalesAmount],
FILTER (
'Fact_SalesDetail',
'Fact_SalesDetail'[DepartmentKey] IN { 1, 2, 3, 11 }
)
),
CurrentDepartmentKey = 18,
CALCULATE (
[SalesAmount],
FILTER (
'Fact_SalesDetail',
'Fact_SalesDetail'[DepartmentKey] IN { 9, 11, 12 }
)
)
- CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 2 )
),
CurrentDepartmentKey = 19,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] IN { 4, 7 } )
),
CurrentDepartmentKey = 20, CALCULATE ( [SalesAmount], ALLSELECTED ( Fact_SalesDetail ) ),
CALCULATE ( [SalesAmount], CurrentDepartment )
)
RETURN
Results
相较于之前的写法,这个写法通过VAR定义变量,和使用TREATAS来减少代码计算逻辑的书写。
通过提供List来减少迭代遍历的次数。
那么有没有继续可以优化的空间?有的。
优化写法:
SalesAmountByDisplay3 =
VAR CurrentDetail =
ADDCOLUMNS (
DISTINCT ( 'Dim_DisplayDepartment' ),
"@CurrentValue",
VAR CurrentDepartmentKey = 'Dim_DisplayDepartment'[DepartmentKey]
RETURN
SWITCH (
TRUE (),
CurrentDepartmentKey = 15,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] IN { 13, 14 } )
),
CurrentDepartmentKey = 16,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] IN { 1, 7, 9 } )
),
CurrentDepartmentKey = 17,
CALCULATE (
[SalesAmount],
FILTER (
'Fact_SalesDetail',
'Fact_SalesDetail'[DepartmentKey] IN { 1, 2, 3, 11 }
)
),
CurrentDepartmentKey = 18,
CALCULATE (
[SalesAmount],
FILTER (
'Fact_SalesDetail',
'Fact_SalesDetail'[DepartmentKey] IN { 9, 11, 12 }
)
)
- CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] = 2 )
),
CurrentDepartmentKey = 19,
CALCULATE (
[SalesAmount],
FILTER ( 'Fact_SalesDetail', 'Fact_SalesDetail'[DepartmentKey] IN { 4, 7 } )
),
CurrentDepartmentKey = 20, CALCULATE ( [SalesAmount], ALLSELECTED ( Fact_SalesDetail ) ),
CALCULATE (
[SalesAmount],
'Fact_SalesDetail'[DepartmentKey] = CurrentDepartmentKey
)
)
)
RETURN
SUMX ( CurrentDetail, [@CurrentValue] )
这种写法,相较于上一种书写量多一些,通过定义虚拟表来减少迭代遍历的次数。从理论上来说,因为定义了虚拟表,无论源代码需要对事实表迭代多少次,这个思路迭代的永远都是虚拟表,优化度很高。
我们来对比一下:
DAX Studio测试:
三种写法的差距很明显。其实这里有一些争议的,简化写法是通过减少资源占用来实现优化,优化写法是通过减少迭代遍历实现优化。
从测试结果上来看,是简化写法优化度较高,但是在实际应用中,测试发现优化写法的方式响应更迅速。
你以为到这里就结束了么?
其实还有第四种优化的思路,只不过这个思路比较难。
SalesAmountByDisplay4 =
SUMX (
VALUES ( 'Dim_DisplayDepartment'[DepartmentKey] ),
VAR CurDpmKey = 'Dim_DisplayDepartment'[DepartmentKey]
VAR TempTable =
FILTER (
ALL ( 'Dim_DisplayDepartment'[DepartmentKey] ),
'Dim_DisplayDepartment'[DepartmentKey] <= 14
)
VAR AllDetail =
ADDCOLUMNS (
TempTable,
"SalesAmount",
VAR CurrentDepartment = 'Dim_DisplayDepartment'[DepartmentKey]
RETURN
CALCULATE (
[SalesAmount],
ALL ( 'Dim_DisplayDepartment' ),
'Fact_SalesDetail'[DepartmentKey] = CurrentDepartment
)
)
VAR FilterContent =
CALCULATE (
MAX ( 'Dim_ComputationalLogic'[FilterContent] ),
ALL ( Dim_DisplayDepartment ),
'Dim_ComputationalLogic'[DepartmentKey] = CurDpmKey
)
VAR Length =
LEN ( FilterContent )
VAR FilterTable =
SELECTCOLUMNS (
GENERATESERIES ( 1, Length, 4 ),
"DepmKey", MID ( FilterContent, [Value], 3 ) * 1
)
VAR Result =
SUMX (
FilterTable,
VAR DpmKey = [DepmKey]
VAR SalesValue =
SUMMARIZE (
FILTER ( AllDetail, 'Dim_DisplayDepartment'[DepartmentKey] = ABS ( DpmKey ) ),
[SalesAmount]
)
RETURN
IF ( DpmKey >= 0, SalesValue, - SalesValue )
)
RETURN
Result
)
--作者:夕枫
这个优化的思路,是**@夕枫**大佬提出来的。通过定义计算表,减少代码书写量,使用ALL减少上下文转换的消耗,减少查询次数,命中缓存。
DAX Studio测试:
总结一下:
1.可以通过变量和定义表来减少代码书写量
2.可以通过减少资源调用优化
3.可以通过虚拟表减少迭代遍历
4.可以通过命中缓存进行优化
比较常用的是前三种,第四种难度系数较高。
往期推荐:
小伙伴们❤GET了么?
(白茶:别问我第四种,我不会TAT)
这里是白茶,一个PowerBI的初学者。
数据分析进阶之路,带你深入了解可视化技巧。