Anaomaly Detection adobeanalytisr
The hope was always that the Anomaly Detection would allow analysts to separate “true signals” from “noise” but that’s been pretty difficult in the Analysis Workspace UI. It has definitly helped ‘identify potential factors that contributed to those signals or anomalies’ but it has fallen short in actually providing the final solution. That’s because anomalies are complex and require context to prove whether the event can be repeated or should just be explained.
The hope of anomaly detection has always the same. Adobe’s documentation expresses it very well.
…it lets you identify which statistical fluctuations matter and which don’t. You can then identify the root cause of a true anomaly. Furthermore, you can get reliable metric (KPI) forecasts.
Unfortunately the reality is that using this tool of statistical analysis can prove to be a lot of wasted time and effort. With that being said, Adobe’s anomaly detection does provide a very powerful opportunity if used correctly.
The current application of Analysis Workspace’s anomaly detection algorithm includes
- Support for hourly, weekly, and monthly granularity, in addition to the existing daily granularity.
- Awareness of seasonality (such as “Black Friday”) and holidays.
So what does this look like in adobeanalyticsr?
The new adobeanaltyicsr function for anomaly detection, aw_anomaly_report()
, is designed to facilitate the principle of “speed to analysis” while fostering better reporting opportunities.
The default function call will return a basic data frame of 7 different columns.
## [1] "day" "metric" "data"
## [4] "dataExpected" "dataUpperBound" "dataLowerBound"
## [7] "dataAnomalyDetected"
If you request more than one metric it will return a row for each metrica at the granularity level you requested in the function.
For instance, the following function will return this:
aw_anomaly_report(date_range = c("2020-12-01", "2021-03-01"),
metrics = c('visits','visitors'))
day | metric | data | dataExpected | dataUpperBound | dataLowerBound | dataAnomalyDetected |
---|---|---|---|---|---|---|
2020-12-01 | visits | 347 | 214.72423 | 319.0386 | 110.4099176 | TRUE |
2020-12-01 | visitors | 312 | 195.45319 | 283.5220 | 107.3843914 | TRUE |
2020-12-02 | visits | 432 | 194.90034 | 299.2147 | 90.5860230 | TRUE |
2020-12-02 | visitors | 384 | 177.27466 | 265.3435 | 89.2058535 | TRUE |
2020-12-03 | visits | 356 | 262.08380 | 385.2547 | 138.9129307 | FALSE |
2020-12-03 | visitors | 324 | 242.80209 | 355.8016 | 129.8026120 | FALSE |
2020-12-04 | visits | 252 | 160.20426 | 264.5186 | 55.8899478 | FALSE |
2020-12-04 | visitors | 223 | 153.12744 | 241.1962 | 65.0586389 | FALSE |
2020-12-05 | visits | 85 | 89.85654 | 194.1709 | 0.0000000 | FALSE |
2020-12-05 | visitors | 76 | 88.35717 | 176.4260 | 0.2883632 | FALSE |
Notice that each row includes the data, expected, upper bound, and lower bounds calculated for you already. It also includes whether or not the data crossed one of those bounds and was determined to be an anomaly.
For those looking to get to the ‘raw’ data, this should be just what you need to get going. But there are many times that all you are wanting to do is visualize the data or just show the dates that an anomaly was detected. This was my main use case so I created an argument that will help you quickly view the results.
Adding the argument quickView = TRUE
to the function call will return a list of 3 items. It will also split these results by the different metrics that were requested, if there are more than 1 in the request.
The following example shows the same function call as above but it includes the quickView = TRUE
argument. The list includes:
- Data = The raw data just like in the default function but split up by metric if you have requested more than one.
- Anoms = The filtered view of the data showing only those rows (by metric) where ‘anomalyDetection = TRUE’.
- Viz = A line graph produced using ggplot which includes the error bar, points on the timeline where an anomay was detected, and finally the data shown in a line expanding over the period requested in the date range.
df <- aw_anomaly_report(date_range = c("2020-12-01", "2021-03-01"),
metrics = c('visits','visitors'),
quickView = TRUE)
df[[1]]$data
## # A tibble: 90 x 7
## day metric data dataExpected dataUpperBound dataLowerBound
## <date> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2020-12-01 visits 347 215. 319. 110.
## 2 2020-12-02 visits 432 195. 299. 90.6
## 3 2020-12-03 visits 356 262. 385. 139.
## 4 2020-12-04 visits 252 160. 265. 55.9
## 5 2020-12-05 visits 85 89.9 194. 0
## 6 2020-12-06 visits 99 91.1 195. 0
## 7 2020-12-07 visits 267 230. 341. 119.
## 8 2020-12-08 visits 314 303. 448. 157.
## 9 2020-12-09 visits 229 257. 380. 135.
## 10 2020-12-10 visits 255 330. 485. 175.
## # … with 80 more rows, and 1 more variable: dataAnomalyDetected <lgl>
df[[1]]$anom
## # A tibble: 4 x 7
## day metric data dataExpected dataUpperBound dataLowerBound
## <date> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2020-12-01 visits 347 215. 319. 110.
## 2 2020-12-02 visits 432 195. 299. 90.6
## 3 2020-12-24 visits 67 260. 377. 143.
## 4 2021-01-05 visits 347 213. 320. 106.
## # … with 1 more variable: dataAnomalyDetected <lgl>
df[[2]]$data
## # A tibble: 90 x 7
## day metric data dataExpected dataUpperBound dataLowerBound
## <date> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2020-12-01 visitors 312 195. 284. 107.
## 2 2020-12-02 visitors 384 177. 265. 89.2
## 3 2020-12-03 visitors 324 243. 356. 130.
## 4 2020-12-04 visitors 223 153. 241. 65.1
## 5 2020-12-05 visitors 76 88.4 176. 0.288
## 6 2020-12-06 visitors 96 88.0 176. 0
## 7 2020-12-07 visitors 237 218. 322. 114.
## 8 2020-12-08 visitors 274 279. 411. 147.
## 9 2020-12-09 visitors 198 222. 326. 118.
## 10 2020-12-10 visitors 238 275. 402. 148.
## # … with 80 more rows, and 1 more variable: dataAnomalyDetected <lgl>
df[[2]]$anoms
## # A tibble: 6 x 7
## day metric data dataExpected dataUpperBound dataLowerBound
## <date> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2020-12-01 visitors 312 195. 284. 107.
## 2 2020-12-02 visitors 384 177. 265. 89.2
## 3 2020-12-24 visitors 66 227. 326. 128.
## 4 2021-01-05 visitors 317 180. 268. 91.5
## 5 2021-01-06 visitors 282 180. 269. 91.4
## 6 2021-01-07 visitors 434 252. 379. 126.
## # … with 1 more variable: dataAnomalyDetected <lgl>
For more on Anomaly Detection in Analysis Workspace check out this video.
I’m always looking for new ways to serve up the anomaly detection data. If you have an idea, make sure to submit an issue for me to work on with you.