Anaomaly Detection adobeanalytisr

March 2, 2021

1650 words 8 mins read
Anaomaly Detection adobeanalytisr

The hope was always that the Anomaly Detection would allow analysts to separate “true signals” from “noise” but that’s been pretty difficult in the Analysis Workspace UI. It has definitly helped ‘identify potential factors that contributed to those signals or anomalies’ but it has fallen short in actually providing the final solution. That’s because anomalies are complex and require context to prove whether the event can be repeated or should just be explained.

The hope of anomaly detection has always the same. Adobe’s documentation expresses it very well.

…it lets you identify which statistical fluctuations matter and which don’t. You can then identify the root cause of a true anomaly. Furthermore, you can get reliable metric (KPI) forecasts.

Unfortunately the reality is that using this tool of statistical analysis can prove to be a lot of wasted time and effort. With that being said, Adobe’s anomaly detection does provide a very powerful opportunity if used correctly.

The current application of Analysis Workspace’s anomaly detection algorithm includes

  1. Support for hourly, weekly, and monthly granularity, in addition to the existing daily granularity.
  2. Awareness of seasonality (such as “Black Friday”) and holidays.

So what does this look like in adobeanalyticsr?

The new adobeanaltyicsr function for anomaly detection, aw_anomaly_report(), is designed to facilitate the principle of “speed to analysis” while fostering better reporting opportunities.

The default function call will return a basic data frame of 7 different columns.

## [1] "day"                 "metric"              "data"               
## [4] "dataExpected"        "dataUpperBound"      "dataLowerBound"     
## [7] "dataAnomalyDetected"

If you request more than one metric it will return a row for each metrica at the granularity level you requested in the function.

For instance, the following function will return this:

aw_anomaly_report(date_range = c("2020-12-01", "2021-03-01"),
                  metrics = c('visits','visitors'))
day metric data dataExpected dataUpperBound dataLowerBound dataAnomalyDetected
2020-12-01 visits 347 214.72423 319.0386 110.4099176 TRUE
2020-12-01 visitors 312 195.45319 283.5220 107.3843914 TRUE
2020-12-02 visits 432 194.90034 299.2147 90.5860230 TRUE
2020-12-02 visitors 384 177.27466 265.3435 89.2058535 TRUE
2020-12-03 visits 356 262.08380 385.2547 138.9129307 FALSE
2020-12-03 visitors 324 242.80209 355.8016 129.8026120 FALSE
2020-12-04 visits 252 160.20426 264.5186 55.8899478 FALSE
2020-12-04 visitors 223 153.12744 241.1962 65.0586389 FALSE
2020-12-05 visits 85 89.85654 194.1709 0.0000000 FALSE
2020-12-05 visitors 76 88.35717 176.4260 0.2883632 FALSE

Notice that each row includes the data, expected, upper bound, and lower bounds calculated for you already. It also includes whether or not the data crossed one of those bounds and was determined to be an anomaly.

For those looking to get to the ‘raw’ data, this should be just what you need to get going. But there are many times that all you are wanting to do is visualize the data or just show the dates that an anomaly was detected. This was my main use case so I created an argument that will help you quickly view the results.

Adding the argument quickView = TRUE to the function call will return a list of 3 items. It will also split these results by the different metrics that were requested, if there are more than 1 in the request.

The following example shows the same function call as above but it includes the quickView = TRUE argument. The list includes:

  1. Data = The raw data just like in the default function but split up by metric if you have requested more than one.
  2. Anoms = The filtered view of the data showing only those rows (by metric) where ‘anomalyDetection = TRUE’.
  3. Viz = A line graph produced using ggplot which includes the error bar, points on the timeline where an anomay was detected, and finally the data shown in a line expanding over the period requested in the date range.
df <- aw_anomaly_report(date_range = c("2020-12-01", "2021-03-01"),
                  metrics = c('visits','visitors'),
                  quickView = TRUE)
df[[1]]$data
## # A tibble: 90 x 7
##    day        metric  data dataExpected dataUpperBound dataLowerBound
##    <date>     <chr>  <dbl>        <dbl>          <dbl>          <dbl>
##  1 2020-12-01 visits   347        215.            319.          110. 
##  2 2020-12-02 visits   432        195.            299.           90.6
##  3 2020-12-03 visits   356        262.            385.          139. 
##  4 2020-12-04 visits   252        160.            265.           55.9
##  5 2020-12-05 visits    85         89.9           194.            0  
##  6 2020-12-06 visits    99         91.1           195.            0  
##  7 2020-12-07 visits   267        230.            341.          119. 
##  8 2020-12-08 visits   314        303.            448.          157. 
##  9 2020-12-09 visits   229        257.            380.          135. 
## 10 2020-12-10 visits   255        330.            485.          175. 
## # … with 80 more rows, and 1 more variable: dataAnomalyDetected <lgl>
df[[1]]$anom
## # A tibble: 4 x 7
##   day        metric  data dataExpected dataUpperBound dataLowerBound
##   <date>     <chr>  <dbl>        <dbl>          <dbl>          <dbl>
## 1 2020-12-01 visits   347         215.           319.          110. 
## 2 2020-12-02 visits   432         195.           299.           90.6
## 3 2020-12-24 visits    67         260.           377.          143. 
## 4 2021-01-05 visits   347         213.           320.          106. 
## # … with 1 more variable: dataAnomalyDetected <lgl>

df[[2]]$data
## # A tibble: 90 x 7
##    day        metric    data dataExpected dataUpperBound dataLowerBound
##    <date>     <chr>    <dbl>        <dbl>          <dbl>          <dbl>
##  1 2020-12-01 visitors   312        195.            284.        107.   
##  2 2020-12-02 visitors   384        177.            265.         89.2  
##  3 2020-12-03 visitors   324        243.            356.        130.   
##  4 2020-12-04 visitors   223        153.            241.         65.1  
##  5 2020-12-05 visitors    76         88.4           176.          0.288
##  6 2020-12-06 visitors    96         88.0           176.          0    
##  7 2020-12-07 visitors   237        218.            322.        114.   
##  8 2020-12-08 visitors   274        279.            411.        147.   
##  9 2020-12-09 visitors   198        222.            326.        118.   
## 10 2020-12-10 visitors   238        275.            402.        148.   
## # … with 80 more rows, and 1 more variable: dataAnomalyDetected <lgl>
df[[2]]$anoms
## # A tibble: 6 x 7
##   day        metric    data dataExpected dataUpperBound dataLowerBound
##   <date>     <chr>    <dbl>        <dbl>          <dbl>          <dbl>
## 1 2020-12-01 visitors   312         195.           284.          107. 
## 2 2020-12-02 visitors   384         177.           265.           89.2
## 3 2020-12-24 visitors    66         227.           326.          128. 
## 4 2021-01-05 visitors   317         180.           268.           91.5
## 5 2021-01-06 visitors   282         180.           269.           91.4
## 6 2021-01-07 visitors   434         252.           379.          126. 
## # … with 1 more variable: dataAnomalyDetected <lgl>

For more on Anomaly Detection in Analysis Workspace check out this video.

I’m always looking for new ways to serve up the anomaly detection data. If you have an idea, make sure to submit an issue for me to work on with you.