In the AWS Glue console, set up a
crawler and name it CDR_CRAWLER
.
Point the crawler to s3://telco-dest-bucket/blog
where the Parquet CDR data
resides.
Next, create a new IAM role to be used by the AWS Glue crawler.
For Frequency, leave the default value Run on Demand
.
Next, choose Add database and define the name of the database. This database contains the table discovered by the AWS Glue crawler.
Choose next and review the crawler settings.
When you’re satisfied, choose Finish.
Next, choose Crawlers, select the crawler that you just created (CDR_CRAWLER
),
and choose Run crawler.
The AWS Glue crawler starts crawling the database. This can take one minute or more to complete.
When it’s complete, under Data catalog, choose Databases. You should be able to
see the new database created by the AWS Glue crawler.
In this case, the name of the database is blog
.
To view the tables created under this database, select the relevant database and choose Tables. The crawler’s table also points to the location of the Parquet format CDRs.
To see the table’s schema, select the table created by the crawler.
Next, build out anomaly detection using Amazon QuickSight. To get started, follow these steps.
Under visual types, choose Line chart.
Drag call_service_duration
to the Value field well.
Drag Date to the X axis field well.
Amazon QuickSight generates a dashboard, as in the following screenshot.
On the x-axis it is represented the full Date at which the call happened. By default, its aggregation window is 1 day. This can be changed by choosing a different value.
Because we currently define the Date to look on one-day aggregations, the call duration is a sum of all call durations from all call records within a day. We can begin the search by looking for days where the total call duration is high.
Now look at how to start using the ML insights anomaly detection feature.
A-number
calling multiple B-numbers
or multiple A-numbers
calling
B-numbers
.
The categories represent the dimensional values by which Amazon QuickSight splits the metric. For example, you can analyze anomalies on sales across all product categories and product SKUs — assuming there are 10 product categories, each with 10 product SKUs. Amazon QuickSight splits the metric by the 100 unique combinations and runs anomaly detection on each of the split metric.
On the anomaly detection configuration screen, set up the following options:
After setting the configuration, choose Run Now to execute the job manually, which includes the “Detecting anomalies… This may take a while…” message. Depending on the size of your dataset, this may take a few minutes or up to an hour.
When the anomaly detection job is complete, any anomalies are called out in the insights visual. By default, only the top anomalies for the latest time period in the data are shown in the insights visuals.
Anomaly detection reveals several B numbers being called from multiple A numbers
with a high call service duration on August 29, 2018. That looks interesting!
1. To explore all anomalies for this insight, select the menu on the top-right
corner of the visual and choose Explore Anomalies.
1. On the Anomalies detailed page, you can see all the anomalies for the latest
period.
In the view, you can see that two anomalies were detected, showing two time series.The title of the visuals represents the metric that is run on the unique combination of the categorical fields. In this case:
[All] | 9645000024
3512000024 | [ALL]
So the system detected anomalies for multiple A-numbers calling 9645000024
,
and 351200024
calling multiple B numbers. In both cases, it observed a high
call duration. The labeled data point on the chart represents the most recent
anomaly that is detected for that time series.
Important
The first 32 points in the dataset are used for training and are not scored by the anomaly detection algorithm. You may not see any anomalies on the first 32 data points.You can expand the filter controls on the top of the screen. With the filter controls, you can change the anomaly threshold to show high, medium, or low significance anomalies.
You can choose to show only anomalies that are higher than expected or lower than expected. You can also filter by the categorical values that are present in your dataset to look at anomalies only for those categories.
Look at the contributors columns. When you configured the anomaly detection,
you defined the accounting ID as another dimension. If this were real call
traffic instead of practice data, you would be able to single out specific
accounting IDs that contribute to the anomaly.
When you’re done, choose Back to analysis.