AlienVault Longitudinal Study Part 4
chartsnthings 2014-06-09
Summary:
In Part 1 we looked atacquiring raw data, and wrangling it into a time series dataset. InPart 2 we looked attypes of threats in the time series. In Part 3we looked at countries. Now we will examine countries and types incombination in the AlienVault reputation database.
Just as we shaped our dataset for better understanding in previous posts, we willuse the top 25 countries in the cctab table, and ignore address types with a compounddescription, i.e. with a semicolon in the type field. Here is the SQL to make ourcore dataset for this post:
-- Create table with countries and type summariesDROP TABLE IF EXISTS avcctyp;CREATE TABLE avcctyp ( d DATE, cc CHAR(2), type CHAR(50), avgrisk DECIMAL(4,2), avgrel DECIMAL(4,2), minrisk INTEGER UNSIGNED, minrel INTEGER UNSIGNED, maxrisk INTEGER UNSIGNED, maxrel INTEGER UNSIGNED, count INTEGER UNSIGNED);-- Insert countries, types, and average, min, max scores INSERT INTO avcctypSELECT d, cc, type, AVG(risk) AS avgrisk, AVG(rel) AS avgrel, MIN(risk) AS minrisk, MIN(rel) AS minrel, MAX(risk) AS maxrisk, MAX(rel) AS maxrel, COUNT(*) AS count FROM avtrack, avip WHERE avtrack.iind=avip.iindGROUP BY d, cc, type;-- Delete compound types (with semicolon) and non-top-25 countries DELETE FROM avcctyp WHERE type LIKE '%;%';DELETE FROM avcctyp WHERE cc NOT IN (SELECT * FROM cctab);
Let us now load a data frame in R with the country and