AlienVault Longitudinal Study Part 4

chartsnthings 2014-06-09

Summary:

In Part 1 we looked atacquiring raw data, and wrangling it into a time series dataset. InPart 2 we looked attypes of threats in the time series. In Part 3we looked at countries. Now we will examine countries and types incombination in the AlienVault reputation database.

Just as we shaped our dataset for better understanding in previous posts, we willuse the top 25 countries in the cctab table, and ignore address types with a compounddescription, i.e. with a semicolon in the type field. Here is the SQL to make ourcore dataset for this post:

-- Create table with countries and type summariesDROP TABLE IF EXISTS avcctyp;CREATE TABLE avcctyp (    d         DATE,    cc        CHAR(2),    type      CHAR(50),    avgrisk   DECIMAL(4,2),    avgrel    DECIMAL(4,2),    minrisk   INTEGER UNSIGNED,    minrel    INTEGER UNSIGNED,    maxrisk   INTEGER UNSIGNED,    maxrel    INTEGER UNSIGNED,    count     INTEGER UNSIGNED);-- Insert countries, types, and average, min, max scores INSERT INTO avcctypSELECT d, cc, type,       AVG(risk) AS avgrisk, AVG(rel) AS avgrel,       MIN(risk) AS minrisk, MIN(rel) AS minrel,       MAX(risk) AS maxrisk, MAX(rel) AS maxrel,       COUNT(*) AS count  FROM avtrack, avip WHERE avtrack.iind=avip.iindGROUP BY d, cc, type;-- Delete compound types (with semicolon) and non-top-25 countries DELETE FROM avcctyp WHERE type LIKE '%;%';DELETE FROM avcctyp WHERE cc NOT IN (SELECT * FROM cctab);

Let us now load a data frame in R with the country and

Link:

http://feedproxy.google.com/~r/RBloggers/~3/FElEy3Yv9Fc/

From feeds:

Statistics and Visualization » R-bloggers

Tags:

r bloggers

Authors:

Steve Patton (@spttnnh)

Date tagged:

06/09/2014, 14:20

Date published:

06/09/2014, 09:00