Notes from the Kölner R meeting, 26 June 2015
R-bloggers 2015-07-01
Summary:
Last Friday the Cologne R user group came together for the 14th time, and for the first time we met at Startplatz, a start-up incubator venue. The venue was excellent, not only did they provide us with a much larger room, but also with the whole infrastructure, including table-football and drinks. Many thanks to Kirill for organising all of this! Photo: Günter Faes We had two excellent advanced talks. Both were very informative and well presented.
Data Science at the Command Line
Kirill Pomogajko showed us how he uses various command line tools to pre-process log-files for further analysis with R. Photo: Günter FaesImagine you have several servers that generate large data sets with no standard delimiters, like the example below. The columns appear to be separated by a blank at first glance, but the second column has strings such as "Air Force". Furthermore, other columns have missing data and another uses speech-marks. Thus, it's messy and difficult to read into R. To solve the problem Kirill developed a Makefile that uses tools such asscp
, sed
and awk
to download and clean the server files. Kirill's tutorial files are available via GitHub.