CSV data manipulation in Bash

I used the akw, join and pr to do some csv data manipulation in bash.

I was in the desperate need for converting the csv file of my actual time tracking tool Toggl to a different format I can import. To relieve my weekend pain of manually editing spreadsheets, I wrote a bash script to do this for me.

This is the csv export of Toggl:

User,Email,Client,Project,Task,Description,Billable,Start date,Start time,End date,End time,Duration,Tags,Amount ()
stefan,mail@mail.com,goodclient,299 3021 mytask,"","",No,2013-04-29,08:24:08,2013-04-29,09:18:58,00:54:50,,

This is the format I need:

date;id;customer;project;task;start;end;duration;text
05/02/2013;55;299;299;201;08:17;12:16;03.59;text

There is some information I can exclude, some information I need to reformat (eg. time, date) and add some extra stuff like the client id. Here is the script.

[code language=”bash”]
#!/bin/bash

awk -F’,’ ‘NR==1 {$15="Mitarbeiter"} NR>1 {split($8,a,"-");$8=a[2]"/"a[3]"/"a[1];$15="55"}1′ OFS=’;’ input.csv > a.csv;
awk -F’,’ ‘NR==1 {print "Projekt Tät"} NR>1 {print $4}’ input.csv | awk ‘{print $1,$2,""}’ OFS=’;’ > b.csv;
awk -F’,’ ‘NR==1 {print "Von"} NR>1 {print $9}’ input.csv | awk -F’:’ ‘{print $1,$2;}’ OFS=":" > c.csv;
awk -F’,’ ‘NR==1 {print "Bis"} NR>1 {print $11}’ input.csv | awk -F’:’ ‘{print $1,$2;}’ OFS=":" > d.csv;
awk -F’,’ ‘NR==1 {print "Dauer"} NR>1 {print $12}’ input.csv | awk -F’:’ ‘{print $1,$2}’ OFS="." > e.csv;
pr -m -t -s";" a.csv b.csv c.csv d.csv e.csv > f.csv;
LANG=en_EN sort -t ";" -k 2,2 projektliste.csv > projektliste_cens.csv;
LANG=en_EN sort -t ";" -k 16 f.csv > f_cens.csv ;
LANG=en_EN join -t ";" -o 2.8,2.15,2.16,1.4,2.17,2.19,2.20,2.21,2.6 -1 2 -2 16 projektliste_cens.csv f_cens.csv > output.csv;
[/code]

  • Awk is the perfect command for these kind of tasks. Howevere, I needed to save each step in its own temp file.
  • The first line adds the column “Mitarbeiter” with the according headline and my id. I also reformat the date with “/” rather than “-“.
  • The second line extracts the project and task id from the ‘project column”, which is delimited by spaces.
  • The third and fourth line removes the seconds from the start and end times.
  • The fifth line, basically, does the same thing but uses a dot instead of a colon for the duration (hh.mm).
  • Then, the whole csv files are merged using pr.

The grand final is done by join, because i needed to get some more information from an additional csv file, the client id. Join is pretty much the same as a ‘vlookup’ in spreadsheets. Before this works, however, the matching columns need to be sorted. I had big troubles getting this to work, since there was some general language fuckup. I found the solution here. Join also rearranges the columns.

Author: Stefan Schwaha

-<>-

Leave a Reply