shell tools – parametric commands with xargs

Sometime is necessary to execute commands several times changing only some parameters.

In most of the cases you can use a for or a while loop:

for FILE in $(ls -1); do echo ${FILE}; done 

This form isn’t particularly robust because of the unmanaged spaces in the file names but anyway it is really common because of its simplicity.

Another option is to use the -exec option of the find command.

find . -type f -exec echo '{}' \;

This form is a really powerful way of looping but can be used only on list of files

In some cases neither of the previous forms can be used. Let see an example: you have to read a file (ei: an apache access log) elaborate the row in some way and execute a command for each row. In a case like this find isn’t an option and the for loop is inconvenient at the best because you have to put much more logic in the variable definition then in the true loop. Moreover if you have a big file to manage you couldn’t have enough memory.

It would be much better to elaborate on line at time: xargs is the tool that permits us to proceed in this way.

Let’s look at an example: suppose I want to loop through a log file and search for the served pages that doesn’t contains a word. I’ll have to read the file, probably select some lines, extract the information required to execute a curl, execute the curl and examine the result.
I can do this in a single line using xargs without involving intermediate files.

cat  20150608_access.log |grep "linuxandcompany" |awk -F"GET " '{print $2}' |awk -F" HTTP/" '{print $1}' | xargs -I {} sh -c 'curl -s "{}" | grep "pippo" > /dev/null;  [ $? -ne 0 ] && echo "{}"'

The first part is really basic: the cat command is used to read through a log file. Then I use a filter (grep) to select some lines (the ones related to this website).

To extract the searched information (URI) I use the two times the awk command.

At this point I have a pipe with a list of URI. The function of xargs is just to get these values, substitute them in a parametrized command and execute the result.

In this example I have to execute several commands at each iteration. xargs doesn’t permit this but a simply trick will help us: we choose a shell as command and execute a script inside it.

Just to complete the description of the example the curl retrieves the web page from internet, the grep command checks for the searched value, and the result is thrown in the trash. This because I’m interested only in the return value of the grep command.

The return value of the grep is checked and if the word isn’t found the URI is written in the output.