Manipulating text with awk, gawk and sed

The awk, gawk and sed commands on Linux are extremely versatile tools for manipulating text, rearranging columns, generating reports and modifying file content.

Using awk and gawk

To select portions of command output using gawk, you can try commands like those below. The first displays the first field in the output of the date command. The second displays the last field. Since NF represents the number of fields in the command output, $NF represents the value of the last field.

$ date | awk '{print $1}'
Sat
$ date | awk '{print $NF}'
2023

Note that on Linux systems today, awk is usually a symbolic link to gawk, so you can type either command and get the same result. Here’s what you’ll probably see when you do a long listing of the awk executable:

$ ls -l /usr/bin/awk
lrwxrwxrwx. 1 root root 4 Jul 21  2021 /usr/bin/awk -> gawk

The “l” at the beginning identifies /usr/bin/awk as a symbolic link.

To break a line of text into pieces, the text must use a common delimiter to separate the pieces you need to work with. The default delimiter for both awk or gawk is white space. The -F argument, however, allows you to specify whatever delimiter separates the chunks of text that you are working with. As you probably know, any character can serve as a delimiter as long as it’s only used as a delimiter. Delimiters are usually blanks, tabs, colons, semicolons and such. You can, however, break on a letter or other character if your data calls for that.

To rearrange the portions of a line of text separated by commas, you could use a command like this:

$ echo "one,two,three" | gawk -F ',' '{print $3,$2,$1}'
three two one

While “x” is rarely used as a delimiter, even that would work.

$ echo onextwoxthree | gawk -F 'x' '{print $2,$3,$1}'
two three one

NOTE: Without the commas, the result would be “onetwothree”.

Note that gawk allows you to arrange the pieces of data in any order and that you can ignore fields that you don’t wish to include in your output.

Breaking on white space

Repeated blanks (a/k/a “white space”) serve as single delimiters unlike other characters. Regardless of the number of blank characters are in each stretch of white space, the strings of letters are easily rearranged and displayed with single blanks between the words.

$ echo one      two    three | gawk -F ' ' '{print $2,$3,$1}'
two three one

Editing files “in place”

You can also use gawk to directly make changes to files by using the inplace option. In the example below, a new file is created using the fortune command and then line numbers are added to each line using gawk. NR represents the line number which is followed by a period and a space character.

$ fortune > fortune
$ gawk -i inplace '{print NR ". " $0}' fortune

To view the changes, display the file again.

$ cat fortune
1. Work is of two kinds: first, altering the position of matter at or near
2. the earth's surface relative to other matter; second, telling other people
3. to do so.
4.              -- Bertrand Russell

To alter the text, you could use a command like the one below which will replace “two” with “2”:

$ awk -i inplace '{gsub("two", "2")} 1' fortune
$ cat fortune
1. Work is of 2 kinds: first, altering the position of matter at or near
2. the earth's surface relative to other matter; second, telling other people
3. to do so.
4.              -- Bertrand Russell

To reverse this change, you could try a command like this, but notice that it changes one of the line numbers as well.

$ awk -i inplace '{gsub("2", "two")} 1' fortune
$ cat fortune
1. Work is of two kinds: first, altering the position of matter at or near
two. the earth's surface relative to other matter; second, telling other people
3. to do so.
4.              -- Bertrand Russell

To avoid changing text that you don’t want to change, add something to your command that will distinguish the target text from the rest. In the command below, I’ve added a blank before the 2 so that the line number won’t be affected.

$ awk -i inplace '{gsub(" 2", " two")} 1' fortune
$ cat fortune
1. Work is of two kinds: first, altering the position of matter at or near
two. the earth's surface relative to other matter; second, telling other people
3. to do so.
4.              -- Bertrand Russell

Separating columns of data

To separate tab-separated columns of data into individual files, you could use a script like this one:

#!/bin/bash

echo -n "file> "
read file

awk -F 't' '{ print $1 }' $file > col1
echo ==============================

awk -F 't' '{ print $2 }' $file > col2
echo ==============================

awk -F 't' '{ print $3 }' $file > col3

Assuming the original file has at least three columns of data separated by tabs, you would end up with three separate files with content. If the original file has only two fields, the third file will contain no text. If the fields are separated by blanks instead of tabs, only the first will contain text.

Using sed to modify file content

You can also use sed commands to make changes to files. For example, to replace every instance of one word or phrase with another, you could use a command like the one shown below. The file first looks like this:

$ cat txt
This is the old text.

The sed command below replaces “the old” with “better”.

$ sed -i "s/the old/better/" txt
$ cat txt
This is better text.

If any lines in the file contain multiple instances of the word or phrase, you need to add the “g” (global option) to the sed command or it will only change the first instance in each line.

$ cat txt
This is the old text.
I like the old text. Do you like the old textbooks?
$ sed -i "s/the old/better/g" txt
$ cat txt
This is better text.
I like better text. Do you like better textbooks?

Wrap-up

The awk, gawk and sed commands can help when you need to manipulate text in some way — especially if you need to make a lot of changes to a lot of text. To look into further into how you can use these commands, check out the links below that will take you to several of my earlier articles.

Source link

awk gawk Manipulating sed Text