The comm command on Linux systems can compare file or directory contents and display the differences in a clear and useful way. Think of “comm” not so much as a reference to “compare” as to “common,” since the command writes to standard output both the lines that are common and the lines that are unique in each of the files or directories.
One key requirement when using comm is that the content to be compared must be in sorted order. However, there are ways that you can get away with comparing content that isn’t sorted. Some examples of how to do this will be presented in this post.
Comparing files
Normally, when using the comm command, you would compare two sorted text files to see their shared and unique lines. Here’s an example in which a list of friends and a list of neighbors are compared.
$ comm friends neighbors Alice Betty Christopher Diane George Patty Ricky Sam Tim Zelda
Notice that the output is displayed in three columns. The first includes the names that are only included in the first file. The second shows the names that are only included in the second file. The third shows the common names.
NOTE: If one of the files were not sorted, you would see something like this:
$ comm friends neighbors Alice Betty Christopher Diane Patty comm: file 1 is not in sorted order <== George Ricky Sam Tim Zelda comm: input is not in sorted order <==
You could, however, get around this issue without actually changing the sort order of the files themselves. Instead, you could sort the files when running the comm command as in this example:
$ comm <(sort friends) <(sort neighbors) Alice Betty Christopher Diane George Patty Ricky Sam Tim Zelda
If you want to see only the contents that are common to the files being compared, you can suppress the display of the first two columns with a command like this one:
$ comm -12 friends neighbors Diane Patty Tim
The “-12” means “suppress column 1 and columns 2”. Any of the columns can be suppressed in this way. In the command below, only the third column is suppressed. As a result, you see the names that are unique to each file, but not those included in both files.
$ comm -3 friends neighbors Alice Betty Christopher George Ricky Sam Zelda
If you want to compare files that may not be sorted, you can use the —nocheck-order option to suppress the comm command’s complaints:
$ comm --nocheck-order friends neighbors Alice Betty Christopher Diane George Patty Ricky Sam Tim Zelda Tim
To have the comm command count the number of lines in each column, add the —total option as shown below.
$ comm --total friends neighbors Alice Betty Christopher Diane George Patty Ricky Sam Tim Zelda 6 1 3 total
To use a different delimited than the tabs that, by default, separate the columns, use the —output-delimiter option as shown in the example below. The lines below with no “:” characters are first column (only in the first file) entries. Those starting with a single “:” are second-column (only in the second file) names. The lines below that start with “::” are third-column (contained in both files) names. This can make it easier to import the output into a spreadsheet.
$ comm --output-delimiter=: friends neighbors Alice Betty Christopher ::Diane George ::Patty Ricky Sam ::Tim :Zelda
Comparing directories
When comparing directory content, you need to use a technique similar to what was shown earlier for comparing unsorted files by sorting their contents in the process of comparing them to list the files. In this example, the contents of the two directories are listed before being compared.
$ comm <(ls dir1) <(ls dir2) file1 file2 file3
In the example above, the only file that is common to both directories is file3.
Note, however, that comm command just shown is only comparing the file names. It is not comparing file contents.
Adding headings
If you want to add column headings to your comm output, you can put both an echo command and the comm command that you want to run into a script file. Though the headings won’t necessarily align precisely with the content being compared, they can still be useful. Here’s an example script:
#!/bin/bash echo -e "friendstneighbors both" echo "======= ========= ====" comm friends neighbors
Here’s example output from the script:
$ compare friends neighbors both ======= ========= ==== Alice Betty Christopher Diane George Patty Ricky Sam Tim Zelda
Wrap-up
The comm command makes it quite easy to compare the contents of text files – even when they’re not sorted to begin with. It also allows you to compare the names of files in two directories. Check out the comm man page for additional options.
Copyright © 2023 IDG Communications, Inc.