Remove Duplicate Lines with One Command

You can easily do this with awk:

awk '!seen[$0]++'

Introduction

This is a quick note on how to remove duplicate lines and show each line only once using a command.

Note: This article was translated from my original post.

Remove Duplicate Lines with One Command

The following command removes duplicate lines and outputs each one only once:

awk '!seen[$0]++'

Command Breakdown

awk '!seen[$0]++'
  • $0: The entire current line being processed
  • seen[$0]: An associative array seen using the entire line as the key
  • seen[$0]++: Increments the count for the current line
    • If the line is seen for the first time: undefined → 0
    • If the line has already been seen: the value increases by 1
  • !seen[$0]++: Boolean evaluation with the NOT operator
    • If the line is seen for the first time: !0 = true
    • If the line has already been seen: !1(or more) = false
  • awk '!seen[$0]++': Only returns true and displays the line when it's the first time it's seen

So this command removes duplicate lines and only shows them once.

Example

Let's try using the command to remove duplicates from a file called list.txt:

apple
banana
apple
orange
banana
grape
apple


Command:

cat list.txt | awk '!seen[$0]++'

Output:

apple
banana
orange
grape

As you can see, each line is shown only once, with duplicates removed.

Conclusion

That was a quick note on how to remove duplicate lines and show them only once using a command.

Hope it helps someone!

[Related Post]

en.bioerrorlog.work

Reference