How to list duplicate lines in a text file, with counts next to each unique line
At some point, last year (it’s been in my ‘toblog’ file all this time), I needed to analyze the lines in a text file, removing duplicate lines, while counting how many times each duplicated line occurred within the file, and sorting from most common to least common.
For example, using a text file called ‘dupetest.txt’, containing:
| foo bar baz |
| foo qux corge |
| spugbrap likes bacon |
| foo qux corge |
| spugbrap likes bacon |
| foo bar baz |
| oatmeal cookies are good |
| oatmeal cookies are good |
| foo bar baz |
| foo qux corge |
| foo bar baz |
The output I want is:
| 4 foo bar baz |
| 3 foo qux corge |
| 2 spugbrap likes bacon |
| 2 oatmeal cookies are good |
I knew there had to be a simple way of doing this by just stringing together a few unix commands (in cygwin), but finding the right combination of commands took me some effort. Here’s what I came up with:
sort dupetest.txt | uniq -c -d | sort -n -r


May 5th, 2006 at 4:07 pm
Awesome tip; I’ve been needing that sort of trick for a while now! Thanks!
May 1st, 2007 at 2:53 pm
they should just come up with a primitive like “notuniq” that we can use!
anyway, i need to do this RIGHT NOW and remembered you had posted this.. heh