May 4th, 2006

How to list duplicate lines in a text file, with counts next to each unique line

At some point, last year (it’s been in my ‘toblog’ file all this time), I needed to analyze the lines in a text file, removing duplicate lines, while counting how many times each duplicated line occurred within the file, and sorting from most common to least common.

For example, using a text file called ‘dupetest.txt’, containing:

foo bar baz
foo qux corge
spugbrap likes bacon
foo qux corge
spugbrap likes bacon
foo bar baz
oatmeal cookies are good
oatmeal cookies are good
foo bar baz
foo qux corge
foo bar baz

The output I want is:

4 foo bar baz
3 foo qux corge
2 spugbrap likes bacon
2 oatmeal cookies are good

I knew there had to be a simple way of doing this by just stringing together a few unix commands (in cygwin), but finding the right combination of commands took me some effort. Here’s what I came up with:

sort dupetest.txt | uniq -c -d | sort -n -r

2 Responses to “How to list duplicate lines in a text file, with counts next to each unique line”

  1. Dave O Says:

    Awesome tip; I’ve been needing that sort of trick for a while now! Thanks!

  2. ClintJCL Says:

    they should just come up with a primitive like “notuniq” that we can use!

    anyway, i need to do this RIGHT NOW and remembered you had posted this.. heh

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>