March 21st, 2007

Gmail snippets include ALT text for images in HTML emails

While checking my email this morning, I noticed something interesting about gmail’s “show snippets” feature. It wasn’t something that jumped out at me or anything. In fact, it didn’t really register in my mind until after I’d already clicked to view the message. The words “Bank of America Customer using a laptop” seemed a little strange. So, I went back and looked at my Inbox again, and saw this snippet:
gmail snippet: Bank of America Customer using a laptop for  Online Banking

This seemed like an odd bit of text for an email notifying me that a direct deposit just posted to my account. Sure, I am a Bank of America customer, who usually uses a laptop to access Online Banking. But they shouldn’t know that, so I clicked on the message again to see what they had to say about using laptops.

Well, the message, itself, showed no signs of the word “laptop”, but the large header image in this HTML email had a picture of a laptop in it. That’s when I realized that gmail was probably showing the ALT text for the header image! To verify this, I used gmail’s “show original” option, to view the full message source. Sure enough, the header was made up of several images, each of which had ALT attributes, and the header images appeared before any of the actual message content. The ALT text for the laptop image was, as expected, “Customer using a laptop for Online Banking”.

Apparently, to generate message snippets, gmail strips the HTML out of the message, leaving behind the ALT text from any IMG tags that appear in that code.

That makes some sense, since the ALT attribute provides a textual representation of the image content, for accessibility purposes. However, I’d bet that most of the time, images in HTML emails are not meant to be part of the content… Most of the time, they’re probably things like company logos, navigation bars (linking to different parts of a company’s website), list bullet icons, pictures of your [family member/friend]’s children, etc.

Don’t get me wrong. I’m not complaining about Bank of America’s email header, or gmail’s snippet generation method. I just thought this was interesting behavior. I have a few ideas for who might benefit from this information, and how they might use it, but I’ll have to save that for another post, tomorrow.

October 23rd, 2006

MS Word’s Find and Replace has limited regex support

I just noticed a post on TGAW’s blog, Word 2003 - The Find What text contains a Pattern Match expression which is not valid, which contained some information that I don’t need right this second, but I do want to make a note of that’s more conspicious to me in the future…

She was trying to do a search/replace in Microsoft Word 2003, and stumbled across the fact that the Find and Replace dialog, with the ‘Use Wildcards’ option checked, “is pretty much a regular expression search (it does appear to be missing some support like \d, \w, etc)”.

I’m a big fan of regular expressions (although I must admit having been intimidated by them for years before finally getting comfortable using them), and often post code samples that used regexes, and just plain search/replace expressions that have proven useful to me, on my old geek blog (and in the future they will go here instead).

May 4th, 2006

How to list duplicate lines in a text file, with counts next to each unique line

At some point, last year (it’s been in my ‘toblog’ file all this time), I needed to analyze the lines in a text file, removing duplicate lines, while counting how many times each duplicated line occurred within the file, and sorting from most common to least common.

For example, using a text file called ‘dupetest.txt’, containing:

foo bar baz
foo qux corge
spugbrap likes bacon
foo qux corge
spugbrap likes bacon
foo bar baz
oatmeal cookies are good
oatmeal cookies are good
foo bar baz
foo qux corge
foo bar baz

The output I want is:

4 foo bar baz
3 foo qux corge
2 spugbrap likes bacon
2 oatmeal cookies are good

I knew there had to be a simple way of doing this by just stringing together a few unix commands (in cygwin), but finding the right combination of commands took me some effort. Here’s what I came up with:

sort dupetest.txt | uniq -c -d | sort -n -r