March 27th, 2008

Recursively grep for a substring, open all results in TextPad with cursor positioned appropriately

I’ve been using Ext-JS on a new project, recently. It’s pretty neat, and the examples are impressive, but the documentation leaves a lot to be desired. I needed to make a section of a we page collapsible, and it seemed like the Ext.Panel class was the way to do that, but I was having trouble figuring out exactly how to get my existing HTML content into a collapsible Ext.Panel. Almost as a last resort, I ended up grepping my local ext-2.0/examples directory tree to find examples that instantiate Ext.Panel objects:

$ grep -Ri “new Ext.Panel” *
code-display.js: var panel = new Ext.Panel({
core/templates.js: var p = new Ext.Panel({
core/templates.js: var p2 = new Ext.Panel({
feed-viewer/MainPanel.js: this.preview = new Ext.Panel({
feed-viewer/MainPanel.js: tab = new Ext.Panel({z
[…]

This was not very useful. I needed to see the whole constructor invocation for each of those cases. So, I decided to grep again, showing just the filenames (using the -l parameter), so I could open all of those files in TextPad. The first part of that (showing just the filenames) was the easy part:

$ grep -Rli “new Ext.Panel” *
code-display.js
core/templates.js
feed-viewer/MainPanel.js
form/combos.js
form/custom.js
[…]

Next, I needed to change those file paths from cygwin/unix-style paths to windows paths, so they could be passed to TextPad on the command-line. Time for a for loop:

$ for f in `grep -Rli “new Ext.Panel” *`; do cygpath -w -a $f; done
c:\api\js\ext-2.0\examples\code-display.js
c:\api\js\ext-2.0\examples\core\templates.js
c:\api\js\ext-2.0\examples\feed-viewer\MainPanel.js
c:\api\js\ext-2.0\examples\form\combos.js
c:\api\js\ext-2.0\examples\form\custom.js
[…]

Okay, so I could have probably built an environment variable as I was looping through and converting these paths, but if I ever wanted to run this on a longer path, with more search results, that command-line could get extremely long.

So, I checked the TextPad help to see if I could pass in the name of a file containing full file paths for TextPad to open. Sure enough:

@filename
Open all the files that are listed, one per line, in the specified file. This overrides the option to load the workspace, specified on the General page of the Preferences dialog box.

You just need to put an at sign (@) before the filename, and TextPad will look at that file to find a list of files to open. So, I decided to create a temporary file, output the filenames found and converted by my set of commands (above) into that temporary file, and then run TextPad, passing the temporary filename preceded by an @ sign.

But wait! I noticed something else in the TextPad help that seemed like a cool idea:

Notes:

  • [...]
  • If the filename to be edited (not printed) is followed by “(
    <line>[,<col>])”, with no intervening spaces, the file will be opened with the cursor at that position. If
    <line> is a hex number (eg. 0×1a22), a hex view of the file will be created, with the cursor at that address.
eg. TEXTPAD.EXE -ac "Read me.txt"(51,20)
In this example TextPad will start up and open "Read me.txt" at line 51, column 20 and display it in a cascaded window.

So, I decided to figure out a way to put the filenames to open, as well as the row and column number to position the cursor at within each of those files, into the temporary file that I was going to pass to TextPad. I already knew how to get grep to output line numbers (using the -n parameter), so I thought that would be the easy part.

However, it seems that you can’t specify both the -l (show filenames) and -n (show line numbers) parameters on the grep commandline. No, -l does more than simply tell it to show the filename next to each matching line (-H does that). -l tells it to ONLY show the filenames. Here’s the -l parameter definition from the grep man page:

-l, --files-with-matches
Suppress normal output; instead print the name of each input file from which output would normally have been printed. The scanning will stop on the first match.

As far as I could tell, if I wanted line numbers and filenames, I needed to use -n and -H, and deal with the fact that the output would also include the text of the matching line. I also threw in -m 1 to limit the output to only one result per file, since the cursor can only be positioned in one place for each file. I didn’t need the -m previously, because the -l parameter already limited it to one result per file, since it only showed the filenames of each matching file. Here’s what the grep commandline and output looked like, at this point:

$ grep -RHn -m 1 “new Ext.Panel” *
code-display.js:11: var panel = new Ext.Panel({
core/templates.js:30: var p = new Ext.Panel({
feed-viewer/MainPanel.js:10: this.preview = new Ext.Panel({
form/combos.js:49: new Ext.Panel({
form/custom.js:40: var panel = new Ext.Panel({
[…]

At first, I thought the matching line text was just in my way, so I used sed to filter it out, and to replace the colon (:) between the filename and the line number with an open parenthesis, to prepare it for the format TextPad wanted:

$ grep -RHn -m 1 “new Ext.Panel” * | sed -e ’s/\(^[^:]\+\):\([0-9]\+\):.*$/\1(\2/g’
code-display.js(11
core/templates.js(30
feed-viewer/MainPanel.js(10
form/combos.js(49
form/custom.js(40
[…]

Next, I needed to get the offsets or column numbers for each matching line number that the previous command returned, to tell TextPad exactly where to put the cursor in each file. At first, I thought I could do this with grep, but the closest grep parameter seemed to be -b:

-b, --byte-offset
Print the byte offset within the input file before each line of output

However, -b gives the absolute byte offset starting from the very beginning of the file, rather than the offset within the matching lines. So, I had to find a different way to get the column offset within each matching line. This is when I realized that having the matching line text returned by my grep command could actually be useful. I figured I could just split that text out and count the characters leading up to the matching string with wc -c, among other things.

Anyways, after a lot of trial and error, a lot of re-checking man pages for bash, grep, wc, etc., I ended up with the following set of commands:

textpad $(for g in `for f in \`grep -Rli "new Ext.Panel" *\`; do (grep -Hn -m 1 "new Ext.Panel" $f | sed -e 's/\(^[^:]\+\):\([0-9]\+\):.*$/\1(\2/g'); done`; do echo `cygpath -w -a ${g/\(*/}`\(${g/*\(/},`grep -m 1 "new Ext.Panel" ${g/(*/} | sed -e 's/\t/ /g' -e 's/new Ext.Panel.*$//g' | wc -c`\); done) &

I’m sure this could be done more efficiently, but this was a fun challenge to take on, and I managed to find a way to do what I wanted to do. Feel free to leave a comment if you know a better way of doing this!

April 13th, 2007

launching textpad from cygwin

This is a simple bash function that I use pretty often. It comes in handy when I’m navigating a tree of source code in a cygwin bash shell, and want to edit a file in TextPad. You can just put this line in your .bashrc file, and make sure the directory where TextPad.exe lives is in your $PATH environment variable:

function tp() { textpad $(cygpath --mixed $1) & }

This allows me to do things like:

$find . -name '*Foo*.java'

which returns results like:

./src/com/spugbrap/foo/bar/TestFooImpl.java
./servlet/com/spugbrap/baz/FooDispatcher.java

Then I can just copy one of those full (but relative) paths to the clipboard, and paste it into a command that looks like this:

$tp ./servlet/com/spugbrap/baz/FooDispatcher.java

Now, regardless of where the root of this relative path exists on the file system, it will open that file in TextPad.

The only limitation that I run into with this is that it only lets you specify one file to open. It could probably be modified to handle multiple files pretty easily, but this hasn’t bothered me enough to deal with yet.

August 10th, 2006

How to list just directories in bash

This morning, I was trying to find a way to list just the subdirectories in the current directory, in a bash shell script I was writing. I thought it would be simple, but everything I tried seemed to either take an extraordinarily long time, or felt like an ugly hack.

The first thing I tried was:
find . -type d

But this was extremely slow, because it was recursively searching inside every subdirectory as well. I just wanted a list of subdirectories inside the current directory. I won’t bore you/clutter this post up with any more of my less-than-ideal methods.

What follows are a couple of ways of doing what I was trying to do, which I found in a post (and its comments) on the Ubuntu Blog, “List only the directories“:

ls -l | grep ā€œ^dā€

This works, but gives a ‘long’ directory listing, when all I wanted was a list of directory names.


find . -type d -maxdepth 1 -mindepth 1This one was my favorite, since it used the method I originally tried, but it fixed the slowness by using parameters to avoid recursion. It gave me a couple warnings about the order of the parameters, though, so I changed it to this:
find . -maxdepth 1 -mindepth 1 -type d


ls -d */This gave me the same output as the ‘find’ method did, but some timing tests showed me that the ‘find’ method was about 2 times faster.

August 2nd, 2006

screen in cygwin needs System attribute on SockDir files

This will not be of use to many, if any, but I expended effort trying to figure out how to solve this today, so I’m posting it here for future reference, if nothing else.

Today, at work, I ssh’d to my home computer, and tried to run ‘screen -r -d‘, to reattach to an existing session of gnu screen at home. Here’s what happened:

$ screen -r -d
There is no screen to be detached.

I knew this was not true, so I tried this:
$ screen -list
No Sockets found in /tmp/uscreens/S-myusername.

I didn’t believe it, because I knew I had an existing session open, so I looked for myself:
$ ls -l /tmp/uscreens/S-myusername/
total 2
-rw——- 1 myusername None 54 Jul 24 14:35 1696.tty0.spugbrap-home
-rw——- 1 myusername None 54 Aug 2 14:19 3500.tty0.spugbrap-home

I saw it right there, so I looked to see if one of the processes that I had running in my existing screen session was still running:
$ ps | grep perl
3204 368 3204 1760 13 1003 14:20:05 /usr/bin/perl

Sure enough, there it was… So, I took a look at my SockDir on my laptop, at work, to see if permissions might be involved in some way:
$ ls -l /tmp/uscreens/S-myusername/
total 3
srwx—— 1 myusername None 53 Aug 2 14:18 2568.tty1.dave-laptop
srw——- 1 myusername None 53 Jul 4 01:05 3600.tty1.dave-laptop
srw——- 1 myusername None 53 May 15 15:42 960.tty1.dave-laptop

Ah hah! There was a difference! The ’s’ at the beginning of the permissions list on my laptop’s SockDir contents, but not on my home machine’s.

So, I went searching for what the heck that ’s’ stands for, since usually, if anything, I either see an ‘l’ (L) or a ‘d’. I checked the help, info, and man pages for ‘ls’ and ‘chmod’, but didn’t find anything that actually matched this flag. The closet thing was ’suid + executable’, but when I tried to chmod that onto one of my files, the permissions showed ‘-rws——’, which is not what I was looking for.

A google search or two, for things like ‘srwx——‘, ‘srw——-‘, ‘srwx srw‘, ‘cygwin srw‘, etc. didn’t turn up anything useful - at least not in the first pages of results.

I tried looking at my laptop’s SockDir in windows explorer, and looking at the advanced security properties of one of the files. Nothing looked interesting. Then I looked at it from a command prompt (4nt), and saw this:
[c:\]dir c:\cygwin\tmp\uscreens\S-myusername
[…]
0 bytes in 0 files and 2 dirs

Oops, let’s try with ‘attrib’ instead of ‘dir’:
[c:\]attrib c:\cygwin\tmp\uscreens\S-myusername
__SA_ C:\cygwin\tmp\uscreens\S-myusername\2568.tty1.dave-laptop
__SA_ C:\cygwin\tmp\uscreens\S-myusername\3600.tty1.dave-laptop
__SA_ C:\cygwin\tmp\uscreens\S-myusername\960.tty1.dave-laptop

Ah hah! The ’system’ and ‘archive’ attributes were set on these files. So, I verified that these flags were NOT set on the files on my home machine:
$ attrib “c:\\cygwin\\tmp\\uscreens\\S-myusername\*”
C:\cygwin\tmp\uscreens\S-myusername\1696.tty0.spugbrap-home
C:\cygwin\tmp\uscreens\S-myusername\3500.tty0.spugbrap-home

Sure enough, there’s the difference. So, I set the ’system’ attribute on those files (didn’t bother with ‘archive’ attribute, though I’m not sure what causes it to be there on my laptop but not on my home):
$ attrib +”c:\\cygwin\\tmp\\uscreens\\S-myusername\*”

Verified that it worked:
srw——- 1 myusername None 54 Jul 24 14:35 1696.tty0.spugbrap-home
srw——- 1 myusername None 54 Aug 2 14:19 3500.tty0.spugbrap-home

Then, I tried connecting to my existing session, and it succeeded.

While preparing this post, I experimented a little bit more, and noticed that, for some reason, my home pc is not creating these screen socket files with the required ’system’ attribute at all, anymore. I’m not sure why this is happening, now, because I can’t think of anything I’ve done, recently, that might have caused any different behavior as far as permissions and such.

I will post again if I figure this out, but for now I am content with adjusting the attribute manually. In theory, I can keep each screen session alive until my next reboot (which is rare), so I shouldn’t have to do too many manual adjustments like this. I also welcome any comments on how to solve this problem, or any other useful tips for effectively using gnu screen in cygwin.

May 4th, 2006

How to list duplicate lines in a text file, with counts next to each unique line

At some point, last year (it’s been in my ‘toblog’ file all this time), I needed to analyze the lines in a text file, removing duplicate lines, while counting how many times each duplicated line occurred within the file, and sorting from most common to least common.

For example, using a text file called ‘dupetest.txt’, containing:

foo bar baz
foo qux corge
spugbrap likes bacon
foo qux corge
spugbrap likes bacon
foo bar baz
oatmeal cookies are good
oatmeal cookies are good
foo bar baz
foo qux corge
foo bar baz

The output I want is:

4 foo bar baz
3 foo qux corge
2 spugbrap likes bacon
2 oatmeal cookies are good

I knew there had to be a simple way of doing this by just stringing together a few unix commands (in cygwin), but finding the right combination of commands took me some effort. Here’s what I came up with:

sort dupetest.txt | uniq -c -d | sort -n -r