Age of Reason

Random musing of books and stuff I am reading.

2004/12/30

Suffix trees and Shattered Perls

I had to Postpone the reading of Richard II
due to the technical problems. Instead I took
libstree for BSD and compiled it on Windows.
Since I boot into BSD only occasionally.

Pulled in bsd net/queue.h into win32/sys/queue.h,
a few typedefs into dummy win32/unistd.h,
dropped the buffer overflow check from snprintf to sprintf,
and voila everything compiled without warnings.
VC did catch one sizeof(void*) problem.

Shattered Perls

Windows has a dozen ports of perl, and all of them are broken,
one of them died on this:
c:\libstree-0.4.0\src> perl -ne 'print if m/\r/' *.c
It seems this windows port of perl converts all backslashes
in the arguments to forward slashes "@ARGV =~ s,\\,/,g"
assuming all arguments are dos paths.

2004/12/29

Tsunami in Chennai

Another harmonic of the Tsunami in Chennai.

Tsunami

After the first Tsunami (tidal wave) in Madras.

2004/12/24

Christmas Reading

Bought DK's "Essential Shakespeare Guide" on a Borders gift card.
The book has excellent pictures, but not much information on language.
Probably popular book with party goers and trivia droppers, kind of people
who talk about Shakespeare rather than read his plays.

That is the problem with gift cards -
you have to carry it around, you feel obliged to cash it,
it was useless at a Barnes and Noble with a larger book selection.

Also Plodding thru Gusfield's explanation of Ukkonen's algorithm to
build suffix trees in linear time.

Also reading "World of Nagaraj" by R.K.Narayan. Nagaraj is building up
suspense by doing nothing, the reader is waiting for something to happen,
why is the tao silent? R.K. is a master writer.


2004/12/22

Rabin Karp Fingerprinting

Dan Gusfield presents Rabin & Karp's fingerprinting algorithm for
string matching in expected linear time.

Fingerprinting and Signatures are sparse hashing,
while parity check is dense hashing.

The idea is this.
If the parity of pattern and text differ,
there can be no match.
Otherwise the probability of a false
match is given by the compression ratio
of the parity, 1/2 in this case.

match(pattern,text)
for(i=1 to |text|)
if( hash(pattern[1..n]) == hash(text[i..i+n]) ){
pattern =~ text[i] with
probability of false match being #patterns / #hash
else
no match at i.

Other idea is to compute the sliding hash incrementally inside the loop:

hash(text[i..i+n]) = hash(text[i-1..i+n-1]) + incremental effort.

Rabin and Karp use numbers mod p as the hashing function,
where p is a random prime < maxint on the computer.
Computation mod p is fast and there is
no need to carry around bignums.

The probability of error falls quickly
by hashing w.r.t. multiple random primes.

2004/12/17

Parsing in 21st century -- "possible unbalanced brace in comment" -- tcl error message.

A real life Tcl error message goes:
"Possible unbalanced brace in comment".
It got me wondering "is nothing sacred?"
Why cannot I comment out a piece of tcl code
without eliciting rude comments from my interpreter?

It reminded me of command dot com, where
the implementation is the spec and the spec is
a collection of examples on the web.

Don't you love software that is just a collection of
special cases, and as the product grows, you just have
to add more cases. Finally you get the big bank theory
of software management.

One exception in the family is the language "Expect" by Don Libes.
Expect is a beautiful extension based on tcl, but that is just
the genius of Libes. Libe's books "Exploring Expect" and "Obfuscated C
and other mysteries
" are exceptionally good.
See my review of the latter book on Amazon.
It is pity the book is out of print.

2004/12/16

Cycle stealers fight back the keyjackers

The latest process explorer from sys internals is awesome,
it is free, and shows everything about the system (win9x/2k/xp).

The CPU graph shows process hog history,
process paths/tcpip ports/file handle/strings,
right click and find about that process on google!
It did find a keyboard logger -- user32.exe on my win2k.

I can see that the antivirus CPU usage jumps in lock step
with my own program CPU usage. Who says anti-virus programs
don't steal cycles?


2004/12/14

Books I am reading - King John ..

I never seem to get down to write about books I am reading.

King John by Shakespeare. I was following the Cambridge Shakespeare by John Dover Wilson. Yesterday I switched to the Folgers edition, since I like the notes on the side and the smaller size.

The play is never mentioned in recent times,
because it religiously deals with power and
politics, and maybe too close to reality.
The Protagonist Bastid saves the
day for England while John wavers.
It has other strange characters like
the Prophet of Pomfret and Lord Bigot.

See http://www.online-literature.com/shakespeare/kingjohn
and http://ise.uvic.ca/Library/SLT/history/kingjohn.html

Also reading Dan Gusfield's classic. Aho Corasick's algorithm is cute, and now I can see why agrep is blindingly fast. Bit-Shift-Anding is kind of multiplication, and that leads to FFTing the bits in m.log(m) time, interesting. So far, so good. Gusfield is great writer, his book could be taught to Btechs, though the price is steep at ~$50.

The third book I am reading is R.K.Narayan's "World of Nagaraj", not much happening, but that is the way I like my novels to be. I finished "Waiting for Mahatama" last month, very touching. The book gives a personal view of South India in 1940s thru the eyes of Sriram, that no history book can match.

I am once again reading Hardi's book, stiring a healthy mix verilog and vhdl for the hardware debugger's compiler.

Agrep got me thinking, that vim needs a dynamic spell check against a dictionary or words in current buffer. String searching can be very fast, especially the bit-shift-and algorithms of agrep. The other option is dynamically lookup esoteric words in google as I type. For example, what is the Latin name of pipal tree in Mumbai.

I try to cross check proper names when I am photo blogging.

Usually I search in a tabbed window of Opera in text mode. Today I tried wget google and got back over-htmlized unreadable text. That's when I found google apis - not fast enough to complete as I type, but certainly plug-able into emacs and gvim to do lazy-lookups in a secondary buffer, while I type ahead. The CPU and network capacity and already there.

- M.

2004/12/11

File Descriptors in Bourne shell (sh,ksh,bash).

In the 1980s Steve Bourne in 11 pages specified the grammar for
a functionally complete shell (see original ATT Unix manual).
It is a tribute to his foresight, that those 11 pages still
suffice to write arbitarily complex shell scripts in 2004.

Marc Rochkind's Advanced Unix Programming book fills in the
elegant plumbing details, and how pipes in sh work.

Compare that to csh and msdos shell, where quoting/aliasing/redirection are
broken, and work only for special common cases. For details read
Tom Christiansen's csh write up
http://www.faqs.org/faqs/unix-faq/shell/csh-whynot

That is the difference between Computer Science (rather mathematical training) grammar and bad engineering hacks. What is a good grammar? It is being able to write arbitarily complex scripts, without having to wonder if it will work.

The elegance of sh has been continued in Perl, while the tradition of csh
has been carried into tcl (in tcl, the parser cannot deal with unbalanced
braces inside comments, and all data is a null terminated string).

If you find sh redirection in configure files spinning your head, read on:



Notation:
FD is file descriptor [0-9-].
&FD is reference to a file descriptor.
WORD is an unglobbed filename.

&- NULL # closed FD.
E<&- FD[E] := NULL # close fd E (default is stdin /0). E>&- FD[E] := NULL # close fd E (default is stdout/1).
<&- FD[0] := NULL # Close stdin >&- FD[1] := NULL # close stdout

FD[-] NULL # closed fd).
FD[0] stdin # default for input
FD[2] stderr

E<>F FD[E] := open(F,rw) # Redirect E to file F in read-write mode
E>&D FD[E] := FD[D] # fd D is duped and result is fd E (default is stdout).
>&D FD[1] := FD[D]
E> F FD[E] := open(F,w) # Redirect fd E to file F (default is stdout).
> F FD[1] := open(F,w) # Redirect stdout to file F.
E>>F FD[E] := open(F,a) # fd E is appeneded to F.

E<&D FD[E] := FD[D] fd D is duped and result is fd E (default is stdin). E<&- FD[E] := NULL, fd E closed. E<<<\STR \n(HERE DOCUMENT $XYZ not expanded\n)*STR E<< -STR \n(\t*HERE DOCUMENT\n)*STR .. Leading \t will be stripped from each line. Notes: > is really 1>
<>/dev/null Leaves STDOUT open
2>/dev/null Discard STDERR
2>&1 Send STDERR to STDOUT instead
2>&- Close STDERR (not recommended)
3<>/dev/tty Open fd 3 to /dev/tty in read-write mode

Only param and command subst,
Not filename or blank interpretations (no globbing).
Eg. $ >b cat a .. copies a to b.
Eg. $ echo x > *.c .. creates '*.c'

Shell ReDirection Examples: moshtag=redirection

Think of each FD as a variable pointing to a fileio_data_buffer.

N>&M FD[N] := FD[M].
N> M FD[N] := open(M,w);
2>&1 Means FD[2] := FD[1].
Now FD[2] stream is lost,
and anything written to FD[2] is sent to FD[1],
since FD[2] is pointing to the FD[1] fileio_data_buffer.

Think of redirection as connecting pipes?
2>&1 Means dup the RHS pipe and connect the LHS source to it.

Eg. $ gcc x.c 2>&1 | grep errors
Eg. $ 2>&1 gcc x.c | grep errors

Order of redirection is always left to right.
Eg. $ gcc x.c > log 2>&1
FD[1] := open(log,w) # 1. First send stdout 1> to ./log
FD[2] := FD[1]; # 2. Then send stderr 2> to &1 (ie. ./log)
# Both FD[1] = FD[2] =

Eg. $ gcc x.c 2>&1 > log
FD[2] := FD[1]; # FD[2] = FD[1] =
FD[1] := open(log,w); # FD[1] =

Eg. $ exec 5>./config.log ; echo "into config.log" 1>&5
FD[5] := open(config.log,w); # Connect pipe 5 to config.log,
FD[1] := FD[5]; # then send pipe 1 data into pipe 5.

Eg. $ cmd 3>&1 1>&2 2>&3 3>&- # Swap stdout and stderr for cmd
# fd3 = fd1 = stdout
# fd1 = fd2 = stderr
# fd2 = fd3 = stdout
# fd3 = null
1. First save stdout as &3 (&1 is duped into 3).
2. Next send stdout to stderr (&2 is duped into 1).
3. Send stderr to &3 (stdout) (&3 is duped into 2).
4. close &3 (&- is duped into 3)

Eg. stderr_output=`cmd 3>&1 1>&2 2>&3 3>&-` moshtag=stderr_output

$ 2>b ls xxx # redirect stderr,
$ cat b
ls: "xxx" not found

# close stdout .. FD[1] :=
$ exec >&-
$ cat b
cat: write error on standard output.

# close stdin .. FD[0] :=
$ exec <&- ... shell exits
Easiest way to learn is to try these in bash.
The firefox is mangling the ampersands in the posting (so watch out).

mosh

Debugger Command Cross Reference

Feature Desc vc dbx gdb windbg ntsd edebug perld i386kd wdeb386
Lang - c c c c c elisp perl c c
OS - win unix unix win win all all win win
GUI - Y N N Y N emacs N N N
Alt GUI - N emacs ddd emacs N N N ddd N N
Vendor GUI - Y Sun/workshop HP/wdbg Y N emacs N N N
Cmdline - N Y Y Y Y - Y Y Y
Alias - N alias/define command N ini Y = N N
Completion N Y YY N N YY - N N
script script VbPerlOle ksh Y dll dll Y Y $< file dll
Actions When .. do when a<>{}
history cmd history N Y Y Y Y Y ! Y Y
init setup regedit .dbxrc .gdbinit regedit ini .emacs - ENV -
Options config regedit dbxenv set O
help ? msdn help i info ? .hlp ? info h pod ? msdn ? msnd
Feature Desc vc dbx gdb windbg ntsd edebug perld i386kd wdeb386
bp break F9 stop at b bp bp - - bp bp
ba hardware A-F9 stop access hbreak/b *addr br ba - N ba br
bl handlers list stops A-F9 status i b - - l L bl -
bd delete F9 delete N delete break N - - u d D - -
bc clear del current C-F9 clear clear - - - - bc -
bp once break once F9 F5 F9 stop -temp tbreak - - - c .. - -
be on enable C-F9 - enable - - - - be -
bd off disable C-F9 -disable disable - - - - bd -
bp if conditional C-F9 cond -if condition J - - - J -
bp --k==0 count to bp C-F9 #NAME? ignore .. k J - - - J -
bp do act action on b N when..{..} command ... J - - - J -
Unassemble disassemble C-F11/A-8 dis $pc disassem addr u - - - U -
bt backtrace A-7 where bt kd kb kd - - T ~k ln -
frame A-7 frame up/down - - - - - -
Locals vars - A-4 dump - - - - V - -
Feature Desc vc dbx gdb windbg ntsd edebug perld i386kd wdeb386
regs registers A-5 regs -F info regs - #R - - - -
expr eval expr F8 print print ?expr - e/R x m p ?expr -
change var F8 p x=y p x=y - - - = E F -
trace single step VC trace trace - - - t tr -
Examine Mem - A-6 x addr/fmt x/fmt addr dw db dc - - - D -
Threads - A-6 - - - ~*KB - - ~ -
Symbols - - - i - X C-h - X -
Watch - A-3 display - - - - w - -
typeof whatis -t ptype - - - - - -
exe - Menu debug a.out file a.out - - - - ini win.com
args - Menu runargs set args Menu cmd - cmd Kernel Kernel
src - vc pdb dir ~/src pathmap ~/src - - - - - -
edit - builtin list list - - - - - -
go - S-F5 run [args] run G - - - G -
Feature Desc vc dbx gdb windbg ntsd edebug perld i386kd wdeb386
step into - F11 step step T - d s T -
stepi instruction - stepi stepi - - - - J -
step over - F10 next next P - - n P -
step if - - - - - - - - J -
finish till return F12 step up finish - - r r - -
continue - F5 cont continue - - c c - -
until till cursor F6 - - - - bc - - -
jump - F2 cont at jump - - - - - -
pop return now \jump pop return - - - r - -
repeat redo - !! - - - - -
quit - MENU quit q q q q C-] - q q
Feature Desc vc dbx gdb windbg ntsd edebug perld i386kd wdeb386


Corrections welcome, it is out of date, but nevertheless useful when switching platforms. Debugger command cross reference, GPL(C) Mohsin Ahmed.

- M

Gmail rocks .. here's why.

I have got ZERO spam on gmail in last three months,
only about 30 friends have that address (kind of
friends who never forward chain letters).

I haven't published it or written my gmail address
on any commercial forms, hence so far so good.

I got hooked to gmail, the day it offered
https access
meaning my mail was safe from local snoppers.

I never realised https was so light-weight till I used
it on a daily basis on an old win98/64Mb/celeron400
(for comparison Knoppix/X thrashes the RAM on this machine).

Now I use Eudora 6.1/pop3 secure connect to gmail from home,
I don't have to worry about the gmail quotas.

On the flip side, I have to use IE & Firefox + Java.
Instead of faster and safer Opera without Java.
Firefox makes my Win98/64Mb disk thrash,
IE brings in viruses. All of them are leave emails
and address-book on the disk.

With Gmail pushing Firefox and it being free,
means soon Firefox will be faster and better than
Opera - that's the way cookie crumbles.

Here's how to convert return addresses into an image on linux:
> pbmtext $LOGNAME@gmail.com | convert - email_address.gif
Now I need my mailer to do it automatically, and also to
shear the image.


2004/12/10

How to install Apache with SSL on a windows box.

Here is how I installed Apache with SSL on a Windows box

1. Read this HowTo and wiki.

2. You need to download apache 1.3 or 2.0 web-server msi installer
and install in c:/apache (call it $apache).

3. Download ssleay32.dll and libeay32.dlls into %WINDIR%/system32

4. Make certificates with openssl.exe and put them in $apache/conf/ssl/


> openssl req -new -out server.csr
> openssl rsa -in privkey.pem -out server.key
> openssl x509 -in server.csr -out server.crt -req -signkey server.key -days 36500
> openssl x509 -in server.crt -out server.der.crt -outform DER
> cp server.crt server.key $apache/conf/ssl



5. wget and edit conf/ssl/ssl.conf

6. Edit conf/http.conf

7. Test with Opera https://localhost
Opera will ask you about accepting the your server certificate,
the one you created in step 4.
Problems? Debug with wget -d https://localhost and see $apache/logs/*.*

---------------------
8. How to selectively export directories to a $USER?

Create an account for the USER
> $apache/bin/htpasswd.exe -c $apache/conf/userlist.lst $USER

And export c:/ as https://localhost/exported/c/
Add this to $apache/conf/httpd.conf


LoadModule ssl_module modules/mod_ssl.so
ServerSignature Off
SSLCertificateFile conf/ssl/server.crt
SSLCertificateKeyFile conf/ssl/server.key

Alias "/exported/c/" "c:/"
< Directory "c:/">
AllowOverride AuthConfig
Options Indexes FollowSymLinks MultiViews
AuthType Basic
AuthName "Password Required"
AuthUserFile abs_path_to_$apache/conf/userlist.lst
Require valid-user
SSLOptions +StrictRequire
< /Directory>