Thinking Anthropologically: November 2011

Monday, November 7, 2011

Conventional Syntax is Critical in Programming

Languages, as everyone knows, come with rules that govern when and where you can say something. Programming languages are no different. The term for such rules is syntax.

Now what amazes me is the diversity of syntax that exists among computer languages and how amazingly baroque so many of them appear.

For example, Lisp uses something called polish notation that--in English speak--puts the verb first and wraps the whole things in parens. Here's an example from Scheme, a dialect of Lisp, that displays the sum of two numbers:

(display (+ 20 30))

See what happened? I summed 20 and 30 and then printed them to the screen with the display function.

Let's compare that to reverse polish notation, which turns it all around. Here's an example from the programming language Forth that once again sums 20 and 30 and then prints them to the screen.

20 30 + .

Boom! How does that look? Here's what happened. Forth uses a stack to store most variables and words (forth calls functions "words"). So let's trace out what I did. First, I put 20 on the stack. Then 30. Then the + word. The + word sums 20 and 30 as plus always does. Finally I put the . word on the stack which outputs the contents of whatever is on the stack.

If you want to see a larger example of Forth, read Dadgum's blog on What It's Like to Program Forth.

Here's the example in Python:

print 20 + 30

Notice anything? I do. The Lisp and Forth examples require explanation. They look tricky. They are Puzzle Languages.

In contrast, the Python code is pretty straightforward. It tracks the way most of us in the West have been reading, writing, and speaking all our lives. This is important.

Programming languages are tools. They should make problem solving easier. When they don't, they need to be replaced by tools that do. I'm going give such replacements a name: better tools.

I know that's going to be hard for some programmers to accept, but it's true. Here's why: Programming languages that track the syntax of spoken languages enable programmers to leverage their human language experience. This almost certainly accelerates comprehension when reading code and decreases the time needed to learn a given language.

Python has really been climbing the popularity charts over the past decade, and I would argue a large part of its success as a programming language is due to how simple it is to learn and remember.

And don't ignore that "remember" bit. It's in bold for a reason. Just learning a language isn't enough; you also must remember the language when you need it. Ten years ago, Perl was the language everybody used for scripting, but not anymore. Perl's syntax is tricky and hard to remember, which is probably why other languages like Python, Ruby, and PHP have been stealing away so many programmers who would likely have counted themselves as Perl programmers back around 2000. For that matter, many of them were Perl programmers back then. But not anymore.

I'm arguing that straightforward syntax is one of the most important features a programming language can have and that lacking such syntax can result in the language declining in popularity or never really getting adopted at all.

This is actually coming up in my own experience. One of my favorite languages right now is Erlang. It's pretty fast, really powerful, and has a ton of features that make programming easy. But it's syntax is a mess. Just a mess. (Pop quiz: what's the lesser-than term look like in Erlang?) That worries me. It's a problem. It means that given a choice between Erlang or another language with more conventional syntax, Erlang might fail to gain mind share--even if it's technically superior--and that's the death sentence for programming languages. Not right away of course. Languages don't die suddenly. It's more like a creeping death where new programmers don't bother to learn the language and old programmers slowly migrate away.

But it's death all the same.

Programming Languages are for Humans

Take a look at the title of this post: Programming Languages are for Humans.

That ought to be obvious, but you still find people arguing all the time that it's the job of the programmer to mould themselves to the machine rather than the other way around. Of course, they don't mean to do this--often they don't even realize that's what they're saying. After all, few people deliberately search for obstacles to place in their own way, but that doesn't change the fact that people--and especially program language designers--still accidentally do it all the time. So, here's the question: How do you know when a programming language is preferencing the machine over the programmer?

Answer: Anytime a programming tutorial tries to justify a feature because of the machine's needs, it's probably a mistake. The best example of this kind of mis-feature is zero based indexing.

For those new to programming, picture an array of five objects: [1,2,3,4,5]. Now, let's assign that array a name:

array = [1,2,3,4,5]

Ok, so far so good. Now, let's pull out the third item from that array. How do we do it? Well, if you didn't know anything about how most programming languages work, you might guess something like this:

print array[3]

But that's actually wrong for arrays with zero based indexes because what that means is that you start counting at ZERO instead of one. So, the code above (print array[3]) would actually print the FOURTH item in the array, not the third. Which means that every time you want the third item, you have to actually ask for the second one instead like so:

print array[2]

Let me say that again, To get the third item in the list, you have to ask for the second.

If you are new to programming, this looks like sheer lunacy. And most of the time it is. Zero based indexing is horribly unintuitive. Human beings just don't count this way. Now, it may still offer some advantages if you are worried about every byte being produced by your compiler. So, if you program in Assembly, C, or C++, then I exempt you from this essay. But that's it. Everyone else should start counting items at one. So who am I talking to?

If your programming language runs in an interpreter or virtual machine then I'm talking to you. If your programming language claims to abstract away pointers and manual memory management for the sake of increased productivity then I'm talking to you. If your programming language doesn't have pointers at all, then I'm definitely talking to you.

In other words, if you program in Python, Perl, Awk, PHP, Ruby, or Javascript then I'm talking to you.

Now, some languages already index their arrays at one. Here's a few: Cobol, Fortran, Pascal, Ada, Lua, Matlab, Smalltalk, Erlang.

Notice anything? These are warhorses: industrial strength programming languages designed so that non-expert programmers could solve industrial strength problems. That's amazing. That's putting the human first.

Those languages might not be as fashionable as Python or Ruby, but so what? They work. And what's more, they eliminate a whole area of confusion and potential bugs simply by playing to the programmers intuition.

This should be the battle cry of every programmer. We should constantly challenge every feature in programming languages to make sure they always play to the human's strengths--not the machine's. We should be embarrassed every time a language tutorial has to explain a certain feature as being necessary because that's how the computer thinks.

What? If this is just about how computers think, then let's just stick to Assembly. Otherwise, let's admit that we don't think like computers and build our tools for ourselves instead, and every time someone says something to the contrary, just ask who they think the programming language is for.

Their answer might be illuminating.

Friday, November 4, 2011

Arbitrary Precision Everywhere

Math should just do "the right thing" when programming.

Once upon a time, computers were slow and desperately short on RAM, and programmers were forced to cut corners everywhere they could just to get their programs to run. It made sense to specify, down to the byte, how much memory to set aside for every variable. That micromanagement really got you something. These constraints still apply in the world of embedded programming or for building really foundational programs like operating systems and the like, but for everything else, modern computers are awash in clock cycles and RAM.

So why do I still see programming languages that can't do arbitrary precision math?

Futzing with ints and doubles today is just premature optimization. It just increases the program's complexity and makes it more likely to crash. To be fair, more and more languages are baking in arbitrary precision arithmetic, but I still see a lot of languages that really ought to have it but don't. (Perl I'm looking at you!) Unless the language in question is designed to compete with C or act primarily as an embedded language like Lua, there is no reason for it to come with limited arithmetic facilities.

Arbitrary precision math should also be transparent. Don't make me go looking for a bignum module or a special set of math functions. Design your language so that math just does the right thing. The default should be arbitrary precision; the special library should be ints and doubles.

Given how much RAM we already have, running out of memory is hard to do and getting harder every day. Ten years from now, when we're all sitting on hundreds of gigabytes (if not terabytes) of RAM, exceeding the computer's capacity will be quite an accomplishment.

For that matter, it already is.

Here's an experiment for you Linux users. Type the following at your command line:

time python -c 'print 2 ** 100000'

For those who don't know python, this program computes 2 to the 100,000th power and then prints it to your command line. Here's the output (minus printing the digits):

real 0m0.768s
user 0m0.736s
sys 0m0.004s

Look at those numbers. This program runs in less than a second. That's over 30,000 digits in less than a second! Let's see what happens when you cut out the print statement:

real 0m0.031s
user 0m0.024s
sys 0m0.004s

Wow! Just wow!

So let's run through this again. Computers are lightning fast and stuffed full of RAM. You don't have to justify arbitrary precision anymore; you have to justify its absence.

Thursday, November 3, 2011

Seven Plus Or Minus Lisp

Have you ever heard of that psychological study that claims the working memory of your average human is seven things, plus or minus two? What this means is that most people can remember, on average, seven "chunks" of information.

So what does this have to do with Lisp?

Many programming languages can be read (and thus programmed) in a very orderly fashion: top down and left to right.

Lisp programs don't seem to follow this convention. Instead, you sometimes finding yourself reading right to left (because of returning function values) as well as left to right, and also bottom-up (again, function returns) instead of just top-down. Lisp programs feel a lot more interconnected to me than many other languages. I always find myself looking for breakpoints where I know the state of the program and can take a breather. Lisp doesn't give you many of these.

Tracing through a Lisp program feels like following an M.C. Escher picture. Things go everywhere and loop back again inexplicably, which is tough when your goal is to understand it.

Making matters worst, the Lisp culture appears perfectly content with this complex style of coding. You don't really notice this until you examine other programming cultures, such as Forth, which have the same potential for abuse, but try to avoid it by explicitly encouraging a very concise style of programming. Forth programmers shoot for only 2-3 lines of code per function (word in Forth speak). I don't see this same cultural value is Lisp where verbose, multi-stage functions appear perfectly normal.

Which brings us back to that working memory limit I mentioned earlier. All this complexity increases the amount of program state that programmers must keep in working memory in order to understand their programs. Think about that!

I bet that if you tested many of the programmers who promote Lisp, you would probably find that they have excellent memories or have discovered some memory scheme for conceptualizing their programs. Maybe they've just learned to ignore the parts of a function they aren't working on. I don't know. I do read comments by other programmers which mention that memory is a key feature they look for in other programmers.

Here's this blogs thesis: Requiring programmers to keep track of more than five or six non-linear details at the same time will render a programming language more difficult than its competitors and lead to reduced usage.

Programming languages should be designed with this limit in mind and programming cultures should consciously promote it.

Lisp Needs A Domain of Its Own

Go anywhere on the web that discusses programming and after a while someone will claim that Lisp is the ultimate programming language. Almost immediately others will likewise pile on to support this claim with statements that Lisp is the ultimate--the Alpha and Omega--the One True Programming Language. Famous programmers are usually named who program only in Lisp followed by famous books on programming that use Lisp to illustrate their programs.

The thesis for most of these arguments is this: Lisp is the ultimate programming language because it is a programmable programming language. By "programmable programming language" they mean that you can actually program Lisp in Lisp.

In most languages, if the language doesn't have a particular feature you need, you have only three options:

Beg the developers to add the feature you want
Figure out a way to do without
Go find some other language that solves your problem

Lisp adds a fourth option. It lets you add the feature yourself. This sounds cool. It sounds like you need never learn another language again. You just add it to Lisp and keep on trucking. This makes it the ultimate general purpose programming language. It can be anything!

So great. We've done it. We have found The One True Programming Language.

There's just one little wrinkle: Lisp is basically nowhere.

So far as I can tell, I don't use a single program written in this ultimate programming language.

None of the office programs I use every day are programmed in Lisp. Nor are the web browsers that make up the rest of my computing experience. None of the major operating systems are written in Lisp. Not Windows. Not Mac. Not Linux. None of the most popular servers than run most of the internet are written in Lisp. Nor are the databases that store those websites' information. Do any smart phones run Lisp? My Android doesn't. Nor does my Apple Ipad or my Ipod or any other Apple devices as far as I know. None of them are written in Lisp. Not one.

For the ultimate programming language, that's not a very good showing.

Programming languages are designed to solve specific problems. Assembly provides better mnemonics than raw machine code. Fortran is more human readable than assembly. C is more human readable than assembly and more portable than assembly too. Awk has more tools for text processing than raw C. Perl has more tools for text processing than raw Awk. Python has more tools and is (arguably) cleaner than all of them. Erlang can be made more fault tolerant than other languages. Javascript is more ubiquitous for client-side web programming than other languages. Lua is easier to embed in C programs for scripting purposes than other languages. PHP is more ubiquitous (and maybe has more tools) for server-side web programming than other languages. Etc...

Obviously you can argue these points, but the designers of these languages would largely support what I just said as primary rationales for why they designed these languages in the first place.

So what again was Lisp's advantage? What problem does it specially solve?

Oh yeah: Lisp has more tools for programming Lisp than other languages.

Hmmmm... Is that a problem that needs solving? I don't think so, and I think this answers the question of why we don't see Lisp actually being used very often: it doesn't provide an advantage for solving the problems programmers need solved. To use Lisp as its acolytes direct: you first need to reprogram Lisp into a domain specific language targeted at a particular problem. But as the list above suggests, others have already been working in those domains and have already gone a long way towards building special tools for them.

What this means is that vanilla Lisp isn't good enough. Nor is merely--and that's a big merely--reprogramming Lisp for a certain problem space good enough. It isn't enough to make Lisp just as good as Perl for text processing. You have to make it better. You have to reprogram Lisp into a domain specific language that is better than what others have already built.

THAT. IS. A. TALL. ORDER.

Here is my thesis: The Cambrian explosion of domain specific programming languages is a better solution to most problems than a general purpose Lisp.

I wrote this essay to find out what I thought Lisp's future is going to be. People still argue--and quite convincingly--that Lisp is The Next Big Thing coming down the pipe, but I'm going to respectfully disagree unless we see a new generation of Lisp Machines emerge with parentheses going all the way down.

Because then Lisp will finally have a domain of its own.

(Note: For the purpose of this article I'm lumping Scheme, Common Lisp, Dr. Racket, and other Lisp like dialects together)