Download tiny.tar.gz   [35k]
(includes source and binaries)

Tiny Programs

The programs listed here are provided as exhibitions of just how compressed a Linux ELF executable can be and still function.

Take a look through /usr/bin: how big is the smallest ELF executable in there? Now try writing the smallest hello-world program you can. Can you get it under 1K? Can you get it under 100 bytes?

This is precisely what I set out to do one day, and I ended up with a minor obsession over creating the smallest possible executables.

The distribution comes with preassembled binaries along with the assembly source code. If you wish to be able to rebuild them yourself, you will need to have nasm installed.


Some Introductory Blather Which You Should Feel Free to Skip if You Want to Get to the Good Stuff

Your average ELF executable, even after being stripped, contains a fair bit of overhead, some of it completely superfluous. And when you rip out everything that doesn't contribute to a program's memory image, much of what remains, beside the program code itself, is information needed by the dynamic linker. This is of course very important information to have, normally — but if the program makes its own system calls, it can dispense with that overhead as well.

Of course, there are few tools out there that will create an executable that has none of this extra machinery. But if one is willing to define the file entirely by hand, an executable can be created with the absolute minimum contents, including only the information that Linux needs in order to get it into memory and start it running.

The minimum information an ELF executable file needs to have is an ELF header and a program header table with at least one entry. (For more information about these entities, consult the ELF standard.) But, even in these two remaining structures, there are fields that aren't used, particularly with a bare-bones executable. Furthermore, there are other fields in these structures which do serve a purpose, but which are not, at this point in time, actually examined by the Linux OS. So, these fields can be used, if the programmer is desperate enough, to hold other bits of data, or even small amounts of code....

Let me pause for a moment and make this perfectly clear: I emphatically do not advocate the wilfull, widespread disregard of standards. Certainly, just because the ELF standard says that a given field will hold a certain value does not automatically mean that Linux ought to examine that field and reject any ELF file with something else there. But by the same token, just because (the current version of) Linux ignores the contents of that field does NOT give programmers implicit permission to fill it in with anything they like. When programmers en masse adhere to existing implementations instead of existing standards is exactly when a well-written standard becomes an obstacle instead of a tool, when future standards are forced to canonize ludicrous practices in the name of backward-compatibility, and when everything generally goes to hell. So: where these programs violate requirements of the ELF standard, do not take them seriously. These programs have been offered up as entertainment, and perhaps as an education in existing practice. But please do not seek to emulate such behavior in other, more serious programming work.

This truth is illustrated by the fact that I have had portability issues with these programs more than once in their history. For example, the 2.0 Linux kernel ignored the e_phentsize entry in the ELF header. Early 2.2 and 2.4 kernels did as well, but then at some point it began to be checked, presumably as part of adding support for 64-bit ELF files. Starting with the 2.6 kernel, program segments that had a different memory size than their file size were no longer allowed to be non-writeable. Some 2.6 kernels permit ELF binaries to load to address zero, and some do not. All of the above cheats were ones that I have had to abandon in order to keep my smallest programs working. Partly because of that, you will see that with the more interesting programs, these tricks are ignored in favor of full compliance with the ELF standard, while still being small.

Of course, violating standards requirements is only one way in which these programs achieve such compression. Other aspects regarding the organization of the ELF structures in these programs is admittedly unorthodox, but nonetheless in complete adherence to the standard. Most important of all, of course, is the creative reworking of algorithms. Finally, as with any CISC assembly programming, there are also plenty of uncommon instructions that can be used to accomplish common tasks.

The path of optimizing an ELF executable for size is described in loving detail in my essay:


The Tiny Programs Collection

true.asm source
true binary

This program returns an exit code of either zero or one, depending on whether it is invoked with the name "true" or "false", respectively. This one is the runt of the litter. Its size is 45 bytes. Count them. This is the smallest it is possible for a Linux ELF executable to be.

hello.asm source
hello binary

The program that started me off on this whole pursuit: hello world. It is 62 bytes long. This one is quite dense. With the program header table overlaid on top of the ELF header, and program itself running through both of them, some of the bytes have no less than three different purposes.

Here you can find an even smaller hello-world program. (Unfortunately I do not know the author's name.) It is only 58 bytes long, and would have been 57 if the author had used the same version of the greeting as myself. The program uses the technique of loading to absolute address zero, which permits a number of tricks that further reduce code size. I have not used this technique myself, because sadly some versions of Linux do not permit executables to load to address zero. It is an ingenious little program, though.

keepalive.asm source
keepalive binary

Here's a program that I actually use. It simply runs forever, printing a bell character every four minutes or so. I keep it in the background after logging into a remote machine that times out connections when they're idle. At 56 bytes, it's a bit longer than the one-line shell script I originally used, but on the plus side it doesn't take up an extra process in order to sleep.

bf.asm source
bf binary

Brainfuck is a very simplistic programming language, which was invented by Urban Mueller solely for the purpose of being able to create a compiler that was less than 256 bytes in size, for the Amiga OS. (More information about Brainfuck can be found at my Brainfuck page.) I eventually decided to take up the challenge as well, and create a Brainfuck compiler for Linux using less than 256 bytes. The compiler works as a filter: redirect the source code to the compiler's input and the compiler's output to a file. (And don't forget to chmod +x the output file.) At the beginning I didn't expect that I would actually succeed, since I thought I would need almost half of that just for the headers. I think my first cut was 458 bytes in size. I was quite pleased with myself when I trimmed that down to less than 300 bytes, ecstatic when I finally reached the 255-byte goal, and downright triumphant when I later got it to work in 238 bytes. By the time I had a 198-byte version, I was just plain dumbfounded. It's now at 166 bytes. And though I can't quite believe it, it works just as well as the first one did. As useless as it is, I think this one would have to be the crown jewel of this little collection. It is quite possibly also the ugliest and most opaque program I have ever written.

hexdump.asm source
hexdump binary

This is your classic hexdump program. I wrote it one day in a fit of pique, after trying (and failing) to convince the standard Linux hexdump program to use the canonical hexdump formatting style. This program, and the ones that follow, are different from the previous ones in that these adhere 100% to the requirements of the ELF standard. I did this not only because I got tired of fixing them when newer versions of Linux broke my old versions, but mainly because, unlike the previous programs, these actually do something useful. I still use hexdump on a regular basis, as the standard Linux hexdump program still can't be coaxed into this output format. The initial version of hexdump simply read from standard input, but I found this limitation irritating enough that I sacrificed a few more bytes in order to specify a filename on the command line. Its size is therefore 202 bytes.

ls.asm source
ls binary

After hexdump, I came into contact with Konstantin Boldyshev, who headed the asmutils project, and I got involved in creating more tiny programs that were actually useful. ls was my first real achievement along these lines. It is 1017 bytes in size, and sports many of the important command-line options:

-C columnar output; default if stdout is a tty
-1 one file per output line; default if stdout is not a tty
-l "long" output
-F show file type suffixes
-i show inode numbers
-s show size of files in 1K-blocks
-d don't expand directories
-a show all (hidden) files
-B don't show files ending in ~ (backups)
-N don't replace non-graphic characters with a question mark
-b show non-graphic characters as octal escape sequences
-R recursively list contents of subdirectories

The "long" output format only displays numeric user IDs, and it displays timestamps as an age, instead of the actual timestamp. There is no sorting capability, and the columnar output is much less intelligent than GNU's (though it looks pretty good most of the time). Beyond that, however, it conforms pretty closely to the standard ls program we all know and love.

base64.asm source
base64 binary

This is a simple base64 decoder utility. Like hexdump, it can either accept a filename on the command line, or work on standard input. Since sometimes very large files need to be encoded in base64, I violated my prime directive a little bit and let the program grow a bit more than strictly necessary, in order to optimize for speed. This version is there significantly faster than the utility included in GNU coreutils. Its size is exactly 256 bytes.

factor.asm source
factor binary

This is an implementation of the standard Unix utility. It displays the prime factors of the integers specified on the command line, or on standard input if no arguments are given. This program is the oddball of this collection. Instead of simply trying to achieve the smallest possible size, I decided to shoot for real portability, for a change. This program not only conforms to the requirements of the ELF standard, it is also the only program in this collection that actually dynamically links with libc (as opposed to making its own system calls). Thus it should continue to work with any future version of Linux, as long as new versions of the libc ABI and the ELF standard remain backwards-compatible. And since, like base64, factor is a program that can sometimes run for a long time, I attempted to balance optimizing for both size and speed. This program therefore makes significant use of the FPU, in order to increase parallelization in the inner loop. (The success of these optimizations varies greatly from CPU to CPU: on my machine it requires about 60% of the time of the standard utility, but on others it runs at effectively the same speed.) Another unique feature of factor is that it has online help, version information, and error messages. It therefore arguably stands as a completely functional replacement for the version in GNU coreutils. Its size is 1020 bytes.

snake.asm source
snake binary

By now you're probably thinking: "Okay, the first couple of programs were impressive, but now it's just repetitive. Isn't there anything interesting in here?" Well, here's something to reward you for reading all the way to the end: a game! Snake is a classic computer game -- in fact it's so old that you can hardly find it anymore in its original form. I originally wrote this as an Easter Egg for a contract involving text-terminal interfaces. Now it's available to all as a tiny program. In case you've never played it, the object of the game is to guide your snake around the playing field, eating the food blocks that appear at random locations, and dodging the walls and your own tail. As you eat the blocks, your score increases but so does your length. The program should work on any VT100-compatible terminal (although see the program comments regarding the line-drawing character set — not all Linux console fonts provide correct support for them). Plus, at 1496 bytes, it's the perfect addition to a rescue floppy ... when you've just lost all your data, what's better than a game to take your mind off your troubles? (My personal best is 499, by the way. I'm sure you can do better.)


Software
Brian Raiter
Muppetlabs