Category Archives: Computers

Why do we code

Last night while the night went on and I worked away optimising some code on a website I got to thinking about why we do it? Why do we code, and by code I’d guess you could apply this to anyone in the opensource community too. Its all lots of effort for no measurable payoff. Or is there?

The project I’ve been working on will never make me money. It’ll probably never be in a portfolio. My name isn’t visible on the site and nothing is credited back to me. And yet I’ve spent at least a few hundred hours on it at this stage with many more to come.

For as long as I’ve coded, I’ve also been doing non-paid coding. Hell its pretty much how we learn most languages unless we are lucky enough to have an understanding work place. But no, this project is written in PHP, a language I’ve 10+ years experience in.

So why do we do it?

The three reasons I have are pretty simple, although explaining them to non-coders never works. If you code, you’ll just nod along, but if you aren’t, well the usual response is just but why?

  1. Its fun. (a.k.a the what the hell are you smoking reason)
    No really, most coders I know actually enjoy it. Its quite satisfying to look at a problem, then solve it. Or to be creating something, even if that something takes days of work and ends up as just a small dot on screen.
    In short, the best coders like their work.
  2. I get some use out of it. (a.k.a. the selfish reason)
    In this case it started as some fun, a challenge even. Then it turned into some useful things that I like to see on a regular basis. It just so happens that others also like to see the same stuff so why not share the love?
  3. Experience. (a.k.a. the job reason)
    Some things in computers you can’t do without real world, high traffic. Yes you can simulate things, but it is never the same as that crazy user who does that thing you never thought possible. And when you get a few hundred of those, well that is when things get interesting. Sure you can say you built a site that does a, b, c, but it is much more impressive to have that site handle X number of users.

Overall I guess the whole thing loop around on itself a lot. It starts because it’s something you need or want, but that gives you some enjoyment so you keep at it, and then it grows only to become something that gives a bit of experience while building something else that you’ll get some use out of. And that gives you some enjoyment so you…

Email Compression

Recently, one of the companies I work for switched across to Dovecot as the email daemon of choice opening up a host of possibilities for new advancements. Compression is one such advancement since Dovecot supports it transparently. But you can’t just enable something that will change hundreds of millions of files without asking some questions first. It can all be summarised as Speed, Speed, and Speed. Read on…

Background to the testing

As with all testing, it begins at the individual mailbox level. My initial tests with mailbox compression consisted of first tar’ing my own mailbox, the gziping that. Then gziping the individual files in the Maildir. Next I tried my archive mailbox where lots of stuff gets dumped to keep it off my primary mailbox. Immediately a problem came up. The compression ratios between two mailboxes varied quite a bit, one at 10%, the other at 30%. Obviously I needed a bigger corpus to test upon.

Given that these email systems I had access to in my new job are a slight bit bigger than what I had access to previously, I decided to take one of the backup snapshots and work off it. And so went the weekend.

Testing Size and Methodology

This data set is a subset of a subset of a subset. It was originally meant to be a single disk from one server but due to some unseen complications (read: a stupid scripting error), it covers about 60-70% of that set. And to fend of the inevitable questions, a single disks it just a crude segmentation system in use to partition email. The reality is that the disk is from a NetApp filer, so multiple spinals are in place.

The data set covers a few hundred active accounts, some pop, some imap, some webmail. No attempt was made to count which was which, or should it really make a difference to the overall conclusion. (Hint: compression is good)

Overall a total of 2,625,147  emails which processed having a size of 399,502,115,126 bytes (Appox 372 Gigabytes). (An aside, this does give an average email size of 148 Kilobytes which is probably above average. It may be possibly skewed by some of the large emails. Further look at this below).

My testing method was to scan the backup snapshot for all emails using the following logic.

foreach email
 copy to ramdisk
 measure size
 gzip, bzip at 1/5/9 compression ratios
 measure size of compressed files
 measure time to read back the compressed files
 cleanup and save results

Given that the files were all being worked on in the ramdisk, disk latency can be ruled out. This was especially important given the source of the files was a file system snapshot. The machine in question also maintained a load under 1 the whole time. It is a multi-cpu system as well, so no cpu wait timing should have occurred.

Goals

I went into this with a few goals set. My main one was to figure out if compression would actually be a good idea. At 10%, the savings wouldn’t be huge compared to the overhead that compressing stuff would bring. Especially given the time to setup and maintain. At 30%, it is a different matter. I wanted to know what the actual average would be over a larger set since obviously my mailboxes were much too different.

Next goal was to produce some graphs. I kid you not. It has been a while since I’ve created something fun, but graphing compression ratios seems like a nice way to do it. Its no MapMap, but its a start!

Next in line was figuring out the overheads. I knew that bzip2 had a larger overhead than gzip does but produces much higher compression ratios. Perhaps there would be a point where using bzip would make sense over gzip. Perhaps there was even a point that emails would be too small to justify any compression at all. Who knows, but the averages from this are ideally suited to a graph.

Results

Speed 1 – Compression speed

While I did measure the overhead while compressing the files, the original logic was that compression would happen solely as an offline task on cpu-free hosts. As such a 2000% overhead wouldn’t matter much provided the host could keep up with the daily volume. Dovecot does support realtime compression via both imap and the LDA, so perhaps this is something I’ll come back to.

What I did set to measure was the impact running the script would have on live or active mailboxes.

The script works roughly as follows

find all mails in $PATH without Z flag and with the S= set
if file is already compressed
 add the Z flag
else
 run the compression

Two things to note, running the find to ignore mails with the Z flag greatly speeds up things. This is due to the fact that dovecot will drop the Z flag when moving a compressed email between folders. That is also why it is necessary to check if the file is compressed before operating. The find itself is also quicker than expected given a hot FS cache.

When compressing, I choose to run maildirlock only around the moving of compressed emails. This means the mailbox isn’t locked for extended periods – it is not uncommon to hit a folder with tens of thousands of mails (or a few large mails) and drastically increase the time taken to process a folder. The benefit of this approach means that users won’t notice a compression cycle running against their folder.

To demonstrate a worst case scenario, I took a folder with 65000 emails. This was loaded up in webmail (Roundcube) and I started selecting random emails. A compression cycle was then started across the folder. Lastly, I started a script with connected in over imap and started moving emails out of the folder. Surprisingly nothing lagged by any measurable amount.

Speed 2 – Compression speed – Readback

The overhead while reading back compressed files was an area I definitely wanted to measure. If compressing an email saved 50% disk space but increased read latency by 50%, then would it be worth it? What about 40-60? How much additional slowness for the user would be okay? Would the extra files in cache offset any introduced slowness on a larger sample size?

As mentioned above, the tests involved operating on a RamDisk to rule out any disk latency. This means the measurements here are purely for the compression overheads, i.e. cpu usage lag. My hope was to measure a percentage low enough that it could be justified by have more data in cache and thus reduce disk latency. What was found was even more promising.

Compression Overhead VS Size - VSmallCompression Overhead VS Size - SmallCompression Overhead VS Size - Large

As the graph shows, gzip really doesn’t add much overhead throughout the sizes until the files get large. However shockingly, gzip actually operates quicker than simply reading the file, even on the RamDisk, in certain cases, usually large files. Results like this would easily help the case but the averages obviously go both ways. And the average overhead of 25.8% is easily eaten up by the extra data in cache by the reduced sizes of the emails. In fact, on some tests with larger emails, we were getting responses back via imap for compressed mails quicker than uncompressed mails even before the cache became useful. Less data to read from the disk meant it returned the data quicker. Definitely something that will need further work.

Overall, since it is usually disk IO that runs out before CPU on a mail server, this really comes down to trading some CPU for reduced disk IO.

Speed 3 – Backup Speed

Since files change once, there is a large enough hit then. The same with space if you lag compression instead of doing via the lda. Ideally you’d be able to differentiate between active pop users (i.e. those that down all mail without leaving a copy on the server)and everyone else. An active pop user probably doesn’t need compression although the benefit of less date to read if the mails comes into play. The real benefit of doing it through the LDA is that your backups will only grab one copy of an email, not the pre and post compression one.

The real improvement for the backups comes with the lower data sizes. Lower data means more ends up in the file system cache. Doing a backup from something like rsync, more folder entries will remain hot allowing it to scan them much quicker. Never underestimate the importance of the FS cache. On a slower testing box, a different of 20-30 seconds can be seen between a cold read (i.e. non cached folder entry) and a hot read of a large (~100k mails) maildir.

Lower individual mail sizes only really play a part if your backups are restricted by your link speed or in a remote location. They will play a part for rsync’d MDBOX folders since these can be large in size. Compressing the MDBOX is much more efficient than doing differential block syncs with rsync. Only time is when the MDBOX purge is run. Depending on the users delete pattern, the chunk size of the MDBOX, and the lattency, the differential option can end up faster for these.

If you are only doing the purge say weekly, then try running it on your backup host before the sync after your live system. If your users are delete happy, it can help quite a bit.

Some  Conclusions

Firstly sysadmin mailboxes are not a-typical mailboxes. Upon testing my own from different mailboxes, I got wildly different results. One had an >80% compression rate, another of <10% with a huge overhead. Yes some of your users will match these profiles (video storage on email anyone?), but it is unlikely you’ll hit a mailbox with >100k mails of pure nearly identical text (a nagios archive).

Once I had the script running on a much large set the averages starting floating around the same point. Things were within 3% of the final average after around 25 mailboxes which was somewhat surprising.

To be honest, with the graphs I was really hoping for an aha graph. One which had a really clear crossover point to point to and go, yup, that is the correct figure for X. But since I didn’t know what X was, it was hard to produce such a graph. I had also forgotten just how much moving the axis affects things, and how much you can adjust things to fit the answers you want to see. More so defining which cross over point you actually want to see also affects the outcome. Do you want to come the overhead of compression against the data in cache? Well then how much cache do you have now because that matters.

One example of the different cross over points is in the following graphs. They cover compression vs the size of mails.  The crossover changes drastically on the small and vsmall graphs.

Compression VS File size - VSmallCompression VS File size - SmallCompression VS File size - Large

But then that isn’t an overly useful cross over. At least not looking at it. Originally I wanted to see if there was any points where compression wouldn’t make sense. Combining the information with the overhead graphs above, it is clear that compression makes sense everywhere from a read-back point of view.

How about if we look at the distribution of file sizes?

Number of files VS File Size - VSmallNumber of files VS File Size - SmallNumber of files VS File Size - Large

There are more small files than large files. This makes a difference if you lag your compression since you can ignore the smaller files and still get most of benefits of compression in a shortly time. Just how much?

Compression with number of files VS Ignored size - Compression with number of files VS Ignored size - SmallCompression with number of files VS Ignored size - Large

What was really interesting if just how much space you can save by only compressing large files, and by large I mean > 1 Megabyte large. If you ignore everything smaller than 500KB you still get a 28% compression rate with gzip VS an overall rate of 38%, and that is with skipping well over half the data set. Not bad at all.

But what should you choose to ignore? Well the difference between doing everything and ignoring say 2KB is within the area of data errors. Basically it probably isn’t going to do anything to your compression rate but will help with the speed of your compression runs. Where you go really depends on how cold your data will be. If you are going to do it via the LDA then you will automatically hit everything, and frankly I doubt there is a need to exclude things at that stage.

My Recommendations

In simple terms the best I offer is

  • Turn on compression at the LDA.
  • Use gzip with a middle setting (5/6) unless you know your mail is in some way different (think archiving with attachments stripped, but even then the midway works pretty well).

To back compress your data, start with a high exclude value, let your backups catch up then lower down, rise and repeat. It is probably also worth excluding by time until you get caught up and your backups have stabilised. Remember that backup space will balloon out by at least 40-50% as you are doing this; all the compressed mails are effectively new data.

Dirvish Backup – Multiple seperate backup schedules

I’ve been used Dirvish now for just over a year. It replaced a number of rsync replication scripts that I had running that were doing rolling backups. While moving to Dirvish has required a few extra scripts to be written, it has been a worthwhile experience. My own scripts weren’t able to handle holding backups for longer periods, at least not gracefully. The biggest issue we had was trying to get Dirvish to do different backups on different schedules. The Dirvish config, while it may look like it allows this at first glance, it really isn’t setup for it. Backups once per day is its bread and butter.

Hopefully this will help clear up a few minor issues with Dirvish and get you running with multiple independent schedules.

Credit where credit is due, some of this is a result of a different sources on Google. We have modified this a number of times over the last year to fit our needs so I’m not totally sure how much of the original remains.

Note: This is from a Debian based system. Paths reflect same.

Initial Dirvish Configuration

For this guide, out setup consists of 1 host which we backup once per day, and the same host which has a directory which gets backed up once per hour. Backups are being stored under /storage/Backups/dirvish.

Our master.conf file – notice that no hosts are actually defined here.

bank:
     /storage/Backups/dirvish
exclude:
     lost+found/
     *~
     .nfs*
expire-default: +15 days
expire-rule:
#       MIN HR    DOM MON       DOW  STRFTIME_FMT
    *   *     *   *         1    +3 months
    *   *     1-7 *         1    +6 months
    *   *     1-7 1,4,7,10  1    +6 month
    *   10-20 *   *         *    +4 days
#   *   *     *   *         2-7  +15 days

runall-daily.conf

Runall:
     host     02:00

runall-hourly.conf

Runall:
      host/hourly/folder

/etc/cron.d/dirvish – This is what calls the jobs

00 01 * * *     root    /usr/sbin/dirvish-expire --quiet   # Expire old images
00 02 * * *     root    /usr/sbin/dirvish-runall --quiet --config runall-daily.conf
00 * * * *      root    /usr/sbin/dirvish-runall --quiet --config runall-hourly.conf

With those in place, our host backs up at 2am every day. The hourly script kicks in every hour. We setup the vaults as normal in the folders defined about. So the hourly is /storage/Backups/dirvish/host/hourly.

Only thing that needs changing is the image-default option in the configs.

Daily vault: image-default: %Y%m%d
Hourly vault: image-default: %Y%m%d%H

Living with it

This setup has run great for us. We get what we need backed up when we need it backed up. There has been a few notable excepts however.

First, one of our hosts started locking up during backup windows. Dirvish then went nuts and started marking incomplete backups correct somehow. We noticed when our bandwidth shot up as it was recopying full machine images across.

Second, our expire rules tend to leave too much data. Yes we could probably fix this, and we probably will when space becomes an issue. I guess the first thing is for our hourlys to reduce down to one a day on the older sets instead of keeping the whole day. But since Dirvish is so good with space, a few months of hourlys isn’t taking too much space.

Overall however, I can’t say we’ll be moving from Dirvish anytime soon for backups, at least for our linux machines.

Windows 7 Aero Peek and System Administration

Windows 7 brought around a number of different improvements for my system administrator job, most notably the fact that it connects to the Win2k8 servers while the old XP system didn’t. However this morning during a mass patching, the Aero Peek feature really has shown its true nature.

Right now I’m installing patches on 6 identical machines to bring them up to date with the patches from our WSUS server. (There are reasons why we don’t clone the boxes to get patches on.) But picture the only, XP way, of doing this. 6 remote desktop sessions and randomly switching between them to see the progress. Switch to the Windows 7 way and it becomes open 6 Hyper-V connections, then mouse over the taskbar icon to get the image below.

Completely wonderful. Total time saver.

In the time normally spent flicking between machines, I’ve written this post, added some more nagios alerts in, and checked on a few other servers.

Komplett.ie and the end of a customer

Long time users of Komplett already know that they switched warehouses and had a few account problems during the switchover. The whole thing seemed a little rushed from the experience on the website, but then management could easily have been pushing them. Things such as not being able to open multiple tabs by control clicking was when I first started getting worried.

Don’t get me wrong now, I wouldn’t be a massive Komplett purchaser, at least not on a person level. We did put a little bit of the company purchases through from time to time although it was about to expand out. However when I could no longer login in to the site and read about missing orders, I decided to pass on placing the order at the time, go somewhere else and then return when all the problems were fixed.

Today is the day when all that changed. The following email arrived in this morning marking the end to my use of Komplett.

So what now? Komplett have always been a consistent company and decent to deal with. Well basically any company incapable of transferring account or even telling customers openly about the issues isn’t a company I want to deal with. (I had to google it to find out what was going on – to komplett, a graphic at the top of the homepage isn’t enough, especially a graphic that isn’t the default graphic, correct place is a link on the login page)

In the end it has worked out better for me. We ended up with a large account with one of the supplier that the likes of Komplett use. Basically we are ending up saving another 2-5% of komplett prices, and we get things quite a bit faster, generally next day delivery.

Acronis Backup – How not to do business

Acronis Backup, a great tool. Backs up files. Lets you set a real schedule (Windows take note). New one seems to compress your older backup files to save space. Forces you to buy new products with every new version. Does not reply to emails outside of maintenance contracts (which you can not renew). Makes it hard to upgrade.

What? Hold on a second. Maybe something is amiss here.

Some Background

I’m a user off Acronis Backup products for close on 18 months now. I’ve been using the TrueImage corporate product. And I’ve been reasonably happy with it, but then I haven’t used it to do any full restores. I did try to pull a large file from a backup in the past. But that didn’t work. it repeatedly crashed while trying to do so. Support, who responded at the time since my maintenance contract was in date, weren’t much help. If the file was critical, I’d have been pissed. Luckily it wasn’t, and a few days and another disk later, I managed a full disk restore, mounted the new disk as an external drive, and then copied the file off. Painful.

Wonderful Search…but only for home users

The lately problem with restoring came from me needing to find a folder on my desktop that I knew existing sometime 5 months ago. All fine I thought, I’ll do a search of the files and it’ll show up. The product doesn’t support searching? Oh your HOME product does but your business one does not. Fine so. I’ll “test” the home version and see if I can switch to that. So in goes the trail version on a blank machine. The initial search looks promising. Finds a few text files in the first backup. I’ll leave it search and see if it can find the files I’m after.

A day later, Windows is still searching, still reading files, sort of, but no results. Watching the accesses, it is bouncing through the individual files in sequence, but then repeating. The Windows search service had also crashed and was unable to index inside the TIB files.

How about Google? They can search everything! And yes, there is an iFilter for Google Desktop. But, and there always is a but, it won’t search the archives either. It craps out.

Again, luckily, I managed to find the files I needed by browsing through a few days in June looking at the desktop each time. Can’t say I’d like to do that for something that went missing randomly.

Upgrades

I don’t know if they make this purposefully hard or not, but finding an option to upgrade took time. Hell I can’t even remember where it was now. All I remember was that the upgrade from the old TrueImage to the new Backup10 costs MORE than buying the home version outright as a new customer. And to add insult, the Home Version has more features, not that you can be sure they work either.

Looking at the upgrades began due to the fact that TrueImage does not work on Windows 7. It has just been added to the DoNotExecute list that Microsoft have in the OS. And it was added by Acronis. There is a way around this list, and everyone who has bypassed it has had NO problems with the program. We can only guess that Acronis have had some reason as to why they did this. Of course it would have nothing to do with their new product coming out. A new product that may or may not actually support Windows 7. The Home one does, but I wanted business style products.

When I didn’t find the upgrade option or any mention of it, I did email Acronis via my customer area. They even replied with one of those auto replies to say the email had been received. All I wanted to know was if the product was supported on Windows 7 and if it could be fixed. If not, what was the upgrade path so I could do it.

Conclusions

Having to wait over a week and counting for a reply to upgrade has left a bad taste in my mouth. It reminded me of the past experiences of the product and it failing.  Also being unable to renew the maintenance support after the initial 12 months is worse. But more so since they operate in a 14-16 month product release cycle going on other people I’ve talked to. That would only be the case to case people to not get product updates under their maintenance agreement and be forced to upgrade or buy the brand new product each time.

It is the lack of support that as sealed the deal for me. TrueImage has been uninstalled from my laptop (not that it worked since I swapped to Windows 7). I’ve already begun searching for alternatives. The inBuilt Windows Backup may do the trick if I can get it to run on weekdays only. One I am itching to trial is inSync by a company called Druvaa. I’ve been testing their new Server backup tool and am VERY impressed with it. So something like this would be overkill for personal use, but it may work. Especially with the dedupe, it may make it a good option. Time will tell. All that is sure is Acronis will not be getting any more of my money, or any from any of the companies I work for. Their loss.

Double disk failures – A storage nightmare

Anyone who has worked with storage systems, or even large personal installs has heard of them. Double disk failures. Words you never say. Ever! You can be banished from the server room for even suggesting it is possible!

But the reality is it can, and does happen. It is why we have hot swappable disks, or even hot swappable drives. I’m even looking at some array by NetApp which has something called DP or Dual Parity which, they say, can handle two separate disk failures without taking down the array. Something that sounds very interesting really. The Dell / Equallogic array we have on test currently runs in a type of raid 50 so you can lose two disks but only from separate arrays. The other two disks are running as hot standby disks to allow for online rebuilding.

The setup

My current, dilemma we’ll call it, is with a much simpler setup. Intel based server with 8 SATA disks connected to a 3ware card doing Raid-5. It is a high end 3ware card too, a 9650. (I do NOT recommend these cards. We have numerous other performance issues with the cards in both Windows and Linux, the Windows ones being much worse, currently stopping me copying backups). Anyway, to make things a little more challenging, something every admin loves in their day is a challenge, the server is remote. In another country remote.

Anyway, this machine has been running fine for nearly a year. Raid array sitting there taking files happily enough. When I started testing some further backups recently, I ran into some troubles. Most of it looked to be Windows related so the usual apply the updates, reboot the machine and see what happens. Only on the first reboot, wham, disk 8 offline. Ok, so I’ll finish the updates and then worry about getting another disks over to be put into the machine. Next reboot, disk comes magically back online but in a degraded state. Strange, we’ll let this rebuild and return tomorrow, see if live has returned to normal.

Normal is normal is just a cover

Sleeping on things and letting the array rebuild and everything looks to be great and just a temporary problem that we can forget about and move on. Never a good idea but when you are overworked, what can you do?

Another day passes trying to move backups across and we hit another windows error. This time requiring a registry fix to increase the IRQStackSize. So I bang in the first change and reboot. Login and strange, the system is locking up it appears. Open the 3ware webpage and get prompted with something  I’d not seen until now.

Raid Status: Inoperable

Luckily these are backups, no live data lost. We can fix this. Hell lets try a reboot and see. Can’t do anymore damage can it?

The Recovery?

Rebooting fixes disks, magically. Both disks back online. Array in a consistent state. Why not leave well enough alone?

More windows problems and another reboot. Back to two disks offline. Reboot again and one disk gone. Useless, useless, useless.

Solutions…

If this was a live server, with live data? I’d probably cry. There’d not be much else to do. You could probably have it rebuild by replacing the disk that was going offline the most, but I’d move as much off as quick as possible. In this case, since it is a backup server, I’ll be getting the guys local to the machine to remove and reseat all the drives. And check the cables inside the case. And then destroy and reformat the array, and the filesystem, with full formats all around.

And then to top it off, 10 reboots, minimum, when the server isn’t looking! If they all work, then maybe, just maybe I’ll look at trusting it again. Any problems and I guess I’m on a plane 🙁

Lessons learned

Well I think I’ll be putting the really critical data onto more than one backup server in future. At least more of the fileserver data anyway. The massive exchange backups will need to be looked at.

Enterprise level SANs are cheaper than you think when you factor in the cost of fixing setups like this. Okay so you aren’t going to be able to get a SAN for twice the price of a server with 16x1TB drives in it, or even three times. You may get a low spec’d one however, and if it gives more piece of mind, maybe that is worth the cost? I know that if faced with the decision in future, I’ll probably recommend a SAN and attached server for a file server assuming it is above the 1TB mark. Lower than that, you can probably get anyway with the multiple servers, replication software AND backups. Replication software is NOT backup software. Delete from live, deletes from backup.

And what nows?

I don’t know. All I can hope is that reseating disks and cables fixes the array, gets it online and lets me start transferring backups offsite. Another box is going to be added to give more backups, hopefully point in time backups too.

Backups really are the largest cost for something you never will use. I do honestly hope I never have to pull any data from backups, ever. It is possible what with Volume Shadow Copies on file servers and raid disks for servers. And maybe real permissions for applications, but that is another day!

In Private Browsing – aka Porn Mode

I’ve recently started using the In Private browsing feature of IE8 more and more, and no not for Porn!

For testing sites I’m developing or doing a clean Google search, it would usually involved closing the browser, clearing cookies / cache etc. and then restarting. It is now reduced to Tools -> InPrivate Browsing, and bang you’ve got a clean browser session. And I know Firefox supports this, but they really make it unusable if you run Firefox with lots of open tab (currently I’ve 36 and it isn’t a busy day) because it shuts the browser down, opens the private browsing mode and then restores things after it is finished.

I guess I’ve long since used two browsers. Firefox for personal stuff and general web development tasks. IE for Intranet net applications and not the InPrivate feature.

If only someone would invent some proper work spaces for a browser. And some better way of storing/organising favourites.

Bill Gates – An Interview Through Time

Some things don’t change, and in computers, while it may look to be moving really fast, a lot of the ideas have been floating around for a while. Technology enables technology.

I read about the book “Coders at Work” which the author kindly posted up a few interviews online. Bill Gates has one from 1986 (found here). It really is a truly insightful read. Bare in mind that this is pre windows, and not just gui windows. Hell it is pre CD Rom.

While talking about 4k memory limits, the thoughts of 650meg of data accessible to a program must have seemed surreal. Our minds aren’t capable of understanding numbers like this anymore. Best way I can explain it is like moving from only having data available from your harddisk to having data available from a Google data centre.

One of parts that really hit home was when Bill Gates starts talking about what is a effectively a Google Maps mash up.

GATES: CD ROM is totally different. We hope with CD ROM you’ll be able to
 look at a map of the United States, point somewhere, click, zoom in and
 say, “Hey, what hotels are around here?” And the program will tell you.

Quotes like that really do show how far ahead he really was. The internet was only being born but here he is thinking on new ways to display data.

If 1986 can give thoughts that only come to live today, what ways will people start displaying data in the future. Sure todays BIG computer problem is more a data problem than a technical one.