Engine Status: Scanning the Internet Percent Completed: 100%
Total Runtime: 0 Estimated Completion: Finished

Who

This project was started by me (Barrett Lyon) as a response to a conversation with my colleagues. Over a lunch we were discussing William Cheswick and Hal Burch's Internet Mapping Project. I was inspired by their beautiful maps but they did not seem to be very useful nor do they release their code freely. Their mapping also was not much of a public affair, and there reports that the images took months to make. (According to Bill Cheswick, the Internet Mapping Project runs now with 20 minute daily scans and has a much improved image creation system.) My comment during the lunch was that, "I can write a program that can map the entire net in a single day." The comment was met with some hostility. Thus, this project was born.

What

The first goal of this project is to use a single computer and single Internet connection to map the location of every single class C network on the Internet. It is obvious that the Internet is not routed as a bunch of class-c networks, but it is easy to see that by treating the Internet IP space as a bunch of class C networks, it will be possible to make a detailed map of the entire Internet. The global Internet address space currently offers 32 bits worth of unique host addresses, or a theoretical maximum of 2^32=4,294,967,296 hosts. In reality, the address space has been allocated in fairly large contiguous blocks, which renders strictly optimal utilization difficult. The smallest block that is logically routed via BGP or allocated by ARIN is a class C network (CIDR /24.) After that concept was shown as possible we moved on to a multi-node scanning system to provide better image and route detail.

At the rate of 194 traceroutes per-second it is possible to scan the entire theoretical 2^24 space within a single day. Thus about 16,777,216 class C networks could be processed by a single computer in a single day. Yet, there are huge portions of network blocks that are no longer used, many network blocks fall into the RFC 1918 standard and other blocks that are reserved by ARIN.

According to ARIN there are about 47 class A networks in the reserved status (search ARIN for OrgName "Internet Assigned Numbers Authority".) Doing the math results in a reduction of 3,080,192 class C blocks to be removed from the scan list, leaving us with a theoretical list of 13,697,024 blocks.

Applying some additional thought large portions of the 13.7 Million blocks may route to the same place. By testing about 20 routes at random within a class B and comparing the results, it is possible to see if there are multiple routes worth investigating or if the entire thing goes to the same place. By applying that logic it increases the speed of the scanning.

After some testing and beta code I proved that with enough bandwidth it is possible to scan the entier Internet with a single computer. The 1/5th of the Internet map only took about 2 hours to create, yet it generated nearly 200k/sec of traffic and put my machine at a load of 60+ while scanning. If you apply the math, the entire internet would take about 10 hours to scan and another hour or two for the visual map output.

As it turns out, the route collection process is rather slow if we want high detail on the image. It only takes minutes to draw the entire Internet's routing strcture, but to get a very high resolution image - the longer you scan, the better the image you get.

Creating the image itself takes about 20 seconds once we have the LGL output from the database. All of the first images were created on an old Dell 533MHz laptop running Linux. The new images are all created on a PowerBook G4 with OSX!

I found a lot of value in the project, so after the proof of concept was completed I continued to program. I turned the entire system into a distributed client/server model. The clients request a chunk of random IP space from the server and when it is completed the IP space is registered with the server. This is done until all of the IP space has been scanned. I'm also working on a stats system so I can monitor the productivity of the different scanning nodes and users involved in the project.

By taking a more distributed approach the data will look more like the real Internet. It will show more of the backup routes, more of the smaller links in different countries, etc. When the first version of the code is done I should have about 5 to 10 different scanning nodes running on the Internet. If you would like to donate a computer and some bandwdith to this project, please contact me. I can give credit where credit is due!

When

The first scanning tests began in late October 2003 and I wish to have the project generate a new map every week.

Where

Currently the project is hosted by Lyon Labs.

Why

This project started as a bet, but after I warmed up to the idea I found a lot of value in the project itself:

  • Mapping the Internet weekly will allow us to see major disasters in different parts of the world. The Internet is a huge disasters sensor. If I had maps of pre-war Iraq and then compared them to today, one could see how badly Iraq was destroyed. The idea of a metaphysical representation of the real world is very interesting to me.
  • The project can show the Internet growth.
  • The project is art.

    How

    The project is simple, using ARIN checks to create a list of usable IP space; I am using PHP to control the entire project. The PHP code I wrote has a process control unit called trctl() that controls all traceroute activities and uses a resource handler to monitor io of the applications and conversion of the traceroute output into Dot. It also handles management of route testing.

    The out-of-the-box traceroute code had to be modified to increase the traceroute speed. It also had to be modified to terminate on a single failed TTL. It is also executed without resolving turned on and other options.

    The resulting output of the PHP program results in a huge database, the database is then parced into a code called "LGL". The LGL code is then used by a Java application called LGL View, this then outputs the images you see.

    What now?

    In 2003 Barrett started a company called Prolexic Technologies where he spent every hour of his work time saving businesses from DDoS attacks. After a major conflict with the managment, Barrett left the company he started and created a new content delivery system called BitGravity. This project was put on the back burner until Barrett had the correct time and money to invest into Opte. Someday soon he would like to make a software creation project with a large grant that goes to the winner. This would further research and design and continue the spirit of Opte! We wish him luck with BitGravity and we hope he can rebuild Opte to something magical in the coming year!

  • Barrett Lyon creates fun companies that do all sorts of innovative exciting things with video and security. CDN cdn
    BitGravityBitGravity Barrett Lyon
    BitGravityBitGravity
    Barrett Lyon
    LimeLight Networks LimeLight Networks
    EdgeCast EdgeCast
    CDNetworks CDNetworks
    Consulting Consulting
    Speaker Speaking Opportunity
    Speaker Speaking Opportunity
    Content Delivery Network Content Delivery Network
    Content Delivery Content Delivery
    Flash Streaming Flash Streaming
    Interactive Video Interactive Video
    Live Streaming Live Streaming
    Live Video Live Video
    Streaming Audio Streaming Audio
    Streaming Media Streaming Media
    Video Delivery Video Delivery
    Video Hosting Service Video Hosting
    Video Podcasting Video Podcasting
    Video Podcasts Video Podcasts
    Video Services Video Services
    Video Streaming Video Streaming
    Barrett Lyon Barrett Lyon
    Barrett Lyon Barrett Lyon
    Barrett Lyon Barrett Lyon
    Barrett Lyon Barrett Lyon
    Barrett Lyon Barrett Lyon
    Barrett Lyon Barrett Lyon
    Barrett Lyon Barrett Lyon