Tom's research

From WirelessAfrica
Jump to navigation Jump to search

This is a brief document outlining current thinking on the WISP-in-a-box project that I (Tom) have been working on. It is meant to eventually be merged into the rest of the WISPiab articles, but for the time being it's here so new thinking can be easily seen and commented upon.

Use Cases

Our basic mesh setup consists of mesh nodes that provide a wireless backhaul to a central server and gateway and end-user access in the form of ethernet connections to the mesh node. We aren't explicitly concerned with running access points; if that functionality is desired then separate devices will need to be attached to the mesh nodes to act as wireless bridges. Each mesh node will have its own captive portal, supporting multiple clients, and communicate with a central AAA server.

Design Goals

The mantra here is reusing existing solutions on low-cost hardware. We don't want to reinvent the wheel, and we need to do everything as cheaply as possible. The cost decisions mainly influence [hardware choice]. The goal of software reuse means we want to do as little development in-house as possible, instead relying on synthesizing a single product from various existing components. At the end of the day we may need to do some of our own development, but the decisions I'm suggesting in this document are made with the intent of minimizing that kind of work.

Hardware Choice

The platform for the mesh nodes will be the Linksys WRT. On the basis of cost and availability, there's really no better platform. These needs trump the technical ones that the Broadcom chipsets fail to meet. There are some consequences of this choice, however, in terms of what software can be run on the mesh nodes. More on this in later section on the mesh software.

Network Component Layout

Overview

The entire WISP-in-a-box system consists of:

  • A central dashboard, running on the gateway server, from which:
    • the network can be monitored and all node can be configured
    • pre-paid vouches can be administered
    • software running on the server (web & mail services, etc) can be administered
    • all functionality will work without upstream Internet access
  • Mesh nodes which:
    • receive configuration updates from the central dashboard
    • run a captive portal that interacts with clients and authenticates against the gateway server

Dashboard & Server

The OrangeMesh/Open-Mesh dashboards provide almost exactly the functionality we require. However, they have a number of problems:

  • Both dashboards require the nodes to be running ROBIN firmware to receive configuration updates. ROBIN, as it is now, only runs on Atheros-based devices, not Broadcom devices (like the Linksys WRT). The reason for this is that they want to use a single radio to both operate as the mesh backhaul and as access points. This requires two ssids to be running on the same radio, something the Broadcom radios aren't capable of. We don't need the radio to also work as an access point, however. The limitation therefore isn't complete, however some semi-serious hacking will be required to add support for Broadcom devices into ROBIN, since right now it assumes the use of the Atheros-only MadWifi drivers. See later in this document for the results of my first attempts to make this work.
  • With an eye towards localization & modularity, the OrangeMesh sources aren't quite what we'd like them to be

Because of these limitations, we may not want to use the OrangeMesh/Open-Mesh dashboard. We still want to imitate the same basic style of system, however, while adding some functionality to meet the additional needs of our dashboard (i.e. ability to administer vouchers and server services, things that the OrangeMesh/Open-Mesh dashboards don't cover). This, therefore, is what I envision the dashboard looking like:

Overall, there is a tab-based configuration page, one tab for each high level task, in a frame above a main page. The tabs are:

  • Server Administration (brings up Webmin in the main frame)
  • Vouchers (brings up phpMyPrepaid, our voucher control system)
  • Network Status (new custom dashboard component)
  • Network Configuration (new custom dashboard component)

We will have to write our own interface for Network Configuration. Despite the design goal of heavy code re-use, the limitations of existing software and current needs of this project mean we'll have to do our our development here. We can use the Orangemesh/Openmesh/ROBIN setup as a reference, and where appropriate can adapt their code to suit our needs.

The Network Status and Network Configuration components (which we may decide to combine into one) provide the following functionality (keep in mind this is first-round functionality, there may be other features we'd like to implement later, but for a version 1):

Network Status:

  • A semi-detailed list of all the nodes in the mesh, include:
    • the node's IP and MAC addresses
    • the node's current status (simple up or down)
    • when they were last heard from (when an update from that node was last received)
  • Potentially, some basic usage statistics, particularly heavy users (this may already in in phpMyPrepaid)

Network Configuration:


All of the components of the larger dashboard will be skinned in a unified CSS style.

Mesh Nodes

The mesh nodes themselves will run OpenWRT. They will run Batman for routing. They will run a client daemon modeled on the ROBIN scripts that receives updates from the dashboard and sends any necessary information back to the dashboard. The client daemon will have to be co-developed with the dashboard.

In addition to being configured through the dashboard (which will be the primary way of configuring things) we will include a minimal web interface for configuring each individual node. This interface will run on the nodes, and can be a stripped-down version of X-WRT's webif^2 or FFL

Other thoughts & issues

Positioning of the billing system

There is some question as to where the captive portal should sit. We've thus far assumed it would reside on the nodes themselves, in the form of CoovaAP. CoovaAP is no good for this purpose, however, because in order for it to run a captive portal it automatically takes over the wireless interface to run as an access point (so it can't be used for the mesh backhaul). Just using the CoovaChilli component, however, and swapping around it's normal outgoing / incoming interfaces, might work. The idea would be to run BATMAN on the wireless interface, tell CoovaChilli that the wireless interface is its outgoing interface, and then somehow bridge the LAN ports together and run a DHCP server only on them, not on the wireless. This seems easy enough, but as far as I can tell has never really been done, so this is a good point for experimentation.

An alternate strategy would be to have the captive portal reside on the server/gateway. This wouldn't allow for control of traffic inside the mesh, but that's probably all right (local traffic isn't what's going to be billed). This has the distinct advantage of only having *one* instance in the entire network, so configuration changes made to it don't need to be pushed out to all mesh nodes. The problem with this method is that it needs to be implemented at Layer 3, because clients won't be on the same Ethernet segment as the captive portal (they'll be multiple wireless hops away). CoovaChilli is a Layer 2 captive portal solution, and furthermore it *needs* to be the DHCP server for its clients (there may be ways to hack it to not require this, but at least in its stock form). Layer 3 captive portals exist, and this may be worth looking into as well. If we can find one that works with RADIUS, we can just plug it in to our existing FreeRADIUS / phpMyPrepaid billing architecture.

Update: I've gotten CoovaChilli running on a WRT running Kamikaze, configured such that the wireless is used as the default gateway (right now its in client mode tied to another AP, but could be used in ad-hoc mode for a mesh too) and the LAN hands out DHCP leases. Radius configuration is a bit tedious, so it's not quite functioning fully yet, but it seems to work no problem. This is a good thing, because it means we can run just CoovaChilli on all the mesh nodes without it affecting the wireless interface, and alongside any other software that runs on Kamikaze. There is one open problem here, however: CoovaChilli itself doesn't come with a web interface. It's interface is built in to the rest of the interface for the CoovaAP firmware. There are a few possibilities to handle this (write our own interface, rip the existing one out of the CoovaAP firmware) but none are totally desirable.

Single-configuration of Mesh Nodes

One of the major goals of this project is to have a single interface from which we can make changes to all of the mesh nodes. There are several approaches to this that I can see:

  • There has been the suggestion of simply designing our own web interface that directly modifies the config files of every node. I am a little concerned about how robust and modular this solution would be. If we want to change a component, we need to redesign the web interface, and if a component's config file changes (say, between versions, and we want to upgrade to get some new feature or patch a security hole) then the web interface could potentially break. Furthermore, the task of this is a bit tedious, although certainly not impossible. In sticking with the design goal of reusing existing tools (not reinventing the wheel), I think we may be better off with a solution that doesn't require us to develop as heavily for a specific task like this unless we absolutely need to.
  • Alternatively, Orangemesh provides this functionality through ROBIN, but for the reasons mentioned above it doesn't really suit our purposes. We could try to modify it or otherwise use their update system (which they must have, but I'm unfamiliar with).
  • Lastly, I have a nascent idea that rather than updating the mesh nodes by simply modifying their configuation files, we could design a small utility that would run on the mesh node connected to the server. This mesh node would have everything we needed on it (e.g. OpenWRT, BATMAN, possibly CoovaChilli, etc.). After it was satisfactorily configured, using existing interfaces hopefully, this utility would essentially make an image of the entire node, and then upload this image to every other node in turn using private-key based SSH file transfers. Each node, upon receiving such an image, would write it to disk and reboot. This is certainly a little slower than directly modifying configuration files, but I think it has the possibility to be a simpler, more robust, and much more modular solution (one could even add whole packages or so to the master node and then have those changes propagate throughout the network). The feasibility of this approach isn't certain, but I think its worth further investigation.
How a flash-based configuration solution might work

A discussion with David Johnson made it seem like this is possible. Some specifics: The master node (node connected to server) gets configured. Once it looks good, the flash distribution utility is run. It commits all changes and writes an image file based on the master node's disk. The master node then goes through the list of every node it knows about (i.e. the rest of the mesh) and using a private SSH key initiates a script on each of those nodes. That script goes and fetches the image from the master node. When it gets it, it does some basic sanity checks (checksum, etc.) on the image, then sends a message to the master node saying it's ready. When the master node has gotten a message from every node in its list saying that that node is ready it sends out another command, telling each node to execute the change. If it doesn't get a message from every node, it informs the user which ones are missing, indicating some human intervention is needed at those nodes. Every node, upon receiving the execute command, waits twice the maximum transmission time or so, then writes the change to disc and reboots. Furthermore (and this is borrowed from the FreeBSD guys) it keeps the old image around (it should have enough RAM to do so) and upon reboot has a watchdog timer running. If it can't find the mesh after a few minutes, it will revert to the old image and reboot again. This is probably the trickiest part, but the entire system seems fairly simple. It'll take a little while for the entire process to go through, but many configuration changes can be lumped into one update, so for normal operation this should be fine. A further feature that was discussed is the possibility of all nodes, at some set time (say midnight each night), switching to a certain "update" SSID and channel and checking the master node for a more recent firmware version than what it currently has. Most nodes will be running the current firmware and immediately switch back to their normal SSID and channel, resulting in little downtime. And "lost" nodes, however, will download the correct firmware, thereby "rejoining the herd" so to speak.