September 2013


Earlier posts discussed the distributed microblogging system I’m building, why I’m writing it, how you would use it, and how it works. In this post, I’ll describe the tools & technologies I’m using to write it, and how you can get involved. It’ll take a long time to write, given the amount of time I can devote to it, given life, family, study and day job, so I’d be very happy to receive help!

The software is written in Scala, and its code is currently hosted in a Mercurial repository on BitBucket. I build it with Maven, write it in IntelliJ IDEA, and use test-driven development as rigorously as possible. It is released under the Apache License, v2.0. Installable software is available for Mac OS X (Snow Leopard or greater), Windows (XP or greater), and Ubuntu Linux (10.04 or greater). I use Software Crafting as my approach, apprentices are always welcome!

The main technologies I’m using are JXTA for the peer-to-peer communications (of which, more later), Play for the client REST API, Bootstrap, Jquery and HTML5 for the web UI. Storage is handled by an embedded H2 database, with my CommonDb Framework for data access.

The rough architectural plan is that on top of JXTA, I intend to have an anti-corruption messaging/asynchronous RPC layer feeding into the domain model, this being isolated from JXTA. Group membership may be handled by an implementation of the Paxos consensus algorithm. Replication is to be handled by a simple gossip protocol, both for the updates to the directory, and between peers in a message replica set.

Interested in contributing to the project? Contact me via this blog, or via @mattgumbley or @devzendo on Twitter; you can find my mail details on the Contact page.

To be continued…

In which I describe the features I’m hoping to provide in a peer-to-peer social network, run by its users, for its users.

In my previous post, I laid out some arguments behind my wish for a decentralised, peer-to-peer social network. This is a system that I’m building. In this post, I’ll describe the features and usage I’m hoping to provide, in as non-technical manner as I can. It’s a simple set of features, but provides the essentials. A subsequent post will describe the technologies I’m using to implement this.

No central system

I should explain the main difference between this system, and other social networks you may be familiar with.

You might visit facebook.com or twitter.com in your web browser, or might use an app on a smartphone or tablet, but Facebook and Twitter have a large set of servers providing the social network to you.

With this system, there are no central servers. The users of the system each run a piece of software (the ‘node’) on their computers, and this plays its part in building the social network, allowing people’s posts to be distributed to their followers.

Open Source

It’s Open Source: free as in speech, and in beer: it costs nothing, the source code is available for you to read or scrutinise and you are free to join me in developing it, translating it, enhancing it, discussing its future direction.

Free as in cost: it costs nothing to run a node, although it will increase your Internet connectivity costs. I’ll try to ensure the node isn’t too greedy with your bandwidth!

Getting started

You would download and run your own copy of the node software – available for Mac OS X (Snow Leopard or above), Windows (XP and above), or Ubuntu Linux (10.04 or above). There’s a desktop version with a small GUI, and a version you can run as a service/daemon without a GUI.

Installation is trivial: drag the application icon to Applications on a Mac; run an installer on Windows; some arcane apt-get incantation on Ubuntu that Linux-heads will find soothingly easy.

I said above that there’s no central server, but there is a server hosting DevZendo.org, which is where the software is downloaded from, but that’s almost all it’s used for. More on that, later.

You run this software whenever and wherever you can – whenever you are using your computer, or when it is idle. The node software would find, and join the peer-to-peer network, handling replication of some user’s messages, and the building of your timeline. It is your home node. You can have more than one home node, say one at home, one at work – and all nodes help to build the network. You can only log into your home nodes, however. Your neighbours might have their own home nodes, that help to build the network, but you can’t log into them, unless they grant you access.

You can opt to provide relay facilities for other users (those behind NAT routers), if you are running the node on a publically-accessible system, and can spare bandwidth, storage and CPU. If you have plenty of this, you can opt to form part of the directory, of which, more later.

Once running, the node software gives you a web site, and you log in to this with your web browser. The desktop node provides a button, which, when clicked, loads the client web site into your web browser.

All setup and operations are then done from your web browser.

To access your home node from the public Internet, you may need to open its web ports on your firewall. Although I’m trying to make all this as easy to use as possible, this step might be problematic for less technical users.

Using the network

After the node is installed and running, you log into it with a web browser, and assign an Administrator password. You can’t do anything else with it until this is done.

Then, as the Administrator, you can create an account on the network.

From the login screen, you can see the Administrator account, and all other users of this home node, including the one you just set up. Now log in using this account.

Logged in as a user, you can set basic bio information from your account settings, and set an avatar picture.

You can search for other users, and follow them. You can see who is following you.

You can post private messages to those you follow, and who also follow you. Such private messages are sent directly to the follower’s node, if it is online. Delivery will be retried if the follower is offline.

You can post a new public message. This is public to your followers, and will be replicated to a small set of peer nodes that hold your posts, to improve availability of your messages, if your home node is offline, or uncontactable. (These peers form your message replica set).

Messages are short – perhaps a little longer than 140 characters though.

Your timeline view will show the posts of those you follow, sorted chronologically. The timeline will show 72 hours of messages; messages older than this disappear. The message expiry time exists since the system relies on the goodwill of its users, hosting message storage – I don’t want to eat your entire disk! Messages replicated to other peers also expire after 72 hours. Maybe more than 72 hours is needed – but there should be some finite expiry.

It’s likely that in building the timeline, there may be replicas that are not online or contactable. You may be viewing part of today’s timeline when these replicas come online, making yesterday’s messages available, so there will need to be some visual indication that there are some older posts from yesterday that you could now read.

A selected message in your timeline can be replied to. Your client would show any replies to your messages.

How the network is built

When you post a message, it is quickly replicated to your replica set (if possible), to improve your messages’ availability to your followers. The size of your replica set is dependent on the number of followers you have: a celebrity may have thousands of followers or more – there needs to be many replicas available to serve their messages. For new users with few followers, fewer replicas are needed, but there will always be a small number.

Your timeline is built by your home node contacting one of the set of peers that replicate your friend’s messages. By contacting a replica of their messages, you will receive a read-only copy of them.

User information, and the graph of followers will be stored in the directory: a replicated set of highly-available, well-connected peers. The directory also records the set of peers that replicate your messages, and which peers are your home nodes.

Search will prove difficult; there may be a need to send all posts to an indexing service, again on a set of high-availability peers, from where search can be effected. This could also be used to provide a “firehose”.