The day started with presentations, and at first I was interested by many of them. The spreadsheet is still online, but I put the following projects on my shortlist:
- Better blood results
- British english medical spelling dictionary
- CAMHS Inpatient Bed Finder
- Daily pollute
- Rota Manager
- Dockerised integration engine
After doing some walking and talking to different people in the room, I ended up sitting with Mike and Tony who work in the NHS at King's College Hospital as developers and were behind the "Dockerised intergration engine" project, and Piete who has some experience with system architecture and Docker in particular.
After a bit of discussion, it turned out that the main problems that Mike and Tony had, was that they were using tools, both for a framework, and deployment processes that they did not have much experience with, and this manifested in having one large "monolith", which required restarting when changes to any component were required.
At this point, I left Piete, Mike and Tony to work on just setting up a smaller isolated service, using Docker and NodeJS, which Tony already knew a bit about, but was interested in getting more experience in.
Now it was lunchtime, and I ended up sitting with Calum (who I have met before at Many to Many) and Devon, who had proposed the "British English medical spelling dictionary" project earlier in the morning.
We decided to have a go at this project, and set about working out the goals, and what work would be required to achieve them. I ended up doing some research in to generating a wordlist (which we ended up not doing), the initial tweaks to the website style, and attempting to create a extension for LibreOffice which would install a dictionary (which was not finished).
As is the case sometimes with these events, the real value is found in the conversations had both on the topic, as was the case when I was discussing the problems that Mike and Tony face day to day, and the discussions I had with Devon, Calum and others, both at the even, and on Saturday evening in the pub on diverse topics of software (including free software, Guix and Debian) and Matt about some of his work and background.
All in all, I'm glad I took the time to attend.
There was a brief gap in use, but then I set it up again in October, driven by the need to monitor machine resources, and monitor the length of the queues (Thread use django lightweight queue). This proved very useful, as suddenly when a problem arose, you could look at the queues and machine stats which helped greatly in determining the correct course of action.
When using Prometheus, the server scrapes metrics from services that expose them (e.g. the prometheus-node-exporter). This is a common pattern, and I had already thrown together a exporter for django lightweight queue (that just simply got the data out of redis), so as new and interesting problems occured, I began looking at how Prometheus could be used to provide better visiblity.
The first issue that I addressed was a fun issue with exausting the pgbouncer connection pool. The first time this happened, it was only noticed as emails were being sent really slowly, and eventually it was determined that this was due to workers waiting for a database connection. Prometheus exporter for PGBouncer.does expose metrics, but having them in Prometheus makes them accessible, so I wrote a
The data from this is displayed on relevant dashboards in Grafana, and has helped more than once to quickly solve issues. Recently, the prometheus-pgbouncer-exporter was accepted in to Debian, which will hopefully make it easy for others to install and use.
With the success of the PGBouncer exporter, I recently started working on another exporter, this time for existing libraries to parse this, which made it easier to put together an exporter.. Now, there is already a HAProxy exporter, but the metrics which I wanted (per HTTP request path rates, per HTTP request path response duration histograms, ...) are not something that it offers (as it just exposes metrics on the status page). These are something that you can get from the HAProxy logs, and there are
It was using the data from the HAProxy log exporter that I began to get a better grasp on the power of aggregating metrics. The HAProxy log exporter, exports a metric haproxy_log_requests_total and this can have a number of labels (status_code, frontend_name, backend_name, server_name, http_request_path, http_request_method, client_ip, client_port). Say you enable the status_code, server_name and http_request_method labels, then, if you want to get a rate of requests per status code (e.g. to check the rate of HTTP 500 responses), you just run:
sum(rate(haproxy_log_requests_total[1m])) by (status_code)
Perhaps you want to compare the performance of two servers for the different request paths, you would run:
sum( rate(haproxy_log_requests_total[1m]) ) by ( http_request_path, server_name )
And everything you can do with a simple counter, you can also do with histograms for response duration. So say you want to know how a particular request path is being handled over a set of servers, you can run:
histogram_quantile( 0.95, sum( rate(haproxy_log_response_processing_milliseconds_bucket[20m]) ) by ( http_request_path, le, server_name ) )
This last query is aggregating the culamative histograms exported for each set of label values, allowing very flexible views on the response processing duration.
At the moment, both exporters are running fine. The PGBouncer exporter is in Debian, and I am planning to do the same with the HAProxy log exporter (however, this will take a little longer as there are missing dependencies).
The next things I am interested in exploring in Prometheus is its capability to make metrics avaialble for automated alerting.
On Monday I started working on Thread. A 3 year old startup that has set out to reinvent how the world buys clothes.
On arrival, I began setting my office machine up with Debian, and left it cloning the rather large git repository while I and the rest of the company went out to a nearby pub for lunch. By the end of the day I had my office machine setup, my name on the website and had begun working on a small feature for the order management part of the site.
On Tuesday, work came to a halt at 11:30. Everyone set of to London Fields for the picnic in celebration of Thread's 3rd birthday.
Wednesday was actually normal as far as I can remember, Thursday featured a office movie night, and today (Friday) I had published my first contribution to the site, along with enjoying my first office lunch.
Thread use a great set of technologies, Python, Django, PostgreSQL and Debian. I have learned loads in just my first week, and I can't wait to get stuck in over the next few weeks.
My last event in Southampton last week was the Maptime Southampton June meetup. This was a joint event organised by Charlie (who regularly organises Maptime Southampton) and Rebecca Kinge who I believe runs Dangerous Ideas Southampton.
The event, Mapping Real Treasure featured some introductions from Charlie and Rebecca, and then several small talks from various interesting people, and myself.
Other talks included a map of fruit trees, Placebook, some work by the University of Southampton and SUSU relating to students and local businesses from Julia Kendal, Chris Gutteridge's recent Minecraft/OpenStreetMap/Open Data project, and some very cool OpenStreetMap jigsaw pieces from Rebecca's husband (whose name I cannot remember/find).
The slides (git repository) for my talk are available. The aim was to give a brief introduction to what OpenStreetMap is, particularly mentioning interesting things like the Humanitarian OpenStreetMap Team.
I was not quite expecting to be presenting to such a large (~50 people!) varied audience (age and gender). In hindsight, I should have probably done a better sell of OSM, rather than the talk I gave, which was more technical in nature. I ended up talking more on the nature of OSM being a digital map, consisting of data, and skipping over the slides I had on editing OSM, I did however demo using iD at the end of the presentation (although I should have perhaps had this as a bigger part).
Towards the end of the presentation, I discussed the legal side of OSM, in terms of the copyright of the data, and the licensing. Although, again I am unsure if I approached this issue correctly, I think I should have probably given examples of what you can do with OSM, and then related this back to the license and copyright.
I should probably also mention the Maptime May meetup, where I ran a smaller workshop on OpenStreetMap and the Humanitarian OpenStreetMap Team. For this I wrote two presentations, one for OSM and the other specifically for HOT. The shorter presentation I gave recently was adapted from these two presentations.
Take any software project, on its own its probably not very useful. First of all, you probably need a complier or interpreter, something to directly run the software, or convert the source form (preferred form for editing), to a form which can be run by the computer.
In addition to this compiler or interpreter, it's very unusual to have software which does not use other software projects. This might require the availability of these other projects when compiling, or just at runtime.
So say you write some software, the other bits of software that your users must have to build it (generate the useful form of the software, from the source form) are called build dependencies. Any bits of software that are required when your software is run are called runtime dependencies.
This complexity can make trying to use software a bit difficult... You find some software on the web, it sounds good, so you download it. First of all, you need to satisfy all the build dependencies, and their dependencies, and so on... If you manage to make it this far, you can then actually compile/run the software. After this, you then need to install all the runtime dependencies, and their dependencies, and so on... before you can run the software.
This is a rather offputting situation. Making modular software is good practice, but even adding one direct dependency can add many more indirect dependencies.
Now there are systems to help with this, but unfortunately I don't think there is yet a perfect, or even good approach. The above description may make this seem easy to manage, but many of the systems around fall short.
Software packages, or just packages for short is a term describing some software (normally a single software project), in some form (source, binary, or perhaps both), along with some metadata (information about the software, e.g. version or contributors).
Packages are the key component of the (poor) solutions discussed below to the problem of distributing, and using software.
Debian, "The universal operating system" uses packages (*.deb's). Debian packages are written as source packages, that can be built to create binary packages (one source package can make many binary packages). Debian packages are primarily distributed as binary packages (which means that the user does not have to install the build dependencies, or spend time building the package).
Packaging the operating system from the bottom up has its advantages. This means that Debian can attempt to solve complex issues like bootstrapping (building all packages from scratch), reproducible builds (making sure the build process works exactly the same when the time, system name, or other irrelevant things are different).
Using Debian's packages does have some disadvantages. They only work if you are installing the package into the operating system. This is quite a big deal, especially if you are not the owner of the system which you are using. You can also only install one version of a Debian package on your system. This means that for some software projects, there are different packages for different versions (normally different major versions) of the software.
However, npm has no concept of source packages, which means its difficult to ensure that the software you are using is secure, and that it does what it says it will. It is also of limited scope (although this is not necessarily bad).
I feel that there must be some middle ground between these two situations. Maybe involving, one, two, or more separate or interconnected bits of software that together can provide all the desirable properties.
I think that language specific package managers are only currently good for development, when it comes to deployment, you often need something that can manage more of the system.
Also, language specific package managers do not account for dependencies that cross language boundaries. This means that you cannot really reason about reproducible builds, or bootstrapping with a language specific package manager.
On the other end of the scale, Debian binary packages are effectively just archives that you unpack in to the root directory. They assume absolute and relative paths, which makes them unsuitable for installing elsewhere (e.g. in a users home directory). This means that it is not possible to use them if you do not have root access on the system.
All is not yet lost...
There are some signs of light in the darkness. Debian's reproducible builds initiative is progressing well. In the Debian way, this has ramifications for everyone, as an effort will be made to include any changes made in Debian, in the software projects themselves.
I am also hearing more and more about package managers that seem to be in roughly the right spot. Nix and Guix, although I have used neither both sound enticing, promising "atomic upgrades and rollbacks, side-by-side installation of multiple versions of a package, multi-user package management and easy setup of build environments" (from the Nix homepage). Although with great power comes great responsibility, performing security updates in Debian would probably be more complex if there could be multiple installations, of perhaps versions of an insecure piece of software on a system.
Perhaps some semantic web technologies can play a part. URI's could prove useful as unique identifiers for software, and software versions. Basic package descriptions could be written in RDF, using URI's, allowing these to be used by multiple packaging systems (the ability to have sameAs properties in RDF might be useful).
At the moment, I am working on Debian packages. I depend on these for most of my computers. Unfortunately, for some of the software projects I write, it is not really possible to just depend on Debian packages. For some I have managed to get by with git submodules, for others I have entered the insane world of shell scripts which just download the dependencies off the web, sometimes also using Bower and Grunt.
Needless to say I am always on the look out for ways to improve this situation.
I can't quite remember how I found out about it, but I ended up attending the 2nd FLOSS (Free/Libre and Open Source Software) for Peer to Peer workshop, and I am very glad that I made the effort.
Things of particular note:
The variety of people in the room made it a very interesting event, I have found out about projects I had not heard of, and found out more about some projects which I already knew about.
The first item on the agenda was a "participatory dynamic with open questions for debate" aka pose controversial/unspecific questions to the room, and have people just stand around and talk. While this did get people talking, the questions were not specific enough to be useful and any division in the room was mostly due to interpretation of the question.
The second day started off with some very interested lightning talks. There were also some more in depth talks after lunch, and the day finished off with some several tutorials (run in parallel).
More Interesting Links
In particular, Cozy caught my eye. I am usually critical of projects that appear to duplicate the functionality of others, and in this case, Cozy is very outwardly similar to ownCloud (which I currently use). However, internally Cozy looks to be built on better technologies (CouchDB and NodeJS rather than PHP and *SQL). There would probably be more key differences, if I understood the architecture of both projects better.
I was also very interested in the remotestorage protocol. This might fit in well with a web application I have been developing for the University of Southampton.
Bugs Everywhere is a “distributed bugtracker”, designed to complement distributed revision control systems.
Bugs Everywhere is packaged for Debian, install the bugs-everywhere package. However, this does not seem to contain the interactive web interface, so you might want to also clone the repository.
git clone git://gitorious.org/be/be
Adding a bug
When you add a bug, either by the web interface, or the command line, you will get a new file like this.
The first UUID (bea...) is the bug directory UUID. You can have mutiple directories in the .be directory, perhaps one for bugs, and one for planning. However, I am not sure how this is used correctly, and there seems to be a bug in the be new command when I tried to specify a bug directory explicitly.
Adding a comment
This creates two new files, values, which contains a JSON object describing the comment, and body, which contains the text for the comment.
There are at least a couple of web interfaces for bugs everywhere.
The first and simplist can be accessed by using the be html command. With no options, this will start serving up pages that look like this.
Cherry Flavoured Bugs Everywhere (cfbe)
I had a dive in to the rather absent world of distributed issue/bug tracking systems on Saturday, and they are still on my mind...
This was triggered by discussions around the impending shutdown of Gitorious.
I started by improving the dist-bugs wiki, and have narrowed down the ones that interest me to four:
I am going to give each a go, one by one, and will hopefully post my thoughts online over the next few weeks.
My current dream of a distributed bug/issue tracking system, has plain text, editable, YAML files, with a web server, and command line utility for editing. Hopefully I will find that one of the tools above is like this, or just good enough/better...
All in all, I think the material was well recieved. I was quite glad at the amount people deviated from the examples that I gave, as that really showed that there was understanding.
I recently spent a week in the lake district. This was a nice change of pace after finishing exams.
In contrast to Milton Keynes, or Southampton, there was quite a large amount of wildlife to be found immediately outside the window. I saw a great number of birds, who came to the feeders, and the seed that the birds dropped attracted two ducks, some mice (or similar small mammals) and some rabbits (including some very young rabbits).
While not doing any of the above. I worked on ikiwiki, mostly because it was the easiest thing to work on, without an internet connection.
I would like to put photos up on this website, using git annex. To do this, there were two obstacles to overcome.
Firstly was the album plugin for ikiwiki, the absolute positioning of the controls meant that it did not work to well with the layout of this site. It was also a bit awkward to use as the templates, css and code are not included within ikiwiki.
To address this, I started using the album4 branch from Simon McVittie, rebased this on to the current master branch, and attempted to improve the css. I believe it now works a bit better, as there is no absolute positioning.
You can see these changes in the album branch, and I also created a simple example site to demonstrate the plugin.
The second hurdle was getting ikiwiki working with git annex. The main issue with this is how ikiwiki and git annex handle symlinks. Ikiwiki, for security issues ignores symlinks. While git annex, when operating normally (not in direct mode), uses symlinks as those are what is commited to git.
There have been some attempts to do this previously, some using git annex in direct mode, and some using underlays to get the content across. I opted for a different approch, that is modify ikiwiki to follow symlinks (or rather, make it configurable). While this has some security implications, for the use on this site, this works well.
The album for this week is now up, and is the reason why this post has taken so long to publish, as I kept encountering issues with Ikiwiki in getting the album up.