November 9, 2008

Using BBC Backstage Weather

Tags: Projects, Technical, Weather

Recently for a little project I wanted to get weather reports, and being in London my first thought was to use BBC weather. Doing a little searching, the BBC provides a number of RSS feeds for its data (news, weather, etc), as part of the Backstage project. Details of the weather feeds are here. This post gives some of the tips and tricks I discovered using these feeds.

The feeds are split into a number of categories. There are the global and UK feeds; forecasts and observations. The details of the feeds are here. The world feed includes many UK cities too. For example the world forecast feed for London, UK is d/0008.xml. Note the last part of the URL: the number is the BBC location ID. There are different IDs for the world and UK feeds, London is 1769 on the UK feed. As far as I can tell, these IDs do not match any other location ID system, they are certainly not the same as the or location IDs. There also does not seem to be an official list of IDs, but here is a CSV of the world locations I managed to generate earlier. Looking at the list you will see that large parts of the number line are empty, for example there are no entries between 525 and 999. Some of these missing entries come back as blank, but most come back as duplicates of previously seen locations. Aberporth, is both location ID 380 and 7510. For me, they are mainly duplicates of London (0008), but I think this may be related to having this set as my “home” city on the BBC website.

The observation feed is a little different to what one might hope. It is a point in time observation (updated at least twice a day according to the BBC website). However, there seems to be no guarantee of when these updates will occur. Thus, this feed is often out of date and not representative of that day’s weather. The observation feeds also seem far more fragile. The data is often marked as N/A. For most purposes I find the forecast feed to be more useful than the observations. The current day’s weather is the first day of the forecast.

A forecast feed has the standard rss xml tags, with one <item> per day’s forecast, starting with the current day and going forward 3 or 5 days. Also, the <image> tag for the channel contains the url of an image depicting the current day’s weather. Other than that, most of the interesting data is in the <item>’s. Below is an example of a verbatim <item> tag for London (take a look at a feed for the other tags):

  <title>Sunday: light showers, Max Temp: 12&#xB0;C (54&#xB0;F), Min Temp: 6&#xB0;C (43&#xB0;F)</title>
  <description>Max Temp: 12&#xB0;C (54&#xB0;F), Min Temp: 6&#xB0;C (43&#xB0;F), Wind Direction: SSW, Wind Speed:   29mph, Visibility: poor, Pressure: 1011mb, Humidity: 52, UV risk: low, Pollution: moderate, Sunrise: 07:07GMT,  Sunset: 16:21GMT</description>
  <guid isPermaLink="false">,(none):/weather/5day/world/0008-1</guid>
  <pubDate>Sun,  9 Nov 2008 08:10:04 +0000</pubDate>

If you need to do something other than just display the weather forecast, you will need to parse the <title> or <description> tags. I don’t personally recommend this as the data is unstructured and liable to change. In particular I have found:

  • extra spaces are sometimes inserted or removed
  • data is often missing (for instance the latitude and longitude tags are present but empty for location IDs 7500 through to 7518)
  • data can often be marked as “NA”, N/A” or “none”

Given that disclaimer, the regular expressions below (in Ruby regex format worked for my task and may be a starting point for you when parsing:

  • @@ uses the format
@/([\w -\/]+): ([\w -]+|N\/A|NA|\(none\)), Max Temp: (.*)/m@

the first matching group is the day and the second is the forecast description

  • @@ has the format
@/Max Temp:  ([-\d\.]+|N\/A|NA|\(none\))(\w+) \((.+)\), Min Temp: ([-\d\.]+|N\/A|NA|\(none\))(\w+) \((.+)\), Wind Direction:  ([\w -\/\(\)]+), Wind Speed: ([\d\.]*|N\/A|NA|\(none\))mph, Visibility: ([\w -\/]+), Pressure: ([\d\.]+|N\/A|NA| \(none\))m([bB]), Humidity: ([\d\.]+|N\/A|NA|\(none\))(.*)/m@

matching group 1 is the max temperature in celsius, 4 is the min temperature, 7 is the wind direction, 8 is the wind speed, 9 is the visibility, 10 is the pressure and 12 is the humidity.

One other thing to remember when accessing the BBC weather feeds is that there appears to be an undocumented throttling feature. If you make too many requests, for a short period of time your requests will be rejected. From experimentation, making one request every 3 seconds will not breach the limit and you won’t be rejected. However, making a request every 2 seconds will result in being blocked for a couple of minutes after 1000 or so requests. I suggest using the slowest request rate you can accept.

The unstructured nature of the weather data (ie, the various individual pieces of data are not available in their own tags and have to be extracted from the description text) and somewhat hidden nature of the location IDs (without a discovery service I could find) suggest these feeds are only designed to be used for display. Next time I’ll try the Yahoo or APIs and see if they are any different.

Update: The regex patterns shown above no longer work in all circumstances. The BBC have changed the format of their weather RSS feeds a couple of times since I wrote this blog post. If you wish to parse the feed, a little extra work updating the regexs (perhaps using the given patterns as a base) will be required.