Blog
Picasa.ini files not properly updated
Updated: Thu May 17 12:58:47 2012
Like many people I've been using the wonderful (and free) Picasa to manage my photos. One of the huge benefits of Picasa aside from its fast and friendly user interface is that it doesn't write changes to your images. Instead it stores a record of changes that are made to each original image in a Picasa.ini file in each directory. This means you can make changes to your images in Picasa, such as adjusting the contrast and brightness or cropping (or marking with a star), and you don't need to worry about it overwriting your original images.
In order to keep performance reasonable it stores a cache of these adjusted thumbnails and the changes in your Local Settings directory too.
I discovered a problem with Picasa's method of updating these Picasa.ini files though - if you make changes to a file, and then move it to a different folder, within picasa, it doesn't update the original or the new .ini file. This means there's a record of the changes still in the old directory, but not the new one, but it doesn't seem to matter because Picasa tracks this information in the Local Settings database.
The problem comes if you lose the database, or (as I did) intentionally delete it. Picasa will happily then trawl back through your pictures directory and rebuild most of the information from these .ini files. Unless you've moved the images to a different folder after editing, in which case your changed will be lost.
The frustrating thing is that this information is still available, picasa just doesn't know where it is.
To fix this with my photos, I wrote a short perl script to trawl all the Picasa.ini files to pull data out of them, then write them to folders where this information is missing. There are a couple of caveats with this though:
- If your camera doesn't keep track of file names (they return to 0001.JPG after emptying your memory card) this almost certainly won't work properly.
- If you've made changes to the image in the new folder too, they won't be updated or merged. It will warn you if there's filename duplication though.
- I ran this on a linux computer. It should work on windows too, using something like activeperl, but I've not tested it.
If you do find it useful, please do let me know, and remember, back up your files before using this script. It works for me, but I make no guarantees that it won't mess up your images.
You can get the script here
To run it simply give it the full path to the directory that contains your photos. e.g.:
./picasa.pl /home/bcc/photos
After running it, you'll need to clear out Picasa's database for it to pick up the changes. Hold down ctrl-alt-shift as Picasa starts and it will ask if you want to do this. You will lose any labels you've applied to images, but if you need this script then that's probably already happened...
Dev8D 2010
Updated: Mon Mar 8 21:31:32 2010
The Event
I hadn't expected to get to go to Dev8D 2010. After the success of our entry in 2009, it was agreed that other people in the department should get the opportunity to go. It came as a pleasant surprise to be invited to join the DevCSI Developer Focus group - intended to help foster a development community based around UK HE, and carrying on the work started at Dev8D 2009. Among other responsibilities, this meant helping with some of the preparation and running of the dev8D 2010 event.
I arrived earlyish, and set up in Base Camp where I started putting together a handful of slides for my lightning talk on list8D. Matt Spence and I had prepared a demo the day before, but I wanted to give a bit of a talk about how the dev8D prototype from last year had turned into a proper funded project, how our management had supported the development, and how agile development had helped us maintain realistic expectations. Most excitingly this would include the first demo of the shiny new theme thanks to some amazing last minute work by Matt.
I had also been roped into taking photos for the event and as more people started to turn up I wandered around getting some pictures.
Lunch was excellent, and with the Linked Data event running at the same time on the first day there were around 500 people in the ULU building.
In the afternoon I had an interesting conversation about where I saw cloud computing with a couple of other interesting folks. Consensus seemed to be that it's a useful tool where appropriate, but not always the right answer. Services such as content delivery and compute-on-demand definitely of value, but it's not mature enough for core service provision yet. Feels a bit like virtualisation did 5 years ago - useful but not quite there yet.
I wandered through to the expert zone to prepare for my talk on list8D which for the most part went well. Minor networking issues meant I couldn't completely demo the addition of new items, but it was nice to show off the new theme and the brilliant work put in by Matt and Simon in getting list8D ready for real use.
I also watched the excellent lightning talks by Joss Winn on Wordpress, the Eprints guys talking about their challenge, and a demo of OpenGL development on android.
Wednesday evening was set aside as 'Games Night'. In addition to a collection of the usual and not-so-usual board games, we played Developer Bingo where you had to find other developers who could sign off a specific item on your sheet. These were things like "has been slashdotted", "coded in fortran" or "is a GNU maintainer" -- based off the signup details and a number of 'likely' other suggestions. This was a brilliant icebreaker, and the prizes of lego boardgames were similarly well received with people playing with their prizes with people they'd only met that evening. Once again, the food was excellent, although the ULU bar could have done with some proper beer.
On Thursday morning (having stayed up finishing my slides later than I probably should have) I gave another lightning talk on Web Security which seemed to go down well. It's a lot of material to cover in 15 minutes and not really in any depth, but the major aim was to give people enough information to go and do some further research themselves. Judging from a couple of conversations I had later on in the day, it seems that at least a couple of dev8Ders will do just that so I consider that a success.
I also watched a brilliant lightning talk by Stephen Johnston on using the Microsoft Azure cloud computing platform to calculate satellite collision probabilities. Very cool stuff, and well suited to the 'compute power on demand' model.
After this I went off to see the RepRap 3D printer which had been set up and was busy printing a coathook. The buzz around this device was amazing - nobody could quite believe this thing was printing physical objects. Adrian Bowyer gave a great talk back in the expert zone on how RepRap came to be, why it was open source and how he hoped it would revolutionise the ability to make things. What's really impressive is that the RepRap device can print about 50% of its own parts, and they're constantly working to improve that percentage. They also encourage the improvement of the design of individual bits and the contribution of those changes back to the central project.
I really can't describe how cool RepRap is, and how much excitement there was at the event - you really got the feeling that RepRap is a game changer in the same way that the internet allowed anyone to publish - this gives people the ability to manufacture. Best of all, it only costs about £300 to build one from scratch, which puts it well within the reach of individuals and communities.
Thursday afternoon meant the Cloud Computing workshop which had Dave Tarrant covering Amazon EC2 and myself talking about Linode. The workshop room was pretty much full for this which only added to the pressure. Dave did a brilliant job going through the basics of EC2 and most people in the room had a working EC2 instance running Apache and MySQL. The Linode demos went pretty well, and I was happy to show off the recovery console and the new StackScripts, and a number of attendees signed up for some of the free instances that Linode had generously provided for the event.
In the evening (entertain yourself evening), despite the horrible rain a few of us went to Ciao Bella for some tasty italian food, then on to the Jeremy Bentham pub for the Shambrarian meetup which was excellent. Good to find another pub in London that has decent beer on tap and a good whiskey selection.
Friday was finally a day where I could relax a bit, so I spent one session in the genetic algorithms workshop by Richard Jones. This is a novel approach to using multiple generations of virtual creatures to solve problems that are non-trivial to work out through conventional means. Using a set of simple rules and a fitness function, you test each set of 'DNA' against the fitness function, pick the best ones, breed them, then run them again. Over a number of generations, you should end up with a pool of creatures that get better and better at solving the problem.
A great visual example of this is the evolution of Mona Lisa demo.
This was a great introduction to an area I knew nothing about, and although I missed part of the workshop due to helping sort out the nominations for the awards dinner, I really enjoyed getting the chance to play with this alternative approach to solving complicated problems.
I also spent a bit of time on Friday putting together a simple list8D API to LTI bridge, for our entry for the LTI challenge which Steve had noticed would be a perfect fit.
Friday evening was the awards dinner which was a lot of fun - we got to give away some cool awards (best newcomer, best leap-of-faith and best t-shirt were my favourites) and the meal was brilliant. I was taking photos of the presentation of the certificates and while there was a convenient balcony, my flash wasn't really strong enough to reach comfortably which was a shame, since the photos taken from the side of the stage weren't as good as I'd hoped.
On Saturday morning (feeling rather blurry from the very late night) I give my web security lightning talk again as it had been asked for. Again, a good number of questions and another chat from someone after the talk suggests it was worthwhile.
I spent the rest of Saturday helping to judge some of the entries for the challenges. I was amazed at the number and quality of the submissions. Clearly a lot of work had gone into many of them, even only over a few days.
Finally with the close of Dev8D came the awarding of the bounty/challenge prizes (again, as photographer-monkey, but the light was rather better this time), then heading home, exhausted.
The Good
- Charles Severance gave a great lightning talk on what's wrong with the way programming is taught, and how he'd remixed an existing creative commons book to produce Python for Informatics.
- I got chance to have some interesting chats with people I wouldn't otherwise have spoken to on various subjects including cloud computing, web security, agile development and oracle application development. All directly relevant to my job.
- I liked the emphasis on smaller prizes across a range of activities. We won last year, so I am a bit biased, but I feel that it did help to encourage people to come up with something, without it needing to be a massive effort.
- The food was brilliant.
- RepRap was amazing.
- I've come back with some ideas.
The Bad
- Too much going on. I had extra stuff to do, but it seemed almost like there was so much going on you couldn't keep track. I'm still not sure if that's a good thing or not, but a common complaint seems to have been that people wanted to attend lightning talks but couldn't because of workshop clashes.
- It turns out taking photos in an 'official' capacity is hard. I wish I'd had a better flashgun and looking back I've taken a lot fewer fun or arty photographs than I might otherwise have done. All useful experience if I do this sort of thing again, though. I have no idea how wedding photographers cope. F-Spot on a netbook is a lousy substitute for Lightroom on a real PC.
The Shiny
- We learned of the existence of Basic LTI, and managed to get list8D working with Moodle, WebCT and Sakai, all with one bit of code. This entry into the LTI challenge got us second place, and gives us a possible solution to a real problem we have. Once again, Dev8D shows it is worthwhile :)
- I feel I should mention RepRap again, since it was arguably the highlight of Dev8D for me. I can see so many applications for this device...
- Ben O'Steen's MP expenses mashup.
- Too much to mention really.
Finally
Well done, if you've read this far. Here's some stuff that may be of interest:
- My Photos of Dev8D
- Happy Stories - a great idea - a single place to collect the useful output of Dev8D
- My list8D slides.
- My Web Security slides.
- Web Security Links
You may also be interested in joining the DevCSI Developer Contact group.
Thanks
Thanks to Mahendra, David F, the UKOLN events team, and anyone else involved in running Dev8D. It was an amazing event and I had a brilliant time.
Fake Drugs being sold from .ac.uk sites
Updated: Mon Mar 8 14:47:39 2010
BBC News reported that a number of .ac.uk sites are being used to sell counterfeit drugs at the end of last week. I wish I could say this surprised me, but knowing how complicated the issues are in sorting out web security at the university where I work, I can't say it's come as a massive shock.
At a university it is often the case that a department may be responsible for their own web presence - usually someone for who it is not a priority, and they may know nothing about the technical issues involved. Sometimes a department will have had a third party company supply a site or content management system without realising it needs to be kept up to date. Even where there is a good level of centralised support for web publishing, some departments may do their own thing for historical reasons.
We've been fairly proactive at working with departments and getting our own house in order, but it's certainly been a challenge to have security taken seriously across the institution. While incidents like this are unfortunate, they do have the positive side-effect of raising the profile of these issues, and longer term this can only be a good thing.
Finally I'll share a tip for anyone working in academia. Set up some google site alerts for the following:
- oxycontin OR levitra OR ambien OR xanax OR paxil OR porn site:example.ac.uk
- texas-holdem OR cialis OR viagra site:example.ac.uk
These will alert you to any new pages that appear on your site with those terms. It's not perfect, but it will alert you to some compromised pages, or even comment spam on wiki pages/blog posts that should be dealt with.
Driving 8x8 LED Displays with an Arduino
Updated: Sun Feb 7 22:13:00 2010
After playing around a bit, I moved on to connecting the 8x8 displays. I spent a bit of time thinking about how best to do it, and had come to the conclusion that using a 595 shift register to drive the anodes of each display was the way forward. I'd ordered some ULN8023A darlington transistor arrays which can sink up to 500ma of current. This is less than I was planning to draw through the 24 LEDs that make up a row, so the plan was to connect the cathodes of all the LED matrices to this chip. Again, a 595 shift register controls the 8023, so it means I can directly address each row and column in the same way.

Once I'd got one 8x8 display working, it seemed like checking it all worked properly was a plan.
It did, so I moved the current limiting resistors over to the 'y-axis' board, and started building the other 2 display boards.

Each board joins directly onto the next, so the wiring isn't ridiculous on any of them, but there's still an awful lot of extremely fiddly connections to make, and it's used pretty much all of my 150 bits of wire up. I wouldn't plan on doing this again in a hurry...

Once I'd connected the other 2 displays, I changed the display code to push out 2 additional sets of bytes on the X-axis with some slightly different display patterns and we were in business:


Here you can see (from the top) one of the 595 shift registers, the 8023 darlington array and 8 current limiting resistors. These collectively make up the Y-axis board, which controls the cathodes of all the displays. Each of these is operated in turn very quickly, lighting up an entire row. These are scanned quickly enough that the image on the display seems to be complete, thanks to the persistence-of-vision effect.
I had considered driving this 595 off separate pins, but decided not to. This is the first one that is connected, so it keeps the last byte of 4 that is sent out. This has the advantage that the latch of all 4 shift registers is operated at the same time, ensuring there's no lag between changing the column and row data. This would probably not be noticeable, but it would annoy me knowing there was a slight lag :)

Putting all this work together, I still had to make the software driving the display useful, rather than just pushing out hard-coded bitmaps.
I wrote some code to turn a 2d boolean array into a series of bytes for direct output. There's an intermediate stage which updates the cached bytes from the boolean array for performance, so the continuous display scanning/multiplexing isn't slowed down by excessive data shuffling. A quick demo later, and we have something that actually shows off the displays as one single screen:
Finally, I need to run some of the LCDs at a different brightness level. I modified my code to maintain 2 arrays and 2 byte caches. One contains the 'bright' LEDs, and one the 'dim', and these are lit alternatively for different periods to create this effect. Again, the refresh rate needs to be fast enough that it's not obvious to the human eye, and that's where I started to run into problems. Switching between the 2 display layers for the 24x8 display, multiplexing the rows, and varying the duty cycle of different LEDs seemed to be getting too much. I couldn't do all that fast enough to keep the refresh rate sufficiently high - the dim LEDs were showing horrible signs of flickering.
I turns out the shiftOut and digitalWrite functions provided by the Arduino software are pretty slow, and this becomes a problem when you're pushing a lot of data. My clever byte caching wasn't actually making a difference since it seems the shiftOut function turns that back into individual bits for output, which I could have done myself without the intermediate layer.
Fortunately it seems I'm not the only person who's had this problem, and thanks to the extremely clever MartinFick on this forum post, I replaced the shiftOut and digitalWrite calls with shiftRaw and fastWrite. The difference in performance is staggering - I have much more control over the duty cycle again, and there's no sign of flicker.

I think it's fair to say this has been a successful weekend. I've got a reasonably sane bit of code for driving the display with both dim and bright LEDs 'simultaneously', and it's run off a data structure that should be dead easy to implement the game of life on top of. All I'm missing now is an RTC to keep time, and actually porting the code over...
More Simple Arduino Goodness
Updated: Sat Feb 6 22:13:00 2010
After a little more fiddling on Friday night, I had a bunch of LEDs connected to one of the 595 shift registers following one of the Oomlout example circuits.

I added a second shift register chained off the second to run another 8 LEDs, again following an example circuit, but this time from the Earthshine Arduino Guide.

This naturally meant cool lighting effects.
Then I had a go at driving the LEDs at different duty cycles to vary their brightness. This is something I'll need to do with the 8x8 displays, so it seemed like a sensible plan to have a go with a simple circuit. It turns out it's not that hard to do:

Arduino Goodness
Updated: Thu Feb 4 22:13:00 2010
So, my plans to build the Game of Life Clock took a step closer to reality today with the arrival of my order of stuff from Oomlout following the recommendation of a couple of people. Everything turned up within 24 hours of placing the order. Very impressed.

In addition to the 8x8 LED matrices I needed, I bought a new Arduino Duemilanove, since my old NG only has an ATMega8, with 8k of ram. This has been fine for tinkering, but was looking a bit tight for running the game, RTC and matrix driver chips. The Duemilanove has 32k of ram, which is tonnes more than I need.

It also gave me a chance to order the ARDX starter kit, which in addition to the Duemilanove has a bunch of extra stuff to play with. Given the Arduino-heavy nature of some of the dev8D workshops this year, it seemed like it'd be worth having some extra bits to play with.


I've not really done much this evening other than have a play with the first starter kit circuit, and get the latest arduino software up and running. It is worth noting that the 10mm LED that ships as part of the starter kit is a bit "argh, my eyes".
One thing that did come as a bit of a surprise was not having to hit the reset button to upload a new sketch. That'll take some getting used to.

Next up is driving an 8x8 LED matrix off a pair of 595 shift registers. I'm still torn between using shift registers or the much more sophisticated MAX7219 LED driver. Both have their advantages and disadvantages, so I think the best bet is to have a play and see...
Debenhams Payment Form - Design Gone Wrong
Updated: Wed Jul 1 23:10:26 2009
I just bought a gift for a friend who is getting married shortly, and navigated my way through the debenhams wedding site, which was fine. I finally got to the payment page though, and felt compelled to rant about it. This is basically the email I sent them...
There are so many issues with this form that I'm not sure where to start, so I'll begin at the top.

1) Horrible JPEG compression on card images and the text around them at the top. There's no ALT text for that image, so a screen reader for the blind wouldn't see that information.
2) "Notified terms and conditions apply" - What does this even mean? I haven't been notified of any T&C at this point. If this is supposed to count for the notification, where are the terms and conditions?
3) "Security Card Number" - What security card? Card Security Number might make more sense. If you're going to use a term that requires explanation, you may as well use one of the standard terms, such as Card Verification Value or Card Security Code. This field doesn't line up with the label text.
4) "Card holders name as it appears on the card:". This should be "Card holder's name". This text is too long, redundant, wraps on to the next line and is redundant. Surely most people understand that the name on the card should go here? Failing that, simply "Name as it appears on the card:", or even "Name on card:" - shorter, avoids the grammar pitfall, and reduces repetition.
5) Make the 'if other' title text box bigger, at least to line up with the right edge of the other fields. If you've got to type 'Brigadier General' into the box, it'd be nice if you can see an entire word at a time.
6) The Card number, expiry date, security card number, and address boxes don't line up.
7) Card number (omit spaces): There are no words for how much I hate this behaviour in payment forms. It is such a simple thing to automatically remove spaces when the user clicks 'confirm'. Why not let the user enter the number as they feel comfortable and sort it out for them? If for some inexplicable reason this can't be fixed, at least the wording could be improved; "omit spaces" is a horrible phrase. "Without spaces" would be much more friendly.
9) The gift card image and balance check thing - why is that there? Was it positioned using some sort of 'pin the tail on the donkey' game? At the top of the page before someone is already expected to have made a decision about which card to use would be much better.
10) I've already mentioned that the fields don't line up vertically, but have the 'expiry date' and 'switch issue' fields and labels been out for a heavy night on the beer, stumbled home and collapsed?
11) "The Debenhams Storecard does not require an expiry date. (Excluding Debenhams Mastercard)" -- Why say this? I assume the store card doesn't actually have an expiry date on it, so there's not one for people to enter? Even if they do manage to enter something, why not just ignore it if you don't need it?
12) Billing address - why have a big editable box with an explanation as to why you can't edit it? Why not put the 'find address' button (which clearly doesn't need all that explanatory text) where the textarea is, and make that not look like it's an editable field.
13) Why have 'confirm' button at top and bottom? I could understand having one after the 'select a card' bit, and another at the bottom of the 'new card' form, but the positioning at the top looks really odd. Why are the confirm buttons outside the form frame around the form?
14) It is impossible to operate this form by keyboard only - you can't trigger the 'find address' button or confirm order buttons without using a mouse.
I work for a large organisation, so I know how things like this evolve over time with the input of various people. When things change gradually over time, the decline in customer experience is often overlooked until someone points it out.
In this case, I think this form provides a pretty bad user experience, and that will not encourage people to shop online with Debenhams.
London to Brighton Bike Ride 2009
Updated: Tue Jun 30 23:25:23 2009
On Sunday the 21st of June 2009 I took part in the annual London to Brighton Bike Ride for the British Heart Foundation.
We'd stayed the night with a family member who lives in Mitcham who lives around 4 miles from the start. I ate breakfast, showered, dressed and left at 6:45 for my 7:30 start, arriving with plenty of time to spare at 7:15. As the route passes the end of the road we were staying in, I followed the stream of cyclists in reverse but needed to hit the pavement in a few places to avoid the completely closed roads.
Unsurprisingly, there were lots of cyclists on clapham common, with the 7:00 starters still leaving at 7:30. I got through the start gate (and had my card stampted) at 7:45.
It took a full hour to cover the 4 miles back to to Mitcham riding past the road I'd started from due to lots of stop/start for traffic and cyclist-related congestion. After a relatively uneventful ride, we eventually made it up the hill to Woodmansterne (where I got married) 12 miles in to the actual ride at 9:30, where I met my wife, brother in law and mother in law. Refilled water, ate some food.
Getting back on the bike, there was a nice fast run down rectory lane before dismounting and walking up How Lane again due to congestion. I stopped off at rest stop D 20 miles in for a bacon and sausage sandwich, and to rest my legs briefly.
I left stop D after a decent rest expecting to gently spin my way up Church Lane, to discover massive congestion. Not even a walking pace - a few steps at a time, taking 30 minutes or so to cover maybe 1/4 of a mile. Turns out the delay was due to letting cars past so cyclists can cross the A25 road a few at a time. Eventually we got past and carry on through a reasonably flat section of the route. I was starting to run low on water about 27 miles in, so called in at stop F for some water. This turned out to be a mistake, as Burstow scouts were insisting on a minimum donation of 20p for a refill of tap water. It even tasted odd, the little gits.
I moved on quickly after getting the water with the intention of my next stop being Turners Hill. Pretty good and mostly uneventful run up to Turners Hill although I walked up part of the hill at this point as my legs were starting to get tired. I stopped again briefly for water which was being handed out by the extremely energetic kids from the local church, and decided to pass on this extremely busy rest stop.
It's a nice fast run run down to Ardingly, where I made a proper stop. This turned out to be the right decision as it's a nice location for a rest, with a good BBQ, and decent cups of tea on offer from Ardingly Scouts. I was definitely starting to feel the tiredness at this point, so the rest was welcome.
I was expecting a nice easy run to the bottom of Ditchling beacon at this point, but the route profile we were given lies a fair bit. Lindfield was pretty, but the extremely long hill up through haywards heath is unpleasant and extremely draining. I gently spun my way up this trying not to wear my legs out.
I stopped at Wivelsfield for more water and a brief rest, then on to the bottom of ditchling beacon. Again as you approach the bottom of Ditchling Beacon there's a few miles that are surprisingly lumpy and gently uphill, which doesn't help. I stopped at the last stop before the beacon for more water, a hotdog, a banana and a rest before tackling The Hill.
Ditchling beacon, at 700 feet of climb in just over a mile was exactly as hard as I'd heard. While I suspect I could (slowly) cycle my way up if I were fresh, after riding over 50 miles I had almost nothing left in my legs. I walked up slowly, just like most other people.
Eventually made it to the rest stop at the top, where I stopped for a quick cup of tea, a banana and to appreciate the amazing view before the run down the hill. Leaving the top of the beacon there's a reasonably gentle downhill at first that gets steeper. Finally, you come round a corner for the big descent. I set a new personal speed record of 42.9 mph at this point, and that was going 'slowly' on the brakes due to the 'slow down' warning signs. I could have gone significantly faster given how quiet the road was at that point -- I almost wish I had
The last 3-4 miles through Brighton are fairly frustrating with a lot of stop/start for traffic, especially as you know that you're so close, but it is thankfully all flat. Finally I made it to madeira drive on the sea front, the finish line in sight. It was an amazing feeling crossing it after so much effort. I got my card stamped and collected my medal, grinning like an idiot.
Then I realised I had about 20 minutes to get to the coach back and I had no idea where it was. I asked one of the marshals who directed me back past the 2 piers and on to Hove sea front. About a mile away was what the paperwork said - in reality closer to 3 which I could have done without, especially as it involved navigating round hordes of pedestrians and tired cyclists. I made it with a few minutes to spare, loaded my bike onto the lorry, and collapsed exhausted on the coach, a total of 61 miles down. I'm a bit annoyed that I didn't have enough spare time to make it a metric century.
It was extremely hard (for me, anyway), but an amazing amount of fun. I'm also extremely pleased to say that so far I've raised over £700 for the BHF.
- My photos of the event
- This site has a load of info about the London-Brighton route, including the elevation profile.
Debian Lenny Madwifi and Hostapd stability issues
Updated: Sat Mar 28 17:40:22 2009
Hopefully this saves somebody some time. I upgraded my router (a mini-itx Intel Atom box running Debian) to the new Lenny release, which was surprisingly painless. I updated madwifi from the old 0.9.2 version to the 0.9.4 current release included with Lenny.
After this upgrade, I found that the wireless would stop working after 30-90 minutes. This often (but not always) coincided with a couple of error messages often repeated every couple of seconds:
kernel: wifi0: ath_rxorn_tasklet: Receive FIFO overrun; resetting.
kernel: wifi0: ath_bstuck_tasklet: Stuck beacon; resetting (beacon miss count: 11)
These would be matched by a load of messages from hostapd repeated, during which no wireless client would stay connected:
hostapd: ath0: STA 00:13:ce:76:d2:2e IEEE 802.11: associated
hostapd: ath0: STA 00:13:ce:76:d2:2e IEEE 802.11: deauthenticated due to local deauth request
hostapd: ath0: STA 00:13:ce:76:d2:2e IEEE 802.11: disassociated
Reloading hostapd or the madwifi modules would make it work again for another short period.
I tried one of the newest madwifi snapshots which made the problem worse - many more of the beacon miss count errors, extremely poor throughput and high latency and after a few minutes the router locked up.
I also tried a number of fixes recommended to clear up the problem, none of which really made any difference:
echo 0 > /sys/devices/system/cpu/cpu1/online
iwpriv ath0 bgscan 0
iwpriv ath0 protmode 0
iwpriv ath0 rssi11a 11
iwpriv ath0 rssi11b 11
iwpriv ath0 rssi11g 11
iwpriv ath0 bintval 500
iwpriv ath0 mode 3
iwpriv ath0 turbo 0
sysctl -w dev.wifi0.diversity=0
sysctl -w dev.wifi0.txantenna=1
sysctl -w dev.wifi0.rxantenna=1
In the end I found a couple of independent suggestions that madwifi 0.9.4rc2 worked ok but no longer compiled against recent kernel versions, and that madwifi patch 3696 would fix that. I applied the patch against that 0.9.4rc2, and it built just fine.
Wireless has now been stable for over 24 hours, and I've not seen any more of the missed beacon messages at all, so it looks promising.
I didn't bother attempting to package it 'the debian way' as I was getting a bit fed up at this point, and madwifi has now been officially removed from debian as the ath5k driver is going to be used in future. Unfortunately ath5k doesn't yet work in accesspoint mode, but that is apparently coming soon.
I've archived the files here, just in case they get hard to find.
Networking, VLAN tagging and IPMP on LDOM vswitches
Updated: Mon Feb 23 21:55:15 2009
UPDATE: Feb 2009
This page has been rendered largely irrelevant with the release of the LDOMS 1.1 software which properly support VLANs now. Aside from no longer requiring an MTU bodge, you can now choose which vlans to pass through to the LDOM which is a massive improvement. To make this work, when you configure the service domain:
ldm add-vswitch vid=10,200,991 mac-addr=0:14:4f:1:aa:aa net-dev=e1000g0 primary-vsw0 primary
And when you configure the network for the hosted LDOM, if you just want to specify interfaces with a single untagged vlan:
ldm add-vnet pvid=10 vnet0 primary-vsw0 <LDOMHOSTNAME>
ldm add-vnet pvid=200 vnet1 primary-vsw0 <LDOMHOSTNAME>
Or for tagged links (to use vnet10000 and vnet200000):
ldm add-vnet vid=10,200 vnet0 primary-vsw0 <LDOMHOSTNAME>
Original article
Or why MTUs are a pain in the arse
I've spent some time configuring some Logical Domains on one of our T2000s for some development machines at work, one of which is a test environment for our main webserver. Having spent the best part of a day debugging some odd networking and NFS problems, I figured I'd write this up in case it saves anyone else some hassle.
I'd done most of the setup work and had the machine up and running, and all seemed to be working fine. I mounted the NFS shares which contained the development webserver files and user home directories at which point it all went a bit wrong. I could perform an 'ls' on the web server directory just fine. Trying that on the user home directories caused both NFS mounts to hang completely.
While checking that the relevant bits of NFS config on our Sun Cluster were ok (they were) and the network settings (also fine), I happened to run an ifconfig on the development LDOM, but forgot the '-a' to output the information for all interfaces. This caused my SSH session to hang.
Normally on Solaris, running ifconfig without -a displays the usage instructions. A quick test on a different machine revealed that this usage information is 1365 bytes long. Another quick test (running an ls in a directory on the local machine) also caused my connection to hang. Aha! This smells like an MTU problem.
Some background
Because we need to present multiple networks to these machines and use IP Multipathing (IPMP), we're using the built-in Solaris support for 802.1Q VLAN tagging.
On regular Solaris, this involves plumbing a virtual device with the vlan number and interface ID encoded:
vlan 10, device e1000g1 = e1000g10001
vlan 999, device bge0 = bge999000
On LDOMS, this involves creating a vswitch in the service domain that's attached to a physical interface as you would normally. You then have to create a vlan-style virtual interface in solaris like you would normally, but within each LDOM - you can't do this at the vswitch level yet.
On the host machine, e1000g0 and e1000g1 are identically configured tagged switch links with a number of VLANS fed down them. They have both been configured as the physical devices for a vswitch in the service domain, which then provide the networking to the guest LDOMS.
Example config for the service domain:
ldm add-vswitch mac-addr=0:14:4f:1:aa:aa net-dev=e1000g0 primary-vsw0 primary
ldm add-vswitch mac-addr=0:14:4f:1:aa:ab net-dev=e1000g1 primary-vsw1 primary
It's important to specify the mac-address of the interface you're 'replacing', or the LDOMS won't talk to the outside world properly.
On the service domain, we then plumb some 'vsw' interfaces instead of the regular e1000g devices to provide its connection to the rest of the network, for example:
vsw10000 - VLAN 10, interface vsw0 (so e1000g0)
vsw10001 - VLAN 10, interface vsw1 (so e1000g1)
You can then use these as regular interfaces.
On this particular guest LDOM, we have the following config:
ldm add-vnet vnet0 primary-vsw0 webdevldom
ldm add-vnet vnet1 primary-vsw1 webdevldom
And the following devices are plumbed:
vnet200000 (connected to vlan 200, vswitch 0)
vnet200001 (vlan 200, vswitch 1)
vnet991000 (vlan 991, vswitch 0)
vnet991001 (vlan 991, vswitch 1)
Vlan 991 is the private network for NFS to the backend cluster, and 200 happens to be the vlan for this machines public facing services.
Solving the problem
VLAN tagging adds 4 bytes to the length of an ethernet frame - from a maximum size of 1518 bytes to 1522 (that's 1500 bytes of data, plus ethernet header information). What seems to be happening with using vlan tagged devices on LDOMS is the vswitch (or perhaps vnet driver) drops ethernet frames over 1518 bytes - a reasonable thing to do for a switch that doesn't support tagging, but unreasonable given that it otherwise passes the data on without interference.
Reducing the MTU of the LDOM by 4 bytes to 1496 immediately and completely cured the problem:
ifconfig vnet991000 mtu 1496
This has to be done for every interface on an LDOM for which you're using VLAN tagging or you'll mysteriously get some large packets simply disappearing. This was only happening for one of my NFS mounts because it happened to contain a lot of entries in its root directory, so sent at least one 1500 byte packet which never arrived - the other only had a couple of subdirectories, so the return data was under the maximum packet size.
To enable IPMP on an LDOM across two VLAN tagged interfaces, you need to do the following:
Create entries in the /etc/hosts file for the host and two test addresses:
192.168.1.42 webdevldom-priv
192.168.1.43 webdevldom-priv-test0
192.168.1.44 webdevldom-priv-test1
In /etc/hostname.vnet991000
webdevldom-priv mtu 1496 netmask + broadcast + group webdevldom-priv-ipmp0 up
addif webdevldom-priv-test0 mtu 1496 netmask + broadcast + deprecated -failover up
and in /etc/hostname.vnet991001
webdevldom-priv-test1 mtu 1496 netmask + broadcast + group webdevldom-priv-ipmp0 deprecated -failover standby up
Remember to set the MTU for each and every interface within each LDOM guest, or you'll have intermitted networking problems. Interestingly, you don't need to do this for the vsw interfaces in the service domain even though it's connected to the same vswitch as the LDOM where the problem occurs, so it appears the oversize ethernet frames are being dropped somewhere between the LDOM and the vswitch, possibly in the vnet driver - the vswitch itself seems happy to forward them on.