Porting AROS to RaspberryPi is a lot of fun, I told that already. There’s also a lot of frustration and You know that. This time because of 4 CPU cores…
From very beginning I have noticed that the speed of frame buffer was relatively slow. At least not as fast as I would expect form a nearly 1 GHz machine. Well, issue there, ignored first. I followed with AROS porting and came to a point where AROS was booting into desktop and running programs. As a simple example I have added Clock to WBStartup folder, thus making this app start automatically once the system is up. Of course I have had full debug enabled in screen console and over serial port.
Huh, it took AROS nearly 30 seconds to boot. Not bad, but could be better for sure. Slow redrawing od the screen was worrying me but hey, we do have the simplest graphics driver ever. No acceleration, just a simple portion of memory filled pixel by pixel (with some help of our base graphics class of course). So far so good.
Then out of curiosity I decided to take a look at an old raspberry pi model I have on my desktop. I booted it and looked on the Clock and gone mad. Old raspberry pi with arm11 CPU booted in about 20 seconds. 2/3 of RaspberryPi2 speed! Can’t be, I thought. The new machine cannot be that bad, can it? Have I missed some cache setup? Frame buffer can’t be cached, right? Why was linux frame buffer console faster?
Finally I found a forum where Bare Metal guys were discussing their great efforts to develop standalone software for RaspberryPi. Luckily for me one of them had similar issue I had. He also led me to the final solution. It turned out, that the CPU cores of RaspberryPi2 are not silently seeping and waiting for an interrupt when start.elf transfers the control over to the ARM cpu. No, instead they are busy looping and polling the registers, anxiously waiting to start and do some useful work. As you can imagine polling technique is not something very effective, it’s rather the contrary. The additional CPU cores were stealing the precious bus cycles, leaving less for the CPU#0 which was actually running AROS code. Eureka!
There are two solutions and I have found both of them working with AROS. The first one is to extend the config.txt file (the file which is read and parsed by VideoCore). There, one has to add following parameter
It forces the additional CPUs to go sleep and wait for interrupts instead of do busy looping. I tested it and it really helped. After adding that line AROS really flies on that tiny computer! Frame buffer refreshes quickly, display redraws quickly, few demos redraw their windows nearly immediately. Boo! Now the machine not only feels faster than old RPi, it actually is faster.
Letting the additional CPUs to sleep alone is good, but not something I liked very much. Sure, start.elf does good job but I wanted to make AROS do that job. So I started to code I wrote small assembly routine, a trampoline which initializes caches and MMU of the woken up core. The trampoline initializes also the supervisor stack and jumps to a routine in C code. At the moment the C routine is rather simple. It checks CPU type, enables VFP and enters endless wait-for-interrupt loop. Ah, the C routine babbles on the system log of course to let me know it is actually working. What I got was:
[KRN] Co]e o Co eUp ani idiwir igr rrutatuots s 0a008
Uh. Not very readable. Forgot something? Ah yes, there is no locking in our bug() function, which means all cores were fighting on the serial line. Proper locking will come later, since it has to be done right, for now I have only added some delays. This is how it looks now
Please note that the “Core x up and waiting” lines are sent to the console respectively by different ARM cores. It’s not SMP, not even AMP. It’s just small initialization routine. But at least it work as expected…
And with current setup AROS really flies on the RaspberryPi 2