o1i's Planet AROS

November 23, 2015


FUSE Filesystem Bounty Completed

The bounty to deliver a working implementation of FUSE Filesystem support and NTFS filesystem driver for AROS i386 has been completed by Fredrik Wikstrom.

November 23, 2015 02:39 AM

November 20, 2015


time, there's no time..

With just a few adjustments to the SDL code I got an output window, which also contained the status line and looked quite ok.

But nothing happened. The CPU emulation code was called and then .. nothing!?

In my old 2.8.1 version I added some assembler operations to get a better timing for the processor speed, as AROS seems not to be able to provide something like that.

And I enclosed the assembler part with

#if defined(i386) || defined(x86_64)

The actual build system seems to be missing those defines !? So after removing the ifdef's the cpu emulation can now determinate the speed of the processor and now:

This is the first screenshot of a running WinUAE 3.2.0 port on AROS :-)!

by noreply@blogger.com (o1i) at November 20, 2015 12:14 PM

November 17, 2015


WinUAE 3.2.0

Well, Toni released Version 3.2.0, so I started merging it with my (now dated) port of 2.8.1.

WinUAE progresses really fast, it seems to be difficult to keep the pace, especially, as many parts are not ported completely and still need a lot of ifdefs, which I had to manually apply again..

WinUAE now contains a lot of new parts, Mame emulator parts, more qemu parts, dosbox parts.. all of them needed to be integrated in the port, too. And there still are the PearPC parts missing ;-). WinUAE is not only emulator, it contains a lot of emulators, too.

After some painful hours now at least the GUI comes up again:

As I now also try to get Picasso96 integrated, pressing "start" is not a good idea at the moment, if you don't like Guru messages ;-).

by noreply@blogger.com (o1i) at November 17, 2015 08:41 AM

October 19, 2015


Download Statistics

First of all I want to thank all the brave downloaders, I would not have expected so many downloads ;-).

sf.net records some statistics, interesting is the Operating-System chart:

Seems like sf.net does not know about us ;-). I am really surprised, that so many people are online with unknown operating systems.

by noreply@blogger.com (o1i) at October 19, 2015 06:56 AM

October 16, 2015

Icaros Desktop

Icaros Desktop inspired keyboards

Maybe you'll be already aware of the X500 Evo, a stylish mini-itx oriented "all into keyboard" computer case, following the shape of classic Commodore computers like the C128 and the Amiga 500. Well, the good news is that its author Loriano Pagni had recently started a KickStarter project to build Amiga-inspired key sets for Cherry-MX compatible keyboards and - big surprise! - there's even an

by Paolo Besser (noreply@blogger.com) at October 16, 2015 12:00 PM

October 15, 2015


Janus-UAE2 v0.1

Even if it still far away from a feature-complete and rock-solid release, it might be time to release a snapshot of the current state of development.

You can download v0.1 of Janus-UAE2, the WinUAE port for AROS, here:


Please keep in mind:
  • This is a complete new WinUAE port. It is not based on the previous Janus-UAE sources.
  • This version lacks a lot of functionality, it is far away from the features, Janus-UAE 1.x offers.
  • No Picasso96, no JIT, no .. ,but Harddrives should work ;)
  • and this is for AROS x86_64 ABI v1 only
  • as ABI v1 might break binary compatibility on any day, it is working for the ABI v1 of 15.10.2015, everything else is not sure.
So for the brave, you might give it a try. Please also read the README.txt.

I will not upload this release to aros-archives or the aminet, as it is still to alpha to waste space there ;-).

by noreply@blogger.com (o1i) at October 15, 2015 01:01 PM

September 08, 2015


Amiga Parallel Port: How fast can you go?

In my plipbox project a fairly fast AVR 8-bit MCU with 16 MHz was connected to the Amiga’s parallel port to transfer incoming and outgoing IP packets from/to the attached Ethernet controller. A protocol on the parallel port was devised to quickly transmit the bytes in both directions. In version 0.6 a data rate of up to 240 KB/s was achieved… The question now arises if this is the top speed we can get or is the parallel port capable of more?

This blog post shows the results of my experiments I performed with the parallel port on my Amiga. It tries to show different classes of transfers possible on this port and gives the achievable maximum speed of each class.

Since the available documents and data sheets are all lacking the exact description of the I/O part on the peripheral side of the device, this blog post is also an effort to try to document this undocumented side of the parallel port (or: “What you always wanted to know about your CIA 8520 and never dared to ask”)

1. Introduction

1.1 The CIA 8520 and the Parallel Port

The Amiga has two custom chips called the CIAs 8520 (Complex Interface Adapter) that are called CIA A and CIA B. A CIA chip has two I/O ports (Port A, Port B) with 8 bits each that can be individually configured for peripheral input or output.

The parallel ports pins consists of three kinds of pins:

  • 8 Data Pins (In or Out), Pin 2-9
  • 1 Strobe Line (Pin 1), 1 Ack Line (Pin 10): Hardware Handshake
  • 3 Control Lines (BUSY, POUT, SELECT) (Pin 11, 12, 13)

Those pins are connected to the two CIAs as follows:

  • CIA A, Port B: 8Data Pins
  • CIA A, PC and F to  Strobe and Ack for Hardware Handshake
  • CIA B, Port A, Bits 0,1,2: BUSY, POUT, SELECT

While CIA A Port B handles the data pins, CIA B Port A handles the 3 control lines. Note that the other bits of this port are connected to serial port lines.

In the Amiga memory map both CIAs are mapped to different memory ranges. Here is an excerpt with the registers useful for parallel port programming. (See the Amiga Hardware Reference Manual, Appendix F for a complete list)

Address   Name  Default  Description
BFE101    prb    0xff    Parallel port
BFE301    ddrb   0x00    Direction for port B (BFE101);1=output (can be in or out)
BFD000    pra    0xff    /DTR  /RTS  /CD   /CTS  /DSR   SEL   POUT  BUSY
BFD200    ddra   0xc0    Direction for Port A (BFD000);1 = output (set to 0xFF)

The data direction register (DDR) for both ports set a bit of the port either to input or output. The logic of the data pins is not inverted, i.e. a 1 in a register is a high (5V) value on the line.

The default values indicate the setup after the Amiga has booted and sets all parallel pins to input.

1.2. The CIAs in the Amiga system

The CIA chip is compared to the MC680xx CPU clock of an Amiga a fairly slow device. It can handle a clock rate of up to 1 or 2 MHz while the CPU runs at 7 or more MHz. The MC68000 CPU architecture offers a special mode of device access for these devices that is based on a slower clock called the E clock. It runs at the 1/10th of the CPU clock speed.

Lets see some numbers:

  • CPU Clock F_CPU =  7.16 MHz (NTSC)  7.09 MHz (PAL)
  • E Clock F_ECLK = F_CPU / 10 = 716 KHz (NTSC)  709 KHz (PAL)
  • E cycle length t_ECLK = 1.40 us (NTSC) 1.41 us (PAL)

This means the CPU accesses the CIA with at most the speed of F_ECLK. An access is a read or write to a register. So when we transfer data we either read or write the data register of CIA A Port B. If we only access this register the top speed we can ever achieve on this port is one byte per F_ECLK or 716/709 KB/s max!

If you look in the data sheet of the CIA’s ancestor device called the MOS 6526 you will see that the E Clock interval is divided into two sections: a HI and LOW range of the clock interval. While in the HI range (4/10 of t_ECLK) the CPU accesses the device, in the LO range (6/10 of t_ECLK) the device starts to realize the change set by the CPU, i.e. if a port is on output it will set the pins low or high accordingly. On a read the data has to be stable on the port before the HI phase will access it from the CPU.

Here are the numbers (naming according to the 6526 data sheet):

  • t_CHW (Clock High Width) = 4 / 10 * t_ECLK = 560 ns (NTSC)  564 ns (PAL)
  • t_CLW (Clock Low Width) = 6 / 10 * t_ECLK = 840 ns (NTSC)  846 ns (PAL)

Some interesting limits of the 6526 chip:

  • t_PD (Output Delay on Write): max 1 us
  • t_PS (Port Setup Time on Read): min 300 ns

The t_PD of max 1 us results in port setups that may take almost the whole E cycle of 1.4 us and it overlaps the next HI range for CPU access.

  <- t_ECLK -> 
   ____        ____
--|    |______|    |______|
   CPU |------->
 Write    t_PD

1.3 Hardware Handshake with Strobe

The parallel port offers two pins for hardware handshaking called Strobe and Ack. The hardware handshake allows to signal the external peripheral whenever new data has been set (or read!) on the external port of the CIA. After data is valid the strobe line sends a short pulse (low active) on Strobe to signal the receiver. It will then read the data byte and acknowledge the transfer by pulling Ack low. The Amiga detects the Ack pulse either by polling or by interrupt and then transmits the next byte.

While strobing (i.e. generating the Strobe pulse) after a Port B read/write happens automatically, you have to manually trigger Ack to confirm it.

Lets see a time sheet with some E clock cycles (.H, .L being the high and low range of the cycle)

ECycle   CPU                 CIA Port B     Strobe
0.H      Write Port B=42     -              H
0.L      -                   42!            H
1.H      -                   42!            H
1.L      -                   42             L
2.H      -                   42             L
2.L      -                   42             H

This examples writes a byte with value 42 to Port B. The CIA realizes this value in the next to sub cycles (denoted with !) and beginning with 1.L a stable value of 42 is available on the output of port B. Then strobe goes low for a full E cycle length.

We see that strobe has to be delayed otherwise a peer reading on falling edge of strobe won’t have a stable data signal.

The interesting questions that now arise are:

  • What is the strobe delay in cycles of the 8520 CIA on the Amiga?
  • What is the strobe width in cycles?
  • How fast can we transfer data and still get valid strobes?

The answer to the first one can be found in the Amiga Hardware Reference Manual, Appendix F, Section Handshaking):

PC will go low on the third cycle after a port B access.

But the other ons are unanswered in the docs. So its time for some experiments…

2. My Experiments

My Setup is an Amiga 500 with ACA500 and ACA1230/33 Accelerator attached. A plipbox device was attached with running version 0.6 firmware unless otherwise stated.

2.1 Setup Port

Using ASMone I quickly hacked some code to set the parallel port to data output and all lines to low/zero:

  lea $bfe101,a0 ; parallel port data
  lea $bfe301,a1 ; parallel port ddr
  move.b #$ff,(a1) ; all bits to output
  move.b #$00,(a0) ; set all lines to low/zero

2.2 Writing a byte

With the port setup lets conduct the first experiment: Write a $ff byte to the parallel port and capture the lines with a logic analyzer. The code:

  lea $bfe101,a0
  move.b #$ff,d0

  move.b d0,(a0)

The scope triggered on falling edge of strobe:


Write $ff (Port was $00)


First interesting fact we see here is the strobe width: Its 2.813 us or 2 * t_ECLK!

So the Strobe width of the CIA 8520 is (in contrast to 6526’s 1E) 2 E long! t_SW = 2 E

Lets repeat the write. Now write a $00 on a port that has been initialized with $ff:


Write $00 (Port was $ff)



Notable difference here is the point in time when the port signal changes:

  • LO->HI: late at end of cycle
  • HI->LO: early at the beginning of the cycle

Note: the markers are aligned to begin of strobe (falling edge) in 1 E steps (i.e. 1.4 us)

If we assume that strobe starts with the LO range of the E cycle then the markers and the begin of strobe denote the HI->LO transition inside an E cycle.

If we compare these lines with the typical E cycle diagram of a data sheet then they denote the center of the cycles and not the borders!

              ^ visible marker              ^ strobe falling edge
              |                             |
|<-- t_CHW -->|<--- t_CHL --->|<-- t_CHW -->|<--- t_CHL --->|...
|------- E cycle 0 -----------|------- E cycle 1 -----------|

              | H>L changes          L>H    |  signal changes

With this shift in mind we can conclude that the actual CPU write of this byte has happened right left of the first marker in the lower image (i.e. in t_CHW).

Lets write down the strobe sequence in a time sheet: the $00 write

EClock   CPU      PortB   Strobe   Annotation
0.H      w00      ff      H
0.L      -        00!     H        realizing 00 on port
1.H      -        00*     H        already 00 stable on port
1.L      -        00      H        \ safety range
2.H      -        00      H        /
2.L      -        00      L        strobe begin
3.H      -        00      L        \ strobe width: 2 E cycles
3.L      -        00      L        /
4.H      -        00      L        strobe end
5.L      -        00      H

and the $ff write:

EClock   CPU      PortB   Strobe   Annotation
0.H      wff      00      H
0.L      -        ff!     H        realizing ff on port
1.H      -        ff!     H        needs this range, too
1.L      -        ff      H
2.H      -        ff      H
2.L      -        ff      L
3.H      -        ff      L
3.L      -        ff      L
4.H      -        ff      L
5.L      -        ff      H

We can see the strobe starting in the third cycle as stated in the docs. It keeps a safety range of one E cycle after setting up the values before beginning the strobe.

2.3 Writing multiple bytes in a row

What will happen if we write two or more bytes in a row (i.e in each E cycle a byte) to the strobe signal?

Let’s see and write two bytes (port again setup with $00):

  lea $bfe101,a0
  move.b #$ff,d0
  moveq  #$00,d1

  move.b d0,(a0) ; write in 0.H
  move.b d1,(a1) ; write in 1.H


Write $ff and $00 (Port was $00)

Write $ff and $00 (Port was $00)

The time sheet:

EClock   CPU      PortB   Strobe   Annotation
0.H      wff      00      H
--- Marker
0.L      -        ff!     H        realizing ff on port (slow)
1.H      w00      ff!     H        needs this range, too
--- Marker
1.L      -        00!     H        realizing 00 on port (fast)
2.H      -        00*     H
2.L      -        ff      L        regular strobe begin  (1st E)
3.H      -        ff      L
3.L      -        ff      L                              (2nd E)
4.H      -        ff      L        regular strobe end
4.L      -        ff      L        extended strobe begin (3rd E)
5.H      -        ff      L        extended strobe end
5.L      -        ff      H

What do we see?

  • A strobe of length 3 * E! So the first write’s strobe and the second one is somewhat merged now.
  • The $ff write happens right before the left marker and is established on the port inside the two marker’s range.  (LO-HI transition = slow)
  • The $00 write happens right before the right marker and is established right after the marker. (HI-LO transition = fast)
  • The $ff value is only valid at the end of the two marker’s interval!

Let’s write 4 bytes in a row:

Write $ff, $00, $ff, $00 (Port was $00)

Interesting Result:

  • Still a Strobe of 3E cycle length! The strobe width is not enlarged, no matter how many bytes you send. Seems that the strobe logic gets stuck.
  • Data $ff, $00, and $ff is valid at the end of the E ranges around the falling edge of strobe

Now 4 bytes starting with $00 (port was $ff):

Write $00,$ff,$00,$ff (Port was $00)

Write $00,$ff,$00,$ff (Port was $00)

Same result here:

  • A 3E Strobe and nothing more!
  • Data again valid at the end of the E cycles around falling edge of strobe

To sum up this experiment: While we can write to the CIA from the Amiga with E cycle speed, the resulting strobe signals are not useable anymore! However, all data values appear on the port lines (in fragments of the E cycle).

Let’s call the non-stop writes to the CIA 1E Transfers and let’s experiment now with transfers that take more E clock cycles in the next experiments.

Data transfer speed of 1E Transfers is E clock speed, i.e 716/709 KB/s

What is the lowest xE transfer that generates useable strobes?

2.4 2E Transfers

Ok, we need to make a pause between the data write from the Amiga. To be precise we want to wait for multiples of the E clock. The best way to perform a “wait” on (or better waste) an E clock cycle is to actually perform a register access to one of the CIAs. Make sure to perform an access with no side effects, so reading a port A (i.e. does not strobe) already does the trick.

A 2E transfer code now does a write (1E cycle) and one pause (second 1E cycle) looks like this:

  lea $bfe101,a0
  lea $bfd000,a1 ; let's use CIAB Port A to "waste" E cycles
  move.b #$ff,d0
  moveq  #$00,d1

  move.b d0,(a0) ; 1E write in 0.H
  tst.b  (a1)    ; 1E waste cycle by reading register (1.H)
                 ; =2E transfer per byte

  move.b d1,(a1) ; write in 2.H
  tst.b  (a1)    ; waste E cycle (3.H)


2E Transfer writing $55,$aa,$55,$aa,... (port was $ff)

2E Transfer writing $aa,$55,$aa,$55,… (port was $aa)


  • Strobe is back again at 2E. But only the first one is visible! All others are gone :(
  • Complete range of 1E port data valid (1E range for port setup)
  • Note: instead of reading a “waste” value in the second E access to the CIA, you can also perform a single control signal write. In the picture above I toggled the SEL signal. This gives you an exact location of the 1.H, 3.H, … locations and can be used on the receiver side as a sync signal! (Very useful since strobe is broken here)
  • Note2: If you toggle SEL (or POUT, BUSY) you can only write the Port A (but not read it beforehand). Therefore, a signal update of only the parallel line bits won’t work. In fact you have to ignore serial line bits in the same port and write them always to a constant value -> Serial lines don’t work with 2E transfers !! or in other words: There is no system friendly way to implement it…
  • Data transfer speed of 2E is half of 1E: 354.5 – 358 KB/s

Time Sheet:

EClock   CPU      PortB   Strobe   Annotation
0.H      w55      ff      H
0.L      -        55!     H        realizing aa on port
1.H      <waste>  55!     H        needs this range, too
1.L      -        55      H        
2.H      waa      55      H
2.L      -        aa!     L        regular strobe begin
3.H      <waste>  aa!     L
--- Marker
3.L      -        aa      L
4.H      w55      aa      L        regular strobe end
4.L      -        55!     H
5.H      -        55!     H
--- Marker
5.L      -        55      H

2.5 3E Transfers

Since 2E transfers still have broken strobe output, lets add another “wasted” cycle and setup a 3E transfer. With two spare E cycle accesses in our transfer loop we can also use the two cycles to perform a read/modify/write operation to a register. E.g. a bclr (bit clear) or bset (bit set) operation can be used to modify a control line of the parallel port and is then used as a “clock” line for our data transfer.

Code Example:

  lea $bfe101,a0
  lea $bfd000,a1 ; let's use CIAB Port A to "waste" E cycles
  move.b #$ff,d0
  moveq  #$00,d1

  move.b d0,(a0) ; 1E write data
  tst.b  (a1)    ; 2E waste cycles
  tst.b  (a1)

  move.b d1,(a0) ; 1E write data
  bclr   d1,(a1) ; 2E cycles to clear "clock" line (bit 0)

  move.b d1,(a0) ; 1E write data
  bset   d1,(a1) ; 2E cycles to set "clock" line

A scope plot of a 3E transfer:

3E Transfer with $aa,$55,$aa writes (Port was $55)

3E Transfer with$55, $aa,$55,$aa writes (Port was $00)

  • Ah! Now we have valid strobes! Makes sense: timing per byte is now 3E with 2E for (fixed) strobe size and 1E for the spacing between strobes.
  • Data transfer speed for a 3E transfer is a third of the 1E speed: 236 – 239 KB/s  
  • The current 0.6 plipbox implementation uses a 3E transfer method and achieves the calculated limit of about 240 KB/s.

Time sheet 3E Transfer:

EClock   CPU      PortB   Strobe   Annotation
0.H      w55      ff      H
0.L      -        55!     H        realizing $55 on port
1.H      <waste1> 55!     H        needs this range, too
1.L      -        55      H        
2.H      <waste2> 55      H
2.L      -        55      L        regular strobe begin
3.H      waa      55      L
3.L      -        aa!     L
4.H      <w1>     aa!     L        regular strobe end
4.L      -        aa      H
5.H      <w2>     aa      H
5.L      -        aa      L        next strobe begin
6.H      w55      aa      L
6.L      -        55!     L
7.H      <w1>     55!     L        next strobe end
7.L      -        55      H

Note: you can see that the first value (here $55) is valid during H->L falling edge of first strobe. Thats the point of time when the external device reads the value.

You can now continue to add waste cycles and introduce 4E, 5E, … transfers. But they do not really make sense as they only move the strobe further apart. You cannot really use the extra E cycles…

Here is an example of a 4E transfer:

4E Transfer writing $55,$aa,$55,$aa (Port was $00)

4E Transfer writing $55,$aa,$55,$aa (Port was $00)

Note the 4E strobe cycle: 2E strobe and 2E spacing between strobes.

2.6 What about read transfers?

In the above experiments I always talked about writing bytes to the port. But what changes if we want to read data with 1E, 2E, or 3E transfers?

  • Strobing is essentially the same. After a read operation the strobe will be generated.
  • The device feeding the port needs to setup the data to be read before the .H cycle that performs the CPU read operation

Here is a time sheet of a 3E read:

EClock   CPU      PortB   Strobe   Annotation
-1.H     -        11!     H        (save setup time)
-1.L     -        11!     H        device sets up data on PortB
0.H      r11      11      H        CIA reads PortB
0.L      -        11      H
1.H      <waste1> 11      H
1.L      -        11      H        
2.H      <waste2> 22!     H        (save setup time)
2.L      -        22!     L        device sets new data on PortB
3.H      r22      22      L        CIA reads PortB
3.L      -        22      L
4.H      <w1>     22      L        regular strobe end
4.L      -        22      H
5.H      <w2>     22      H


  • The external device needs to setup data right before the CPU access. While the .L sub cycle before the read might suffice for stable read it is more safe to already setup data in .H before
  • If you use a parallel port control line to “clock” the data you can set the line before the first CPU read and start reading with the first byte.
  • If you want to use the strobes to sync your reads then you have a problem: The strobe signal arrives _after_ the read! To get in sync with this signal you must use a trick: first perform a dummy CPU read just to generate a strobe and then use this strobe to sync your device’s writes:
    • In the above time sheet we dummy read at 0.H
    • The device already sets up data 0x22
    • The CPU performs the next read at 3.H and gets 0x22
    • The device waits for the raising edge of strobe (4.H – 4.L) and sets the next data
  • Reading in 2E and event 1E gets more difficult as in the worst case no “clock” signal is available and you have to use a sampling pattern with fixed E size to setup the data in time from the device. It is still open if it possible to write a stable 1E transport this way.
  • In most reader code the interrupts have to be disabled on Amiga side otherwise the clocked setting up of data before a read might arrive too late and thus a CPU read gets wrong.

3. Summary

This (rather long) blog article shows you all the details when transferring data over the parallel port at the maximum possible speed. We discovered some interesting anomalies with strobe generation at these high transfer rates.

I introduced a new speed classification for the parallel transfer types called 1E, 2E, or 3E transfers.

The top speeds achievable with the xE transfers are:

1E: 709..716 KB/s
2E: 355..358 KB/s
3E: 236..239 KB/s

Current plipbox version 0.6 implements a 3E transfer using external control lines for clocking. I am currently experimenting with a 3E transfer using only strobes as signalling (it frees control lines for other functions). Another interesting coding exercise will be a 2E or even a 1E transfer… Now the technical background is available!

by lallafa at September 08, 2015 08:04 PM

August 31, 2015


Dopus5.91 released

After ~1 year since first opensource release, we happy to release a new version of Dopus 5 !

Download it from http://dopus5.org or from https://sourceforge.net/p/dopus5allamigas/ in the Files section.

August 31, 2015 10:15 PM

August 25, 2015

Icaros Desktop

Accessing the Tube!

In Italy we have a motto which says, once translated into "barely English", a single image worths a thousand words. That's why I can't really stop myself from sharing the following image with you: Yes, this basically means Deadwood has made the miracle and yes, you will be playing YouTube videos on Icaros Desktop starting with next update. No more scripts to download

by Paolo Besser (noreply@blogger.com) at August 25, 2015 01:57 PM

July 28, 2015


Hard drive tab..

Not working yes (buttons not yet activated), but at least displays the actual config.

Now if SourceForge would come back to life, I could commit all that stuff. This all will be a big commit, no chance to ever track the single changes..

by noreply@blogger.com (o1i) at July 28, 2015 02:45 PM

July 23, 2015


Happy birthday Amiga!

Happy 30th birthday, Amiga!

Your journey has started 30 years ago, on 23rd July 1985,
when the first Amiga 1000 was introduced to the clueless public at a clumsy, but epic event.
This journey is never ending, still going on after 30 years.

We love you.

by noreply@blogger.com (Álmos Rajnai) at July 23, 2015 07:48 AM

July 09, 2015



The Listtree class is .. not my friend ;-). But somehow with a lot of trial and error, I managed to get it working:

Well, the nice icons are missing (I tried to get icons working , but did I already say, Listtree is not my friend?), but it is close enough to WinUAE:

by noreply@blogger.com (o1i) at July 09, 2015 01:42 PM

May 12, 2015


Combo Boxes

WinUAE in Windows has nice comboboxes to select rom and adf images:

AROS only offers Cycle and String gadgets or listviews. None of the three can give you the functions of a combobox. I tried to work around it, but ended up with a lot of useless spent time, as those three can#t emulate all combobox features.

Even back in gtk-mui times, I wanted a combobox custom class, so now it was time to code one:

It even works with type-ahead ;-). There are still some minor bugs left, or some bugs in other parts of the gui were added during combobox development.

So there is still progress, but as always, time is much too limited to really progress fast.

PS: Forgot to mention, I moved my development environment to Debian/64bit, so from now on, x86_64/ABI_V1 is the primary target.

by noreply@blogger.com (o1i) at May 12, 2015 01:14 PM

May 01, 2015


Spin locks and the beauty of conditional instructions


Low-level toying with multiple CPUs without proper locking mechanisms is asking for trouble. I have already seen many cryptic boot logs form native AROS on RaspberryPi2 which you simply cannot decode. This happens every time when more than one core tries to speak over serial line.

The locking primitive which we have just added to AROS is a spin lock. It does not have an owner, so one cannot re-enter it — trying to do so will result in an endless loop with no exit. The spin lock can be obtained either for reading or for writing. When spin lock is in read mode, it can be acquired by many clients but as long as at least one of them is holding a read lock, code willing to switch it into write mode will have to wait. When spin lock is in write mode, it gives an exclusive access to not more nor less but only one caller. Until it is released again, no other code will be able to obtain the lock at all.

So, here it goes, the spin lock:

typedef struct {
    volatile unsigned long lock;
} spinlock_t;

#define SPINLOCK_INIT_WRITE_LOCKED  { 0x80000000 }

The spin lock comes with three default initializers for those who want to put it in some defined state into e.g. data section. The lock uses one 32-bit value which defines the state of lock:

  • lock == 0 – the lock is in its free state, everyone can lock it in either mode
  • lock > 0 – locked in READ mode. Everyone can lock it in READ state (up to 2^31 times, then it wraps), but attempting to lock it in WRITE mode will blocks until it is free.
  • lock == 0x80000000 – the lock is in WRITE mode. Further attempts to lock it in either modes will block.

The code for locking and unlocking uses the LDREX and STREX instructions which guarantee exclusive access to addressed memory. The code uses also a nice feature of ARM processors – conditional execution of instructions. Let’s look at the code – it assumes that register r0 points to the lock

    mov       r3, #0x80000000
1:  ldrex     r2, [r0]
    teq       r2, #0
    strexeq   r2, r3, [r0]
    teq       r2, #0
    bne       1b

Only one single loop inside. When the function finishes, the spin lock is acquired in WRITE mode. How does it work? The LDREX function reads the lock value into r2 register and marks exclusive access to addressed memory. The lock value is compared against zero. If the lock value was not zero, then the WFE instruction will be executed (please note the “ne” suffix). It puts the CPU into sleep mode until either an interrupt or an event from any other core is sent. If the lock value was zero, the WFE instruction is not executed at all. The next one is conditional variant of STREX. It is executed only if the lock value equals zero (spin lock is free, note the suffix “eq” after STREX). The STREX stores register r3 at address pointed by register r0. If write succeeds, i.e. exclusive lock was still granted, register r2 will be set to value 0, if write fails, r2 will contain value 1. Finally, register r2 is tested against value 0 and, if it’s not zero, we jump back and repeat.

Please note, that in second comparison r2 can contain one of three values:

  • 0, if STREXeq was executed and succeeded,
  • 1, if STREXeq was executed and filed,
  • 0x80000000, if the lock was already acquired and our CPU went to sleep (WFEne).

The last case means, that CPU has received either an event (from another CPU core when it released a spin lock) or an interrupt was triggered. In both cases the CPU will re-attempt to acquire the lock. It wakes up, STREXeq is not executed, 0x80000000 is compared against 0x00000000 and if they are not equal, CPU does a branch. Nice, isn’t it?

There is one more scenario to be considered. What happens if there was an interrupt triggered between LDREX and STREX? Well, in that case AROS code needs to release the exclusive memory by either issuing a CLREX instruction (ARM v7 cpus and up) or by issuing a dummy STREX instruction to some arbitrary memory location. In that case the interrupted code will re-attempt the process of obtaining a spin lock.

Now after the locks were added and properly used, you can turn this:

[KRN:ide27 modbces 08_dritpri fl #s veacion nam00:
0 (0147300 7ff0)
nif8 1or08# 1ls @ 0x50 "ex0c.
 iKRar C
e f81a03tr: p110 .2
 41N]expansi CPlib60ry01
 815f] Core105CP2 =600001bu
libra Cor
 80e f70: 100001
e @ 0e500c:ec00
0x0001889a 8RN]41o"er2 .library@
b3e10 C1 e41 Bootstrad t.re @ 0x0"
[d0c: or9 2 cpu1 ontek.reizurc12
+ KR1a1Ca8e 2 cp 01tx @ 0pr00e3eb0
08] 0fm2948_ini4_c1r 43

into this:

[KRN:BCM2708] Initialising Multicore System
[KRN:BCM2708] bcm2708_init: Copy SMP trampoline from f800074c to 00002000 (100 bytes)
[KRN:BCM2708] bcm2708_init: Patching data for trampoline at offset 80
[KRN:BCM2708] bcm2708_init: Attempting to wake core #1
[KRN:BCM2708] bcm2708_init: core #1 stack @ 0x000b4380 (sp=0x000dc370)
[KRN:BCM2708] bcm2708_init: core #1 fiq stack @ 0x000dc390 (sp=0x000dd380)
[KRN:BCM2708] bcm2708_init: core #1 tls @ 0x000dd3a0
[KRN] Core 1 Boostrapping..
[KRN] Core 1 CPSR=600001d3
[KRN] Core 1 CPSR=60000193
[KRN] Core 1 TLS @ 0x000dd3a0
[KRN] Core 1 KernelBase @ 0x000b3ec0
[KRN] Core 1 SysBase @ 0x000b3200
[KRN] Core 1 Bootstrap task @ 0x000dd3c0
[KRN] Core 1 cpu context size 2124
[KRN] Core 1 cpu ctx @ 0x000dd460
[KRN:BCM2708] bcm2708_init_core(1)
[KRN] Core 1 operational
[KRN] Core 1 waiting for interrupts
[KRN:BCM2708] bcm2708_init: Attempting to wake core #2

by michal at May 01, 2015 06:17 PM

April 27, 2015


All your nightly are belong to us

Yay, I’ve killed all nightly builds. Sorry 😉

That was the short version. Last weekend I was busy with removing some legal hacks from AROS sources. The hack on the schedule was commonly used ThisTask pointer in the SysBase. Now, at least in my local branch of AROS for RaspberryPi the SysBase->ThisTask points to a nirvana place where all code is either happy crashing, or dead, or both. ThisTask points to NULL :)

No, it didn’t disappeared completely. The ThisTask pointer has been moved (and is used there) to something similar to a thread local storage. It is local, but not local for a thread. It is local to a CPU core. On RPi2 we use four independent local storages and each of them has it’s own ThisTask pointer. Don’t hold your breath, it’s not SMP yet. Far from it :) The scheduler works only on the CPU#0. At least for now.

The TLS is used exclusively by the kernel.resource, which knows best about the low-level part of the system. Exec has become two new architecture-specific macros, named GET_THIS_TASK and SET_THIS_TASK(x). On all architectures they do expand to SysBase->ThisTask, on RaspberryPi they expand to TLS_GET(ThisTask) and equivalent TLS_SET. What about the rest of the AROS code? Well, in that case the only sane way to get ThisTask shall be used — the FindTask(NULL) call.

And here we come to the point where I’ve killed all nightlies. During my ThisTask removal fun I broke accidentally one macro in AROSTCP network stack :) It should be fixed already.

by michal at April 27, 2015 08:27 PM