FPU Emulator

General Discussion, STOS.

Moderator: troed

User avatar
agranlund
Posts: 777
Joined: Sun Aug 18, 2019 10:43 pm
Location: Sweden
Contact:

Re: FPU Emulator

Post by agranlund »

agranlund wrote: Mon Jul 11, 2022 8:39 pm If you put it before Mint.prg in the auto folder it'll fool it too.
Oh, I should maybe mention that not all priviledged fpu instructions are fully emulated (yet).
NetBSD didn't need these at all since the whole thing was baked right into their kernel but as a standalone thing we do (for multitasking environments)


TLDNR:
Context switching will not work properly yet and running two programs that does floating point calculations concurrently may/will step on each others toes.
If only one program is doing FPU emulation thingys it should be perfectly fine though.


And the nerdy bits:

- ftrapcc is not implemented. Probably not needed. Lowest on the list of priorities.

- fsave + frestore are only partially implemented.
Enough for most software to identify the presence of a 68881 but the internal running-state is not yet emulated fully - an OS would take the result as a hint that it doesn't need to bother saving/restoring user-visible FPU registers.
I do plan on trying to improve these.
User avatar
agranlund
Posts: 777
Joined: Sun Aug 18, 2019 10:43 pm
Location: Sweden
Contact:

Re: FPU Emulator

Post by agranlund »

frank.lukas wrote: Tue Jul 12, 2022 12:51 pm FPU Benchmark ...

https://github.com/czietz/linpack-atari/
Oh nice!

I'm getting 2.875 KFLOPS on my ST+TF536 @ 50Mhz running Mint 1.16

(I typed in 100, so that means LINPACK 100 right? Is that the same way you ran it @czietz ?)

IMG_6351.jpg
IMG_6351.jpg (85.06 KiB) Viewed 1592 times
czietz
Posts: 548
Joined: Sun Jan 14, 2018 1:02 pm

Re: FPU Emulator

Post by czietz »

No, the program is a bit misleading if you don't know LINPACK: What you type in is the array size. The matrix is half the array size. Therefore, to run the standard LINPACK 100 benchmark (solving a 100x100 matrix), you need to type "200" at the prompt (or just press enter, as this is the default value).

But still, the values would change only maybe by a few percent. As to why your TF536 is faster than the TT (without FPU, of course): I imagine the logic inside the TT will still try to access the FPU and only cause the Line F exception that triggers the emulator after a bit of delay.

EDIT: For comparison: Compling LINPACK for 68000 with gcc's software floating point library and running it on the ST (68000 @ 8 MHz) yields 1.79 kFLOPS.
EDIT2: Running the same gcc software floating point 68000 version on the TT gives 21.4 kFLOPS.
User avatar
agranlund
Posts: 777
Joined: Sun Aug 18, 2019 10:43 pm
Location: Sweden
Contact:

Re: FPU Emulator

Post by agranlund »

czietz wrote: Tue Jul 12, 2022 7:15 pm No, the program is a bit misleading if you don't know LINPACK: What you type in is the array size. The matrix is half the array size. Therefore, to run the standard LINPACK 100 benchmark (solving a 100x100 matrix), you need to type "200" at the prompt (or just press enter, as this is the default value).

But still, the values would change only maybe by a few percent. As to why your TF536 is faster than the TT (without FPU, of course): I imagine the logic inside the TT will still try to access the FPU and only cause the Line F exception that triggers the emulator after a bit of delay.

EDIT: For comparison: Compling LINPACK for 68000 with gcc's software floating point library and running it on the ST (68000 @ 8 MHz) yields 1.79 kFLOPS.
EDIT2: Running the same gcc software floating point 68000 version on the TT gives 21.4 kFLOPS.
Aha! Thank you for the explanation. I got confused by 100 and thought I was supposed to run it like that. Can you tell I’ve never used Linpack before? :lol:

This time, with defaults just pressing enter I got 2.901.
I buy your explanation for the difference with the TT. Maybe there’s a difference in general fastram and/or burst mode speed too between our machines?
There’s a tremendous overhead for each instruction so I’m not surprised it’s so much slower than compiled in softfloat.

The Motorola documentation has tables where they show a similar difference comparing their reference f-line emulator to recompiling with calling the same emulator code without the overhead.

(The emulator has to do a bunch of stack frame manipulations, decode the instruction and calculate effective-address, then there’s the actual emulation of the instruction, and then again with the stack frame manipulation before returning to the caller)
czietz
Posts: 548
Joined: Sun Jan 14, 2018 1:02 pm

Re: FPU Emulator

Post by czietz »

agranlund wrote: Tue Jul 12, 2022 8:50 pm I buy your explanation for the difference with the TT. Maybe there’s a difference in general fastram and/or burst mode speed too between our machines?
The Storm TT in my (overclocked) TT is quite fast. NemBench numbers for FastRAM:

Code: Select all

Linear 32bit read (FastRAM)  -> 26.109 MByte/sec (~491%)
Linear 32bit write (FastRAM) -> 31.170 MByte/sec (~483%)
I would be surprised if your TF536 (albeit clocked at 50 MHz) was much faster.

No, I think the TT will wait for the FPU's DTACK and only if that doesn't come after so many bus cycles, something will assert BERR and thereby trigger the Line-F handler. (Note that the TT was never meant to be operated without FPU.) I wonder how it is handled in the Falcon; but mine is currently in storage.
User avatar
agranlund
Posts: 777
Joined: Sun Aug 18, 2019 10:43 pm
Location: Sweden
Contact:

Re: FPU Emulator

Post by agranlund »

czietz wrote: Tue Jul 12, 2022 9:38 pm I would be surprised if your TF536 (albeit clocked at 50 MHz) was much faster.

No, I think the TT will wait for the FPU's DTACK and only if that doesn't come after so many bus cycles, something will assert BERR and thereby trigger the Line-F handler. (Note that the TT was never meant to be operated without FPU.) I wonder how it is handled in the Falcon; but mine is currently in storage.
I think you are completely right.
The numbers I got on my machine are very similar to the ones you have on the TT so that shouldn't really be a factor.

Code: Select all

Linear 32bit read (FastRAM)  -> 32.564 MByte/sec (~612%)
Linear 32bit write (FastRAM) -> 27.798 MByte/sec (~430%)
User avatar
agranlund
Posts: 777
Joined: Sun Aug 18, 2019 10:43 pm
Location: Sweden
Contact:

Re: FPU Emulator

Post by agranlund »

So this is probably newer going to be useful for anyone but the latest version supports TOS1.x
Because:
  • Kaos TOS1.04 works on 68030 machines
  • Programs can contain both 68000 and 68881 instructions, even though it's not common on Atari (ex: Gembench)
  • I enjoy working with exception handlers and now I got to make 6 different variations, 12 if counting entry from super or user :)

I don't expect people to use this with TOS1, or even stock 68000's, but at least I had fun.
It should work on any combination of TOS and CPU now (up to 68030.. there is no 68040+ support)

The call overhead is slightly larger under TOS1. If TOS1 was relocated or patched to run from somewhere other than the standard 0xFC0000 location then the call overhead is slightly larger still.
User avatar
Badwolf
Posts: 2231
Joined: Tue Nov 19, 2019 12:09 pm

Re: FPU Emulator

Post by Badwolf »

czietz wrote: Tue Jul 12, 2022 6:08 pm
Atari TT030 @ 48 MHz            FPEMU_220712    Real 68882      Factor
-----------------------------------------------------------------------
LINPACK 100: kFLOPS             1.59            281             177x
This kind of factor is why I don't have any truck with anyone complaining about DFB1 running the FPU at clock/2. 8-)

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
User avatar
agranlund
Posts: 777
Joined: Sun Aug 18, 2019 10:43 pm
Location: Sweden
Contact:

Re: FPU Emulator

Post by agranlund »

Badwolf wrote: Wed Jul 13, 2022 1:12 pm This kind of factor is why I don't have any truck with anyone complaining about DFB1 running the FPU at clock/2. 8-)
Haha yeah I wonder what they'd actually use the FPU for? Benchmarks only?

Old FPU-only software.. are there any?
Ports of PC stuff from a time period where floating point was becoming standard? Well, in that case the 68030 is probably not going to cut it for the CPU side anyway :)

I'm sure it's been discussed to death already but it would make a lot more sense if the most common usecase was the default.
Basically that you had to explicitly flag to gcc that "yes, I am porting Quake/Whatever and it absolutely cannot run without hardware floating point", instead of defaulting to that.
The occasional printf with floats is probably completely fine with the softfloat lib :)
User avatar
Badwolf
Posts: 2231
Joined: Tue Nov 19, 2019 12:09 pm

Re: FPU Emulator

Post by Badwolf »

agranlund wrote: Wed Jul 13, 2022 3:47 pm The occasional printf with floats is probably completely fine with the softfloat lib :)
Exactly this. The (oft-discussed) decision to have the MiNT GCC toolchain equate -m68020-60 to 020-60+FPU causes a lot of software to demand an FPU when it doesn't really need it

I see you program as a way around that.

Shame we can't flash a light or sound a klaxon when it decodes a real FPU instruction to ease Mikro's genuine concern regarding bug reports against MiNT for slowness as a result of using the library, though.

Anyway, joining in with the Linpack tests (because why not?) I get, for a 50x50 matrix (PiStorm not stable enough to leave it running for 200x200):-

IMG_5660.jpeg
IMG_5660.jpeg (112.43 KiB) Viewed 1474 times

Code: Select all

PiStorm emulated EC020 with FPEMU:          ~0.75 kflops
PiStorm emulated EC020 with emulated 68881: ~330 kflops.
A ratio of about 1:440!

BW
DFB1 Open source 50MHz 030 and TT-RAM accelerator for the Falcon
DSTB1 Open source 16Mhz 68k and AltRAM accelerator for the ST
Smalliermouse ST-optimised USB mouse adapter based on SmallyMouse2
FrontBench The Frontier: Elite 2 intro as a benchmark
Post Reply

Return to “SOFTWARE”