Working on bug 3921 - Add some Fastpath to BlitNtoNKey and BlitNtoNKeyCopyAlpha
authorSam Lantinga <slouken@libsdl.org>
Mon, 01 Oct 2018 14:43:03 -0700
changeset 12274736699aa224d
parent 12273 c522b61334f1
child 12275 be554ed7b4aa
Working on bug 3921 - Add some Fastpath to BlitNtoNKey and BlitNtoNKeyCopyAlpha

Sylvain

I did various benches. with clang 6.0.0 on linux, and ndk-r16b on android (NDK_TOOLCHAIN_VERSION=clang).

- still see a x10 speed factor.
- with duff_loops, it does not use vectorisation (but doesn't seem to be a problem).

on linux my patch is already at full speed on -O2, whereas the duff_loops need -O3 (200 ms at -03, and 300ms at -02).

I realized that on Android, I had a slight variation which fits best.
both on linux with -O2 and -O3, and on android with 02/03 and armeabi-v7a/arm64.

Here's the patch.
src/video/SDL_blit_N.c
     1.1 --- a/src/video/SDL_blit_N.c	Mon Oct 01 21:29:11 2018 +0300
     1.2 +++ b/src/video/SDL_blit_N.c	Mon Oct 01 14:43:03 2018 -0700
     1.3 @@ -2344,8 +2344,9 @@
     1.4              /* *INDENT-OFF* */
     1.5              DUFFS_LOOP(
     1.6              {
     1.7 -                Uint32 Pixel = (*src32 == ckey) ? *dst32 : *src32;
     1.8 -                *dst32 = Pixel;
     1.9 +                if (*src32 != ckey) {
    1.10 +                    *dst32 = *src32;
    1.11 +                }
    1.12                  ++src32;
    1.13                  ++dst32;
    1.14              },
    1.15 @@ -2418,8 +2419,9 @@
    1.16              /* *INDENT-OFF* */
    1.17              DUFFS_LOOP(
    1.18              {
    1.19 -                Uint32 Pixel_ = ((*src32 & rgbmask) == ckey) ? *dst32 : *src32;
    1.20 -                *dst32 = Pixel_;
    1.21 +                if ((*src32 & rgbmask) != ckey) {
    1.22 +                    *dst32 = *src32;
    1.23 +                }
    1.24                  ++src32;
    1.25                  ++dst32;
    1.26              },