Skip to content

Commit

Permalink
Working on bug 3921 - Add some Fastpath to BlitNtoNKey and BlitNtoNKe…
Browse files Browse the repository at this point in the history
…yCopyAlpha

Sylvain

I did various benches. with clang 6.0.0 on linux, and ndk-r16b on android (NDK_TOOLCHAIN_VERSION=clang).

- still see a x10 speed factor.
- with duff_loops, it does not use vectorisation (but doesn't seem to be a problem).

on linux my patch is already at full speed on -O2, whereas the duff_loops need -O3 (200 ms at -03, and 300ms at -02).

I realized that on Android, I had a slight variation which fits best.
both on linux with -O2 and -O3, and on android with 02/03 and armeabi-v7a/arm64.

Here's the patch.
  • Loading branch information
slouken committed Oct 1, 2018
1 parent 922623e commit 6e35e42
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions src/video/SDL_blit_N.c
Expand Up @@ -2344,8 +2344,9 @@ BlitNtoNKey(SDL_BlitInfo * info)
/* *INDENT-OFF* */
DUFFS_LOOP(
{
Uint32 Pixel = (*src32 == ckey) ? *dst32 : *src32;
*dst32 = Pixel;
if (*src32 != ckey) {
*dst32 = *src32;
}
++src32;
++dst32;
},
Expand Down Expand Up @@ -2418,8 +2419,9 @@ BlitNtoNKeyCopyAlpha(SDL_BlitInfo * info)
/* *INDENT-OFF* */
DUFFS_LOOP(
{
Uint32 Pixel_ = ((*src32 & rgbmask) == ckey) ? *dst32 : *src32;
*dst32 = Pixel_;
if ((*src32 & rgbmask) != ckey) {
*dst32 = *src32;
}
++src32;
++dst32;
},
Expand Down

0 comments on commit 6e35e42

Please sign in to comment.