Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Fixed bug #15
SDL_blit_A.mmx-speed.patch.txt --
        Speed improvements and a bugfix for the current GCC inline mmx
        asm code:
        - Changed some ops and removed some resulting useless ones.
        - Added some instruction parallelism (some gain)
        The resulting speed on my Xeon improved upto 35% depending on
        the function (measured in fps).
        - Fixed a bug where BlitRGBtoRGBSurfaceAlphaMMX() was
        setting the alpha component on the destination surfaces (to
        opaque-alpha) even when the surface had none.

SDL_blit_A.mmx-msvc.patch.txt --
        MSVC mmx intrinsics version of the same GCC asm code.
        MSVC compiler tries to parallelize the code and to avoid
        register stalls, but does not always do a very good job.
        Per-surface blending MSVC functions run quite a bit faster
        than their pure-asm counterparts (upto 55% faster for 16bit
        ones), but the per-pixel blending runs somewhat slower than asm.

- BlitRGBtoRGBSurfaceAlphaMMX and BlitRGBtoRGBPixelAlphaMMX (and all
variants) can now also handle formats other than (A)RGB8888. Formats
like RGBA8888 and some quite exotic ones are allowed -- like
RAGB8888, or actually anything having channels aligned on 8bit
boundary and full 8bit alpha (for per-pixel alpha blending).
The performance cost of this change is virtually 0 for per-surface
alpha blending (no extra ops inside the loop) and a single non-MMX
op inside the loop for per-pixel blending. In testing, the per-pixel
alpha blending takes a ~2% performance hit, but it still runs much
faster than the current code in CVS. If necessary, a separate function
with this functionality can be made.

This code requires Processor Pack for VC6.
  • Loading branch information
slouken committed Mar 15, 2006
1 parent cc4b955 commit ff71047
Showing 1 changed file with 960 additions and 384 deletions.

0 comments on commit ff71047

Please sign in to comment.