NOTES
author Sam Lantinga <slouken@libsdl.org>
Sun, 04 Oct 2009 03:38:01 +0000
changeset 3334 61ea9005fddf
parent 2253 6d99edd791bf
permissions -rw-r--r--
Use gcc's built-in dependency generation, thanks to Adam Strzelecki
slouken@2217
     1
slouken@2217
     2
Sam - Mon Aug  6 23:02:37 PDT 2007
slouken@2217
     3
------
slouken@2217
     4
Add color modulation to blitting
slouken@2217
     5
Blit convert format X -> format Y (needed for texture upload)
slouken@2217
     6
Blit copy / blend / modulate format X -> format X (needed for software renderer)
slouken@2217
     7
slouken@2217
     8
Create full software renderer for framebuffer interfaces.
slouken@2217
     9
slouken@2217
    10
Create texture for surface, keep surface around as pixel source, allow
slouken@2217
    11
copying / blending / modulating from surface to display (automatically
slouken@2217
    12
generate texture?)
slouken@2217
    13
slouken@2217
    14
At that point, should anyone be using anything besides textures for display?
slouken@2218
    15
slouken@2218
    16
IRC - Mon Aug  6 23:50:44 PDT 2007
slouken@2218
    17
-----
slouken@2218
    18
[11:07pm] icculus: so we're clear, "textures" replace "surfaces" from 1.2 when you want to get stuff to the screen? So you have a definitely point where it stops being pixels in memory and starts being an object on the graphics card?
slouken@2218
    19
[11:07pm] icculus: Upload once, blit many
slouken@2218
    20
[11:07pm] icculus: something like that?
slouken@2218
    21
[11:07pm] slouken: That's the idea, yes
slouken@2218
    22
[11:07pm] icculus: ok, just making sure
slouken@2218
    23
[11:08pm] slouken: Many drivers retain the "texture" as a surface and blit to opaque bits which then get copied into a framebuffer.
slouken@2218
    24
[11:08pm] slouken: retain -> would retain
slouken@2218
    25
[11:08pm] icculus: yeah, I figured
slouken@2218
    26
[11:08pm] slouken: That's why the features for surface blitting need to match the features for texture display.
slouken@2218
    27
[11:08pm] icculus: But it gives an abstraction where the app has to make a conscious action: the upload is slow, but then the blit is fast.
slouken@2218
    28
[11:09pm] icculus: This couldn't just map to LockSurface, though?
slouken@2218
    29
[11:09pm] slouken: Yes, exactly.  I wasn't sure whether to make that clear, e.g. can you display any surface, and automatically generate a texture (if necessary)?
slouken@2218
    30
[11:09pm] slouken: If not, it simplifies the framebuffer case.
slouken@2218
    31
[11:10pm] slouken: But even the framebuffer case will probably still want to convert the bits to the optimal format.
slouken@2218
    32
[11:10pm] slouken: And at that point, the non-optimal bits can be thrown away.
slouken@2218
    33
[11:11pm] slouken: e.g. SDL_DisplayFormat()
slouken@2218
    34
[11:10pm] icculus: oh, that's a good point.
slouken@2218
    35
[11:10pm] icculus: hmm
slouken@2218
    36
[11:11pm] icculus: yeah, okay
slouken@2218
    37
[11:11pm] icculus: I was thinking about if the separation is really necessary, or if LockSurface would imply a texture creation (and you just have much more strict locking requirements than most 1.2 targets had)
slouken@2218
    38
[11:11pm] slouken: That's also why I separated the conversion blits from the copy / blend / modulate blits.
slouken@2218
    39
[11:12pm] icculus: But I like that the app has to be conscious of when that's happening
slouken@2218
    40
[11:12pm] slouken: Yeah, I was really leaning towards making it implicit, but the memory savings is pretty significant for artwork.
slouken@2218
    41
[11:12pm] icculus: SDL_compat can wrap the difference for people that can't get their head around it.
slouken@2218
    42
[11:13pm] icculus: At the performance cost, that can be a totally external layer that manages it like 1.2's locking.
slouken@2218
    43
[11:13pm] slouken: Well, SDL_compat is entirely software on top of a single texture that represents the screen.
slouken@2218
    44
[11:14pm] slouken: Yeah, that's the way it's implemented right now.
slouken@2218
    45
[11:14pm] slouken: a HWSURFACE is one that is backed by a texture, and lock/unlock is used to synchronize the bits.
slouken@2218
    46
[11:14pm] slouken: I'm not sure if that's worth keeping though, if SDL_compat is software only.
slouken@2218
    47
[11:15pm] slouken: It would minimize code migration though.
slouken@2218
    48
[11:15pm] icculus: yeah
slouken@2218
    49
[11:15pm] icculus: I expect SDL_compat to be a complete cesspool
slouken@2218
    50
[11:15pm] icculus: just a black box that no one touches or looks at more than necessary
slouken@2218
    51
[11:15pm] slouken: more or less, but it's actually pretty clean right now... I think as a side effect of the new API being pretty clean.
slouken@2218
    52
[11:15pm] slouken: I'm just unsure how much to use texture vs HWSURFACE
slouken@2218
    53
[11:16pm] icculus: Besides, you'd be surprised how quickly you can get people to move if you flag functions as deprecated so that GCC bitches when you use them.
slouken@2218
    54
[11:16pm] slouken:
slouken@2218
    55
[11:16pm] icculus: how much to use texture vs HWSURFACE in 1.3, or in SDL_compat?
slouken@2218
    56
[11:16pm] slouken: in 1.3
slouken@2218
    57
[11:17pm] icculus: Pick one or the other, I would say.
slouken@2218
    58
[11:17pm] icculus: I don't think it's good to confuse people with both terms
slouken@2218
    59
[11:17pm] slouken: yeah
slouken@2218
    60
[11:17pm] icculus: Everything is software until it's a texture.
slouken@2218
    61
[11:17pm] slouken: I'm just not sure which
slouken@2218
    62
[11:17pm] slouken: that's certainly cleanest.
slouken@2218
    63
[11:18pm] slouken: and what's currently implemented
slouken@2218
    64
[11:18pm] slouken: Let's think through the migration process...
slouken@2218
    65
[11:18pm] icculus: Plus dropping the term HWSURFACE gets the point across that a) this isn't 1.2 and b) this is probably going to a 3D api that you should be using anyhow.
slouken@2218
    66
[11:18pm] • slouken nods
slouken@2218
    67
[11:18pm] icculus: I mean, "texture" is what every API calls these things
slouken@2218
    68
[11:18pm] slouken: Yep
slouken@2218
    69
[11:19pm] slouken: So let's work through a migration case...
slouken@2218
    70
[11:19pm] icculus: ok
slouken@2218
    71
[11:19pm] slouken: FooBall loads a big background and a bunch of sprites.  They are png, loaded into SDL_Surface with SDL_image, then converted with SDL_DisplayFormat()
slouken@2218
    72
[11:20pm] slouken: Then the background is blitted every frame and the sprites are blended on top.
slouken@2218
    73
[11:20pm] slouken: In the compat case:
slouken@2218
    74
[11:21pm] slouken: SDL_SetVideoMode() creates a single lockable texture for the display.  DisplayFormat() converts the bits into the optimal format, all blitting is done in software, and SDL_UpdateRects() pushes the bits into the texture and the texture is rendered.
slouken@2218
    75
[11:21pm] slouken: In the 1.3 case:
slouken@2218
    76
[11:22pm] slouken: The background and sprites are converted to textures using SDL_CreateTextureFromSurface(), and the appropriate blending flags are set.  Each frame copies the textures into place and then the display is presented.
slouken@2218
    77
[11:23pm] slouken: compat is software only, 1.3 can be 3D accelerated.
slouken@2218
    78
[11:23pm] icculus: wait, why does all blitting have to be done in software in the SDL_compat case?
slouken@2218
    79
[11:23pm] icculus: I don't understand why SDL_compat can't move things between surfaces and textures at Lock/Unlock time
slouken@2218
    80
[11:24pm] slouken: Because by default the screen isn't created with HWSURFACE, since apps expect to be able to munge the bits.  Therefore, the blits to it have to be done locally.
slouken@2218
    81
[11:24pm] icculus: And all the surfaces are flagged HWSURFACE, so ->pixels is NULL until locked.
slouken@2218
    82
[11:24pm] icculus: oh
slouken@2218
    83
[11:24pm] icculus: It wasn't possible to have a HWSURFACE screen?
slouken@2218
    84
[11:25pm] slouken: Yes, it was, just nobody did it because alpha blending needs to read from video memory, and poking pixels across the bus is slow.
slouken@2218
    85
[11:25pm] slouken: Even in 1.3, the Xlib case needs to be software renderer if you want to do any alpha blending.
slouken@2218
    86
[11:26pm] icculus: But arguably there's no reason that can't all be HWSURFACE (that is, they need to get moved to a texture, even if that's still a software renderer on the backend)
slouken@2218
    87
[11:26pm] icculus: That sounds like it's only a problem when an app touches SDL_GetVideoSurface()->pixels without checking if they should lock it.
slouken@2218
    88
[11:26pm] icculus: Which might be quite common
slouken@2218
    89
[11:27pm] slouken: Yep, in 1.2 the app was able to specify it, and most explicitly don't because it's either not available or bad for alpha blending and direct pixel poking.
slouken@2218
    90
[11:27pm] icculus: hmm.
slouken@2218
    91
[11:28pm] slouken: You see why I've been going round and round for months on this?
slouken@2218
    92
[11:28pm] icculus: Well, we're talking about a compatibility layer; if it's going to crash without LockSurface() on the screen, make them lock it. If that makes it slow, so be it.
slouken@2218
    93
[11:29pm] icculus: The options are make small changes and take a performance hit, or make bigger changes and get a big performance win.
slouken@2218
    94
[11:29pm] icculus: (if touching the framebuffer directly, that is)
slouken@2218
    95
[11:29pm] slouken: Well, at that point it's a compatibility layer, why not just leave it software?  (devil's advocate here)
slouken@2218
    96
[11:29pm] icculus: That's a good point.
slouken@2218
    97
[11:30pm] slouken: Unless we leave everything surfaces to get the best of both worlds... 
slouken@2218
    98
[11:30pm] • slouken gets dizzy
slouken@2218
    99
[11:30pm] icculus: From a technical elegance viewpoint, I can see a good mapping between HWSURFACE and textures, but realistically, you want to motivate people to move away from old APIs.
slouken@2218
   100
[11:31pm] slouken: Yeah probably.  There's a certain attraction to retaining the SDL_Surface usage even for hardware surfaces, simply because of code reuse.  You don't have to have separate code for your software composition and your rendering.  You also get to keep your old image loading code, etc.
slouken@2218
   101
[11:31pm] icculus: man, this really is a pain in the ass, I see what you mean.  
slouken@2218
   102
[11:32pm] slouken: Yeah. 
slouken@2218
   103
[11:32pm] icculus: hmm, let me think on this awhile.
slouken@2218
   104
[11:32pm] slouken: On the other hand, separating the Texture API for rendering is clearer and allows extension in the future.
slouken@2218
   105
[11:32pm] slouken: We could potentially allow you to create a software renderer pointed at an SDL surface....
slouken@2218
   106
[11:32pm] slouken: Hmmm
slouken@2218
   107
[11:33pm] icculus: well, that's what you have now for something like Doom
slouken@2218
   108
[11:33pm] icculus: you render to a shadow surface, and throw a hail-mary with SDL_Flip()
slouken@2218
   109
[11:34pm] slouken: Yep.  I mean a 1.3 "renderer" with an SDL_Surface or another texture as the target.
slouken@2218
   110
[11:34pm] icculus: More or less, that doesn't change. The only important thing there is not generating a new texture every time, but being able to discard what's currently in it for a fresh upload.
slouken@2218
   111
[11:34pm] slouken: Yep
slouken@2218
   112
[11:34pm] icculus: oh, I see
slouken@2218
   113
[11:35pm] icculus: render-to-surface  
slouken@2218
   114
[11:35pm] slouken: lol
slouken@2218
   115
[11:35pm] slouken: yeah
slouken@2218
   116
[11:36pm] slouken: So... where to draw the line with surface vs texture...
slouken@2218
   117
[11:37pm] icculus: I don't know, I would think that basically you want to get out of surfaces as fast as possible
slouken@2218
   118
[11:37pm] icculus: (disregarding SDL_compat for the moment)
slouken@2218
   119
[11:37pm] slouken: Yeah, I think so.
slouken@2218
   120
[11:37pm] slouken: Load the bits up, throw them into a texture, and go
slouken@2218
   121
[11:37pm] icculus: And basically all you really need for that is an "upload" function.
slouken@2218
   122
[11:38pm] slouken: Yep
slouken@2218
   123
[11:38pm] icculus: I'd even be inclined to not allow "Locking," so there's no readback.
slouken@2218
   124
[11:38pm] icculus: well, I'm sure that would cause a fight
slouken@2218
   125
[11:38pm] • slouken thinks
slouken@2218
   126
[11:40pm] slouken: Let me see where I use SDL_LockTexture() right now.
slouken@2218
   127
[11:42pm] slouken: The only time that's useful is to avoid a buffer copy when you're already writing the bits in the correct format.
slouken@2218
   128
[11:42pm] slouken: e.g. lock -> software render into bits -> unlock (upload)
slouken@2218
   129
[11:43pm] slouken: that may already be taken care of by the upload though.
slouken@2218
   130
[11:43pm] slouken: e.g. software render into bits -> upload
slouken@2218
   131
[11:44pm] slouken: Oh yeah, there's probably a memory copy of the bits though, so it's:  upload = copy into cached bits, copy cached bits to video memory as needed.  In that case if you lock to get access to the cached bits directly that's a win.
slouken@2218
   132
[11:44pm] icculus: ah, okay
slouken@2218
   133
[11:47pm] icculus: I don't know, my head hurts.  
slouken@2218
   134
[11:47pm] slouken: Yeah, mine too. 
slouken@2218
   135
[11:47pm] slouken: I was pretty happy with the current setup until I noticed that it's really hard to write a framebuffer driver right now.
slouken@2218
   136
[11:49pm] slouken: I think maybe if I clean that up and separate conversion blit and copy / blend / modulate blit, then it may work pretty cleanly.
slouken@2218
   137
[11:49pm] icculus: yeah
slouken@2218
   138
slouken@2218
   139
[11:54pm] slouken: So recapping... SDL_Surface is only used for loading and app composition.
slouken@2218
   140
[11:55pm] slouken: SDL surface blitting is enhanced to maintain parity with the renderer features, since it's used as the core of the software renderer.
slouken@2218
   141
[11:56pm] slouken: The software renderer is adapted to be a standalone module targeting either an SDL_Surface or an SDL_Texture.
slouken@2218
   142
[11:56pm] slouken: SDL_HWSURFACE goes away
slouken@2218
   143
[11:57pm] slouken: Anything I'm missing?
slouken@2218
   144
[11:58pm] icculus: no, sounds good
slouken@2218
   145
[11:58pm] slouken: This means we have the new 1.3 texture API pretty much as it stands.
slouken@2218
   146
[11:59pm] slouken: Right?
slouken@2218
   147
[11:59pm] icculus: yeah, I think so
slouken@2218
   148
slouken@2218
   149
[12:00am] slouken: I was trying to see if it was possible to make a pluggable blit API, but I was going insane with trying to figure out how to make it fast.
slouken@2218
   150
[12:01am] slouken: If it were software only I could just say, write your own and register it here, but you'd have to maintain parity with the OpenGL and Direct3D renderers as well.
slouken@2218
   151
[12:01am] slouken: At that point you might as well be working in surfaces and uploading to texture. 
slouken@2218
   152
[12:02am] icculus: yeah
slouken@2219
   153
slouken@2219
   154
TODO
slouken@2219
   155
----
slouken@2219
   156
Change textures to static/streaming.  Static textures are not lockable,
slouken@2219
   157
streaming textures are lockable and may have system memory pixels available.
slouken@2219
   158
SDL_compat will use a streaming video texture, and will never be HWSURFACE,
slouken@2219
   159
but may be PREALLOC, if system memory pixels are available.
slouken@2253
   160
*** DONE Thu Aug 16 14:18:42 PDT 2007
slouken@2219
   161
slouken@2219
   162
The software renderer will be abstracted so the surface management can be
slouken@2219
   163
used by any renderer that provides functions to copy surfaces to the window.
slouken@2253
   164
slouken@2253
   165
Blitters...
slouken@2253
   166
----
slouken@2253
   167
Copy blit and fill rect are optimized with MMX and SSE now.
slouken@2253
   168
slouken@2253
   169
Here are the pieces we still need:
slouken@2253
   170
- Merging SDL texture capabilities into the SDL surface system
slouken@2253
   171
- Generic fallback blitter architecture
slouken@2253
   172
- Custom fast path blitters