tag:blogger.com,1999:blog-6042417775578107106.post4407353966762598313..comments2023-08-05T11:30:32.754-04:00Comments on The Hacks of Life: There Must Be 50 Ways to Draw Your Streaming QuadsChrishttp://www.blogger.com/profile/14648675681957285299noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-6042417775578107106.post-30262849412214034492016-05-26T20:22:32.707-04:002016-05-26T20:22:32.707-04:00From my experience developing a voxel engine: for ...From my experience developing a voxel engine: for small vertex strides and small "instance" sizes, using a geometry shader to blow up each instance is MUCH faster than instancing. As the vertex stride increases, however, the geometry shader becomes inferior to simply filling a VBO with a pair of triangles for each quad. As for instancing in that case, common wisdom amongst the voxel gurus is that instancing is intended for *large* vertex-per-instance counts, and small ones won't perform well at all. At the end of the day, though, this only matters if you're not already fill-rate bound... a likely scenario if your voxels are better looking than Minecraft's :)Anonymoushttps://www.blogger.com/profile/12639179589912926578noreply@blogger.comtag:blogger.com,1999:blog-6042417775578107106.post-51475598650096823982013-11-06T12:45:36.850-05:002013-11-06T12:45:36.850-05:0095% of the time you want to go pos0, color0, norma...95% of the time you want to go pos0, color0, normal0, pos1, color1, normal1. The GPU is going to fetch data from memory in relatively big chunks (it has a wide memory system) - if all of the data in a single vertex is nearby, then all of the data it fetches is useful immediately, and cache utilization is good.<br /><br />The only exception is if you need some data to change per frame and some is static - in that case, keep the data separate AND use separate VBOs. That way you can set the unchanged data to STATIC_DRAW and the changing data to STREAM_DRAW.Benjamin Supnikhttps://www.blogger.com/profile/04886313844644521178noreply@blogger.comtag:blogger.com,1999:blog-6042417775578107106.post-20931089633945289932013-11-06T11:28:51.609-05:002013-11-06T11:28:51.609-05:00I'm curious, I've only this week started p...I'm curious, I've only this week started playing around properly with OpenGL so please excuse my ignorance here..<br /><br /><br />When you say your vertex attributes for all mesh data is interleaved in a single VBO... <br /><br />Is that to say that the vertices and the attributes themselves are interleaved? e.g.:<br />{pos_0,colour_0,pos_1,colour_1,...,pos_n,colour_n}<br /><br /><br />Or is each individual object encoded such that the vertices are all sequential, followed by a sequential block of attributes?<br />{pos_0,pos_1,...,pos_n,colour_0,colour_1,...,colour_n}<br /><br /><br />I suspect the latter, such the vertices for an object are sequential, followed by a sequential block of attributes for that object, and then repeat for each object...<br /><br />Is there any measurable benefit of either approach?<br />Timmohttps://www.blogger.com/profile/04444307072673174001noreply@blogger.comtag:blogger.com,1999:blog-6042417775578107106.post-73773491331140662092013-04-21T11:43:20.979-04:002013-04-21T11:43:20.979-04:00Hi Vladimir,
We always keep all vertex attributes...Hi Vladimir,<br /><br />We always keep all vertex attributes interleaved in a single VBO for both IOS and desktop - only the indices are in a separate VBO.<br /><br />We do this to maintain locality - whatever hw fetches a vertex, we want the vertex to sit in a cache line, etc.<br /><br />Cheers<br />BenBenjamin Supnikhttps://www.blogger.com/profile/04886313844644521178noreply@blogger.comtag:blogger.com,1999:blog-6042417775578107106.post-63051747524158370722013-04-20T21:46:59.811-04:002013-04-20T21:46:59.811-04:00You are right, "Test, compare and analyze usi...You are right, "Test, compare and analyze using Instruments" is what I recommend to everyone too :D<br /><br />Btw do you have all vertex data in one VBO and stream everything or do you split them into multiple so you can update (or not) them separately? I mean in the iOS version. I prefer the interleaved method but in our case it doesn't really matter... I think :)<br />Vladimir Hrincarhttps://www.blogger.com/profile/09850555042355790347noreply@blogger.comtag:blogger.com,1999:blog-6042417775578107106.post-22174851867994549252013-04-20T21:26:56.536-04:002013-04-20T21:26:56.536-04:00Totally true - on IOS with GLES 2.0 you do have th...Totally true - on IOS with GLES 2.0 you do have the option to write a shader to 'decompress' some kind of packed up transformation. But you also bring up the other issue: the GPUs aren't super-beefy. I think it's a question of what the game is bound up on...if the game is dying for CPU, an offload is a win. If the CPU is idle and you've maxed out the poor GPU, then trying to use GPU-based transform is not a win.<br /><br />I think in our latest code we use indexed triangles and orphaning and it works pretty well. You can tell pretty easily when you've won with instruments - when you're doing something the driver doesn't like a ton of CPU stuff with ominous names will show up in the stack trace under the glDrawXXX call. :-)Benjamin Supnikhttps://www.blogger.com/profile/04886313844644521178noreply@blogger.comtag:blogger.com,1999:blog-6042417775578107106.post-52838400655019078422013-04-20T21:19:45.145-04:002013-04-20T21:19:45.145-04:00Wow, what a nice article!
You are right, the str...Wow, what a nice article! <br /><br />You are right, the streaming method is the best for rendering dynamic 2D geometry under iOS (OpenGL ES 2.0). The thing is how to do it the most efficiently - I mean how to send data to GPU… and also what data. <br /><br />You touched the latter part by mentioning "instancing" and "compressing". When I say "instancing under OpenGL ES 2.0" I mean that you send object's transformation matrix (compressed if possible) as a part of the vertex data, extract/recreate the matrix in vertex shader and then do to transform on GPU. This is the method which I use in our game for rendering many dynamic 2D objects (which have relatively few vertices on their own) in one draw call. Btw I use "linked" triangle strips which seemed as a better solution than options like indexed triangle list. I didn't perform any relevant comparison test to prove that, though.<br /><br />But I wonder if doing the transform on GPU is faster that doing it on CPU (using NEON instructions if possible). PowerVR GPUs' SIMD architecture should be quite a good match for that task but considering differences in frequencies (1Ghz CPU vs 200Mhz GPU) I am not sure if GPU is really that faster doing 2D transform. Also there are many different SoCs in iOS devices - starting with A4 in iPhone4 (Cortex-A8/SGX535) and ending with the newest A6x in iPad4 (?Cortex-A9 Custom?/SGX554) so GPU/CPU performance may vary from device to device. There is no recommended method. We are not vertex bound so I don't plan to investigate further… for now :)<br /><br />Now about the problem how to send data to GPU (update your VBO) This always translates to questions like: "Single-buffered, double-buffered or even triple-buffered? Use a ring buffer? Discard an old buffer? Use glMapBuffer(Range), glBufferData or glBufferSubdata?", and more similar ones to which only the GPU driver engineers know the answer :) There are just too many ways how to do it, which means you can easily do it wrong. But do I actually care? Nope, because in most iOS games the differences are really not so huge and again vary from one iDevice to another… I tested some variants long time ago on my old iPhone4 and iPad2 and decided to stick with the best one after that. So right now I use tripe-buffered approach with glMapBuffer while discarding an old buffer. I haven't done any comparison tests since then so maybe another method is better on newer devices but I think I don't care enough becuase the game still runs 60 frames per seconds and it has other problems :)<br /><br />So in the end I am streaming my dynamic geometry using my preferred method and all seems pretty fast in that part of the pipeline because I am fill-rate bound anyway :)<br /><br />#end_of_rant<br /><br /><br />PS.: Btw I really like your older articles about double-buffered VBOs :)Vladimir Hrincarhttps://www.blogger.com/profile/09850555042355790347noreply@blogger.com