tag:blogger.com,1999:blog-6042417775578107106.post1761969571721094481..comments2023-08-05T11:30:32.754-04:00Comments on The Hacks of Life: Making SoA TollerableChrishttp://www.blogger.com/profile/14648675681957285299noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-6042417775578107106.post-65290472414051041562021-03-02T13:49:00.310-05:002021-03-02T13:49:00.310-05:00Our code already had getter and setters for data s...Our code already had getter and setters for data such position and orientation; thus the getter/setter call an "extract" function that deinterleaves the SIMD for us to non-SIMD.<br />Thus externally the SIMD is hidden.<br /><br />But indeed, the recommendation is to not call these functions too frequently (i.e. no more than once per frame per object; assuming you are forced to call it on all objects every frame for some reason) since de-interleaving for reading and interleaving again for setting has its overhead.<br /><br />But indeed: It's tradeoffs. We also included a macro to dynamically adjust the "width" at build time.<br /><br />So if for some reason the SIMD code is very detrimental to your use case, you can disable the SIMD code and now data is layed out XYZXYZXYZ in memory. You loose SIMD but gain in lower overhead per object.Matias N. Goldberghttps://www.blogger.com/profile/02186368235104859313noreply@blogger.comtag:blogger.com,1999:blog-6042417775578107106.post-920575556581683472021-02-28T16:59:47.867-05:002021-02-28T16:59:47.867-05:00Right - the idea here is that the "inner"...Right - the idea here is that the "inner" array matches the SIMD stride but the outer array keeps at least _part_ of the objects (e.g. components of a position vector) together?<br /><br />I can see how this would be optimal for certain cases. My one thought is: this is pretty straight forward if you are _always_ going to use SIMD to loop over objects - since you're always going to iterate by 4 objects at a time (or whatever your SIMD lane count is) then the two-part iteration (outer and inner) is basically free.<br /><br />But if for some reason code has to iterate the collection one object at a time, that code becomes more complex. It might be that if you have that kind of code, that kind of code is wrong if you're really serious about the SIMD.<br /><br />Benjamin Supnikhttps://www.blogger.com/profile/04886313844644521178noreply@blogger.comtag:blogger.com,1999:blog-6042417775578107106.post-29457799782662547212021-02-28T10:31:15.394-05:002021-02-28T10:31:15.394-05:00Hi!
Good read. I agree with everything!
I just c...Hi!<br /><br />Good read. I agree with everything!<br /><br />I just came in to add that there is 3rd way to layout position data in memory. You mentioned:<br /><br />1.<br />XXXX<br />YYYY<br />ZZZZ<br /><br />2. XYZXYZXYZXYZ<br /><br />3. The 3rd one is AoSSoA (Arrays of Structures of Structure of Arrays):<br />XXXXYYYYZZZZ<br /><br />e.g.<br />struct AoSSoA<br />{<br /> float x[4];<br /> float y[4];<br /> float z[4];<br />}<br /><br />The benefit is that data for a single position element is one or at most 2 cache lines away.<br />The disadvantage is that it doesn't scale too well for large SIMD e.g. 512 bits SIMD, as the distance for a single element ends in 3 different cache lines. AVX-512 consumes too much power anyway, so that's rarely an issue.Matias N. Goldberghttps://www.blogger.com/profile/02186368235104859313noreply@blogger.com