DDraceNetwork - developer

Nice blogposts, I like this guy

ddnet-maps

[ddnet-maps:master] 1 new commit

d10e54c A run_as_a_pro - ddnet-maps

def-

[ddnet-web:master] 1 new commit

ec5af5f Add 10 € donation by Laienboy - def-

@Jupstar ✪ soo, I currently have like 3 designs in mind for tee rendering and I'm not sure which one I like best. I do want to draw them all in one draw call. The biggest drawback this will be that keeping all skins in an array texture is a bit harder, and defragmenting unused skins or stuff like that is also harder than simply dropping the skin textures that are not in use anymore. iirc my wgpu renderer was mostly cpu-capped, so doing stuff on the gpu is probably the way to go

1.
    One static vertex buffer with all the tee body parts in their correct size with the right texture coordinates, as well as a uint which identifies the body part.
    One minimal tee vertex buffer with instance stepping, which contains minimal information about the tee
      - position
      - view angle for the eyes
      - relative position of the body
      etc
    The vertex shader then figures out which body part the vertex is from and applies the correct transformations
2.
    Again the same static vertex buffer without the identifying uint
    A vertex buffer with instance stepping, which, for every tee, has a matrix for each body part
    (since some body parts are duplicate the amount of matrices could be reduced, and this could also just be a uniform buffer that gets indexed)
3.
    A single vertex buffer for all tees that includes all transformed vertices, probably with an index buffer

The trade offs that I tried to take into account were:
    - The amount of redundant information
    - The size of the buffer writes each frame
    - The amount of work for the vertex shader

Thoughts? ^^

Patiga

@Jupstar ✪ soo, I currently have like 3 designs in mind for tee rendering and I'm not sure which one I like best. I do want to draw them all in one draw call. The biggest drawback this will be that keeping all skins in an array texture is a bit harder, and defragmenting unused skins or stuff like that is also harder than simply dropping the skin textures that are not in use anymore. iirc my wgpu renderer was mostly cpu-capped, so doing stuff on the gpu is probably the way to go

1.
    One static vertex buffer with all the tee body parts in their correct size with the right texture coordinates, as well as a uint which identifies the body part.
    One minimal tee vertex buffer with instance stepping, which contains minimal information about the tee
      - position
      - view angle for the eyes
      - relative position of the body
      etc
    The vertex shader then figures out which body part the vertex is from and applies the correct transformations
2.
    Again the same static vertex buffer without the identifying uint
    A vertex buffer with instance stepping, which, for every tee, has a matrix for each body part
    (since some body parts are duplicate the amount of matrices could be reduced, and this could also just be a uniform buffer that gets indexed)
3.
    A single vertex buffer for all tees that includes all transformed vertices, probably with an index buffer

The trade offs that I tried to take into account were:
    - The amount of redundant information
    - The size of the buffer writes each frame
    - The amount of work for the vertex shader

Thoughts? ^^

1. i assume the best trade off approach probs cheapest on CPU 2. i assume this could be the fastest of the three (rendering wise) 3. requires lot of buffer updates, so might be the slowest u should generally not except that instancing is faster than draw calls its basically like drawing the same tee instance_count time with an automatic index for i in 0 .. instance_count draw_call as far as i seen it also doesnt execture on the GPU i tried to build a buffer that only contains indices counted up (to reflect what instancing would do if you count from 0 to x) and it was faster than the driver (by a very bit), since i assume it done that in one gpu call instead of simply looping What is your solution to the blending problem?

biggest problem with rendering for teeworlds is we apply one transformation for 3-4 vertices my biggest problem with opengl 3.3 was really the bad control over memory management, updating uniforms in opengl is much faster than uploading/updating a buffer e.g. with vulkan its almost as cheap to created streamed vertices(20 bytes per vertex * 4 * skin_part_count) vs updating matrices/informations (usually sizeof(float) * 2 (<= pos) * rotation (float) * color(4 bytes) ~ 16 bytes) only advantage is ofc that the rotation calc etc happens on the GPU which is usually better than CPU if the number of vertices is high (bcs parallelism) So i guess try out all 3 apporaches xD

hm, how is updating uniforms different to updating a buffer?

11:02

I think with wgpu the best thing to do is to use the least amount of interaction with the api / using the highest abstractions. that is why I am assuming that instance based rendering is the most suitable variant

11:02

but yeah, I wasn't aware that instance based rendering isn't on the gpu

Patiga

hm, how is updating uniforms different to updating a buffer?

I assume the gl driver assumes that uniform buffers are updated more regularly