Guild icon
DDraceNetwork
Development / developer
Development discussion. Logged to https://ddnet.tw/irclogs/ Connected with DDNet's IRC channel, Matrix room and GitHub repositories — IRC: #ddnet on Quakenet | Matrix: #ddnet-developer:matrix.org GitHub: https://github.com/ddnet
Between 2023-02-12 00:00:00Z and 2023-02-13 00:00:00Z
Avatar
Nice blogposts, I like this guy
Avatar
ec5af5f Add 10 € donation by Laienboy - def-
Avatar
@Jupstar ✪ soo, I currently have like 3 designs in mind for tee rendering and I'm not sure which one I like best. I do want to draw them all in one draw call. The biggest drawback this will be that keeping all skins in an array texture is a bit harder, and defragmenting unused skins or stuff like that is also harder than simply dropping the skin textures that are not in use anymore. iirc my wgpu renderer was mostly cpu-capped, so doing stuff on the gpu is probably the way to go 1. One static vertex buffer with all the tee body parts in their correct size with the right texture coordinates, as well as a uint which identifies the body part. One minimal tee vertex buffer with instance stepping, which contains minimal information about the tee - position - view angle for the eyes - relative position of the body etc The vertex shader then figures out which body part the vertex is from and applies the correct transformations 2. Again the same static vertex buffer without the identifying uint A vertex buffer with instance stepping, which, for every tee, has a matrix for each body part (since some body parts are duplicate the amount of matrices could be reduced, and this could also just be a uniform buffer that gets indexed) 3. A single vertex buffer for all tees that includes all transformed vertices, probably with an index buffer The trade offs that I tried to take into account were: - The amount of redundant information - The size of the buffer writes each frame - The amount of work for the vertex shader Thoughts? ^^
Avatar
Avatar
Patiga
@Jupstar ✪ soo, I currently have like 3 designs in mind for tee rendering and I'm not sure which one I like best. I do want to draw them all in one draw call. The biggest drawback this will be that keeping all skins in an array texture is a bit harder, and defragmenting unused skins or stuff like that is also harder than simply dropping the skin textures that are not in use anymore. iirc my wgpu renderer was mostly cpu-capped, so doing stuff on the gpu is probably the way to go 1. One static vertex buffer with all the tee body parts in their correct size with the right texture coordinates, as well as a uint which identifies the body part. One minimal tee vertex buffer with instance stepping, which contains minimal information about the tee - position - view angle for the eyes - relative position of the body etc The vertex shader then figures out which body part the vertex is from and applies the correct transformations 2. Again the same static vertex buffer without the identifying uint A vertex buffer with instance stepping, which, for every tee, has a matrix for each body part (since some body parts are duplicate the amount of matrices could be reduced, and this could also just be a uniform buffer that gets indexed) 3. A single vertex buffer for all tees that includes all transformed vertices, probably with an index buffer The trade offs that I tried to take into account were: - The amount of redundant information - The size of the buffer writes each frame - The amount of work for the vertex shader Thoughts? ^^
1. i assume the best trade off approach probs cheapest on CPU 2. i assume this could be the fastest of the three (rendering wise) 3. requires lot of buffer updates, so might be the slowest u should generally not except that instancing is faster than draw calls its basically like drawing the same tee instance_count time with an automatic index for i in 0 .. instance_count draw_call as far as i seen it also doesnt execture on the GPU i tried to build a buffer that only contains indices counted up (to reflect what instancing would do if you count from 0 to x) and it was faster than the driver (by a very bit), since i assume it done that in one gpu call instead of simply looping What is your solution to the blending problem?
Avatar
biggest problem with rendering for teeworlds is we apply one transformation for 3-4 vertices my biggest problem with opengl 3.3 was really the bad control over memory management, updating uniforms in opengl is much faster than uploading/updating a buffer e.g. with vulkan its almost as cheap to created streamed vertices(20 bytes per vertex * 4 * skin_part_count) vs updating matrices/informations (usually sizeof(float) * 2 (<= pos) * rotation (float) * color(4 bytes) ~ 16 bytes) only advantage is ofc that the rotation calc etc happens on the GPU which is usually better than CPU if the number of vertices is high (bcs parallelism) So i guess try out all 3 apporaches xD
Avatar
hm, how is updating uniforms different to updating a buffer?
11:02
I think with wgpu the best thing to do is to use the least amount of interaction with the api / using the highest abstractions. that is why I am assuming that instance based rendering is the most suitable variant
11:02
but yeah, I wasn't aware that instance based rendering isn't on the gpu
Avatar
Avatar
Patiga
hm, how is updating uniforms different to updating a buffer?
I assume the gl driver assumes that uniform buffers are updated more regularly
11:05
Also buffer has a different lifetime
11:05
Uniform are per shader program
Avatar
ah, true
11:07
instancing does seem to have support from gpus, not sure to which extent tho https://en.wikipedia.org/wiki/Geometry_instancing#Video_cards_that_support_geometry_instancing
11:09
i tried to build a buffer that only contains indices counted up (to reflect what instancing would do if you count from 0 to x) and it was faster than the driver (by a very bit), since i assume it done that in one gpu call instead of simply looping
heh, interesting. for 2. with a uniform buffer I also considered this and then wondered if I can use zero-sized vertex structures and then just use the built-in instance-index
11:12
What is your solution to the blending problem?
isn't the blending problem solved if I render the each tee limbs before the next by having them in this order in the vertex buffer? if you mean the order of the tees, I suppose their order is mostly fixed and I need to take care to have the own tee be in front. although it would still just take a single buffer write to change their order
Avatar
Avatar
Patiga
What is your solution to the blending problem?
isn't the blending problem solved if I render the each tee limbs before the next by having them in this order in the vertex buffer? if you mean the order of the tees, I suppose their order is mostly fixed and I need to take care to have the own tee be in front. although it would still just take a single buffer write to change their order
but are u doing it in seperate draw calls?
Avatar
I think it should work fine with a single draw call
Avatar
that surprises me
11:21
so the GPU waits for other fragments to finish completely?
Avatar
uh I think yes
11:23
I mean z-fighting only happens with depth buffers with floating point inaccuracies afaik
11:24
it should be the same behavior as rendering a quads layer, I think the order is just as defined
11:24
maybe I'm just not seeing the issue you are pointing out ^^
Avatar
but if u overlay 2 half transparent quads
11:26
in one call
11:27
i dunno, to me this sounds like the GPU has to be aware of the content of the fragments instead of spamming them out
Avatar
but if it wasn't synchronized in some way, we would be able to observe some kind of flickering
11:28
or at least inconsistencies in the coloring if we would overlay such quads
Avatar
mh yeah i guess it can guarantee the order of execution at the fragmentation already
11:33
now that i think about it, i think there was a vulkan extension that disabled this feature which i found very useful for ddnet tile map rendering but apparently the driver is just as good https://gpuopen.com/learn/unlock-the-rasterizer-with-out-of-order-rasterization/
GCN hardware supports a special out-of-order rasterization mode which relaxes the ordering guarantee, and allows fragments to be produced out-of-order.
11:36
in this case its about z buffer, but the same problem arrises for the color blending
11:37
from pure feeling this sounds rather expensive to do, i'd love to have that control
11:38
i wonder if the fragments itself wait for other fragments or if the gpu only waits as soon as one fragment overlaps another or something similar
11:38
note this is kinda offtopic to what u want to do
Avatar
interesting, I have no idea how the lower level stuff on the graphics card works
Avatar
just a guarantee not needed for tile rendering
11:39
like, can the gpu detect overlaps and reorder the vertices like the cpu reorders operations? :D
Avatar
yeah would be interesting to know what exactly happens 😄
11:41
i bet in 10 years we have much more control over such stuff for graphics apis
11:42
lot of stuff is still "hardcoded" but not actually needed. e.g. a compute shader can probably reflect all pipeline stages but nobody done it yet and/or it also might be slower of one architecture vs. another (edited)
Avatar
heh, operating on all hardware must be a huge pain
Avatar
i mean it can still be a programmable shader
11:44
its just that its probably not as optimized as the hardcoded stuff
11:44
compute shaders are basically what OpenCL can do, and OpenCL can do all blender rendering
11:44
so its just a matter of effort/time & next gen hardware xD
Avatar
ChillerDragon BOT 2023-02-12 11:49:50Z
11:49
this is fine
Avatar
@Patiga do you render your quads in your twgpu in one draw call? And i explicitly dont mean instancing
Avatar
yes, one quad layer is one draw call
11:52
same in ddnet?
Avatar
nope, it uses instancing
11:53
so its one draw call
11:53
but instanced
Avatar
wait why
11:53
what is your base vertex buffer?
Avatar
ok let me look xD
11:53
maybe i am wrong
Avatar
I have an indexed draw call in twgpu
Avatar
ah yeah i use the vertex index
11:54
so its not instanced
11:54
int TmpQuadIndex = int(gl_VertexID / 4) - gQuadOffset;
Avatar
👍 indices.extend([0, 1, 3, 0, 2, 3].map(|i| i + offset)); in a loop for the creation of the index buffer
Avatar
but now i also remember it was the last thing i did for the vk backend, bcs it was the part where i actually thought about using streamed vertices instead with many quads its still faster to use the GPU for rotation and stuff, but for few or single quads its not
11:56
i think i ended up splitting it like that
11:57
i ended up with some trade off: single quads have a faster code path (e.g. using push constants (which are faster than updating a buffer)) multiple quads use a buffer
Avatar
wait you rotate quads on the cpu? interesting but different rendering methods dependent on the content sounds exhausting
Avatar
nope
Avatar
currently in twgpu, you only have to update the camera buffer and the envelope buffer for the next frame
Avatar
971e1b0 server/pickup: Use enum values and struct size instead of magic numbers - Kaffeine f5d7174 GameContext: Use SERVER_DEMO_CLIENT instead of a magic number - Kaffeine cc43d40 Extract SnapLaserObject() from entities to CGameContext helpers - Kaffeine 2ad5c02 Add a simpler IServer::SnapNewItem() API based on some more generated data - Kaffeine 3a9e4ee Extract SnapPickup() to CGameContext helpers - Kaffeine cb68791 Merge #6330 - bors[bot]
Avatar
Avatar
Patiga
wait you rotate quads on the cpu? interesting but different rendering methods dependent on the content sounds exhausting
i thought about it bcs i disliked the fact that i update a buffer for one quad but instead i am now using push constants
12:00
so the calculation is on the GPU, and i dont update any buffer, which is faster than simply uploading the finished vertices
12:00
that was ok for me, but it's quite possible the CPU would still be faster for a single quad
12:00
but requires a few more bytes to upload per frame 😄
Avatar
hm I don't use push constants currently, but they are actually quite portable in wgpu, they emulate them with uniforms buffers on webgl for example
Avatar
sounds cool
12:03
ah fuck, opengl is supported, but not on the web, misremembered that detail
Avatar
mh thats weird
Avatar
probably simply not implemented yet
Avatar
i'd have thought webgl2 is gles3
12:05
but they GLES3 support is also only "Ok"
12:05
if thats up to date xD
Avatar
they don't seem to be different backends
12:08
I think it would be interesting if wgpu had a compile-feature you could use to disable validation, to remove the overhead once you built a project and are finished with debugging
Avatar
Validation= their own. Or validation layers?
Avatar
#mapping
Avatar
lalala soytari 卍 2023-02-12 12:48:01Z
Sorry
12:48
İ don't see this channel
Avatar
Avatar
Jupstar ✪
Validation= their own. Or validation layers?
their own
13:06
my last callgrind informed me that my cpu-bottlenecked program spends most of its time in wgpu functions
Avatar
Avatar
Patiga
my last callgrind informed me that my cpu-bottlenecked program spends most of its time in wgpu functions
and u tested that in release mode? bcs last time i tested twgpu it was like 100% gpu for me
Avatar
afaik, yes
13:18
I think that was before I introduced those bounding boxes for the tilemaps
13:19
before I did those, I was able to cleanly create one RenderBundle, but set_scissors_rect isn't possible with render bundles, so while that took of work from the gpu, the validation checks now needed to be done every frame instead of once at the creation of the render bundle
13:21
I should also test again if reverting that and doing early returns in the shader would simply be much better. last time we thought about early returns we thought that wgsl doesn't support it, but its just that you need to have a specific control flow in wgsl
13:26
this might also be interesting https://github.com/gpuweb/gpuweb/issues/3479
13:26
and TIL there is a discard statement https://gpuweb.github.io/gpuweb/wgsl/#discard-statement
Avatar
There is cl_dummy_resetonswitch for making sure inputs never stuck on dummy switch. But I would make the bold statement that jump is one of the least needed inputs to ever be stuck. So it would be nice to be able to opt out of it in favor of avoiding annoying fails due to accidental jump stucks while keeping the hook/dir/fire stuck on switch active.
Avatar
Avatar
Patiga
and TIL there is a discard statement https://gpuweb.github.io/gpuweb/wgsl/#discard-statement
discard might not improve the performance tho, it can result in the gpu core using predications for all opcodes instead
Avatar
wouldn't it miss its purpose somewhat if it wouldn't improve performance?
13:58
it says that it turns that invocation into a "helper invocation" https://gpuweb.github.io/gpuweb/wgsl/#helper-invocation
Avatar
yep
14:01
its not comparable to early discards
Avatar
I guess I'll just have to test how well it works
Avatar
where the fragment shader will not even be called
Avatar
are early discards simply when the fragment would be out of the viewport?
Avatar
if for whatever reason it doesnt produce a fragment yeah
Avatar
if the fragment shader isn't even called it doesn't sounds like it could be done in the fragment shader ^^
Avatar
in fact u can purposly discard all fragments
Avatar
I need some help to make this bot to my discord server :
18:22
im at the step 4
18:23
and im lost
Avatar
Avatar
Jupstar ✪
in fact u can purposly discard all fragments
Do u know how to make this? ^^
Avatar
Avatar
Raks
Do u know how to make this? ^^
depends what graphics api you use
18:50
just google it
Avatar
Avatar
Jupstar ✪
depends what graphics api you use
What kind of graphics card do i have?
Avatar
you have a rtx 2080
Avatar
then i shouldnt get a fortune teller
Avatar
But seriously, i need some help
Avatar
then google, u can e.g. use stencil testing to discard all fragments early
Avatar
I did but i just cant find the right one
19:10
the right explain
Avatar
man takes 10 seconds to google
19:11
glEnable(GL_STENCIL_TEST); glStencilFunc(GL_NEVER, 1, 0xFF);
Avatar
all i got its just this
19:13
But i think ill try my best and if i cant make it, i will just give up
Avatar
????
19:13
just stop troll dude
Avatar
Im not trolling!
19:13
Avatar
u ask me about graphics api (edited)
Avatar
Its says i have to download it (edited)
Avatar
and show pictures of php
Avatar
Nevermind. Im sorry for disturbing
Avatar
Avatar
Raks
Nevermind. Im sorry for disturbing
._.
what 2
Avatar
hey
Avatar
A what?
Avatar
???????
19:16
whats going on in this channel rn
Avatar
if u have a question please formulate proper english
19:17
use chatgpt if you arent a native english speaker
Avatar
Avatar
Jupstar ✪
whats going on in this channel rn
It's a channel on ddracenetwork and everything seems to be fine.
Avatar
Avatar
Jupstar ✪
if u have a question please formulate proper english
Excuse me please but I'm not from England and didn't study it in school because I didn't think it was important, and now because of this game I think.
Avatar
Avatar
J0X04D
Excuse me please but I'm not from England and didn't study it in school because I didn't think it was important, and now because of this game I think.
oh then your english is quite good
Avatar
Thank you for the praise
nouis 4
Avatar
ban faker
Exported 156 message(s)