MadSharp: Unsafe
Запустил стрим, подпиливаем и пересобираем dotnet-runtime, буду юзать его для кастомного движка в поддержкой шарпа https://www.youtube.com/live/U57DD1g6_nA?feature=shared
Тестовый стрим имхо прошел не плохо. Правда задачу решить не получилось, буду дальше думать))
Раз уж сегодня пятница, вечером в 19:00 МСК проведу стрим с ответами на вопросы. Буду рад ответить всем. Вопросы можно задать в комменты под постом, или прямо на стриме.
Наверняка у вас накопилось много вопросов.
И вот вам стрим где все можно спросит. Вопросы в чат на ютубе или в под этот пост. Начинаем через 5 минут.
https://www.youtube.com/live/Oy2LpjdmjFg?feature=shared
И вот вам стрим где все можно спросит. Вопросы в чат на ютубе или в под этот пост. Начинаем через 5 минут.
https://www.youtube.com/live/Oy2LpjdmjFg?feature=shared
YouTube
Вопросы и ответы про C# Unsafe
https://www.tgoop.com/madcsharp
What we will get when Unity finally implement .NET Core?
Short comparison of the new .NET 8.0 with Burst in terms of runtime performance.
Spoiler: .NET IS AWESOME!
https://meetemq.com/2023/09/18/is-net-8-performant-enough/
Short comparison of the new .NET 8.0 with Burst in terms of runtime performance.
Spoiler: .NET IS AWESOME!
https://meetemq.com/2023/09/18/is-net-8-performant-enough/
Meetemq
Is .NET 8 performant enough?
While I’m experementing on making my execution environment for C++/C# interop I became curious about the performance of .NET 8 (under CoreCLR, not Mono). Maybe I can use it without Burst and transition via P/Invoke at all? Test suite Thankfully there’s already…
Forwarded from viruseg
Написал статью как работать с классом Gradient из под burst. И в процессе малость охренел, от того что моя реализация метода Evaluate оказалось в разы быстрее c++ реализации.
https://habr.com/ru/articles/761572/
https://habr.com/ru/articles/761572/
Хабр
Работа с Gradient через jobs + burst
В Unity есть класс Gradient, который предоставляет удобные средства для управления градиентом в рантайме и редакторе. Но т.к. это класс, а не структура использовать его через Job system и burst...
Finally!
There's a new article on .NET object layouts.
In the Part 1 the layouts of:
1. System.Object
2. T[] array
3. string
This article also reveals how to access those types in unsafe or unmanaged environment. How we can change the object type or get array/string length pointer to change it later.
https://meetemq.com/2023/09/27/managed-primitives-part-i/
There's a new article on .NET object layouts.
In the Part 1 the layouts of:
1. System.Object
2. T[] array
3. string
This article also reveals how to access those types in unsafe or unmanaged environment. How we can change the object type or get array/string length pointer to change it later.
https://meetemq.com/2023/09/27/managed-primitives-part-i/
Meetemq
Managed primitives Part I
In the Part 1 the layouts of:
1. System.Object
2. T[] array
3. string
This article also reveals how to access those types in unsafe or unmanaged environment. How we can change the object type or get array/string length pointer to change it later.
1. System.Object
2. T[] array
3. string
This article also reveals how to access those types in unsafe or unmanaged environment. How we can change the object type or get array/string length pointer to change it later.
Всем привет, давно тут не было новостей. А все потому что я очень много работал, и реализовывал очень интересные вещи.
Работать я более менее закончил, поэтому запускаю стрим с вопросами и ответами про ансейф и все такое, можете заранее писать вопросы в комменты под этой новостью.
Когда? 16.01.2024
18:00 CET (20:00 MSK, 19:00 Kyiv)
Ссылочку выложу ближе к началу
Please open Telegram to view this post
VIEW IN TELEGRAM
Через полтора часа начинаем.
Пишите вопросы в комменты под этим или предыдущим постом.
18:00 CET
20:00 Msk
19:00 Kyiv
https://www.youtube.com/watch?v=vTXDPntqs6Y
Пишите вопросы в комменты под этим или предыдущим постом.
18:00 CET
20:00 Msk
19:00 Kyiv
https://www.youtube.com/watch?v=vTXDPntqs6Y
YouTube
Q&A Про ансейф и перформанс в Unity и C#
Все знают что произошло.
Тем кто сочувствует — мои соболезнования. Сам только отхожу.
Тем кто сочувствует — мои соболезнования. Сам только отхожу.
Ok, so I've managed to call MDI render on DX12 and Vulkan.
Previously I've been working on MDI for DX11 via NvAPI and it worked, tho NVIDIA doesn't have an API for passing an indirect count, but at least CPU count worked fine.
Unity mentioned that they may support MDI in BRG, but they actually don't and they don't even have plans to support it in Unity 6, so I made my little takeover.
There's a working proof of concept.
All material, shader, and render targets (basically a PSO) is set by Unity, so we don't need to go fully native.
Writing native rendering plugins for Unity is hard, mostly due to lack of documentation and even if it exists, there's a big chance it will not work.
😁 You are welcome to ask any questions in the comments, I will answer them in upcoming stream.
Previously I've been working on MDI for DX11 via NvAPI and it worked, tho NVIDIA doesn't have an API for passing an indirect count, but at least CPU count worked fine.
Unity mentioned that they may support MDI in BRG, but they actually don't and they don't even have plans to support it in Unity 6, so I made my little takeover.
There's a working proof of concept.
All material, shader, and render targets (basically a PSO) is set by Unity, so we don't need to go fully native.
Writing native rendering plugins for Unity is hard, mostly due to lack of documentation and even if it exists, there's a big chance it will not work.
Please open Telegram to view this post
VIEW IN TELEGRAM
Сегодня сделаю стрим по обзору Unity Native Rendering Plugin
и ответами на вопросы.
Подключайтесь)
19:00 MSK
19:00 Kyiv
18:00 CEST
https://www.youtube.com/watch?v=XcjNVTHRxqI
и ответами на вопросы.
Подключайтесь)
19:00 MSK
19:00 Kyiv
18:00 CEST
https://www.youtube.com/watch?v=XcjNVTHRxqI
YouTube
Ковыряем Unity Native Rendering Plugin
Длинный обзор того что внутри и что за что отвечает.
How to build Unity Native Rendering Plugin for Windows, Linux, WebGL and Android.
https://gist.github.com/Meetem/b8538de3e7800de3242b6e4f830b87c6
https://gist.github.com/Meetem/b8538de3e7800de3242b6e4f830b87c6
Gist
Unity Render Plugin Compilation Example
Unity Render Plugin Compilation Example. GitHub Gist: instantly share code, notes, and snippets.
Quick overview on using Vulkan-native features in Unity's HLSL:
(bonus point: adding Unsupported functionality in Unity's DXC)
https://meetemq.com/2024/06/14/using-vulkan-in-unity-shaders/
(bonus point: adding Unsupported functionality in Unity's DXC)
https://meetemq.com/2024/06/14/using-vulkan-in-unity-shaders/
Meetemq
Using Vulkan in Unity shaders
Kept you waiting, huh? It’s been a long time, S… since my last post, mostly due to an enormous amount of work I made during this period. Ever wondered that you can utilize some Vulkan features for your good, but Unity seemingly doesn’t support them in HLSL?…
Interseting discovery about Unity when using Vulkan.
See, Vulkan have two entry points to resolve functions:
According to the docs,
So, Unity requests
That means that all of the device/-child functions have an overhead to them. E.g. vkCmd* functions, vkQueue* functions etc. Those functions are actually used with insane frequency, in draw calls, uploading constants, binding buffer ranges etc.
The good thing is that we can reroute
How much performance? Well, you never know before you try. I think for 10K calls, say, on Android it could be measurable, like 0.5ms or something, but that's just my speculation.
See, Vulkan have two entry points to resolve functions:
vkGetInstanceProcAddr
and vkGetDeviceProcAddr
. vkGetInstanceProcAddr
returns functions for a given VkInstance.vkGetDeviceProcAddr
on the other hand returns functions for a given VkDevice OR device child, e.g. VkQueue and VkCommandBuffer.According to the docs,
vkGetDeviceProcAddr
is preferred when resolving device/-child functions, because returned address induce less overhead (probably because it doesn't need to resolve the child from VkInstance).So, Unity requests
vkGetDeviceProcAddr
from Vulkan (or native plugin if hooked with InterceptVulkanInitialization
). But actually never using it.That means that all of the device/-child functions have an overhead to them. E.g. vkCmd* functions, vkQueue* functions etc. Those functions are actually used with insane frequency, in draw calls, uploading constants, binding buffer ranges etc.
The good thing is that we can reroute
vkGetInstanceProcAddr
to return pointers as vkGetDeviceProcAddr
via native plugin. Which can give potential performance increase when events count is high.How much performance? Well, you never know before you try. I think for 10K calls, say, on Android it could be measurable, like 0.5ms or something, but that's just my speculation.
VkSharp is out 🥳
Fully-featured Vulkan interop compatible with .NET and Unity Burst.
GC Free (just a couple for initial setup)
https://github.com/Meetem/VkSharp
Fully-featured Vulkan interop compatible with .NET and Unity Burst.
GC Free (just a couple for initial setup)
https://github.com/Meetem/VkSharp
GitHub
GitHub - Meetem/VkSharp: Vulkan bindings for C# (with Unity Burst support)
Vulkan bindings for C# (with Unity Burst support). Contribute to Meetem/VkSharp development by creating an account on GitHub.
Understanding GPU Virtual Addressing and Sparse Images/Buffers
Since the days of DirectX 11, and possibly even earlier, it's been possible to allocate memory on a GPU without actually using physical memory right away. But what does this mean?
Imagine you can create a buffer with a size of 64GB, even if your GPU only has 4GB of actual VRAM. How is this possible?
This works similarly to how virtual addresses work on a CPU. When you ask the operating system for memory, it doesn't immediately use real physical memory (RAM). Instead, it gives you a virtual address. The actual physical memory is only used when you start using that memory.
When you create a sparse buffer on a GPU, it only allocates a mapping table that looks something like this:
If you try to read this memory before it is backed by real memory, it will return zero because the memory doesn't actually exist yet.
Next, you allocate real, physical memory. This memory is usually aligned in pages (typically 64KB on modern GPUs). For example, let's say we allocate 2 pages, which equals 128KB. Then, we can bind these pages to the virtual address.
You can tell the GPU: "Bind my BufferAddress + 1GB (16384 pages) to the start of my allocated data." The mapping table then updates like this:
After binding the real memory to the virtual address, you can read or write to it in your shaders, compute passes, etc. Essentially, your 64GB buffer only takes up the size of the mapping table plus the 128KB of allocated real memory.
Since the days of DirectX 11, and possibly even earlier, it's been possible to allocate memory on a GPU without actually using physical memory right away. But what does this mean?
Imagine you can create a buffer with a size of 64GB, even if your GPU only has 4GB of actual VRAM. How is this possible?
This works similarly to how virtual addresses work on a CPU. When you ask the operating system for memory, it doesn't immediately use real physical memory (RAM). Instead, it gives you a virtual address. The actual physical memory is only used when you start using that memory.
When you create a sparse buffer on a GPU, it only allocates a mapping table that looks something like this:
Page 0 = Address0
Page 1 = Address1
...
If you try to read this memory before it is backed by real memory, it will return zero because the memory doesn't actually exist yet.
Next, you allocate real, physical memory. This memory is usually aligned in pages (typically 64KB on modern GPUs). For example, let's say we allocate 2 pages, which equals 128KB. Then, we can bind these pages to the virtual address.
You can tell the GPU: "Bind my BufferAddress + 1GB (16384 pages) to the start of my allocated data." The mapping table then updates like this:
Page 16383 = NULL [previous value]
Page 16384 = AllocatedData + 0
Page 16385 = AllocatedData + 65536 Bytes
Page 16386 = NULL [previous value]
After binding the real memory to the virtual address, you can read or write to it in your shaders, compute passes, etc. Essentially, your 64GB buffer only takes up the size of the mapping table plus the 128KB of allocated real memory.
This media is not supported in your browser
VIEW IN TELEGRAM
Implemented bindless for Unity. Compute and Fragment shaders support for now. No weird trickery with compiling shaders manually or anything else. Just normal Unity shaders and native plugin.
Unity LockBufferForWrite: When You Should Prefer Them and Why Your Choice Matters
Написал тут небольшой обобщительный пост, когда юзать и когда не юзать LockBufferForWrite, и чем они отличаются от обычных GraphicsBuffer.
https://meetemq.com/2025/01/26/lockbufferforwrite-vs-other-buffer-types/
Написал тут небольшой обобщительный пост, когда юзать и когда не юзать LockBufferForWrite, и чем они отличаются от обычных GraphicsBuffer.
https://meetemq.com/2025/01/26/lockbufferforwrite-vs-other-buffer-types/
Meetemq
Unity LockBufferForWrite: When You Should Prefer Them and Why Your Choice Matters
Recently (2022.1) Unity introduced LockBufferForWrite buffer type, and respective API functions: Unity GraphicsBuffer.LockForWrite The documentation states "The returned native array points directly to GPU memory if possible." It made me curious, how close…
Compiling Shaders on Device with Fully Dynamic Shader
The idea is that in Unity you can load a shader asset via an asset bundle. In this asset, there is either compiled bytecode (for Metal or Vulkan, as well as DX11 DXBC / DX12 DXIL) or text.
The binary format of the asset bundle is known — there are plenty of open-source rippers on GitHub.
The binary format of the shader is also known.
This leaves only compiling the shader. The simplest case is when you have an Android device, your code is in GLSL, and you only need OpenGL ES. In that case, simply write the text into the shader asset.
[At first you will need to add the available shader variants in the Shader asset as they are always stored, and this will need to be done anyway.]
It's more complicated when you have GLSL and Vulkan:
You then need to compile the SPIR-V Cross compiler for Android. It's written in C++, so there shouldn't be any issues.
If you prefer HLSL — feel free to port DXC to Android. That shouldn't be too hard either.
The resulting output is also written into the shader inside the asset bundle.
Then load the asset bundle into Unity.
??????
PROFIT!
The idea is that in Unity you can load a shader asset via an asset bundle. In this asset, there is either compiled bytecode (for Metal or Vulkan, as well as DX11 DXBC / DX12 DXIL) or text.
The binary format of the asset bundle is known — there are plenty of open-source rippers on GitHub.
The binary format of the shader is also known.
This leaves only compiling the shader. The simplest case is when you have an Android device, your code is in GLSL, and you only need OpenGL ES. In that case, simply write the text into the shader asset.
[At first you will need to add the available shader variants in the Shader asset as they are always stored, and this will need to be done anyway.]
It's more complicated when you have GLSL and Vulkan:
You then need to compile the SPIR-V Cross compiler for Android. It's written in C++, so there shouldn't be any issues.
If you prefer HLSL — feel free to port DXC to Android. That shouldn't be too hard either.
The resulting output is also written into the shader inside the asset bundle.
Then load the asset bundle into Unity.
??????
PROFIT!