缘由
最近调查 Android 上的性能问题时,发现 FrameTime 比 Game Thread、Draw Thread 和 RHI Thread 都大很多,Stat Game、Stat SceneRendering 发现瓶颈不在 Game Thread 和 Draw Thread ,推测瓶颈在 GPU,但 Stat GPU 没数据,调节 r.ScreenPercentage 影响也不大,使用各种 ShowFlag 也没能对 RHI Thread 开销造成多大影响。过程中对 ShowFlag 产生了怀疑和兴趣,遂调查确认之。
说明
- 本文的目的在于从源码和实验的角度对 ShowFlag 的原理和效果进行分析,以明确相关设置的实际影响,以便我们在做性能分析和优化时能有更多确切的上下文和理论知识。
- 说是对虚幻 ShowFlag 的原理机制进行基于源码的剖析,其实确实没太多可以说的。ShowFlag 涉及到渲染管线、 Console 、宏等机制,但本文不对他们进行详细的说明(之前有写过相关文章,如有需要可以参考下)。本文主要对 ShowFlag 的原理和影响进行分析、确认和说明,涉及的源码虽然也属于原理,但其实没什么太深奥的东西,也没必要把简单的东西复杂了说。
- 本文基于 UE5.2 和 UE4.27.2,涉及的流程机制乃至源码在两个版本里都差不多(些许细节差异,不构成影响)。
测试
测试用例
- 新建 Level,选 Basic 或空场景,里边就放个 Cube (当然实际上还有一些其他的内置的,比如 Light 啥的),把 WorldSetting 里的 GameMode 设置为 GameModeBase,这样就不会生成角色 Mesh,尽量避开无关渲染的影响。相机视角对着 Cube 。
- 开启 Stat UnitMax ,Stat GPU,以便显示基础的 RHI 、Draw 等线程的开销及具体的 GPU 开销数据。
- 在默认情况下(ShowFlag.StaticMeshes 1),通过 RenderDoc 截帧(RenderDoc.CaptureFrame),分析 RenderDoc 截帧数据。在 ShowFlag.StaticMeshes 0 时再来一遍。而后对比两者。
- 注意,此处的测试存在片面性,没有对全部类型的资源及全部的 ShowFlag 进行测试,结果可能也存在片面性,请知悉。
测试结果
Stat SceneRendering
- ShowFlag.StaticMeshes 1
- ShowFlag.StaticMeshes 0
- 说明
- RenderViewFamily 由 2.27ms 降低到 2.20ms。
- InitViews 由 0.18ms 降低到 0.12ms。
- Mesh Draw calls 由平均 11.5 降低到 6。
- 分析
- 即便场景相对很简单,机器性能相对很强悍,还是能看到:ShowFlag.StaticMeshes 对 SceneRendering(或者说对 Draw Thread ) 也有一定的影响,对 Game Thread 基本则没什么影响。
Stat GPU
- ShowFlag.StaticMeshes 1
- ShowFlag.StaticMeshes 0
- 说明
- 平均 GPU 开销由 5.61ms 降低到 5.39ms
- basepass 由 0.25ms 降低到 0.02ms
- 分析
- ShowFlag.StaticMeshes 主要影响 GPU 开销,尤其是 basepass 的开销。
RenderDoc
- ShowFlag.StaticMeshes 1
- ShowFlag.StaticMeshes 0
- 说明
- 使用了 ShowFlag.StaticMeshes 0 之后,basepass 的绘制里没有了 StaticMeshes 相关的绘制。
- 分析
- ShowFlag.StaticMeshes 0 会跳过 StaticMeshes 的绘制。
源码分析
命令的注册
- 截图
- 堆栈
[UnrealEditor-Engine.dll] `FSystemSettings::RegisterShowFlagConsoleVariables'::`2'::FIterSink::OnEngineShowFlag(unsigned int,const FString &) SystemSettings.cpp:148
[UnrealEditor-Engine.dll] FEngineShowFlags::IterateAllFlags<`FSystemSettings::RegisterShowFlagConsoleVariables'::`2'::FIterSink>(FIterSink &) ShowFlagsValues.inl:265
[Inlined] [UnrealEditor-Engine.dll] FSystemSettings::RegisterShowFlagConsoleVariables() SystemSettings.cpp:162
[UnrealEditor-Engine.dll] FSystemSettings::Initialize(bool) SystemSettings.cpp:217
[UnrealEditor.exe] FEngineLoop::PreInitPreStartupScreen(const wchar_t *) LaunchEngineLoop.cpp:2894
[Inlined] [UnrealEditor.exe] FEngineLoop::PreInit(const wchar_t *) LaunchEngineLoop.cpp:4380
[Inlined] [UnrealEditor.exe] EnginePreInit(const wchar_t *) Launch.cpp:41
[UnrealEditor.exe] GuardedMain(const wchar_t *) Launch.cpp:140
[UnrealEditor.exe] LaunchWindowsStartup(HINSTANCE__ *,HINSTANCE__ *,char *,int,const wchar_t *) LaunchWindows.cpp:233
[UnrealEditor.exe] WinMain(HINSTANCE__ *,HINSTANCE__ *,char *,int) LaunchWindows.cpp:284
[Inlined] [UnrealEditor.exe] invoke_main() 0x00007ff6e2a5d996
[UnrealEditor.exe] __scrt_common_main_seh() 0x00007ff6e2a5d975
[kernel32.dll] <unknown> 0x00007ffcc2e57604
[ntdll.dll] <unknown> 0x00007ffcc38226a1
- 说明
- 引擎初始化(EnginePreInit)的时候通过 IConsoleManager 的 RegisterConsoleVariableBitRef 注册命令,把他们转换为 IConsoleObject 放到 ConsoleObjects 中。
命令的调用和值的设置
- 截图
- 堆栈
[UnrealEditor-Core.dll] FConsoleVariableBitRef::Set(const wchar_t *,EConsoleVariableFlags) ConsoleManager.cpp:1121
[UnrealEditor-Core.dll] FConsoleManager::ProcessUserConsoleInput(const wchar_t *,FOutputDevice &,UWorld *) ConsoleManager.cpp:1979
[UnrealEditor-Engine.dll] UEngine::Exec(UWorld *,const wchar_t *,FOutputDevice &) UnrealEngine.cpp:4815
[UnrealEditor-UnrealEd.dll] UUnrealEdEngine::Exec(UWorld *,const wchar_t *,FOutputDevice &) UnrealEdSrv.cpp:661
[UnrealEditor-Engine.dll] UGameViewportClient::Exec(UWorld *,const wchar_t *,FOutputDevice &) GameViewportClient.cpp:3072
[UnrealEditor-Engine.dll] ULocalPlayer::Exec(UWorld *,const wchar_t *,FOutputDevice &) LocalPlayer.cpp:1538
[UnrealEditor-Engine.dll] UPlayer::ConsoleCommand(const FString &,bool) Player.cpp:53
[UnrealEditor-Engine.dll] APlayerController::ConsoleCommand(const FString &,bool) PlayerController.cpp:479
[UnrealEditor-Engine.dll] UConsole::ConsoleCommand(const FString &) Console.cpp:619
[UnrealEditor-Engine.dll] UConsole::InputKey_InputLine(FInputDeviceId,FKey,EInputEvent,float,bool) Console.cpp:975
[UnrealEditor-Engine.dll] UConsole::InputKey(FInputDeviceId,FKey,EInputEvent,float,bool) Console.cpp:1432
[UnrealEditor-Engine.dll] UGameViewportClient::InputKey(const FInputKeyEventArgs &) GameViewportClient.cpp:651
[UnrealEditor-Engine.dll] FSceneViewport::OnKeyUp(const FGeometry &,const FKeyEvent &) SceneViewport.cpp:1094
[UnrealEditor-Slate.dll] SViewport::OnKeyUp(const FGeometry &,const FKeyEvent &) SViewport.cpp:292
//更多略...
- 说明
- 各命令作为继承于 IConsoleObject 的 FConsoleVariableBitRef ,有 Force1MaskPtr 和 Force0MaskPtr 两个属性来标记其当前的设置值。
- 在用户输入指令时,通过 FConsoleManager::ProcessUserConsoleInput 获取到对应的命令 FConsoleVariableBitRef ,调用其 FConsoleVariableBitRef::Set 来修改“其值”。
命令值的获取
- 截图
- 堆栈
[UnrealEditor-Engine.dll] FStaticMeshSceneProxy::GetViewRelevance(const FSceneView *) StaticMeshRender.cpp:1984
[UnrealEditor-Renderer.dll] FRelevancePacket::ComputeRelevance() SceneVisibility.cpp:2412
[Inlined] [UnrealEditor-Renderer.dll] FRelevancePacket::AnyThreadTask() SceneVisibility.cpp:2387
[UnrealEditor-Renderer.dll] <lambda_a90e2bcd6b6574494bdf974833b26e7a>::operator()(int) SceneVisibility.cpp:3172
[Inlined] [UnrealEditor-Renderer.dll] UE::Core::Private::Function::TFunctionRefBase<UE::Core::Private::Function::FFunctionRefStoragePolicy,void __cdecl(int)>::operator()(int) Function.h:600
[Inlined] [UnrealEditor-Renderer.dll] ParallelForImpl::CallBody(const TFunctionRef<void __cdecl(int)> &,const TArrayView<std::nullptr_t,int> &,int,int) ParallelFor.h:80
[UnrealEditor-Renderer.dll] ParallelForImpl::ParallelForInternal<TFunctionRef<void __cdecl(int)>,<lambda_06f0d5d1c5fc9bccb0e7d1fd1e70ca42>,std::nullptr_t>(const wchar_t *,int,int,TFunctionRef<void __cdecl(int)>,<lambda_06f0d5d1c5fc9bccb0e7d1fd1e70ca42>,EParallelForFlags,const TArrayView<std::nullptr_t,int> &) ParallelFor.h:137
[UnrealEditor-Renderer.dll] ParallelFor(int,TFunctionRef<void __cdecl(int)>,bool,bool) ParallelFor.h:447
[UnrealEditor-Renderer.dll] ComputeAndMarkRelevanceForViewParallel(FRHICommandListImmediate &,const FScene *,FViewInfo &,FViewCommands &,unsigned char,TArray<unsigned char,TConcurrentLinearArrayAllocator<FSceneRenderingBlockAllocationTag> > &,TArray<unsigned char,TConcurrentLinearArrayAllocator<FSceneRenderingBlockAllocationTag> > &) SceneVisibility.cpp:3169
[UnrealEditor-Renderer.dll] FSceneRenderer::ComputeViewVisibility(FRHICommandListImmediate &,Type,TArray<FViewCommands,TSizedInlineAllocator<4,32,TSizedDefaultAllocator<32> > > &,FGlobalDynamicIndexBuffer &,FGlobalDynamicVertexBuffer &,FGlobalDynamicReadBuffer &,FInstanceCullingManager &) SceneVisibility.cpp:4893
[UnrealEditor-Renderer.dll] FDeferredShadingSceneRenderer::BeginInitViews(FRDGBuilder &,const FSceneTexturesConfig &,Type,FILCUpdatePrimTaskData &,FInstanceCullingManager &) SceneVisibility.cpp:5293
[UnrealEditor-Renderer.dll] FDeferredShadingSceneRenderer::Render(FRDGBuilder &) DeferredShadingRenderer.cpp:2638
[UnrealEditor-Renderer.dll] RenderViewFamilies_RenderThread(FRHICommandListImmediate &,const TArray<FSceneRenderer *,TSizedDefaultAllocator<32> > &) SceneRendering.cpp:4410
//更多略...
- 说明
- 在渲染管线中,通过 FStaticMeshSceneProxy::GetViewRelevance 获取显示的相关性时,会通过 (FSceneView)View->Family->EngineShowFlags.StaticMeshes 设置 (FPrimitiveViewRelevance)Result.bDrawRelevance 。
- 注意 FEngineShowFlags 的源码中是没有直接写出命令的名字的,是在编译时通过宏(SHOWFLAG_ALWAYS_ACCESSIBLE)自动生成的,其名字跟具体的 FConsoleVariableBitRef 的绑定可见“命令的注册”中的 ShowFlagsValues.inl
对渲染管线的影响
- 截图
- 堆栈
//ComputeDynamicMeshRelevance
[UnrealEditor-Renderer.dll] ComputeDynamicMeshRelevance(EShadingPath,bool,const FPrimitiveViewRelevance &,const FMeshBatchAndRelevance &,FViewInfo &,FMeshPassMask &,FPrimitiveSceneInfo *,const FPrimitiveBounds &) SceneVisibility.cpp:3398
[UnrealEditor-Renderer.dll] FSceneRenderer::GatherDynamicMeshElements(TArray<FViewInfo,TSizedDefaultAllocator<32> > &,const FScene *,const FSceneViewFamily &,FGlobalDynamicIndexBuffer &,FGlobalDynamicVertexBuffer &,FGlobalDynamicReadBuffer &,const TArray<unsigned char,TConcurrentLinearArrayAllocator<FSceneRenderingBlockAllocationTag> > &,const TArray<unsigned char,TConcurrentLinearArrayAllocator<FSceneRenderingBlockAllocationTag> > &,FMeshElementCollector &) SceneVisibility.cpp:3556
[UnrealEditor-Renderer.dll] FSceneRenderer::ComputeViewVisibility(FRHICommandListImmediate &,Type,TArray<FViewCommands,TSizedInlineAllocator<4,32,TSizedDefaultAllocator<32> > > &,FGlobalDynamicIndexBuffer &,FGlobalDynamicVertexBuffer &,FGlobalDynamicReadBuffer &,FInstanceCullingManager &) SceneVisibility.cpp:4963
[UnrealEditor-Renderer.dll] FDeferredShadingSceneRenderer::BeginInitViews(FRDGBuilder &,const FSceneTexturesConfig &,Type,FILCUpdatePrimTaskData &,FInstanceCullingManager &) SceneVisibility.cpp:5293
[UnrealEditor-Renderer.dll] FDeferredShadingSceneRenderer::Render(FRDGBuilder &) DeferredShadingRenderer.cpp:2638
[UnrealEditor-Renderer.dll] RenderViewFamilies_RenderThread(FRHICommandListImmediate &,const TArray<FSceneRenderer *,TSizedDefaultAllocator<32> > &) SceneRendering.cpp:4410
//更多略...
//FRelevancePacket::ComputeRelevance
[UnrealEditor-Renderer.dll] FRelevancePacket::ComputeRelevance() SceneVisibility.cpp:2434
[Inlined] [UnrealEditor-Renderer.dll] FRelevancePacket::AnyThreadTask() SceneVisibility.cpp:2387
[UnrealEditor-Renderer.dll] <lambda_a90e2bcd6b6574494bdf974833b26e7a>::operator()(int) SceneVisibility.cpp:3172
[Inlined] [UnrealEditor-Renderer.dll] UE::Core::Private::Function::TFunctionRefBase<UE::Core::Private::Function::FFunctionRefStoragePolicy,void __cdecl(int)>::operator()(int) Function.h:600
[Inlined] [UnrealEditor-Renderer.dll] ParallelForImpl::CallBody(const TFunctionRef<void __cdecl(int)> &,const TArrayView<std::nullptr_t,int> &,int,int) ParallelFor.h:80
[UnrealEditor-Renderer.dll] ParallelForImpl::ParallelForInternal<TFunctionRef<void __cdecl(int)>,<lambda_06f0d5d1c5fc9bccb0e7d1fd1e70ca42>,std::nullptr_t>(const wchar_t *,int,int,TFunctionRef<void __cdecl(int)>,<lambda_06f0d5d1c5fc9bccb0e7d1fd1e70ca42>,EParallelForFlags,const TArrayView<std::nullptr_t,int> &) ParallelFor.h:137
[UnrealEditor-Renderer.dll] ParallelFor(int,TFunctionRef<void __cdecl(int)>,bool,bool) ParallelFor.h:447
[UnrealEditor-Renderer.dll] ComputeAndMarkRelevanceForViewParallel(FRHICommandListImmediate &,const FScene *,FViewInfo &,FViewCommands &,unsigned char,TArray<unsigned char,TConcurrentLinearArrayAllocator<FSceneRenderingBlockAllocationTag> > &,TArray<unsigned char,TConcurrentLinearArrayAllocator<FSceneRenderingBlockAllocationTag> > &) SceneVisibility.cpp:3169
[UnrealEditor-Renderer.dll] FSceneRenderer::ComputeViewVisibility(FRHICommandListImmediate &,Type,TArray<FViewCommands,TSizedInlineAllocator<4,32,TSizedDefaultAllocator<32> > > &,FGlobalDynamicIndexBuffer &,FGlobalDynamicVertexBuffer &,FGlobalDynamicReadBuffer &,FInstanceCullingManager &) SceneVisibility.cpp:4893
[UnrealEditor-Renderer.dll] FDeferredShadingSceneRenderer::BeginInitViews(FRDGBuilder &,const FSceneTexturesConfig &,Type,FILCUpdatePrimTaskData &,FInstanceCullingManager &) SceneVisibility.cpp:5293
[UnrealEditor-Renderer.dll] FDeferredShadingSceneRenderer::Render(FRDGBuilder &) DeferredShadingRenderer.cpp:2638
[UnrealEditor-Renderer.dll] RenderViewFamilies_RenderThread(FRHICommandListImmediate &,const TArray<FSceneRenderer *,TSizedDefaultAllocator<32> > &) SceneRendering.cpp:4410
//更多略...
//FDrawCommandRelevancePacket::AddCommandsForMesh
[UnrealEditor-Renderer.dll] FDrawCommandRelevancePacket::AddCommandsForMesh(int,const FPrimitiveSceneInfo *,const FStaticMeshBatchRelevance &__restrict,const FStaticMeshBatch &__restrict,const FScene *__restrict,bool,Type) SceneVisibility.cpp:2236
[UnrealEditor-Renderer.dll] FRelevancePacket::MarkRelevant() SceneVisibility.cpp:2749
[Inlined] [UnrealEditor-Renderer.dll] FRelevancePacket::AnyThreadTask() SceneVisibility.cpp:2388
[UnrealEditor-Renderer.dll] <lambda_a90e2bcd6b6574494bdf974833b26e7a>::operator()(int) SceneVisibility.cpp:3172
[Inlined] [UnrealEditor-Renderer.dll] UE::Core::Private::Function::TFunctionRefBase<UE::Core::Private::Function::FFunctionRefStoragePolicy,void __cdecl(int)>::operator()(int) Function.h:600
[Inlined] [UnrealEditor-Renderer.dll] ParallelForImpl::CallBody(const TFunctionRef<void __cdecl(int)> &,const TArrayView<std::nullptr_t,int> &,int,int) ParallelFor.h:80
[UnrealEditor-Renderer.dll] ParallelForImpl::ParallelForInternal<TFunctionRef<void __cdecl(int)>,<lambda_06f0d5d1c5fc9bccb0e7d1fd1e70ca42>,std::nullptr_t>(const wchar_t *,int,int,TFunctionRef<void __cdecl(int)>,<lambda_06f0d5d1c5fc9bccb0e7d1fd1e70ca42>,EParallelForFlags,const TArrayView<std::nullptr_t,int> &) ParallelFor.h:137
[UnrealEditor-Renderer.dll] ParallelFor(int,TFunctionRef<void __cdecl(int)>,bool,bool) ParallelFor.h:447
[UnrealEditor-Renderer.dll] ComputeAndMarkRelevanceForViewParallel(FRHICommandListImmediate &,const FScene *,FViewInfo &,FViewCommands &,unsigned char,TArray<unsigned char,TConcurrentLinearArrayAllocator<FSceneRenderingBlockAllocationTag> > &,TArray<unsigned char,TConcurrentLinearArrayAllocator<FSceneRenderingBlockAllocationTag> > &) SceneVisibility.cpp:3169
[UnrealEditor-Renderer.dll] FSceneRenderer::ComputeViewVisibility(FRHICommandListImmediate &,Type,TArray<FViewCommands,TSizedInlineAllocator<4,32,TSizedDefaultAllocator<32> > > &,FGlobalDynamicIndexBuffer &,FGlobalDynamicVertexBuffer &,FGlobalDynamicReadBuffer &,FInstanceCullingManager &) SceneVisibility.cpp:4893
[UnrealEditor-Renderer.dll] FDeferredShadingSceneRenderer::BeginInitViews(FRDGBuilder &,const FSceneTexturesConfig &,Type,FILCUpdatePrimTaskData &,FInstanceCullingManager &) SceneVisibility.cpp:5293
[UnrealEditor-Renderer.dll] FDeferredShadingSceneRenderer::Render(FRDGBuilder &) DeferredShadingRenderer.cpp:2638
[UnrealEditor-Renderer.dll] RenderViewFamilies_RenderThread(FRHICommandListImmediate &,const TArray<FSceneRenderer *,TSizedDefaultAllocator<32> > &) SceneRendering.cpp:4410
//更多略...
- 说明
- 在渲染管线中,在 StaticMeshes 调用 AddCommandsForMesh 渲染前,会先判断其 (FPrimitiveViewRelevance)ViewRelevance 的 bDrawRelevance ,这个值受到 ShowFlag 影响。即如果 ShowFlag 设置了 StaticMeshes 为 0 ,那么不会进行 StaticMeshes 的绘制调用。
- bDrawRelevance 的影响也体现在 StaticMeshes 的其他相关渲染流程中,如 ComputeDynamicMeshRelevance、NumVisibleDynamicMeshElements 、RelevantStaticPrimitives 等。
总结
- 虚幻的 ShowFlag 指令,正如其名,主要用于控制场景中特定类型的元素的显示或隐藏。
- 其原理是通过各种类型元素的“渲染开关”来控制渲染管线中对应类型元素的渲染的绘制调用的执行与否。即如果关闭了相关元素的绘制,那么在渲染管线中将不会执行对应的绘制调用。
- 其主要影响 RHI Thread 或者说 GPU 的性能,某些情况下对 Draw Thread 也有一定影响,一般情况下对 Game Thread 没有太大影响。
- 再次声明,测试具有片面性,结果和结论仅供参考。
最后
- 说下开头提到的写作缘由,为啥 Stat GPU 无效(看不到具体数据),为啥 ShowFlag 作用不大。
- OpenGL 上 GPU 时间的获取有问题, Stat GPU 没数据,GameThread,RenderThread 和 RHIThread 不一定等于 FrameTime(可通过 SnapDragon Profiler 、 RenderDoc 或 Android GPU Inspector (AGI) 来确认 GPU 开销)。
- ShowFlag 作用不大的可能原因: gfur 的开销影响太大,顶点和面数太高,严重影响了 ShowFlag 对 GPU 的效果。。(没有很科学的依据,从结果得到的结论)
相关链接
**声明:**本文来自公众号:GameDevLearning,转载请附上原文链接及本声明。