Speaker(s):

Profiling Apache Beam Python pipelines


Often the first version of our Apache Beam pipelines do not perform as well as we would like, and sometimes it is not so obvious to find the places where we could optimize performance; sometimes it will be a function parsing JSON, some others the bottleneck will be a external source or sink, or we have a very hot key and we are trying to group by key. In this talk, we will explore how to profile Apache Beam Python pipelines to identify potential bottlenecks in our code.