You belong together - detecting linked accounts at Ricardo

Ricardo - the largest online marketplace in Switzerland with over 4 million members - has been challenged with detecting duplicated or linked accounts for quite a while. With the release of the Dataflow Runner v2 for Python and the FlexRS cost savings, we felt confident to build our first larger-scale Python-based pipeline in Apache Beam to tackle this problem. This talk is about the learnings along the way from someone who started out using Beam with the Java SDK and fell in love with the Python one.