Speaker(s):

How to handle duplicate data in streaming pipelines using Dataflow and Pub/Sub


This session will provide a detailed overview of the origin of duplicates in your streaming data pipelines built using Pub/Sub and Dataflow. We’ll then go over some techniques that Apache Beam SDK provides to handle such duplicate data along with technical trade-offs of each option. There would also be some Q/A and discussion on some common mistakes developers may make.