Name: Beam Summit 2021
Start: 2021-08-04T11:00-05:00
End: 2021-08-06T11:00-05:00

Speaker(s):

How to handle duplicate data in streaming pipelines using Dataflow and Pub/Sub

This session will provide a detailed overview of the origin of duplicates in your streaming data pipelines built using Pub/Sub and Dataflow. We’ll then go over some techniques that Apache Beam SDK provides to handle such duplicate data along with technical trade-offs of each option. There would also be some Q/A and discussion on some common mistakes developers may make.

How to handle duplicate data in streaming pipelines using Dataflow and Pub/Sub

Zeeshan

How to handle duplicate data in streaming pipelines using Dataflow and Pub/Sub