Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-200

Explore using MySQL changelog as input stream

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Samza is designed with good support for database changelogs, but the current open source release is mostly centered around Kafka. It would be good to have out-of-the-box support for some common databases, such as MySQL, as well.

      Databus is LinkedIn's change capture tool, but the current open source release focuses mainly on Oracle. There is an open source release of Databus for MySQL, but it's a proof-of-concept implementation, not the one used by LinkedIn in production. (The one used by LinkedIn requires a patched version of MySQL.) The open source Databus uses Open Replicator to connect to a MySQL server as a slave, and parses the binlog to find any inserts, updates or deletes.

      I played around a bit with Open Replicator today, and got it working — a small Scala program that could get a real-time feed of all changes happening in a MySQL database. However, I have some doubts about the quality of the library (the code is not very good, it has only very cursory tests, the original maintainer hasn't touched it for 18 months, and there are reports of nasty bugs – eg. blowing up on any negative number). There don't seem to be any better Java binlog parsers out there. But I did skim the source of Open Replicator, and it's not too complicated – it seems quite feasible to write a MySQL binlog parser ourselves.

      This is still very much at exploratory stage, but I think it could be really cool to have database changelog support easily available in Samza.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            martinkl Martin Kleppmann

            Dates

              Created:
              Updated:

              Slack

                Issue deployment