Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-160

Event.TAG_REGEX does not match necessary special characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.9.0, 0.9.1
    • 0.9.1, 0.9.2
    • Sinks+Sources
    • None

    Description

      I tried to use output bucketing based on the scribe category by specifying
      collectorSink("hdfs://localhost:9000/somepath/%

      {scribe.category}", "somefile-")
      as the sink.

      However, %{scribe.category}

      does not get replaced, and shows up literally in the path name.

      After some poking around, it turns out that the regular expression used to match tags is too restricted in what it matches:
      final public static String TAG_REGEX = "\\%(\\w|\\%)|\\%

      {(\\w+)\\}

      ";

      The \w character class is equivalent to [a-zA-Z0-9_], so it will never match tags including a dot.

      The regex should be expanded to match dots, and possibly also underscores. Maybe even any character that is not a closing curly brackets:
      final public static String TAG_REGEX = "\\%(\\w|\\%)|\\%

      {([\\w\\.-]+)\\}

      ";
      or
      final public static String TAG_REGEX = "\\%(\\w|\\%)|\\%

      {([^\\}

      ]+)
      }";

      It could be even more elaborate (e.g. it could allow single or double quotes so the tags themselves could contain curly brackets), but I guess it's a much better idea to just keep things reasonable

      Attachments

        Activity

          People

            flume_dzuelke Disabled imported user
            flume_dzuelke Disabled imported user
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: