Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1483

Update AWS S3 log format deserializer regex

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.5.0, 0.6.0, 0.7.0
    • None
    • None

    Description

      From HIVE-693:

      Amazon Changed the format of the logs at the beginning of February 2010, so now the new regex is:
      static Pattern regexpat = Pattern.compile( "(
      S+) (
      S+)
      [(.?)
      ] (
      S+) (
      S+) (
      S+) (
      S+) (
      S+) \"(.)\" (
      S) (
      S+) (
      S+) (
      S+) (
      S+) (
      S+) \"(.)\" \"(.*)\"(?: -)?");

      (the only difference is addition of (?: -)? at the end.

      Since Amazon hasn't yet documented the last field, I don't know if it is ok to do a catch-all regex for that field instead of the very specific one I've added.

      Attachments

        Activity

          People

            Unassigned Unassigned
            cwsteinbach Carl Steinbach
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: