Hive
  1. Hive
  2. HIVE-662

Add a method to parse apache weblogs

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Labels:
      None

      Description

      Apache weblogs is one of the more common formats for people to parse using Hadoop. Unfortunately the method provided to process the logs in Hive has some issues and seems to be on it's way out. See HIVE-519 and comments on HIVE-520. We should replace that method with something that works better and that can be supported in the future.

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          2d 21h 44m 1 Zheng Shao 24/Jul/09 09:42
          Resolved Resolved Closed Closed
          875d 15h 24m 1 Carl Steinbach 17/Dec/11 00:07
          Carl Steinbach made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Namit Jain made changes -
          Assignee Zheng Shao [ zshao ]
          Hide
          Zheng Shao added a comment -

          The example above is from contrib/src/test/queries/clientnegative/serde_regex.q

          Show
          Zheng Shao added a comment - The example above is from contrib/src/test/queries/clientnegative/serde_regex.q
          Zheng Shao made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Zheng Shao added a comment -

          Fixed as a result of HIVE-167. HIVE-167 adds RegexSerDe which allows us to do the following:

          CREATE TABLE serde_regex(
            host STRING,
            identity STRING,
            user STRING,
            time STRING,
            request STRING,
            status STRING,
            size STRING,
            referer STRING,
            agent STRING)
          ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
          WITH SERDEPROPERTIES (
            "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?",
            "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
          )
          STORED AS TEXTFILE;
          
          LOAD DATA LOCAL INPATH "../data/files/apache.access.log" INTO TABLE serde_regex;
          LOAD DATA LOCAL INPATH "../data/files/apache.access.2.log" INTO TABLE serde_regex;
          
          SELECT * FROM serde_regex ORDER BY time;
          
          
          Show
          Zheng Shao added a comment - Fixed as a result of HIVE-167 . HIVE-167 adds RegexSerDe which allows us to do the following: CREATE TABLE serde_regex( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \" ]*|\ "[^\" ]*\ ") (-|[0-9]*) (-|[0-9]*)(?: ([^ \" ]*|\ "[^\" ]*\ ") ([^ \" ]*|\ "[^\" ]*\ "))?" , "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s" ) STORED AS TEXTFILE; LOAD DATA LOCAL INPATH "../data/files/apache.access.log" INTO TABLE serde_regex; LOAD DATA LOCAL INPATH "../data/files/apache.access.2.log" INTO TABLE serde_regex; SELECT * FROM serde_regex ORDER BY time;
          Zheng Shao made changes -
          Link This issue is blocked by HIVE-167 [ HIVE-167 ]
          Zheng Shao made changes -
          Component/s Serializers/Deserializers [ 12312585 ]
          Zheng Shao made changes -
          Field Original Value New Value
          Link This issue relates to HIVE-519 [ HIVE-519 ]
          Hide
          Zheng Shao added a comment -

          Yes, I will work on adding a serde how-to and some examples into the new contrib directory HIVE-639 today.

          Show
          Zheng Shao added a comment - Yes, I will work on adding a serde how-to and some examples into the new contrib directory HIVE-639 today.
          Hide
          Johan Oskarsson added a comment -

          What is the best route to take here? I would assume a custom serde is the way to go?

          Show
          Johan Oskarsson added a comment - What is the best route to take here? I would assume a custom serde is the way to go?
          Johan Oskarsson created issue -

            People

            • Assignee:
              Zheng Shao
              Reporter:
              Johan Oskarsson
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development