Derby
  1. Derby
  2. DERBY-5201

Create tools for reading the contents of the seg0 directory

    Details

    • Type: Task Task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 10.9.1.0
    • Fix Version/s: None
    • Component/s: Tools
    • Labels:
      None

      Description

      It would be nice to have tools which read Derby data files (the files in the seg0 directory) without disturbing their contents.

      1. TableSignatureReader.java
        6 kB
        Rick Hillegas
      2. DataFileReader.java
        49 kB
        Rick Hillegas
      3. DataFileReader.java
        52 kB
        Rick Hillegas
      4. DataFileReader.java
        60 kB
        Rick Hillegas
      5. DataFileReader.java
        60 kB
        Rick Hillegas

        Issue Links

          Activity

          Hide
          Rick Hillegas added a comment -

          Attaching DataFileReader.java. This program reads a file in the seg0 directory and streams out its contents as human-readable xml. Verbose printing shows the column data as byte arrays. This can be refined further by giving the tool a row signature--if you do that, the column data is deserialized into objects and the toString() results are shown. The actual database is not booted or otherwise disturbed. A transient in-memory helper database is created if you want to deserialize the column contents.

          More work could be done formatting overflow data.

          I have run the tool on the SYSCONGOMERATES file (c20.dat) in databases created by 10.0, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, and 10.9 (trunk). I have also tested the tool on a file containing a UDT.

          Here is the tool's usage message:

          Usage:

          java DataFileReader $dataFileName [ -v ] [ -d $D ] [ -p $P ] [ -n $N ]

          -v Verbose. Print out records and slot tables. Field data appears as byte arrays. If you do not set this flag, the tool just decodes the page headers.
          -d Data signature. This makes a verbose printout turn the field data into objects. $D is a row signature, e.g., "( a int, b varchar( 30 ) )"
          -p Starting page. $P is a number which must be at least 1, the first page to read after the header. Page 0 (the header) is always read.
          -n Number of pages to read. $N is a positive number. Defaults to all subsequent pages.

          For example, the following command deserializes all of the records in the SYSCONGLOMERATES file:

          java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )"

          Note the special 'serializable' type in the preceding example. Use 'serializable' for user-defined types and for the system columns which are objects.

          Here are some sample use cases:

          -------------------------------------------

          1) Decode an entire data file, putting the resulting xml in the file z.xml. You can then view that file using a browser like Firefox, which lets you collapse and expand the elements.

          java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )" > z.xml

          -------------------------------------------

          2) Pretty-print the file header:

          java DataFileReader db/seg0/c20.dat -n 1
          <?xml version="1.0" encoding="UTF-8" standalone="no"?>
          <dataFile>
          <fileHeader>
          <allocPage number="0">
          <formatableID>118</formatableID>
          <pageHeader>
          <isOverFlowPage>false</isOverFlowPage>
          <status hexvalue="1">
          <flag>VALID_PAGE</flag>
          </status>
          <pageVersion>9</pageVersion>
          <slotsInUse>0</slotsInUse>
          <nextRecordID>6</nextRecordID>
          <pageGeneration>0</pageGeneration>
          <previousGeneration>0</previousGeneration>
          <beforeImagePageLocation>0</beforeImagePageLocation>
          <deletedRowCount>1</deletedRowCount>
          </pageHeader>
          <nextAllocPageNumber>-1</nextAllocPageNumber>
          <nextAllocPageOffset>0</nextAllocPageOffset>
          <containerInfoLength>80</containerInfoLength>
          <containerInfo>
          <formatableID>116</formatableID>
          <containerStatus hexvalue="0">
          </containerStatus>
          <pageSize>4096</pageSize>
          <spareSpace>0</spareSpace>
          <minimumRecordSize>12</minimumRecordSize>
          <initialPages>1</initialPages>
          <preAllocSize>8</preAllocSize>
          <firstAllocPageNumber>0</firstAllocPageNumber>
          <firstAllocPageOffset>0</firstAllocPageOffset>
          <containerVersion>0</containerVersion>
          <estimatedRowCount>71</estimatedRowCount>
          <reusableRecordIdSequenceNumber>0</reusableRecordIdSequenceNumber>
          <spare>0</spare>
          <checksum>2463908068</checksum>
          </containerInfo>
          <allocationExtent>
          <extentOffset>4096</extentOffset>
          <extentStart>1</extentStart>
          <extentEnd>10216</extentEnd>
          <extentLength>8</extentLength>
          <extentStatus hexvalue="30000010">
          <flag>HAS_UNFILLED_PAGES</flag>
          <flag>KEEP_UNFILLED_PAGES</flag>
          <flag>NO_DEALLOC_PAGE_MAP</flag>
          </extentStatus>
          <preAllocLength>7</preAllocLength>
          <reserved1>0</reserved1>
          <reserved2>0</reserved2>
          <reserved3>0</reserved3>
          <freePages totalLength="8" bitsThatAreSet="0"/>
          <unFilledPages totalLength="8" bitsThatAreSet="1"/>
          </allocationExtent>
          </allocPage>
          </fileHeader>
          <pageCount>1</pageCount>
          </dataFile>

          -------------------------------------------

          3) Count the number of pages in a data file:

          java DataFileReader db/seg0/c20.dat | grep pageCount
          <pageCount>9</pageCount>

          -------------------------------------------

          4) Decode 3 pages, starting at page 2. This one is a little tricky because the header page is always decoded. So you need to ask for 4 pages (3 data pages plus 1 header page):

          java DataFileReader db/seg0/c20.dat -v -p 4 -n 3

          Show
          Rick Hillegas added a comment - Attaching DataFileReader.java. This program reads a file in the seg0 directory and streams out its contents as human-readable xml. Verbose printing shows the column data as byte arrays. This can be refined further by giving the tool a row signature--if you do that, the column data is deserialized into objects and the toString() results are shown. The actual database is not booted or otherwise disturbed. A transient in-memory helper database is created if you want to deserialize the column contents. More work could be done formatting overflow data. I have run the tool on the SYSCONGOMERATES file (c20.dat) in databases created by 10.0, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, and 10.9 (trunk). I have also tested the tool on a file containing a UDT. Here is the tool's usage message: Usage: java DataFileReader $dataFileName [ -v ] [ -d $D ] [ -p $P ] [ -n $N ] -v Verbose. Print out records and slot tables. Field data appears as byte arrays. If you do not set this flag, the tool just decodes the page headers. -d Data signature. This makes a verbose printout turn the field data into objects. $D is a row signature, e.g., "( a int, b varchar( 30 ) )" -p Starting page. $P is a number which must be at least 1, the first page to read after the header. Page 0 (the header) is always read. -n Number of pages to read. $N is a positive number. Defaults to all subsequent pages. For example, the following command deserializes all of the records in the SYSCONGLOMERATES file: java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )" Note the special 'serializable' type in the preceding example. Use 'serializable' for user-defined types and for the system columns which are objects. Here are some sample use cases: ------------------------------------------- 1) Decode an entire data file, putting the resulting xml in the file z.xml. You can then view that file using a browser like Firefox, which lets you collapse and expand the elements. java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )" > z.xml ------------------------------------------- 2) Pretty-print the file header: java DataFileReader db/seg0/c20.dat -n 1 <?xml version="1.0" encoding="UTF-8" standalone="no"?> <dataFile> <fileHeader> <allocPage number="0"> <formatableID>118</formatableID> <pageHeader> <isOverFlowPage>false</isOverFlowPage> <status hexvalue="1"> <flag>VALID_PAGE</flag> </status> <pageVersion>9</pageVersion> <slotsInUse>0</slotsInUse> <nextRecordID>6</nextRecordID> <pageGeneration>0</pageGeneration> <previousGeneration>0</previousGeneration> <beforeImagePageLocation>0</beforeImagePageLocation> <deletedRowCount>1</deletedRowCount> </pageHeader> <nextAllocPageNumber>-1</nextAllocPageNumber> <nextAllocPageOffset>0</nextAllocPageOffset> <containerInfoLength>80</containerInfoLength> <containerInfo> <formatableID>116</formatableID> <containerStatus hexvalue="0"> </containerStatus> <pageSize>4096</pageSize> <spareSpace>0</spareSpace> <minimumRecordSize>12</minimumRecordSize> <initialPages>1</initialPages> <preAllocSize>8</preAllocSize> <firstAllocPageNumber>0</firstAllocPageNumber> <firstAllocPageOffset>0</firstAllocPageOffset> <containerVersion>0</containerVersion> <estimatedRowCount>71</estimatedRowCount> <reusableRecordIdSequenceNumber>0</reusableRecordIdSequenceNumber> <spare>0</spare> <checksum>2463908068</checksum> </containerInfo> <allocationExtent> <extentOffset>4096</extentOffset> <extentStart>1</extentStart> <extentEnd>10216</extentEnd> <extentLength>8</extentLength> <extentStatus hexvalue="30000010"> <flag>HAS_UNFILLED_PAGES</flag> <flag>KEEP_UNFILLED_PAGES</flag> <flag>NO_DEALLOC_PAGE_MAP</flag> </extentStatus> <preAllocLength>7</preAllocLength> <reserved1>0</reserved1> <reserved2>0</reserved2> <reserved3>0</reserved3> <freePages totalLength="8" bitsThatAreSet="0"/> <unFilledPages totalLength="8" bitsThatAreSet="1"/> </allocationExtent> </allocPage> </fileHeader> <pageCount>1</pageCount> </dataFile> ------------------------------------------- 3) Count the number of pages in a data file: java DataFileReader db/seg0/c20.dat | grep pageCount <pageCount>9</pageCount> ------------------------------------------- 4) Decode 3 pages, starting at page 2. This one is a little tricky because the header page is always decoded. So you need to ask for 4 pages (3 data pages plus 1 header page): java DataFileReader db/seg0/c20.dat -v -p 4 -n 3
          Hide
          Rick Hillegas added a comment - - edited

          Attaching a new version of DataFileReader. This version adds support for pretty-printing data files in encrypted databases:

          Usage:

          java DataFileReader $dataFileName [ -v ] [ -d $D ] [ -p $P ] [ -n $N ] [ -e $bootPassword $serviceProperties ]

          -v Verbose. Print out records and slot tables. Field data appears as byte arrays. If you do not set this flag, the tool just decodes the page headers.
          -d Data signature. This makes a verbose printout turn the field data into objects. $D is a row signature, e.g., "( a int, b varchar( 30 ) )"
          -p Starting page. $P is a number which must be at least 1, the first page to read after the header. Page 0 (the header) is always read.
          -n Number of pages to read. $N is a positive number. Defaults to all subsequent pages.
          -e If the database is encrypted, you must supply the boot password and the location of service.properties.

          For example, the following command deserializes all of the records in the SYSCONGLOMERATES file:

          java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )"

          Note the special 'serializable' type in the preceding example. Use 'serializable' for user-defined types and for the system columns which are objects.

          The following example decrypts and deserializes an entire SYSCONGLOMERATES file, dumping the result into an xml file for inspection:

          java DataFileReader wombat/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )" -e Wednesday wombat/service.properties > z.xml

          Show
          Rick Hillegas added a comment - - edited Attaching a new version of DataFileReader. This version adds support for pretty-printing data files in encrypted databases: Usage: java DataFileReader $dataFileName [ -v ] [ -d $D ] [ -p $P ] [ -n $N ] [ -e $bootPassword $serviceProperties ] -v Verbose. Print out records and slot tables. Field data appears as byte arrays. If you do not set this flag, the tool just decodes the page headers. -d Data signature. This makes a verbose printout turn the field data into objects. $D is a row signature, e.g., "( a int, b varchar( 30 ) )" -p Starting page. $P is a number which must be at least 1, the first page to read after the header. Page 0 (the header) is always read. -n Number of pages to read. $N is a positive number. Defaults to all subsequent pages. -e If the database is encrypted, you must supply the boot password and the location of service.properties. For example, the following command deserializes all of the records in the SYSCONGLOMERATES file: java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )" Note the special 'serializable' type in the preceding example. Use 'serializable' for user-defined types and for the system columns which are objects. The following example decrypts and deserializes an entire SYSCONGLOMERATES file, dumping the result into an xml file for inspection: java DataFileReader wombat/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )" -e Wednesday wombat/service.properties > z.xml
          Hide
          Rick Hillegas added a comment -

          The following command decodes the entire SYSCOLUMNS conglomerate:

          java DataFileReader db/seg0/c90.dat -v -d "( a char(36), b char(128), c int, d serializable, e serializable, f char( 36 ), g bigint, h bigint, i bigint )"

          Show
          Rick Hillegas added a comment - The following command decodes the entire SYSCOLUMNS conglomerate: java DataFileReader db/seg0/c90.dat -v -d "( a char(36), b char(128), c int, d serializable, e serializable, f char( 36 ), g bigint, h bigint, i bigint )"
          Hide
          Kathey Marsden added a comment -

          Is this tool useful for printing and understanding indexes. I've been working on a couple cases with index growth and was wondering if the output might provide some insight.

          Show
          Kathey Marsden added a comment - Is this tool useful for printing and understanding indexes. I've been working on a couple cases with index growth and was wondering if the output might provide some insight.
          Hide
          Rick Hillegas added a comment -

          Hi Kathey,

          I think you should be able to run this tool on an index. If you run into trouble, let me know so that we can make the tool better. Thanks.

          Show
          Rick Hillegas added a comment - Hi Kathey, I think you should be able to run this tool on an index. If you run into trouble, let me know so that we can make the tool better. Thanks.
          Hide
          Steve Ferguson added a comment - - edited

          Thanks very much for this. 2 pieces of feedback.

          If I provide a data signature, and a row contains a column with a null value, and Exception is thrown and the program exits. I put a try catch ignore Throwable block around dvd.readExternalFromArray( is ); to circumvent this.

          I think it would nice if a data signature is present to add an XML element to output the name of the column in addition to its index.

          I also found this link helpful in conjunction with the tool to query out my ddl to pass into the tool as a data signature for tables with lots of columns.

          http://wiki.apache.org/db-derby/ListTableColumns

          Show
          Steve Ferguson added a comment - - edited Thanks very much for this. 2 pieces of feedback. If I provide a data signature, and a row contains a column with a null value, and Exception is thrown and the program exits. I put a try catch ignore Throwable block around dvd.readExternalFromArray( is ); to circumvent this. I think it would nice if a data signature is present to add an XML element to output the name of the column in addition to its index. I also found this link helpful in conjunction with the tool to query out my ddl to pass into the tool as a data signature for tables with lots of columns. http://wiki.apache.org/db-derby/ListTableColumns
          Hide
          Rick Hillegas added a comment -

          Attaching a new rev of the DataFileReader. This reworks formatRecords, cribbing logic from StoredPage.recordToString(). I have found this to be a more reliable guide to decoding page data.

          Show
          Rick Hillegas added a comment - Attaching a new rev of the DataFileReader. This reworks formatRecords, cribbing logic from StoredPage.recordToString(). I have found this to be a more reliable guide to decoding page data.
          Hide
          Rick Hillegas added a comment -

          Attaching TableSignatureReader. This class prints out the signature of a table, suitable for passing to DataFileReader. All JAVA_OBJECT types are reported as "serializable". Here's how to use this program:

          Usage:

          java TableSignatureReader connectionURL schemaName tableName

          where

          connectionURL e.g. "jdbc:derby:db1"
          schemaName case-sensitive schema name, e.g. APP
          tableName case-sensitive table name, e.g. T1

          Here's an example use...

          java TableSignatureReader "jdbc:derby:db1" SYS SYSALIASES

          ...which prints out the following result:

          ( "ALIASID" char( 36 ), "ALIAS" varchar( 128 ), "SCHEMAID" char( 36 ), "JAVACLASSNAME" long varchar, "ALIASTYPE" char( 1 ), "NAMESPACE" char( 1 ), "SYSTEMALIAS" boolean, "ALIASINFO" serializable, "SPECIFICNAME" varchar( 128 ) )

          Show
          Rick Hillegas added a comment - Attaching TableSignatureReader. This class prints out the signature of a table, suitable for passing to DataFileReader. All JAVA_OBJECT types are reported as "serializable". Here's how to use this program: Usage: java TableSignatureReader connectionURL schemaName tableName where connectionURL e.g. "jdbc:derby:db1" schemaName case-sensitive schema name, e.g. APP tableName case-sensitive table name, e.g. T1 Here's an example use... java TableSignatureReader "jdbc:derby:db1" SYS SYSALIASES ...which prints out the following result: ( "ALIASID" char( 36 ), "ALIAS" varchar( 128 ), "SCHEMAID" char( 36 ), "JAVACLASSNAME" long varchar, "ALIASTYPE" char( 1 ), "NAMESPACE" char( 1 ), "SYSTEMALIAS" boolean, "ALIASINFO" serializable, "SPECIFICNAME" varchar( 128 ) )
          Hide
          Rick Hillegas added a comment -

          Attaching a new version of DataFileReader. This version incorporates changes made to DataFileVTI on DERBY-6136: the bootPassword argument has been turned into a more general encryptionAttributes argument. Here is the new usage:

          Usage:

          java DataFileReader $dataFileName [ -v ] [ -d $D ] [ -p $P ] [ -n $N ] [ -e $encryptionAttributes $serviceProperties ]

          -v Verbose. Print out records and slot tables. Field data appears as byte arrays. If you do not set this flag, the tool just decodes the page headers.
          -d Data signature. This makes a verbose printout turn the field data into objects. $D is a row signature, e.g., "( a int, b varchar( 30 ) )"
          -p Starting page. $P is a number which must be at least 1, the first page to read after the header. Page 0 (the header) is always read.
          -n Number of pages to read. $N is a positive number. Defaults to all subsequent pages.
          -e If the database is encrypted, you must supply the encryption attributes and the location of service.properties.

          For example, the following command deserializes all of the records in the SYSCONGLOMERATES file:

          java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )"

          Note the special 'serializable' type in the preceding example. Use 'serializable' for user-defined types and for the system columns which are objects.

          Here are examples of using this tool on encrypted databases:

          java DataFileReader encryptedDB/seg0/c490.dat -v -d "( a varchar( 50 ), b char( 11 ) )" -e "encryptionKey=abcd1234efab5678" encryptedDB/service.properties > ~/junk/z.xml

          java DataFileReader bootpasswordDB/seg0/c490.dat -v -d "( a varchar( 50 ), b char( 11 ) )" -e "bootPassword=mysecretpassword" bootpasswordDB/service.properties > ~/junk/zz.xml

          Show
          Rick Hillegas added a comment - Attaching a new version of DataFileReader. This version incorporates changes made to DataFileVTI on DERBY-6136 : the bootPassword argument has been turned into a more general encryptionAttributes argument. Here is the new usage: Usage: java DataFileReader $dataFileName [ -v ] [ -d $D ] [ -p $P ] [ -n $N ] [ -e $encryptionAttributes $serviceProperties ] -v Verbose. Print out records and slot tables. Field data appears as byte arrays. If you do not set this flag, the tool just decodes the page headers. -d Data signature. This makes a verbose printout turn the field data into objects. $D is a row signature, e.g., "( a int, b varchar( 30 ) )" -p Starting page. $P is a number which must be at least 1, the first page to read after the header. Page 0 (the header) is always read. -n Number of pages to read. $N is a positive number. Defaults to all subsequent pages. -e If the database is encrypted, you must supply the encryption attributes and the location of service.properties. For example, the following command deserializes all of the records in the SYSCONGLOMERATES file: java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )" Note the special 'serializable' type in the preceding example. Use 'serializable' for user-defined types and for the system columns which are objects. Here are examples of using this tool on encrypted databases: java DataFileReader encryptedDB/seg0/c490.dat -v -d "( a varchar( 50 ), b char( 11 ) )" -e "encryptionKey=abcd1234efab5678" encryptedDB/service.properties > ~/junk/z.xml java DataFileReader bootpasswordDB/seg0/c490.dat -v -d "( a varchar( 50 ), b char( 11 ) )" -e "bootPassword=mysecretpassword" bootpasswordDB/service.properties > ~/junk/zz.xml

            People

            • Assignee:
              Unassigned
              Reporter:
              Rick Hillegas
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development