CouchDB
  1. CouchDB
  2. COUCHDB-259

Ability to store abitrary data in attachment stubs

    Details

    • Type: Wish Wish
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.8.1
    • Fix Version/s: None
    • Component/s: Database Core
    • Labels:
      None
    • Skill Level:
      Regular Contributors Level (Easy to Medium)

      Description

      I suggest the ability to store arbitrary data in line with single attachments to a doc. The mooted use is to store metadata about that attachment.

      For example, a current attachment:

      m['_attachments']
      => {"yamanote.jpg"=>{"content_type"=>"image/jpeg", "stub"=>true, "length"=>382613}}

      Desired behaviour is to be able to insert persistent metadata like so:

      m['_attachments']
      => {"yamanote.jpg"=>{"content_type"=>"image/jpeg", "stub"=>true, "length"=>382613, "width" => 800, "height" => 600, "md5" => "95de7a118ee28824afa8d2ad8fe5819f"}}

      And many other use cases according to media type.

        Issue Links

          Activity

          Hide
          Robert Newson added a comment -

          +1, I'd like somewhere to store md5 or sha1 values for the attachments too. I attempted a local hack, and while I can manage it on retrieval, my attempts to get it into the stored file failed.

          Show
          Robert Newson added a comment - +1, I'd like somewhere to store md5 or sha1 values for the attachments too. I attempted a local hack, and while I can manage it on retrieval, my attempts to get it into the stored file failed.
          Hide
          Robert Newson added a comment -

          also, this should be for 0.9.0? It's my understanding that this changes the on-disk format, making it a blocker for 0.9 (or rejected..)

          Show
          Robert Newson added a comment - also, this should be for 0.9.0? It's my understanding that this changes the on-disk format, making it a blocker for 0.9 (or rejected..)
          Hide
          Sho Fukamachi added a comment -

          Robert: If you're game, you could always make a view that includes a hash function in JS and then emit output from that ...

          http://pajhome.org.uk/crypt/md5/md5src.html

          I didn't really know which version to set as the target. I am hoping a core dev will correct it to what it should be...

          Show
          Sho Fukamachi added a comment - Robert: If you're game, you could always make a view that includes a hash function in JS and then emit output from that ... http://pajhome.org.uk/crypt/md5/md5src.html I didn't really know which version to set as the target. I am hoping a core dev will correct it to what it should be...
          Hide
          Robert Newson added a comment -

          My intention for storing the md5 with the attachment is to detect corruption and to enable an enhancement to keep one copy of any attachment per database (so-called 'single-instance store').

          Doing something so cpu-intensive like md5 or sha1 in javascript never even occured to me...

          Show
          Robert Newson added a comment - My intention for storing the md5 with the attachment is to detect corruption and to enable an enhancement to keep one copy of any attachment per database (so-called 'single-instance store'). Doing something so cpu-intensive like md5 or sha1 in javascript never even occured to me...
          Hide
          Sven Helmberger added a comment - - edited

          This issue is linked to 217 because if you don't know which attachment changed you'd have to recalculate the md5sum of every attachment just to be sure it hasn't changed.

          Show
          Sven Helmberger added a comment - - edited This issue is linked to 217 because if you don't know which attachment changed you'd have to recalculate the md5sum of every attachment just to be sure it hasn't changed.
          Hide
          Sho Fukamachi added a comment -

          Robert: For small things it's probably OK. JS isn't that slow. Wouldn't want to run a 600M video through it, though. I do think it's a valid approach though - the idea of "generative views" is fascinating. For large files I'd want to consider writing the view in ruby, though, which has a C md5/sha1 module.

          Sven: It's relevant to that ticket but I think you're touching on a much bigger subject there, that of revisions for individual attachments, which is an issue in its own right and beyond the scope of this one.

          Show
          Sho Fukamachi added a comment - Robert: For small things it's probably OK. JS isn't that slow. Wouldn't want to run a 600M video through it, though. I do think it's a valid approach though - the idea of "generative views" is fascinating. For large files I'd want to consider writing the view in ruby, though, which has a C md5/sha1 module. Sven: It's relevant to that ticket but I think you're touching on a much bigger subject there, that of revisions for individual attachments, which is an issue in its own right and beyond the scope of this one.
          Hide
          Robert Newson added a comment -

          Perhaps the couchdb file format already includes checksums for detecting corruption within attachments (but, if it does, I haven't found it). My intention was to put an md5 on every attachment, which could include 600M videos, etc. Specifically, calling crypto:md5() seems preferable, assuming it's native.

          The Ruby idea is interesting but, again, I wanted the digest calculated on the data being added and stored with the attachment to detect corruption, generating it in a view is only equivalent if you never regenerate it, which is fragile.

          Show
          Robert Newson added a comment - Perhaps the couchdb file format already includes checksums for detecting corruption within attachments (but, if it does, I haven't found it). My intention was to put an md5 on every attachment, which could include 600M videos, etc. Specifically, calling crypto:md5() seems preferable, assuming it's native. The Ruby idea is interesting but, again, I wanted the digest calculated on the data being added and stored with the attachment to detect corruption, generating it in a view is only equivalent if you never regenerate it, which is fragile.
          Hide
          Adam Kocoloski added a comment -

          0.10.0 is out the door, adjusting FixFor on all remaining unresolved issues to 0.11 by default

          Show
          Adam Kocoloski added a comment - 0.10.0 is out the door, adjusting FixFor on all remaining unresolved issues to 0.11 by default
          Hide
          Gabor Ratky added a comment -

          It's been a while since there was any activity on this ticket. I see 1.2 as Fix Version and while low priority, would love to see this show up in trunk.

          In our scenario, the attachments are an important part of the initial data load process, we get CSV/TSV data from our partners that we import and create documents out of it. The _attachments stubs would be the preferred way to store these kinds of information:

          • Date when the attachment was imported
          • The revpos when the attachment was imported (so we can decide whether a newer file was uploaded since)
          • Other useful information about the content of the attachment itself.

          Both MD5 and metadata suggestions in the past sounds like great scenarios as well.

          The easy workarond is just to keep this information under doc.attachments instead of doc._attachments, but that needs to be kept in sync. If an attachment is deleted, the metadata will still exist under doc.attachments.

          Anybody else watching this share their possible useful scenarios to get this bumped?

          Show
          Gabor Ratky added a comment - It's been a while since there was any activity on this ticket. I see 1.2 as Fix Version and while low priority, would love to see this show up in trunk. In our scenario, the attachments are an important part of the initial data load process, we get CSV/TSV data from our partners that we import and create documents out of it. The _attachments stubs would be the preferred way to store these kinds of information: Date when the attachment was imported The revpos when the attachment was imported (so we can decide whether a newer file was uploaded since) Other useful information about the content of the attachment itself. Both MD5 and metadata suggestions in the past sounds like great scenarios as well. The easy workarond is just to keep this information under doc.attachments instead of doc._attachments, but that needs to be kept in sync. If an attachment is deleted, the metadata will still exist under doc.attachments. Anybody else watching this share their possible useful scenarios to get this bumped?
          Hide
          Jan Lehnardt added a comment -

          Remove FixVersion.

          Show
          Jan Lehnardt added a comment - Remove FixVersion.
          Hide
          Bernhard Hörmann added a comment -

          +1

          Show
          Bernhard Hörmann added a comment - +1
          Hide
          Benjamin Young added a comment -
          Show
          Benjamin Young added a comment - Some work on this was done in a couple Github PRs: https://github.com/apache/couchdb/pull/54 https://github.com/apache/couchdb/pull/51
          Hide
          ASF GitHub Bot added a comment -

          Github user Humbedooh commented on the pull request:

          https://github.com/apache/couchdb/pull/51#issuecomment-35156296

          Should this be closed?

          Show
          ASF GitHub Bot added a comment - Github user Humbedooh commented on the pull request: https://github.com/apache/couchdb/pull/51#issuecomment-35156296 Should this be closed?

            People

            • Assignee:
              Dave Cottlehuber
              Reporter:
              Sho Fukamachi
            • Votes:
              8 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development