Uploaded image for project: 'CouchDB'
  1. CouchDB
  2. COUCHDB-3173

Views return corrupt data for text fields containing non-BMP characters

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.1.0
    • Component/s: JavaScript View Server
    • Labels:
      None

      Description

      When inserting a non-BMP character (i.e. characters with a Unicode codepoint above U+FFFF), the content gets corrupted after reading it from a view. At every instance of such characters, there is an exta U+FFFD REPLACEMENT CHARACTER inserted into the text.

      To reproduce, use the following commands.

      Create the document containing a field with the character U+1F604 SMILING FACE WITH OPEN MOUTH AND SMILING EYES:

      $ curl -X PUT -d '{"type":"foo","value":"😄"}' http://localhost:5984/foo/foo2
      {"ok":true,"id":"foo2","rev":"1-d7da3cd352ef74f6391cc13601081214"}
      

      Get the document to ensure that it was saved properly:

      curl -X GET http://localhost:5984/foo/foo2
      {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"😄"}
      

      Create a view that will return that document:

      $ curl --user user:password -X PUT -d '{"language":"javascript","views":{"v":{"map":"function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"}}}' http://localhost:5984/foo/_design/bugdemo
      {"ok":true,"id":"_design/bugdemo","rev":"1-817af2dafecb4cf8213aa7063551daac"}
      

      Get the document from the view:

      $ curl -X GET  http://localhost:5984/foo/_design/bugdemo/_view/v
      {"total_rows":1,"offset":0,"rows":[
      {"id":"foo2","key":"foo2","value":{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"😄�"}}
      ]}
      

      Now we can see that the field value now contains two characters. The original character as well as U+FFFD.

        Issue Links

          Activity

          Hide
          paul.joseph.davis Paul Joseph Davis added a comment -

          Here's a simpler reproducer:

          https://gist.github.com/davisp/3cc1a0e5b0de04a3c027f694d5a4bc31

          The contents of the gist are pasted below for posterity, but I dunno how well Jira and Chrome will store the raw byte values:

          repro.js:

          ["reset",

          {"reduce_limit":"true", "timeout":5000}

          ]
          ["add_fun", "function(doc)

          {if(doc.type===\"foo\")emit(doc._id,doc);}

          "]
          ["map_doc",

          {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"😄"}

          ]

          run.sh:

          cat repro.js | ./bin/couchjs share/server/main.js

          Should have a fix in a few minutes if I'm lucky.

          Show
          paul.joseph.davis Paul Joseph Davis added a comment - Here's a simpler reproducer: https://gist.github.com/davisp/3cc1a0e5b0de04a3c027f694d5a4bc31 The contents of the gist are pasted below for posterity, but I dunno how well Jira and Chrome will store the raw byte values: repro.js: ["reset", {"reduce_limit":"true", "timeout":5000} ] ["add_fun", "function(doc) {if(doc.type===\"foo\")emit(doc._id,doc);} "] ["map_doc", {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"😄"} ] run.sh: cat repro.js | ./bin/couchjs share/server/main.js Should have a fix in a few minutes if I'm lucky.
          Hide
          paul.joseph.davis Paul Joseph Davis added a comment -

          Fixed. PR incoming.

          Show
          paul.joseph.davis Paul Joseph Davis added a comment - Fixed. PR incoming.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user davisp opened a pull request:

          https://github.com/apache/couchdb-couch/pull/202

          Fix CouchJS character replacement

          This was a bad backport from an old bug. We accidentally backed up when
          looking at the second half of a surrogate pair. Instead the backup
          should only happen when we see a low half of a surrogate pair with no
          preceding high half.

          COUCHDB-3173

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/cloudant/couchdb-couch 3173-fix-couchjs-character-replacement

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/couchdb-couch/pull/202.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #202


          commit 37d3778172ca354f124334edf13bc09d9abc28bc
          Author: Paul J. Davis <paul.joseph.davis@gmail.com>
          Date: 2016-10-04T14:45:36Z

          Fix CouchJS character replacement

          This was a bad backport from an old bug. We accidentally backed up when
          looking at the second half of a surrogate pair. Instead the backup
          should only happen when we see a low half of a surrogate pair with no
          preceding high half.

          COUCHDB-3173


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user davisp opened a pull request: https://github.com/apache/couchdb-couch/pull/202 Fix CouchJS character replacement This was a bad backport from an old bug. We accidentally backed up when looking at the second half of a surrogate pair. Instead the backup should only happen when we see a low half of a surrogate pair with no preceding high half. COUCHDB-3173 You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloudant/couchdb-couch 3173-fix-couchjs-character-replacement Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb-couch/pull/202.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #202 commit 37d3778172ca354f124334edf13bc09d9abc28bc Author: Paul J. Davis <paul.joseph.davis@gmail.com> Date: 2016-10-04T14:45:36Z Fix CouchJS character replacement This was a bad backport from an old bug. We accidentally backed up when looking at the second half of a surrogate pair. Instead the backup should only happen when we see a low half of a surrogate pair with no preceding high half. COUCHDB-3173
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 37d3778172ca354f124334edf13bc09d9abc28bc in couchdb-couch's branch refs/heads/master from Paul Joseph Davis
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=37d3778 ]

          Fix CouchJS character replacement

          This was a bad backport from an old bug. We accidentally backed up when
          looking at the second half of a surrogate pair. Instead the backup
          should only happen when we see a low half of a surrogate pair with no
          preceding high half.

          COUCHDB-3173

          Show
          jira-bot ASF subversion and git services added a comment - Commit 37d3778172ca354f124334edf13bc09d9abc28bc in couchdb-couch's branch refs/heads/master from Paul Joseph Davis [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=37d3778 ] Fix CouchJS character replacement This was a bad backport from an old bug. We accidentally backed up when looking at the second half of a surrogate pair. Instead the backup should only happen when we see a low half of a surrogate pair with no preceding high half. COUCHDB-3173
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/couchdb-couch/pull/202

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/couchdb-couch/pull/202

            People

            • Assignee:
              Unassigned
              Reporter:
              loke Loke
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development