Uploaded image for project: 'CouchDB'
  1. CouchDB
  2. COUCHDB-3173

Views return corrupt data for text fields containing non-BMP characters

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.1.0
    • Component/s: JavaScript View Server
    • Labels:
      None

      Description

      When inserting a non-BMP character (i.e. characters with a Unicode codepoint above U+FFFF), the content gets corrupted after reading it from a view. At every instance of such characters, there is an exta U+FFFD REPLACEMENT CHARACTER inserted into the text.

      To reproduce, use the following commands.

      Create the document containing a field with the character U+1F604 SMILING FACE WITH OPEN MOUTH AND SMILING EYES:

      $ curl -X PUT -d '{"type":"foo","value":"😄"}' http://localhost:5984/foo/foo2
      {"ok":true,"id":"foo2","rev":"1-d7da3cd352ef74f6391cc13601081214"}
      

      Get the document to ensure that it was saved properly:

      curl -X GET http://localhost:5984/foo/foo2
      {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"😄"}
      

      Create a view that will return that document:

      $ curl --user user:password -X PUT -d '{"language":"javascript","views":{"v":{"map":"function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"}}}' http://localhost:5984/foo/_design/bugdemo
      {"ok":true,"id":"_design/bugdemo","rev":"1-817af2dafecb4cf8213aa7063551daac"}
      

      Get the document from the view:

      $ curl -X GET  http://localhost:5984/foo/_design/bugdemo/_view/v
      {"total_rows":1,"offset":0,"rows":[
      {"id":"foo2","key":"foo2","value":{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"😄�"}}
      ]}
      

      Now we can see that the field value now contains two characters. The original character as well as U+FFFD.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                loke Loke
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: