-
Rails And JSON Containing Unicode Characters
Posted on August 27th, 2008 9 commentsAs I mentioned in a previous blog post, Rails 2.1 natively supports incoming JSON requests. Unfortunately, it still struggles with JSON data containing non-ASCII characters.
According to the JSON spec, JSON fully supports UTF-8 encoded text, so with a few exceptions it generally should not be necessary to escape non-ASCII characters with \u Unicode escape sequences. However, many JSON libraries appear to escape all non-ASCII text in this fashion. This in itself should not be a problem, but
ActiveSupport::JSONcurrently does not properly parse JSON containing \u escapes, resulting in strings with literal \u escape sequences rather than the desired UTF-8 encoded characters. This is especially confusing sinceActiveSupport:JSONitself encodes all non-ASCII characters as \u escapes, so one might think that the reverse transformation yields the original data. But this behavior is likely explained by an odd implementation choice for its decoder: Rather than using thejson(orjson-pure) library, it converts the JSON data to YAML and then uses the YAML library to decode the data into Ruby objects.Monkey-patching to the rescue! I decided to replace
ActiveSupport::JSON::decodewith an implementation that uses thejsonlibrary. The easiest way is to stick the following code into a file named something likeactivesupport_json_unicode_patch.rbinside theconfig/initializers/directory, where Rails will automatically pick it up.require 'json' module ActiveSupport module JSON def self.decode(json) ::JSON.parse(json) end end end
You can verify the fix by adding a test case (I added a file named
activesupport_json_test.rbto thetest/unit/directory):require File.dirname(__FILE__) + '/../test_helper' class ActiveSupportJsonTest < Test::Unit::TestCase def test_json_encoding unicode_escaped_json = '{"foo":"G\u00fcnter","bar":"El\u00e8ne"}' hash = ActiveSupport::JSON.decode(unicode_escaped_json) assert_equal({'foo' => 'Günter', 'bar' => 'Elène'}, hash) end end
This test should fail without the patch and pass after adding it.
In addition to fixing the JSON / Unicode problem, this patch should also provide a nice speed boost, as we’re replacing the somewhat roundabout YAML based JSON decode method with a native one (particularly if you’re using the native
jsonimplementation rather thanjson-pure.)9 responses to “Rails And JSON Containing Unicode Characters”

-
Nicely done. I get the feeling there is still a lot most of us (myself especially) have to learn about doing l18n properly in Rails 2.1
-
I had a similar problem with the flickraw library. Internally it used YAML to parse JSON which completely failed on Unicode.
In my case I modified the library to use JSON. Will email the author once I’m happy it’s not created any new problems.
-
Nice – but the required json gem overrides the to_json methods added by ActiveSupport. So if your are calling to_json (with AS defined options) on your rails objects requiring the gem will break your app. Any solutions or alternatives for that?
See also http://groups.google.com/group/rubyonrails-core/browse_thread/thread/54e5453eaac6687b
-
I found another solution which seems to work. Instead of using the json gem (see my comment above). I’m using a unicode unescape method found at http://d.hatena.ne.jp/cesar/20070401/p1 and override the decode function like that:
module ActiveSupport module JSON class << self alias orig_decode decode def decode(json) orig_decode(MyUnicode.unescape(json)) end end end end(Note: Renamed the ‘Unicode’ module from the link above to ‘MyUnicode’ – just to prevent any naming troubles)
-
moelee April 21st, 2009 at 15:17
Looks like this monkey patch breaks the ability of ActiveResource to parse timestamped attributes (i.e. created_at, updated_at) into Time objects. A fix for this is much needed in the next version of Rails
-
Arghh! they fixed the decoding of Unicode chars, but to a certain extent…
It has some issues still, here is the problem illustrated, check this out: http://pastie.org/497986
Leave a reply
-


glenn August 28th, 2008 at 04:07