Advanced Rails - Building Industrial-Strength Web Apps in Record Time

(Tuis.) #1

242 | Chapter 8: i18n and L10n


ActiveSupport::Multibyte


In lieu of complete multibyte character support in Ruby 1.8, Rails has created a
workaround. We touched on this solution, ActiveSupport::Multibyte, back in
Chapter 2. Here, we will explore it in more detail.


Recall that the global variable$KCODEdetermines the current character encoding, and
thus influences how Ruby treats your strings. In Rails 1.2 and later, Initializer sets
$KCODE to'u', so all processing is assumed to be in UTF-8 unless otherwise specified.


Rails includes a library called ActiveSupport::Multibyte that provides a way to deal
with multibyte characters on top of Ruby. At this time, only UTF-8 is supported. The
encoding is derived from the current value of$KCODE.


Multibyte adds aString#charsinstance method, which returns a proxy (of type
ActiveSupport::Multibyte::Chars) to that string. This proxy delegates to a handler,
depending on the current encoding. (Right now, the only handlers are a UTF-8 hand-
ler for$KCODE = 'u'and a pass-through handler for everything else.) TheCharsobject
usesmethod_missingto trap unknown calls and send them to the handler. If the
handler cannot deal with them, they are sent to the originalString.


The most important feature Multibyte provides is the ability to split strings on char-
acter boundaries, rather than byte boundaries. All you need to do is call the
String#chars method and optionally convert back to aString when you are done:


$KCODE = 'u'

str = "résumé" # => "résumé"

str[0..1] # => "r\303"
str.chars[0..1].to_s # => "ré"

Multibyte also provides case conversion, which can differ vastly among languages:


str.upcase # => "RéSUMé"
str.chars.upcase.to_s # => "RÉSUMÉ"

And method calls tocharscan be chained, as theCharsmethods return aChars
object rather thanStrings. Even methods that are proxied back to the originalString
have theirString return values converted toChars objects.


str.chars[0..1].upcase.to_s # => "RÉ"

The implementation of Multibyte is itself fascinating; the tables of composition
maps, codepoints, case maps, and other details are generated automatically from
tables at the Unicode Consortium web site and stored inactive_support/values/
unicode_tables.dat. The generator can be found in active_support/multibyte/
generators/generate_tables.rb.

Free download pdf