Unicode Explained

| No Comments

Unicode Explained
Jukka K. Korpela
O'Reilly Media, 2006
ISBN 0-596-10121-X
US$ 59.99

Rating: 4/5 (very good)

Now that the IT world is moving towards Unicode, there are usally two ways a computer programmer thinks about the role Unicode has in his life: the first is "I need to know nothing, the environment will do everything by itself," while the second is "hey man, it's just a character set, I already know everything I need". Well, there's also a third "I don't care and I will use US-ASCII forever" option, but I presume the reader of this review is somehow interested in Unicode.

Let's talk about the second class of programmers. It's not just character set, in fact it isn't at all. Also, the character sets which encode Unicode characters (i.e. UTF-8) are different from the older ones in many ways: in most cases they go beyond the old byte=character relation; moreover, they're becoming the standard towards which everyone is moving, and so it's a thing one can't escape from.
Now to the first way of thinking: unfortunately it's not so easy, as the environment will support Unicode but it's still not a default in most places, so it's up to you to choose the correct settings. Moreover, when migrating to Unicode the most trouble is with the legacy data you have, which is still in Latin 1 or other formats (the good part is that if it's in US-ASCII then it's also in UTF-8).

Unicode Explained, in not less than 660 pages, tells you everything, but really everything, you need (and also something you don't need) about the Unicode world. By reading the first part of this book you'll also learn something about the "old" character codes, as characters, encodings, fonts and many related concepts are explained in great detail.

The subsequent parts of the book teaches Unicode, and it's quite useful to destroy some widespread misconceptions, such as that Unicode chars are 2 bytes long, or that UTF-8 is a newer version of Unicode. This part is also great in helping you choose the correct Unicode encoding between UTF-8 (the most common choice), UTF-32, UTF-16 and others. There are also suggestions on what should be represented at character level and what should be at markup level: the discussion of this topic is quite interesting, as the decision can sometimes be difficult, depending on the context and the working environment.
Last comes the part related to programming, which is relatively small but shows the status of Unicode in all the most known languages (including Perl), and it contains tips & caveats related to its usage.

All in all, this book is great for who wants to know the details of Unicode. By reading Unicode Explained you can dig to various levels of knowledge: some of them are so deep that knowing them is just a matter of personal satisfaction, but most of them represent an essential knowledge for the programmer who does not want to lag behind.

Leave a comment

About this Entry

This page contains a single entry by Michele Beltrame published on October 3, 2006 3:37 PM.

Pic de Coma Pedrosa was the previous entry in this blog.

We shall overcome is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.



OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.14-en