Dealing with Bilingual Data
As technological advances make it possible for more businesses to enter the global marketplace, it’s increasingly important for corporate databases to support international communications. For employees to read and respond to client e-mails or post Web pages in multiple languages, companies must use a language standard that supports a wide variety of international characters.
To ensure important multilingual data are accurately represented, most companies now use the Unicode Standard when building corporate databases. Databases that are Unicode-compliant support text and symbols from most of the world’s major languages, empowering users to do business with people around the globe.
Although most new databases use Unicode, many legacy databases have a limited capability to recognize foreign characters.
Older databases created in the American Standard Code for Information Interchange (ASCII) or Latin-1 (the European standard) turn data stored in languages such as Chinese or Arabic into noise, making it impossible for support representatives to meet the needs of many international clients.
In most cases, a simple conversion program is all that’s need to make a database Unicode-compliant. This upgrade is important for any business hoping to compete on an international scale, said William Wong, director of engineering for Language Weaver, a company that develops translation software.
Although the old standards, at one point, might have been good enough for doing business, the growth of global business is making this translation necessary for both small and enterprise businesses, he said.
“In the past, you had many choices, and none of them were unified,” Wong said. “These days, the database administrator is fortunate enough to be able to use Unicode. In this day and age, even a simple-access database should be multilingual and Unicode-compliant.”
With the right software, the translation process is relatively easy. This does not, however, solve the problem — the company’s Web sites, workstations, laptops and archives also need to be Unicode-compliant. Even the applications workers use to look at these data (such as a word-processing program) need to read Unicode.
Yet, this is not the hardest part. One of the most difficult challenges for a database administrator is dealing with font encoding, Wong said. When a foreign language e-mail or Web page is created using a customized font, it often creates problems for databases, even when they are Unicode-compliant, he explained. Because there is no standard for font encoding, translating these data can be very complicated and costly.
Unless these data are extremely valuable, organizations generally have to discard the information and start from scratch.
“If somebody wrote something to you in Hindi, and you see it as gibberish in Arial font, they might have written it to you in the font DV-TTYogesh,” he said. “Because your font is Arial, you’re not able to render or draw that font for you to look at in Hindi, and that’s a problem because there is no unifying standard for fonts. So, if you stored all your e-mails in DV-TTYogesh font, those data are completely useless, and it’s very hard to convert back. Basically, you lose a lot of information that way.”
Proper sorting techniques also cause problems for database administrators working with multilingual data. Languages such as Chinese, which have no alphabetical order to the characters, create confusion within the database. Many administrators address this issue by sorting characters based on their placement in the Unicode Standard, but this can offend people whose names are misplaced on a list, Wong said.
The best way for administrators to deal with many language-based issues is to work with their company’s database vendor, he said. Discovering your database has inaccurately mapped characters or improperly sorted names can be frustrating to a company that has invested several years in developing an enterprise-level database, but Wong said dealing with these problems as they arise is important to ensure business success.
“The question is: Do you really want to look at the 500-pound gorilla in the room?” he said. “You try not to look because you’ve invested so much money into this process, and most of the time it works great, except when these darn Europeans are sending you e-mails. The problem is, most of our business these days isn’t from the U.S. — it’s really balanced by the global business environment and as a result, sooner or later, you’re going to have to make the investment to fix the problem.”