Fixing UNICODE-ASCII-Arabic conversion problems

UNICODE-ASCII-Arabic and conversion problems? Use this algorithm


UNICODE to ASCII

So, you've used Microsoft Office applications like Excel or Access and SQL Server to store and retrieve Arabic text. There are times though, when your data seems to be corrupt or you get dummy Latin characters when you're expecting the correct Arabic text. Providing configuration information regarding how to set SQL Server, Excel or Access to hold Arabic information correctly is not the topic of interest for this blog post. Instead, I'd like to provide you with an algorithm and sample VBA code to help you convert the Latin text back to Arabic.

There are many reasons for the Arabic text to appear in Latin. One of the reasons may be your SQL Server Collation settings, or the fact that you have used a VARCHAR (or in this context CHAR, TEXT, etc.) when you're supposed to use NVARCHAR (or NTEXT, NCHAR, etc.). The N prefix in the type makes sure that the information in the column is stored in UNICODE and hence, your Arabic text will surely display exactly as it is everywhere. However, if you are getting something like ÇáßãÈíæÊÑ when you are supposed to get الكمبيوتر , then it's already too late.

Good news! I have figured out the conversion table between such Latin characters and the corresponding Arabic text in Unicode. The algorithm is simple, there is a difference in the ASCII codes between the Latin text and those in Arabic (based on code page 1256: Arabic Windows). Unfortunately, this is not a constant value, as the order of characters is not the same either. Hence, in order to convert those Latin letters into UNICODE, all you have to do is scan the text taking one character at a time, then get the UNICODE value for the character and add the deficit corresponding the range value from the table below:

Update on July 16th, 2011: Table has been updated with more characters. I have no time at the moment to update the code. But you can easily add the missing ones. I have also attached an Excel Sheet with the conversions.

Conversion Table

ASCII range for Latin Text ASCII range for Latin text (in hex) Deficit between ASCII & Unicode
From To From To
129 129 81 81 1533
138 138 8A 8A 1519
141 141 8D 8D 1529
142 142 8E 8E 1546
143 143 8F 8F 1529
144 144 90 90 1567
152 152 98 98 1553
154 154 9A 9A 1527
159 159 9F 9F 1563
170 170 AA AA 1556
191 191 BF BF 1376
192 192 C0 C0 1537
193 214 C1 D6 1376
216 219 D8 DB 1375
220 223 DC DF 1380
225 225 E1 E1 1379
227 230 E3 E6 1378
236 237 EC ED 1373
240 243 F0 F3 1371
245 246 F5 F6 1370
248 248 F8 F8 1369
250 250 FA FA 1368

Here's a VBA code you can use within Excel or Access to convert the Latin Text to Arabic. I have used Excel as an example:

Sub convert2ara()
  For Each s In Selection
     strNew = ""
     For i = 1 To Len(s.Text)
        j = AscW(Mid(s.Text, i, 1))
        Select Case j
           Case &HFA: j = j + 1368
           Case &HF8: j = j + 1369
           Case &HF5 To &HF6: j = j + 1370
           Case &HF0 To &HF3: j = j + 1371
           Case &HEC To &HED: j = j + 1373
           Case &HC0 To &HD6, &HBF, &HC1: j = j + 1376
           Case &HD8 To &HDB: j = j + 1375
           Case &HE1: j = j + 1379
           Case &HE3 To &HE6: j = j + 1378
           Case &HDC To &HDF: j = j + 1380
           Case &H8A: j = j + 1519
           Case &H9A: j = j + 1527
           Case &H8D, &H8F: j = j + 1529
           Case &H81: j = j + 1533
           Case &HC0: j = j + 1537
           Case &H8E: j = j + 1546
           Case &H98: j = j + 1553
           Case &HAA: j = j + 1556
           Case &H9F: j = j + 1563
           Case &H90: j = j + 1567
        End Select
        strnew = strnew & ChrW(j)
     Next
     Cells(s.Row, s.Column + 1) = strnew
  Next
End Sub

Here's also a T-SQL Stored Procedure that does the same thing:

CREATE PROCEDURE Convert2Ara
  @Latin VarChar(100),
  @Arabic NVarChar(100) = N'' OUTPUT
AS
BEGIN
  DECLARE @ind int, @Len int, @Src int
  SET @Len = Len(@Latin)
  SET @ind = 1
  WHILE @ind <= @Len
  BEGIN
     SET @Src = ASCII(SUBSTRING(@Latin, @ind, 1))
     Set @Src = CASE
        WHEN @Src = 250 THEN @Src + 1368
        WHEN @Src = 248 THEN @Src + 1369
        WHEN @Src BETWEEN 245 AND 246 THEN @Src + 1370
        WHEN @Src BETWEEN 240 AND 243 THEN @Src + 1371
        WHEN @Src BETWEEN 236 AND 237 THEN @Src + 1373
        WHEN @Src BETWEEN 216 AND 219 THEN @Src + 1375
        WHEN @Src BETWEEN 193 AND 214 OR @Src=191 THEN @Src + 1376
        WHEN @Src BETWEEN 227 AND 230 THEN @Src + 1378
        WHEN @Src = 225 THEN @Src + 1379
        WHEN @Src BETWEEN 220 AND 223 THEN @Src + 1380
        WHEN @Src = 138 THEN @Src + 1519
        WHEN @Src = 154 THEN @Src + 1527
        WHEN @Src = 141 OR @Src=143 THEN @Src + 1529
         WHEN @Src = 129 THEN @Src + 1533
         WHEN @Src = 192 THEN @Src + 1537
         WHEN @Src = 142 THEN @Src + 1546
         WHEN @Src = 152 THEN @Src + 1553
         WHEN @Src = 170 THEN @Src + 1556
         WHEN @Src = 159 THEN @Src + 1563
         WHEN @Src = 144 THEN @Src + 1567
     END
     SET @Arabic = @Arabic + NCHAR(@Src)
     SET @ind = @ind + 1
  END
END

Furthermore, below is a T-SQL Scalar-Valued, User-Defined Function that does the same thing:

CREATE FUNCTION [dbo].[Convert2Ara]
(
   @Latin VarChar(255)
)
RETURNS NVarChar(255)
AS
   BEGIN
       DECLARE @ind int, @Len int, @Src int, @Arabic NVarChar(255) = N''
       SET @Len = Len(@Latin)
       SET @ind = 1
       WHILE @ind <= @Len
       BEGIN
           SET @Src = ASCII(SUBSTRING(@Latin, @ind, 1))
           Set @Src = CASE
           WHEN @Src = 250 THEN @Src + 1368
           WHEN @Src = 248 THEN @Src + 1369
           WHEN @Src BETWEEN 245 AND 246 THEN @Src + 1370
           WHEN @Src BETWEEN 240 AND 243 THEN @Src + 1371
           WHEN @Src BETWEEN 236 AND 237 THEN @Src + 1373
           WHEN @Src BETWEEN 216 AND 219 THEN @Src + 1375
           WHEN @Src BETWEEN 193 AND 214 OR @Src=191 THEN @Src + 1376
           WHEN @Src BETWEEN 227 AND 230 THEN @Src + 1378
           WHEN @Src = 225 THEN @Src + 1379
           WHEN @Src BETWEEN 220 AND 223 THEN @Src + 1380
           WHEN @Src = 138 THEN @Src + 1519
           WHEN @Src = 154 THEN @Src + 1527
           WHEN @Src = 141 OR @Src=143 THEN @Src + 1529
           WHEN @Src = 129 THEN @Src + 1533
           WHEN @Src = 192 THEN @Src + 1537
           WHEN @Src = 142 THEN @Src + 1546
           WHEN @Src = 152 THEN @Src + 1553
           WHEN @Src = 170 THEN @Src + 1556
           WHEN @Src = 159 THEN @Src + 1563
           WHEN @Src = 144 THEN @Src + 1567
       END
       SET @Arabic = @Arabic + NCHAR(@Src)
       SET @ind = @ind + 1
   END
   RETURN @Arabic
END