Jump to content
  • 0

[DEV] Unicode support


MX_Master

Question

I was interested in inplementing unicode support to MTA one week ago and i have some progress at this moment. I just want to ask.. is there anybody else, who also works on this MTA feature at this moment? May be we must to unite our efforts.. Just one thing, i'm PHP programmer and that is why i asking for C++ help.

In ideal way we will get unicode support, which allow to use any chars. For example, if we want to use unicode chars in lua scripts, we must to save our scripts in UTF8 encoding, that's all (:

Here is my small plan for implementing unicode support:

  • (done) - 100%
    Safety unlock of unicode symbols ranges for all of standart MTA fonts. MTA uses TAHOMA, TAHOMA BOLD, VERDANA, own SANS font, own GTA GOTHIC font, own GTA HEADER font.
    .
  • (done) - 100%
    Change CEGUI's chars injecting mechanism to another one, which can convert client keyboard's 1-byte keys to unicode chars (UTF16, unsigned int) knowing client's active keyboard layout.
    .
  • (done) - 100%
    Change CEGUI's strings displaing mechanism to another one, which can convert input text from UTF8 encoding to UTF16 (unsigned int)
    .
  • (just planned) - 0%
    Looking for chat displaing mechanism and add some info about chat's unicode support in this text. Only one thing i know at this moment, that in chat now we can type chars with code ranges >= 32.

Any good comments and C++ help will be great.

P.S.: my original language is russian

Also i want to show some test screens:

  1. http://www.mxgames.kz/images/2010-05-14_222350.jpg
  2. http://www.mxgames.kz/images/2010-05-16_232759.jpg
  3. (new) http://mxgames.kz/images/2010-07-07_211358.jpg
  4. (new) http://mxgames.kz/images/2010-07-08_104334.jpg

Edited by Guest
Link to comment

25 answers to this question

Recommended Posts

  • 0

Hi MX_Master

I don't know of anyone else working on Unicode support. The matter is complex and due to the nature of the hardcoded ASCII nature of MTA, implementing unicode is a massive task. However, I have looked into the matter in order to find the simplest way to Unicode working in both script, and chat.

I was hoping to look for a solution for the next upcoming major MTA release. I devised a plan which would be (imo) the easiest way to get unicode support, but would be quite hacky:

  • As you said, unlock Unicode for MTA fonts. This would allow dxDrawText to print unicode.
  • Enable natural unicode display in CEGUI
  • Enable input of CEGUI into Edit Boxes.
  • Re-Write the MTA chatbox using Lua. We have a complete function set that should allow us to replicate the chatbox exactly (including client settings). This chatbox would have hacks in place to allow Unicode.
  • Implement Unicode Identifiers

In order to allow the chatbox to work under Lua, i was thinking of inventing a hacky Unicode-ASCII syntax which can be encoded and decoded and safely sent via triggerClient/ServerEvent. This would, for example, convert "То был" to something like "utf-16[1412,1241,0032,1245,2135,6432,2135]" (The IDs are incorrect, that was just an example). MTA would provide default utfEncode and utfDecode functions. I was unsure of how CEGUI and dxDrawText would deal with unicode, but if it proved to be a problem, we'd use the Unicode-ASCII syntax and decode it at the very lowest level.

To detail the functioning of the chatbox, since dxDrawText would support unicode, it would accept input via CEGUI (most custom chatbox scripts do this already) and draw it onto the input line of the chatbox. When you press Enter, it sends the message as a Unicode-ASCII syntax to other clients who then decode it and draw it into chat history.

Bear in mind, I've actually done nothing towards putting this in place. This was just some brainstorming i did. It's very relieving that someone has some interest in getting unicode working. If you're interested in my proposal for the custom Chatbox, I would be happy to begin writing this in Lua for you now, which would save you some time.

Lastly, if you're interested, you could be granted SVN access to our Google Code to work on a Unicode branch. That way we could follow your progress and help and provide feedback, and you could use the real MTA code. Just catch me on IRC (irc://irc.multitheftauto.com) and we could discuss it further.

Thanks

Dan

Link to comment
  • 0

If we make Lua support wide chars as the link you referenced said, couldn't we convert other strings in our netcode to UTF-8 too - rather than the hacky method? There can't be a huge number, and support could be added gradually if it really is an issue.

Maybe I'm missing something.

Also from my limited understanding, isn't UTF-8 preferable to UTF-16 in that it's backwards compatible with ASCII?

Link to comment
  • 0

My dev priority was pointed only to unicode support in CEGUI. Only after this we can make something like own custom chatboxes with unicode support.

Unicode identifiers mainly is not so important for mta scripters, because lua/mta functions is not so hard to remember. I think that lua/mta identifiers must use only ASCII. It will granted better understanding of lua code for other scripters.

About UTF8 and UTF16:

  • UTF8 is more preferable. UTF8 strings which uses only ASCII chars looks exactly like ASCII strings (same size, same chars).
  • UTF16 uses for any char - two bytes, even for ASCII char.
  • UTF8 uses for ASCII char - only 1 byte, for unicode chars - from 1 to 6 bytes

Link to comment
  • 0

Another question to all:

If we will load all of unicode glyths for all MTA fonts, MTA client will need ~1 Gb of RAM !! At this momonet i don't know how to load CEGUI fonts dinamically. We need to load at client startup just some of unicode glyths and not for all MTA fonts. I think, for most used fonts like Tahoma, Verdana (used for GUI windows) we must load some custom ranges of glyths. For SA GOTHICS and sans, i think, we can load only this glyths range 32..127 (from ASCII).

  • I just want to know which languages MTA must support?
  • Also i want to know which glyths ranges we must load? For example, russian glyths range is 1040..1103.

Link to comment
  • 0

Can we load glyphs on demand and keep a cache of them? I believe other systems do this.

Alternatively, perhaps we can have a language chooser that affects the glyph sets that are loaded. That could also lead to localisation eventually.

Link to comment
  • 0

Priorities I guess would be Cyrillic and Arabic. Less important are Chinese (simplified?) and Japanese. Supporting Cyrillic and Arabic would probably offset most of the complaints we have.

Link to comment
  • 0

Just two problems I have now:

Can't merge `trunk` into `Unicode` branch to have the latest changes in the `Unicode` branch too. Any type of merging gives errors. Even if branch have the fisrt revision when it was created, merging still gives errors. :|

Can't connect to loacal testing server by just compiled client. Error "No such mod installed (deathmatch)". I think it's client error, which tells that `(MTA dir)/mods/deathmatch/Client.dll` is not compatible (??) with client. But this DLL was compiled from source too. No changes was made for project `Client - Deathmatch`. Any other `Client.dll` just gives me a constant window with message `Entering the game ...`. :?

Link to comment
  • 0

Sounds epic on this developmemt, since there're about millions(or more) ppls playing gta:sa, and about 300 on sa:mp. in chinese.

I'd like to see unicode supported in mta gui, so that I have the thance to script native-lang in mta for attracting samper to mtaer.

Link to comment
  • 0

As i can see, you are from China. Can you tell me your most used encoding code-page for Windows? Also i need a list of Chinese Unicode chars with Unicode codes which uses this encoding code-page.

Number of GUI fonts chars is limited to RAM, that is why we can add not all Unicode chars, but some of most used.

Link to comment
  • 0
As i can see, you are from China. Can you tell me your most used encoding code-page for Windows? Also i need a list of Chinese Unicode chars with Unicode codes which uses this encoding code-page.

Number of GUI fonts chars is limited to RAM, that is why we can add not all Unicode chars, but some of most used.

Sorry if I don't know much about encoding.

most used:GBK, BIG 5

unicode char list/map: http://www.khngai.com/chinese/charmap/tbluni.php?page=0

More info: http://www.khngai.com/chinese/charmap/

Could be cool if any information helpful.

Link to comment
  • 0

As there are 20986 Chinese characters in CJK Charsets,so it's very wasteful for loading all characters into RAM.

It proposes that you should establish a Cache Buffering for fonts' Render.

I think the buffer size of 512 * 512 is enough. If the font size of 20*20 needs to be rendered, the 512*512 buffer area size will be contained 655 characters. You see, about 2000 Chinese are used in daily life,and less than 500 chinese are displayed in a screen at the same time.

Implementation Method is very simple. Create [buffer Index] <---> [unicode] Doubly linked list,then you only maintenance the buffer list according to the frequence of character usage.

Represents only my suggestion is for reference only.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...