[DEV] Unicode support

MX_Master · May 26, 2010

I was interested in inplementing unicode support to MTA one week ago and i have some progress at this moment. I just want to ask.. is there anybody else, who also works on this MTA feature at this moment? May be we must to unite our efforts.. Just one thing, i'm PHP programmer and that is why i asking for C++ help.

In ideal way we will get unicode support, which allow to use any chars. For example, if we want to use unicode chars in lua scripts, we must to save our scripts in UTF8 encoding, that's all (:

Here is my small plan for implementing unicode support:

(done) - 100%
Safety unlock of unicode symbols ranges for all of standart MTA fonts. MTA uses TAHOMA, TAHOMA BOLD, VERDANA, own SANS font, own GTA GOTHIC font, own GTA HEADER font.
.
(done) - 100%
Change CEGUI's chars injecting mechanism to another one, which can convert client keyboard's 1-byte keys to unicode chars (UTF16, unsigned int) knowing client's active keyboard layout.
.
(done) - 100%
Change CEGUI's strings displaing mechanism to another one, which can convert input text from UTF8 encoding to UTF16 (unsigned int)
.
(just planned) - 0%
Looking for chat displaing mechanism and add some info about chat's unicode support in this text. Only one thing i know at this moment, that in chat now we can type chars with code ranges >= 32.

Any good comments and C++ help will be great.

P.S.: my original language is russian

Also i want to show some test screens:

Edited July 13, 2010 by Guest

toneysix · May 26, 2010

Very nice, We really need this support now. I'm russian too, i hope it will release soon. Thanks.

Regards Nikita.

darkdreamingdan · May 26, 2010

Hi MX_Master

I don't know of anyone else working on Unicode support. The matter is complex and due to the nature of the hardcoded ASCII nature of MTA, implementing unicode is a massive task. However, I have looked into the matter in order to find the simplest way to Unicode working in both script, and chat.

I was hoping to look for a solution for the next upcoming major MTA release. I devised a plan which would be (imo) the easiest way to get unicode support, but would be quite hacky:

As you said, unlock Unicode for MTA fonts. This would allow dxDrawText to print unicode.
Enable natural unicode display in CEGUI
Enable input of CEGUI into Edit Boxes.
Re-Write the MTA chatbox using Lua. We have a complete function set that should allow us to replicate the chatbox exactly (including client settings). This chatbox would have hacks in place to allow Unicode.
Implement Unicode Identifiers

In order to allow the chatbox to work under Lua, i was thinking of inventing a hacky Unicode-ASCII syntax which can be encoded and decoded and safely sent via triggerClient/ServerEvent. This would, for example, convert "То был" to something like "utf-16[1412,1241,0032,1245,2135,6432,2135]" (The IDs are incorrect, that was just an example). MTA would provide default utfEncode and utfDecode functions. I was unsure of how CEGUI and dxDrawText would deal with unicode, but if it proved to be a problem, we'd use the Unicode-ASCII syntax and decode it at the very lowest level.

To detail the functioning of the chatbox, since dxDrawText would support unicode, it would accept input via CEGUI (most custom chatbox scripts do this already) and draw it onto the input line of the chatbox. When you press Enter, it sends the message as a Unicode-ASCII syntax to other clients who then decode it and draw it into chat history.

Bear in mind, I've actually done nothing towards putting this in place. This was just some brainstorming i did. It's very relieving that someone has some interest in getting unicode working. If you're interested in my proposal for the custom Chatbox, I would be happy to begin writing this in Lua for you now, which would save you some time.

Lastly, if you're interested, you could be granted SVN access to our Google Code to work on a Unicode branch. That way we could follow your progress and help and provide feedback, and you could use the real MTA code. Just catch me on IRC (irc://irc.multitheftauto.com) and we could discuss it further.

Thanks

Dan

eAi · May 27, 2010

If we make Lua support wide chars as the link you referenced said, couldn't we convert other strings in our netcode to UTF-8 too - rather than the hacky method? There can't be a huge number, and support could be added gradually if it really is an issue.

Maybe I'm missing something.

Also from my limited understanding, isn't UTF-8 preferable to UTF-16 in that it's backwards compatible with ASCII?

MX_Master · May 27, 2010

My dev priority was pointed only to unicode support in CEGUI. Only after this we can make something like own custom chatboxes with unicode support.

Unicode identifiers mainly is not so important for mta scripters, because lua/mta functions is not so hard to remember. I think that lua/mta identifiers must use only ASCII. It will granted better understanding of lua code for other scripters.

About UTF8 and UTF16:

UTF8 is more preferable. UTF8 strings which uses only ASCII chars looks exactly like ASCII strings (same size, same chars).
UTF16 uses for any char - two bytes, even for ASCII char.
UTF8 uses for ASCII char - only 1 byte, for unicode chars - from 1 to 6 bytes

MX_Master · June 9, 2010

Another question to all:

If we will load all of unicode glyths for all MTA fonts, MTA client will need ~1 Gb of RAM !! At this momonet i don't know how to load CEGUI fonts dinamically. We need to load at client startup just some of unicode glyths and not for all MTA fonts. I think, for most used fonts like Tahoma, Verdana (used for GUI windows) we must load some custom ranges of glyths. For SA GOTHICS and sans, i think, we can load only this glyths range 32..127 (from ASCII).

I just want to know which languages MTA must support?
Also i want to know which glyths ranges we must load? For example, russian glyths range is 1040..1103.

eAi · June 9, 2010

Can we load glyphs on demand and keep a cache of them? I believe other systems do this.

Alternatively, perhaps we can have a language chooser that affects the glyph sets that are loaded. That could also lead to localisation eventually.

MX_Master · June 9, 2010

At this momonet i don't know how to load CEGUI fonts dinamically.

Up to date CEGUI can load fonts with their glyths dinamically. I saw some examples on their offiacial wiki.

And what about languages list?

eAi · June 9, 2010

Priorities I guess would be Cyrillic and Arabic. Less important are Chinese (simplified?) and Japanese. Supporting Cyrillic and Arabic would probably offset most of the complaints we have.

MX_Master · June 9, 2010

It will be great if somebody who knows Arabic, Chinese or Japanese will give here unicode glyths codes which they uses.

eAi · June 9, 2010

http://en.wikipedia.org/wiki/Arabic_alphabet

http://en.wikipedia.org/wiki/Cyrillic_alphabet

MX_Master · June 9, 2010

OK. Also I will use windows charmap utility.

eAi · June 9, 2010

Those pages tell you the unicode character space for each language - what more do you need?

MX_Master · June 22, 2010

Just two problems I have now:

Can't merge `trunk` into `Unicode` branch to have the latest changes in the `Unicode` branch too. Any type of merging gives errors. Even if branch have the fisrt revision when it was created, merging still gives errors.

Can't connect to loacal testing server by just compiled client. Error "No such mod installed (deathmatch)". I think it's client error, which tells that `(MTA dir)/mods/deathmatch/Client.dll` is not compatible (??) with client. But this DLL was compiled from source too. No changes was made for project `Client - Deathmatch`. Any other `Client.dll` just gives me a constant window with message `Entering the game ...`.

DEFCON1 · June 22, 2010

Why not compiling the server as well ? Just build the whole solution and it should work

MX_Master · June 22, 2010

Yes, whole compiled solution. But it not so important, whole or not, promlem still here.

MX_Master · July 3, 2010

Talidan, can you recreate `Unicode` branch from the HEAD `trunk` ? Current changes i will add after that. Some of them is not correct and it will be changed in right way. Also first revision of branch was with not mine errors.

yezizhu · July 11, 2010

Sounds epic on this developmemt, since there're about millions(or more) ppls playing gta:sa, and about 300 on sa:mp. in chinese.

I'd like to see unicode supported in mta gui, so that I have the thance to script native-lang in mta for attracting samper to mtaer.

MX_Master · July 13, 2010

As i can see, you are from China. Can you tell me your most used encoding code-page for Windows? Also i need a list of Chinese Unicode chars with Unicode codes which uses this encoding code-page.

Number of GUI fonts chars is limited to RAM, that is why we can add not all Unicode chars, but some of most used.

yezizhu · July 15, 2010

As i can see, you are from China. Can you tell me your most used encoding code-page for Windows? Also i need a list of Chinese Unicode chars with Unicode codes which uses this encoding code-page.
Number of GUI fonts chars is limited to RAM, that is why we can add not all Unicode chars, but some of most used.

Sorry if I don't know much about encoding.

most used:GBK, BIG 5

unicode char list/map: http://www.khngai.com/chinese/charmap/tbluni.php?page=0

More info: http://www.khngai.com/chinese/charmap/

Could be cool if any information helpful.

mk124 · July 19, 2010

As there are 20986 Chinese characters in CJK Charsets,so it's very wasteful for loading all characters into RAM.

It proposes that you should establish a Cache Buffering for fonts' Render.

I think the buffer size of 512 * 512 is enough. If the font size of 20*20 needs to be rendered, the 512*512 buffer area size will be contained 655 characters. You see, about 2000 Chinese are used in daily life,and less than 500 chinese are displayed in a screen at the same time.

Implementation Method is very simple. Create [buffer Index] <---> [unicode] Doubly linked list,then you only maintenance the buffer list according to the frequence of character usage.

Represents only my suggestion is for reference only.

mk124 · July 21, 2010

Sorry, I forgot.

codepage=936 GBK (Simplified Chinese)

http://www.unicode.org/Public/MAPPINGS/ ... /CP936.TXT

codepage=950 BIG5 (Traditional Chinese)

http://www.unicode.org/Public/MAPPINGS/ ... /CP950.TXT

codepage=932 SJIS (Japanese)

http://www.unicode.org/Public/MAPPINGS/ ... /CP932.TXT

Edited July 21, 2010 by Guest

MX_Master · July 21, 2010

If you know how to do a cashing method just make a patch to source. If not i think we can't add all of chinese unicode chars to startup fonts creation.

MX_Master · July 22, 2010

Chat strings always stores as arrays of type char. What will be better - using wchar_t arrays OR something else?

darkdreamingdan · August 16, 2010

Hey there, any luck in trying to get dxDrawText to work with unicode characters?

You'd need to look at dxGetTextWidth and such too.

MX_Master · August 17, 2010

It's harder then GUI.. because CEGUI originally supported unicode. All of DX functions/methods uses strings which contains only 1-byte chars, but we needs at list two bytes per char. It's not a little work which needs more C++ brains (:

[DEV] Unicode support

Question

Link to comment

25 answers to this question

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members