Lua to machine code compiler - Interest Poll

The_GTA · August 9, 2021

Hello MTA community,

for quite a while I have been thinking about the Lua language. I think that we can all agree on this fact: that the Lua language is simple and beautiful, especially inside MTA San Andreas. Would any of you be interested in a service that lets you compile Lua directly into machine code, for example into a stand-alone PE executable file? I am not talking about srlua or LuaJIT, rather a completely new application/web service where you could queue your Lua code for compilation. There are major benefits to compiling to machine code rather than Lua bytecode. When I'd finish this project then I would implement a C++ interface and port it into MTA aswell.

Benefits of machine code translation

Speed boosts during script execution due to native format of the code
Code cannot be reversed into Lua anymore
Lua code optimizations that the reference Lua compiler does not perform (constant optimizations, function inlining, computation node grouping, loop unrolling, etc)
The ability to market your programs

I am thinking about a future business partnership with the MTA team where electronically-signed machine code translated from Lua would be securely run on MTA clients. The modernization of the Lua component would bring more freedom in designing the future scripting possibilities too.

If you guys voice enough interest for this product then I would be greatly encouraged to create it. There is no such product available yet and it's creation would be both interesting and challenging. What do you think, community member?

- Martin

Patrick · August 10, 2021

Hi!

The idea is really good and would be awesome to have something like that in MTA, since .luac doesn't provide full protection (ofc).

But... I think we don't have it for good reasons, just like bytecode isn't safe and the Lua VM can be exploited "easily".
Thats why Roblox removed bytecode support in 2012: https://blog.roblox.com/2012/08/bye-bye-bytecode/

Could these vulnerabilities be fixed or exists at all with your method of compiling directly to machine-code?

The_GTA · August 11, 2021

15 hours ago, Patrick said:

Hi!

The idea is really good and would be awesome to have something like that in MTA, since .luac doesn't provide full protection (ofc).

But... I think we don't have it for good reasons, just like bytecode isn't safe and the Lua VM can be exploited "easily".
Thats why Roblox removed bytecode support in 2012: https://blog.roblox.com/2012/08/bye-bye-bytecode/

Could these vulnerabilities be fixed or exists at all with your method of compiling directly to machine-code?

The issue with Lua bytecode is that it can express more than can be written down with the Lua syntax. I agree that junking about with the bytecode will just lead to problems so I suggest multiple protections against it:

Reject execution of unsigned Lua bytecode binaries; instead allow self-signed ones for testing purposes
For special cases where "exploit-y" bytecode is used we slow down execution with safeguards such as Lua stack virtualization with pre-and-post zeroing-guard

No Lua VM will be shipped with my approach. I will be taking personal responsibility and liability for any exploit found within my product that leads to severe casualties such as shellcode execution. I have much experience in this field and am performing research to mitigate any possibility prior to it occuring. Suffice to say I am taking this topic of security very seriously.

Signed code execution is a great security feature. If the MTA team signs all binary scripts that run on both MTA servers and clients then only code that comes from official compilers, not binary/machine code twisters can run on MTA software. It comes with the burden of maintaining security certificates and their recommended properties (due-by date, strong ciphers, etc).

kowalski · August 26, 2021

If it can be made to work by all means it could be useful

The_GTA · August 26, 2021

34 minutes ago, kowalski said:

If it can be made to work by all means it could be useful

Of course it can be made to work! In the past weeks I have been working on an optimal internal representation and calculation dependency graph system for the compiler. I am doing good progress on that part and I want to share some internal details I have set on so far.

(Lua to 80386 machine model example - graphs post - Imgur)

Let's try to compile a very simple Lua program and you will see that it is not such a big deal.

local value

value = 1 + 1

print( value )

return value

If we lex the program into our compiler we can obtain the following graph (pure language structure):

There you see that we have created internal objects for each language structure. Each node that you see in the graph above is a memory-allocated object in a programming language. Our language target is C++ which is very fast and powerful, especially in the latest standard. Our next step is to schedule the graph into an operation order:

Now we have assigned each opcode a natural number so the processor would know in what order to assign so called "instructions". This is important because each program is defined to be a strict sequence of instructions. Now that we have such an order we know in what regions which variables are used inside the program. Thus we know what registers to assign between which operations so that they do not collide. Take a look at this possible 80386 machine model example:

This model is just an example hence not yet detailed enough but it should give you a general idea about the feasibility of the compiler. The programs that result from this compilation can be packaged into PE files or into other formats that support electronic signatures. I hope that I could convince you of my methods!

---

From the above example you can deduce the required components for this task: lexer, computation graph and transformation rules. I am currently working on the lexer.

-ffs-PLASMA · August 26, 2021

Just wondering, how much compatible would such a lua to binary compiler be in terms of:

1) Can I take a 10+ year old lua resource/gamemode (including deprecated functions) and will it convert 100% into binary with updating old functions?

2) How accurate would the convertion of conditions/statements be? Could there be an issue of misinterpreting long/complex operations?

3) Will it optimize variables/functions and remove unused/redundant code? Will it print out warnings/errors/tips and tricks during conversion for the user?

The idea of signed code running on client is a strong point, giving injectors less chance to be successful.

The_GTA · August 26, 2021

2 minutes ago, -ffs-PLASMA said:

Just wondering, how much compatible would such a lua to binary compiler be in terms of:

1) Can I take a 10+ year old lua resource/gamemode (including deprecated functions) and will it convert 100% into binary with updating old functions?

2) How accurate would the convertion of conditions/statements be? Could there be an issue of misinterpreting long/complex operations?

3) Will it optimize variables/functions and remove unused/redundant code? Will it print out warnings/errors/tips and tricks during conversion for the user?

The idea of signed code running on client is a strong point, giving injectors less chance to be successful.

1)

Yes. By completely implementing the Lua 5.1 language this approach can speed up any 10+ years old Lua resource/gamemode. Don't get me wrong, I want to implement the entirety including garbage collection, luac binaries and the API layer for applications like MTA. But there will be limits where security is at risk, mainly the luac binaries that use sequences of opcodes that cannot be generated by the reference Lua 5.1 bytecode compiler (call opcode with stackframe intersection, exploits, etc).

2)

Great care will be taken that the execution order of all statements does perfectly match the order of the Lua 5.1 reference compiler. I will go into the old source code if I must make things clear, and I have already studied the internal workings of it in great detail. Thus NO, there will be no misinterpreting long/complex operations. The compiler will be created with great scientific effort and documentation.

3)

Yes. My goal is to optimize the code as much as possible. I will post about any optimization technique that will be implemented in great detail in this thread once I get to them. Removing unused and redundant code is an important optimization I want to make. An example situation is: if there is code which does not call C functions and whose effects can be isolated to have no influence on code which otherwise would call C API, then the code can be removed. This is possible by inspecting the dependencies of language constructs (arrows in the calculation graph).

I plan to improve the error messages of script compilation and provide warnings where code does not perform well, for example if by static analysis the compiler finds that two booleans would be added together. In Lua 5.1 this code is allowed to compile but once that code is to be executed, the language throws a runtime error. This situation will not change due to compatibility, but a warning will be put out to help devs.

The_GTA · September 9, 2021

An important topic for every to-the-metal developer is memory management. While we like to keep things simple, for example by limiting our data models to trees, we have to admit that in the general case, in all of reality software is a complex graph of data nodes which is held together by so-called "pointers". In the professional business world people like to create complicated proofs on when to release memory nodes in the correct time and place to both gain saving on the memory footprint and on the performance. But can we skip all these custom proofs and get a grip on this issue? Welcome to the world of garbage collection!

The science behind it is simple. We define a graph as a set of nodes (the circles with the letters) and for each node there is a set of edges (the arrows pointing away from each circle). We define a set of interesting nodes. A node is alive if there is a path from one of the interesting nodes to it. Otherwise the node is black, or we could also say dead.

Since programs are mutable in nature the program graph does always change. New nodes are being created that replace older with better ones, convert nodes to more specialized representations and put the older things to rest. Now, if the incoming connections to a node change and the graph has reached the desired state, we could check if the node does not connect to the set of interesting nodes anymore and in that case delete it. There are multiple approaches possible:

1) Perform a complete path scan for every node once the incoming connections of it have changed. The drawback of this method are the performance hits if the graph is growing to unexpected odds. On the other hand, this approach is the most memory saving because you do not have to remember anything once the query is done.

2) Utilize a data-structure which remembers the reachability of the memory nodes from each other, so that reachability can dynamically expand based on new node components that connect to already reachable ones. This is perfect for performance because the path proofs do not have to be computed to the set of interesting nodes but can be short-cut to reachable nodes which are a much bigger set. Drawback is that you have to deal with the additional memory bookkeeping for each node.

3) Limit your program model to simple things like trees where each node has only one input connection so that entire subtrees can be deleted when a node is deleted.

For the Lua machine code compiler I am going to need garbage collection during the lexing, the program model conversion and the Lua runtime. The problems have already been solved using C++ code. I am looking forward to further implementing it!

The_GTA · September 22, 2021

Some time has passed and I would like to give you another update about the current point-of-research. As I have promised in the previous posts I did work on the lexer and have finished it. The lexer is the component of a program which turns human-friendly text into machine-driven data. As an example of its functionality I have made a small and simple XML parser that could rival even TinyXML, if finished. The context-free-grammar (CFG) production rules are as follows:

S -> takefirst spaces (begdesc spaces)^0:1 xmlnode spaces
xmlnode -> '<' spaces begtag:xmltagname spaces attributes spaces '/' spaces '>'
xmlnode -> opentag spaces (child:[<1,e>xmlnode,spaces]|content:text)^0:1 spaces endtag
opentag -> '<' spaces begtag:xmltagname spaces attributes spaces '>'
endtag -> '<' spaces '/' spaces endtag:xmltagname spaces '>'
attribute -> key:keyname spaces '=' spaces value:strtok
attributes -> [<0,e>attribute,spaces]
begdesc -> "<?xml" spaces attributes spaces '?' spaces '>'
spaces -> '\n\r\t '^*
name -> 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_'^1:*
xmltagname -> name
keyname -> name
text -> name

(File Details: /testcompiler/src/lexing_tests.cpp (head) - eircompile (svn) - Eir Compile - OSDN)

The lexer supports the following optimizations:

single-step-optimization or undeniable-change-optimization: objects are passed directly into rules which promise to not modify it if they fail
late-creation-optimization: objects are only created at the point at which are they first really needed

Suffice to say I am very convinced that writing rules for the lexer is better than writing the lexing in C++ code. Not only do you not trash the C++ stack using weird method names but the lexer does use internal optimized allocator-based data structures.

---

I am working on Lua CFG rules but I am not confident enough to share them yet. Instead I want to talk about the next big thing in what makes up a compiler for computers: object-based calculations. Previously I have already mentioned the calculation dependency graph system which is just another way to phrase it. To put text into executable form you need to lex, interpret and calculate it. Interpreting is the way of applying a template with calculatory meaning to language structures. For example, lets look at how we would implement the Lua + operator (proof-of-concept):

typedef variant <bool, number, object*> TValue;	// garbage-collector-ref since it has a pointer to object

...

operator + (TValue left, TValue right)
{
    if (left.index == 0 && right.index == 0 || left.index == 1 && right.index == 1)
    {
    	return ( left.value + right.value );	// bool and number are compiler built-ins
    }
    else if (left.index == 2 || right.index == 2)
    {
    	return object_add( left.value, right.value );	// call into support library of object type (managed by compiler)
    }
    else
    {
    	throw;
    }
}

In Lua code each local-declaration or slot in a Lua table is a TValue. A TValue either a boolean, number of pointer to Lua dynamic data. Lua dynamic data is traditionally either a string, table, closure, state or userdata.

What does it mean to "apply a template"? Think about the fact that the lexer is putting out a tree of language objects which stand in relation to each other. For example we have the statement: a + b. We set left = a and right = b. Then inside of the calculatory tree we replace the + node with the proof-of-concept code we see above, binding the variable "a" to the name "left" and "b" to the name "right". Then we can further calculate the tree because we know meaning behind "if"-nodes and "+" nodes that connect to compiler-internal types.

The code you see above is a proof-of-concept for my proprietary object language that will be used inside the compiler. While it may look very similiar to C or C++ it will be different in the following ways: there are no strict type bindings, calculations are performed in no strict order other than what correctness requires, types depend on the platform or support library availability, etc.

Edited September 22, 2021 by The_GTA

Einheit-101 · September 23, 2021

€DIT: is there any speed comparison between this and LuaJit?

Edited September 23, 2021 by Einheit-101

The_GTA · September 23, 2021

5 hours ago, Einheit-101 said:

€DIT: is there any speed comparison between this and LuaJit?

I am willing to test against LuaJIT in the future. Do you have any benchmarks in mind that I can perform the speed comparisons on? I need good and isolated test scripts.

Einheit-101 · September 23, 2021

Nope, but i guess a simple for loop with different arguments, operations and functions can do the Trick, but my preferred test method is my server live performance

The_GTA · September 23, 2021

22 minutes ago, Einheit-101 said:

Nope, but i guess a simple for loop with different arguments, operations and functions can do the Trick, but my preferred test method is my server live performance

I think that you are not considering the whole picture here. The optimizations performed by logic transformations itself will speed up the code. The compiler will warn the users if dx functions are provably used outside of onClientRender event handlers. The compiler will pick best internal representations for objects (strings, tables, etc) depending on their use (constant string, extensible string buffer, hash table vs. sortable-key table, etc). And much more! Consider that when looking forward or even wanting to support my compiler development.

Edited September 23, 2021 by The_GTA

The_GTA · October 21, 2021

Today I have made significant progress in programming model maintainability and freedom in the Eir lexer. The great new feature inside the SDK is selector-based steps. This is a very important feature to split CFG syntax parsing away from associativity rules as well as language rule simplifications that result from using selectors. I dare to say that the implementation quality of my lexer is above industry standard. Not only can you use the scientific notations that you know from various books and lectures, but you are provided with a meaningful set of optimizing deterministic production templates. I have created and used the Eir SDK which is filled with performant, robust and powerful algorithms/data-structures. That is why I can recommend the Eir lexer for professional compilation software from the bottom of my heart.

Lexing CFG rules does suffer from the P != NP problem. Not only does it have implications on performance but on the complexity of programming models. In my lexer I have decided to stay inside the deterministic space of CFG rules for various reasons:

guarantee halting and maximum performance for CFG syntaxes that are both deterministic (strong requirement) and progressive-only (weak requirement)
building upon the strong and intuitive foundation of no-progress-termination inside CFG syntaxes which paves the way for good CFG language diagnostics
non-determinism is hard to understand for developers

Now, to explain to you the difference between non-determinism and determinism I want to give you an example of the binary operator +.

Non-deterministic CFG syntax for operator +:

S -> S '+' S
S -> A | empty

This CFG syntax resembles all associative possibilities to create brackets around the + operator. But it does also resemble concatenations of the + token without any A productions. While it does contain the essence of binary nodes, it is very ambiguous in nature and is not supported by the Eir lexer.

Deterministic CFG syntax for operator +:

S -> A
S -> A '+' S

This CFG syntax creates productions of type A+A+...+A which is internally creating a list of A data nodes with the implication that each is separated by a + operator. While this by default is not a binary-tree representation, it can be easily converted into one by the use of selector-based steps. This grammar is fully supported by the Eir lexer.

The above example of the + operator should explain to you why it is a good idea to stick to deterministic rules in general. I promise you that the Eir lexer is the best toolkit to work with the deterministic space of CFG syntaxes.

Selectors are a simple scientific concept. Imagine that each data node that is lexed from a CFG sequence is a tuple of child data nodes. The job of the selector is to pick one possible tree arrangement from this list that is best suited for your program. It is being fed with each attributed token or data node in sequence. Instead of storing the + operator node as a list, it can easily create a left- or right-associative tree. This tree can then be transformed into a direct representation by CPU instructions (they are often working on two operands to keep things simple).

Arran · November 24, 2021

This sounds like a really big project and you seem to like major projects, doing loads of work on them, but then lose interest near the end. Which is a real shame that so much talented work goes to waste like MTA: Eir. There are much smaller issues that can have a big benefit to all players, like how client FPS is so low. I traced one cause of low client FPS to sounds: https://github.com/multitheftauto/mtasa-blue/issues/1067 Surely finding the root cause (see the newest comment posted on the issue as to how we might be able to do that) and then fixing it could be a fraction of the work of some big project like this, but would benefit every single of the tens of thousands of MTA players.

The_GTA · November 24, 2021

37 minutes ago, Arran said:

This sounds like a really big project and you seem to like major projects, doing loads of work on them, but then lose interest near the end. Which is a real shame that so much talented work goes to waste like MTA: Eir. There are much smaller issues that can have a big benefit to all players, like how client FPS is so low. I traced one cause of low client FPS to sounds: https://github.com/multitheftauto/mtasa-blue/issues/1067 Surely finding the root cause (see the newest comment posted on the issue as to how we might be able to do that) and then fixing it could be a fraction of the work of some big project like this, but would benefit every single of the tens of thousands of MTA players.

Working on MTA Eir was never financially motivated so it lacked in importance about finishing it. The new compiler will be extended to different languages and be ultimatively a commercial project (Python, PHP, JavaScript, C, etc). You seem to misunderstand my drive behind it. You are judging me based on an old hacky project based on a closed source engine by a studio that programmed the engine with many issues. Please reconsider your statements as they are clearly unfair. In the end working on MTA Eir should have inspired a lot people like you and just based on that fact you should be happy that I did it. Projects like Magic.TXD have sprung from my research of these terrible old game engines and show what the GTA modding community can accomplish if they truly care about their work. And I do truly care. Thus it truly hurts me that you put such derrogatory statements about me on the internet. I was happy to receive CIT support back in the day but I don't know where this statement today is coming from. Glad to see experienced people like you around though. The GTA Trilogy DE engine will be fun to work on in the future!

You want to know the ultimative reason why MTA Eir was not finished? I had no original source code, the engine sucks and the mathematical problems behind rendering have not been properly solved (read: implemented as code) to "remaster" the engine in my own vision! (for example the alpha-blend ordering problem that plagued D3D9 games)

I sense disappointment in your words, but basing such negative statements purely on emotion is damaging in an unfair way. You should have asked why I did it, not try to invoke my upset to answer you.

You seem to be unaware that I do this Lua compiler project as my full-time effort right now. This was different back in the day when I was doing it as a part-time research with other projects on my mind. It will definitely come out, a release candidate is scheduled for end of December. Development is a difficult affair and what motivates me is people who come to me and say how much they appreciate my work, not this kind of stuff I am reading right now. You are talking to a member of the GTA community who is thinking about contributing, spending his time helping people on this forums. Do not try to defame me or my prideful work in any way.

My work for the GTA community is spanning further than you may think. I worked with people like aap, TheLink2012, DK22Pac and NTAuthority to achieve great inspirations. I am proud that I did, even though I may not share their ideals about copyright or enterpreneurship. We shared code together and bolstered our knowledge to get to the experience that we have today. These were awesome days and I like to believe that my contribution to their way of thinking has partially led to better modding quality in general. Thus it is unfair to deride my will to work on projects as a way to shame or defame. I certainly don't think your statement is doing me justice.

Edited November 24, 2021 by The_GTA

Arran · November 25, 2021

I don't think that saying someone is talented is negative or defamatory. Anyway, on topic, from my experience with your work I'm sure you have the talent to complete this project and probably anything else like it.

The_GTA · December 16, 2021

Code templates are the most powerful tools for generalizing problem solutions. Instead of writing a version of your code for each type or combination of fixed parameters, you can create a template that would then generate a substituted final type or function body based on provided parameters (types, values, etc). The most powerful example of template design in a programming language can be found in C++. By making use of the one-definition-rule (ODR) as well as the compile-time fixation of all types the compiler can generate the most optimized version of the template substitution. By making use of numerous compile-time logic execution features the C++ developer can re-use the best and most-proofed code in an ideal frequency. Suffice to say templates are a must-have feature in every programming language.

The intermediate programming language that will host all programming languages in my upcoming machine-code compiler will be COS which stands for Compiler-Objects-Syntax. The language aims to be as reasonably close to known programming languages as possible. That is why I have chosen to adapt the C++ template grammar into COS. But since the C++ grammar is an impossible-pick for the dynamic nature of COS some changes have to be made in comparison to C++. I promise you the changes are not that big and easy to adapt to, once you understand why.

Let me start off by describing the way C++ handles template instantiation. In C++ every type, function or variable location has a location describable by a specifier path. Thus the compiler can look-up the meaning of each specifier or even operation during lexing just by inspecting the entire program object graph (in the Eir lexer this would be implemented by the StepCustomIndirection in connection with access to the indexable global program graph). When deciding whether or not the template instantiation on a simple-specifier is possible the C++ lexer has to first inspect the specifier declaration.

variant <int, float> anyval;

Here the "variant" specifier has to be of template type or else the above program is grammatically ill-formed. This argument can be easily made because types are not runtime objects inside of C++.

The C++ situation becomes more complicated when template instantiation of dependants is performed inside of templates themselves. By allowing types to be passed as template parameters the language does also allow template instantiation of templates that reside inside said template parameters (TP). This introduces the need for a disambiguating "template" keyword at dependent look-up. While it may come-off as annoying to C++ developers - especially considering that they know about the presence of said template in the TP and the language teaching them about the mantra "knowledge is king" - it is understandably necessary.

// Example of code where the "template" keyword is required for disambiguation.
struct template_container
{
    template <int T>
    struct sub_template
    {
        inline sub_template( int V )
        {}

        static const int VALUE = 0;
    };
};

struct not_a_template_container
{
    static const int sub_template = 1;
};

template <typename T>
struct template_user
{
    static void func( void )
    {
        // In the next line the keyword "template" is required for the C++ lexer to be able to create a deterministic
        // language object graph. Then the use of T defined as "template_container" struct is possible.
        // Otherwise - say if we remove the "template" keyword - then the use of "not_a_template_container" is possible.
        // If we had no "template" keyword then the C++ lexer would have no way of creating one definitive template
        // object graph because the right choice of template parameter T would conflict either of the possible graphs.
        // This is why the mandatory "template" keyword in COS is just an extension of already known C++ syntax.
        // https://stackoverflow.com/questions/7304699/what-are-all-the-syntax-problems-introduced-by-the-usage-of-angle-brackets-in-c
        auto value = T::template sub_template <1> ( 0 );
        //auto value = T::sub_template <1> ( 0 );
    }
};

void func( void )
{
    template_user <template_container>::func();
    //template_user <not_a_template_container>::func();
}

Now we switch our view to the COS design of template instantiation. In COS everything is a runtime-object that is transformable during runtime. Thus types and functions are not bound to one known location, standing in stark contrast to C++. To diminish this obscurity I have decided to expand the use of the "template" keyword to all forms of template instantiations including the simple-specifier case (my big wish is that C++ does add this grammar rule too because it would not hurt and it would increase compatibility to COS).

template variant <int, float> anyval;

The expansion claim comes from the fact that the anonymous dependent template instantiation grammar of C++ intersects in compatibility with the COS grammar. Due to the "template" keyword already being used in the similar context it is profound to believe that developers easily adapt to it. The impossible-pick is unfortunate but all means have been considered in design. COS does aim to be more explicit thus strongly-readable and this can be considered a benefit. Since templates are objects in COS you can assign them to variables of type any, allowing dynamic template instantiation with COS grammar.

Ultimatively there is a big gain in forcing this kind of template instantiation disambiguation inside of COS. Weird logical problems about program ambiguity in C++ such as the following can be avoided.

// Example of code that changes meaning depending on the definition of the "myStruct" specifier.
const int myParameter = 2;
int myValue = 3;

#if 1
int myStruct = 1;
#else
template <int T>
struct myStruct
{
    myStruct( int V ){};
};
#endif

int main( int argc, char *argv[] )
{
    auto value = myStruct <myParameter> ( myValue );

    return 0;
}

This should cover the rationale behind the COS template grammar in contrast to C++. I recommend you to follow-up this article with a dive into the issues of the C++ template instantiation grammar.

https://stackoverflow.com/questions/7304699/what-are-all-the-syntax-problems-introduced-by-the-usage-of-angle-brackets-in-c

---

I am currently in the process of expanding the Eir lexer. There have been scientific issues greater than previously expected. Expect another update at the end of December!

The_GTA · December 31, 2021

Today we mark the last day of the year 2021. Thus it is a good moment to look back at 4 and a half months worth of Eir Compile developments. This project is about creating the ultimate compiler along with an astounding lexer for all CFG purposes. With each passing day new problems are encountered and ultimatively solved, just for you to enjoy in the upcoming product as service package. This is easier said than done. But I find it important to explain the intricacies for you to understand why there is no better CFG parsing and processing product than the Eir lexer.

Back in August was when development started. I had the idea to use the power of the newest C++20 features to create the most maintainable, user friendly and performant compiler possible, using the best lexer I could come up with and the best research about code transformation along the way. It was a difficult start with problems such as picking the right lexing strategy (DTM vs NTM), designing a powerful grammar notation language (standard-compilation-syntax short SCS) and establishing a good object building framework surrounding the grammar rules. Looking back all this time I am happy about every decision that I made. This is important because software development is a business and with every faced decision you do take risks, some quite substantial. The most difficult decision - it sure gave me big headaches - during that time was the limitation of the lexer to deterministic processing. It was a question between the capacity of supported grammars and the possibility of lexical optimizations.

If you are a developer then you might know about unit-tests. This has been a continuing effort across all the months. I made it a habbit to create unquestionably working solutions by getting each feature a good unit test. Today there is more than 500 kilobytes, half a megabytes worth of unit test code in the Eir Compile project. This is more than 15000 lines of code combined. Creating them comes down to having internal docs that explain the test in great detail and then transforming the science into code. Typing the unit test code in C++ is a lot of work but it is essential. I promise you that the Eir lexer has the best unit tests compared to all other lexers. You should pick the Eir compiler because it is a proudly tested product from all angles. Here is how a COS lexing unit-test looks like in my own docs:

Near the end of November I did reach the point when I was confident about the COS specification. I am very enthusiastic about the upcoming possibilities. But I might had been overly emotional with my claim that I wanted the compiler to be finished by the end of December. I regret the fact that I did not see the curve-balls coming that the CFG lexing problems would throw at me. It can be explained very easily in a two sided conflict. On one side is my big aspiration of perfectly-fit code with the best performance goals in mind. At the other side we have the broad possibilities of the CFG world. This clash led to a tug of war in seeing the final status of the Eir lexer. New features in this revolutionary solution had to be developed in a brink to feed the craving beast of design freedom and perfection, the latest being grammar templates with replacement-point-step support. As a result, the need for no-progress-gates arised which reset empty-reentrancy and result-status information of enclosed productions. This is where we are today and will continue our journey into the coming year.

To include an insight into my current developments let me share the current revision of the Compiler-Object-Syntax (short: COS) with you. It contains support for C++-style declarations, structs, enums, if-statements, loops, type-casting, template-instantiations and type-defs. The prototyped test code can be found at this URL: https://osdn.net/projects/eircompile/scm/svn/blobs/69/testcompiler/src/cos_tests.cpp

(full version at https://green-candy.osdn.jp/external/forumuploads/eircompile/cos_grammar_31122021.pdf)

In case you are having trouble understanding the virtual lexing grammar above, it is written in Eir standard-compilation-syntax (short: SCS). I have not previously described it but it should be naturally intuitive. You can find it here: https://green-candy.osdn.jp/external/forumuploads/eircompile/scs_spec_31122021.pdf

I hope that you are not too disappointed in the current status of my project. You see the difficulties of my work and why the Eir lexer (and compiler!) will be matching your expectations very well, especially because I am not taking any unfounded short-cuts. Creating this compiler is about embedding my own vision, my own mind, my own skillset into a software framework that developers are able to query at their green fingertips. Your trust into me and my way of support shall be reflected in every portion of this software. Let me end this post by prematurely wishing you a happy new year!

Edited December 31, 2021 by The_GTA

Lua to machine code compiler - Interest Poll

A better Lua 19 members have voted

1. Would you like to see a better Lua compiler?

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

A better Lua
19 members have voted