Software Code Composition
The ultimate goal for the TextComposerLib library is to be used for systematically generating structured static text code for arbitrary languages and applications. Like there is no single tool that can be used to compose all kinds of art, my experience with code generation is that no single method is suitable for all situations. We almost always need to integrate several code generation methods to get the desired results.
There are several ways we create our code:
- We can always write the whole code ourselves. This is the way things were done 15 years ago. Now the very large software systems we create are almost impossible to write manually. We must use code generation at some point in the development. Nevertheless, the design and engineering of the code is never a task for computers. The structure, correctness, and efficiency of the produced code largely depends on the designer not the tool.
- Print statements are another alternative for smaller systems with few intricacies. They can be useful for things like adding simple comments to the code, unrolling fixed loops, and implementing C-like macro preprocessors. This approach saves some time but can cause very bad things for structure and correctness of the code. Every code generator will, in its final stage, depend on print statements, but we need to organize these loose prints into higher-level text generation functions, just like we encapsulated the old Goto statements into If-Then, For, and Try-Catch constructs.
- Template-based code generation has been a popular approach for many years now, especially within the data-base and web software design realm. When most of our code is symmetric and its main body is fixed, like HTML pages and the code for data access layers, we create templates containing missing parts that we fill depending on some data-source and configurable parameters. This approach is much better than the direct use of print statements, just like control of flow statements are much better than the Goto statements.
- Constructing an Abstract Syntax Tree then unparsing into code is a very powerful approach used by many. This approach can help in producing correct code, at least syntactically, for some target language. It’s usually not practical to use AST-based code generation for whole systems because we need to create the AST by creating and connecting many smaller nodes as objects. The readability of the code that performs the generation is reduced when we try to produce more and more code. Also this approach is not suited for general code generation because each target language would require its specific AST and unparser design. This approach is best suited for creating snippets or small portions of code for a specific target language, and using templates to organize the snippets within the larg body of the remaining code.
- Software Factories are used for producing large software systems by assembling code parts using many different tools under the control of the software designer. A software factory can be setup to produce a family of related software along with their unit tests, documentations, alternative implementations, feature selection, etc. This is the goal in mind for the TextComposerLib and all its composers.
1. Generating Code using Text Composers
The TextComposerLib contains composers that act as “structured print statements”. My intention was to create classes that imitate how I write code manually.
- The LinearComposer is very useful in this context as it captures what I do while writing consecutive lines of code by selecting where to write the next line or how to indent the code.
- The ListComposer and other structured composers capture the symmetry of well-formatted text in many ways. When combined with .NET’s Linq to objects, these composers can produce arrays or lists of structured text with arbitrary separators and prefix\suffix text.
- The ParametricComposer is close to a template generator but lacks in-template processing. This composer just substitutes text in place of identified places in the template.
- The MappingComposer is also a form of template generator that can be used in conjunction with the ITextExpression interface to add processing capabilities to text templates. We just need to add an interpreter that takes a marked text segment and parses it into a text expression tree then interpret the tree into a string to be substituted in the template. We can also use any other interpretation\transformation of the marked segments text we wish.
- The RegionComposer is similar to the MappingComposer but works on line regions of text rather than contiguous text segments. It can be used to mark blocks of code lines for generation while the tag of its slot regions can hold additional information to guide the generated code, perhaps using an ITextExpression interpreter or any other method.
- The FilesComposer can be used to structure generated text into code files and folders as desired with arbitrary order of creation.
- The ProgressComposer can be used to track and generate logs of all these steps in detail for debugging and tracking purposes.
Together these composers can be used to generate highly structured, well-formatted text code files very similar to how we think while writing manual code. We can use the composers to construct larger systems like code preprocessors, code conversion systems, template processing engines, aspect weaving systems, and more.
2. Generating Code using ASTs
Like many software developers, I have a history with many general purpose programming languages like QBasic, C, C++, VB.NET, C#, and F# among others, in addition to many domain specific ones. Each language has its own specific syntax and semantics that give it the characteristics we know. Generating code in one target language using the AST-unparsing approach requires creating a specific AST for each target language and a specific unparser as well. Nevertheless, many syntactic elements are common among various languages that they can be effectively abstracted into AST nodes common to many target languages. For example most C-Like languages contain very similar forms of comments, if-else conditions, for loops, try-catch, switch-case, return, continue-break, class and structure definition, method and function declaration, and many others. The main parts of the if-else construct are the if-condition, the true-statements, and the false-statements. Here we only need to create a general AST node class for the if-else construct to serve several languages at once (C, C++, Java, C#, VB.NET, etc.) and just write the correct unparsing procedure for each specific language. This is the approach I took for creating general AST node classes under the TextComposerLib.Code.SyntaxTree namespace. All node classes implement the ISyntaxTreeElement interface, so we can add more nodes, either general to many or specific to a single target language.
A significant portion of any language is related to expressions the language can have. Some languages, specifically functional ones like F#, deal exclusively with expressions rather than statements. The SteExpression class under the TextComposerLib.Code.SyntaxTree.Expressions namespace is used as the only representation AST node for all expressions. This class is mainly responsible for storing and manipulating the tree structure of the expression. Many kinds of expressions exist, for example literal values, variables, arithmetic operators, function calls, array element access, etc. The details of the kind of the expression is stored in the HeadSpecs member of the the SteExpression class. This member is of type ISteExpressionHeadSpecs, the main interface for holding all information about expression kinds.
For any specific target language we need a class to construct the AST. Naturally not all AST node classes implementing the ISyntaxTreeElement interface are relevant to all languages. The LanguageSyntaxFactory class under the TextComposerLib.Code.Languages namespace can be used, directly or through a derived class, to construct the AST as desired for a specific target language. The LanguageCodeGenerator class is the base for unparsing the AST. This class implements the double-dispatch dynamic visitor pattern explained here to traverse the AST and generate text based on the nodes. We can inherit a class from the LanguageCodeGenerator class to implement an unparser for a selected target language. The abstract LanguageServer class contains the members SyntaxFactory and CodeGenerator that hold the AST construction and unparsing objects. We can inherit from this class and add whatever members to serve our code generation needs for the selected target language. In addition, the simple LanguageInfo class contains some information about the target language like its name and version. We also have the ILanguageSyntaxConverter interface and the abstract LanguageExpressionConverter that implements it. The class is intended to convert a given expression from one target language to another. This was required in GMac to convert Mathematica symbolic expressions into target language computational expressions as explained in the GMacAPI Guide. All these interfaces and classes are the infrastructure for creating general target AST’s and generating structured code from them. GMac provides a good use case for this infrastructure and the full details are easily followed through its code. In time more classes and functions will be added to this infrastructure to serve more target languages and more syntax capabilities while, probably, having the same high-level design.
3. Generating Code Libraries
The GMacCodeLibraryComposer class under the TextComposerLib.Code namespace is the base class for generating complex code for a large software library in a selected target language. The GMacAPI Guide explains how GMac extends this class to generate geometric computing code from GA models. This code library generator class should be inherited into an assembly line that contains components to compose our code in a very modular organized way to ease readability and debugging. The full source code of this very important class is shown here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 |
using System; using System.Text; using TextComposerLib.Code.Languages; using TextComposerLib.Files; using TextComposerLib.Logs.Progress; using TextComposerLib.Text.Linear; using TextComposerLib.Text.Parametric; namespace TextComposerLib.Code { /// <summary> /// A base class for structured text code generation process into source code files /// </summary> public abstract class CodeLibraryComposer : IProgressReportSource { /// <summary> /// The name of this code library composer /// </summary> public abstract string Name { get; } /// <summary> /// The description of this code library composer /// </summary> public abstract string Description { get; } /// <summary> /// The name of this composer used during reporting progress /// </summary> public abstract string ProgressSourceId { get; } /// <summary> /// The progress log object /// </summary> public abstract ProgressComposer Progress { get; } /// <summary> /// The language object providing code composition and generation services /// </summary> public abstract LanguageServer Language { get; } /// <summary> /// The target language code generator of this library composer /// </summary> public LanguageCodeGenerator CodeGenerator { get { return Language.CodeGenerator; } } /// <summary> /// The language syntax elements factory for this code library composer /// </summary> public LanguageSyntaxFactory SyntaxFactory { get { return Language.SyntaxFactory; } } /// <summary> /// The Files Composer object where the final text is written during code generation /// </summary> public FilesComposer CodeFilesComposer { get; private set; } /// <summary> /// A collection of parametric template composers to be used during code generation /// </summary> public ParametricComposerCollection Templates { get; private set; } /// <summary> /// True if the Files Composer has an active file /// </summary> public bool HasActiveFile { get { return CodeFilesComposer.HasActiveFile; } } /// <summary> /// The active file composer object /// </summary> public TextFileComposer ActiveFileComposer { get { return CodeFilesComposer.ActiveFileComposer; } } /// <summary> /// The active file's linear text composer object /// </summary> public LinearComposer ActiveFileTextComposer { get { return CodeFilesComposer.ActiveFileTextComposer; } } /// <summary> /// This property is set to true internally after initializing the parametric templates collection /// </summary> protected bool TemplatesReady { get; private set; } protected CodeLibraryComposer() { CodeFilesComposer = new FilesComposer(@"\"); Templates = new ParametricComposerCollection(); TemplatesReady = false; } /// <summary> /// Initialize peremetric text templates used for code generation. This method is called every time /// the Initialize() method is called and TemplatesReady = false. The new value for TemplatesReady is /// set in the Initialize() method using the returned value from InitializeTemplates() /// </summary> /// <returns></returns> protected abstract bool InitializeTemplates(); /// <summary> /// Initializes any other components of a generator sub-class inherited from this one. /// This method is called automatically by the Initialize() method /// </summary> protected abstract void InitializeOtherComponents(); /// <summary> /// This method verifies that the code generator is ready to start code generation process. /// This method is called before initializing the generator to make sure all relevant inputs \ parameters /// are ready. /// </summary> /// <returns></returns> protected abstract bool VerifyReadyToGenerate(); /// <summary> /// Perform the actual text generation process after a call to InitializeGenerator() is done. /// A call to FinalizeGenerator() must follow the execution of this method. /// </summary> protected abstract void ComposeTextFiles(); /// <summary> /// Finalizes any other components of a generator sub-class inherited from this one. /// This method is called automatically by the Finalize() method /// </summary> protected abstract void FinalizeOtherComponents(); /// <summary> /// Initializes the text file generation process. This method must be called before any generation process /// </summary> protected void InitializeGenerator() { //Call initialize templates if needed if (TemplatesReady == false) TemplatesReady = InitializeTemplates(); //For each template, clear all parameters bindings foreach (var template in Templates.Values) template.ClearBindings(); //Clear the contents of the files composer CodeFilesComposer.Clear(); //Initialize any other components of a generator sub-class inherited from this one InitializeOtherComponents(); } /// <summary> /// Finalizes the text file generation process. This method must be called after any generation process /// </summary> protected void FinalizeGenerator() { //Finalize any other components of a generator sub-class inherited from this one FinalizeOtherComponents(); } /// <summary> /// Performs text files generation based on the given AST information. /// This method calls InitializeGenerator(), ComposeTextFiles(), then FinalizeGenerator(). /// </summary> public void Generate() { Generate(ComposeTextFiles); } /// <summary> /// Performs text files generation based on the given AST information. /// This method calls InitializeGenerator(), composeTextFilesAction(), then FinalizeGenerator(). /// </summary> /// <param name="composeTextFilesAction"></param> public void Generate(Action composeTextFilesAction) { if (this.SetProgressRunning() == false) return; if (VerifyReadyToGenerate() == false) { this.SetProgressNotRunning(); return; } InitializeGenerator(); try { composeTextFilesAction(); } catch (OperationCanceledException e) { this.ReportError(e); } finally { CodeFilesComposer.FinalizeAllFiles(); FinalizeGenerator(); this.SetProgressNotRunning(); } } public override string ToString() { var s = new StringBuilder(); s.Append(CodeGenerator.LanguageInfo.LanguageSymbol); s.Append(" "); s.Append(Name); return s.ToString(); } } } |
The members are straightforward to follow but we need to understand the correct calling sequence for the methods:
- If we focus on the Generate() method we find that the first step is to make sure the code library generator is ready to begin the generation process by calling the VerifyReadyToGenerate() method. There we should put any checks we need on the inputs of the library generator to make sure they are sufficient for the process.
- Next a call to the InitializeGenerator() method is done. This method initializes any parametric text templates used during code generation, clears the code files composer, and initialize any other components we need for the process.
- Next the composeTextFilesAction delegate, passed as a parameter to the Generate() method, is executed. Here we should write code to use our components to generate the code. There is another overload of the Generate() method taking no arguments. This overload by default calls the ComposeTextFiles() method as the code composition action.
- Finally, the code files composer is finalized to finish its work, and the FinalizeGenerator() method is called to finalize any other components used during the code generation process.
In this way, the GMacCodeLibraryComposer class provides a general framework for code generation of larger size than do the text composers and the AST-unparser methods. It’s up to the software designer to select the specific components to add to this class and how they interact with the data-source to compose the final code.