Project Proposal
Big Ideas
The big idea for our project will be to make a program that will emulate LaTex, only simpler and easier to pick up. The program name will be called aztex. The program will take a text file that is in aztex format, and convert it into a nicely formatted pdf document. The minimum viable product for our project is a program that takes a text file with minimal formatting and converts it into a professional pdf using the latex compiler. The following features are highly desired:
- Numbered Lists
- Unordered Lists
- Headers
- Links
- Bold, italic, underline, strike-through
Some stretch goals that may be pursued are programming our own pdf compiler, integrated gui to see how the documents will be formatted, and adding further features like images and such.
Learning Goals
Shared:
- Learn more about parsing documents
- Learn more about working in project teams
- Interfacing with other libraries.
Individual:
Jordan Van Duyne:
- Make more in-line comments
- Test more often
Thuc Tran:
- Develop better unit tests.
- Write more efficient code -> Use less nested for loops.
Isaac Getto:
- Write more tests
- Organize code better
Implementation Plan
- Find latex compiling library
- Determine how to simplify LaTeX language a. Develop our own language based off of simplified LaTex
- Learn how to parse our language
- Separate projects into components -> i.e. compiling, parsing, processing
- Develop our own compiler
Project Schedule
Weekly Goals:
- Figure out the program architecture (with UML/diagrams)
- Create our language (unit tests for individual formatting options)
- Working on parsing text file into intermediate representation.
- Processing intermediate into LaTex Format
- Compile the intermediate into a PDF
- Stretch Goals + Debugging.
Collaboration Plan
Overall plan is to work collaboratively on the program architecture and developing the simplified language. This is to make sure that we are all on the same page for what kind of implementation on which we will be working. After which we will subdivide parts of the parsing, processing, and compiling when we have functions and subfunctions that do not interact with each other directly. This will let us work in parallel for sections of the project.
Risks
- The project could be more complex than we thought
- Libraries are non-existent or difficult to work with.
- Unexpected lack of free work time.
Additional Course Content
- language processing
Preparation and Framing Document
The big idea
Making a LaTex that is simpler to use.
How?
Instead of latex style, use markdown style
What is markdown?
http://en.wikipedia.org/wiki/Markdown#Example
Formatting that we want to include:
- Math symbols
- Lists (ordered and unordered)
- Size of text
- Bold, underline, strikethrough, italics
- Title
- Headers
The goal is for teachers to be able to make (math) homework documents with it
Have user input simple code, which our program turns into the analogous latex code.
We then input this latex code into a latex compiler using the PyLaTeX library: https://github.com/JelteF/PyLaTeX
General compiler architecture
Effectively, the task is to compile a LaTex language in another (new) language.
Generally a compiler will have a front end, middle end, back end.
The front end will focus on converting user input into some intermediate form.
The middle end will generally optimize code so one gets rid of extraneous/unreachable code, and other tricks. This will be seen more often in writing programs, rather than text. We may not end up doing some of this.
The back end will focus on converting the intermediate representation into the needed LaTex code, and then compiling the code.
Why would we not have a middle end?
- We don't necessarily want to optimize the code due to the fact that we want to represent the user intentions as accurately as possible.
Key Questions
- How do we want to represent each character?
- Should attributes like underlining and bolding be its own class/token or should it be part of a text class?
- How are the parts (tokenizer and parser) connected?
- Is it parsed all at once or token by token?
- As a stream or with files?
- Do we need a middle end?
- How can we make the language simpler for users? Give context of math teacher
- Do we want a gui?
- How do we make the language simpler for us? Easier to parse
- Should the final outputted pdf document be able to be formatted with an option of different themes? E.g. a math theme, note-taking theme, etc.
- Should we have users type in our program? If not, what text format do we want to use? .txt? .doc?
Agenda for technical review session
The beginning will mostly be under a std. powerpoint presentation of information.
- 2 Minutes on what we want out of the review: affirmation that we’re on the right track and feedback on our key questions
- 1 Minute: Who are we? What are we doing?
- 5 Minutes: Background context with PPT
- 3 Minutes: How we’re structuring the system + UML
- Until we get kicked off: QUESTION SLIDES
This will mostly be done in a conversational, “Here is our question, here are some thoughts, help us”
The main questions to be addressed at the review:
- Are elementary school math teachers an appropriate target audience? Any other ideas for possible audiences?
- How are the parts of the program connected (tokenizer, parser)? Are tokens sent to the parser one by one, or all together?
- How should we set up the language itself?
- How can it be easy for the user?
- How can it be easy for us to parse?
- How do you think we might want to store our intermediate representation?
- Should it be a class for each attribute? Or should attributes be under a class?
- Do we need a middle end?
- Any concerns y’all have?
Design Review Reflection
Reflection and Synthesis
Feedback and decisions
Teachers are not necessarily our key audience. In addition, we do want to manage the trade-off between simplicity and flexibility of our new language. There was also a discussion about a compromise between simplicity and flexibility via the use of a raw tag that lets you write LaTex code.
Going Forward:
We are most likely going to transition our target audience to students who are not familiar with LaTex. In particular, we may be focusing on either new students to Olin, or high school students in general. However, we have also decided that we can make design decisions based off of what we want out of our project, not necessarily what our “target audience” would want, since we are making this project for ourselves, not an audience. We may be focusing directly on using Markdown directly as a language.
New Questions:
Should our program come with multiple example LaTeX templates? What features are most important for a student user? Should we use markdown as our exact language?
Review process reflection
How did the review go?
It went well. We got valuable feedback on how we want to think about our target audience going forward, and ideas for ways to implement the language and workflow of our program.
Did you get answers to your key questions?
We did get some feedback. In addition to transitioning to a new audience entirely as a result of feedback, the idea of balance between simplicity and complexity was brought up. We also received an answer to how we should structure our language: some adaptation of Markdown.
Did you provide too much/too little context for your audience?
We provided ample context for one of the technical aspects of our presentation (how compilers are usually split into a front, middle, and back end) in the reading we sent out before our Design Review, and as a result we only briefly addressed this specific background context in our presentation; we assumed our audience would have read the reading and thus be familiar with what we were talking about. However, no one actually read the reading, so as a result the small amount of context that we provided in our presentation was not enough for our audience to be able to answer the few technical questions that we had. However, besides this one small technical aspect of our presentation, we believe that we provided an appropriate amount of context for our audience.
Did you stick closely to your planned agenda, or did you discover new things during the discussion that made you change your plans?
We had stuck fairly closely to our overall agenda, but had found that certain questions we had got more response than others. In particular, for many of the more technical questions we had little feedback. This might be addressed by structuring the question as a function of different options or ideas that we are choosing from.
What could you do next time to have an even more effective technical review?
For our next presentation, we can change up our presentation style. For example, if we want our audience to help us generate ideas (such as design options), instead of just asking if the audience has any ideas, we could split the audience into smaller groups and have each group ideate for a few minutes and then bring all of the groups back together to look at all of the new ideas. We should also cover all of the background context that we provide in our pre-review readings in our presentation. We could frame our technical questions in a different way to give the audience more knowledge about the effects of different design options.
Code Review
Preparation and Framing Document
Background and context
We are still working towards creating aztex: an easier to use version of LaTeX. However, we have decided that although we have pivoted our user group to be general students (with a focus on high school students or students who are not familiar with LaTeX), we will not be too concerned with creating aztex specifically for our user group.
aztex will work by compiling a txt file written using aztex language (similar to Markdown) into the analogous LaTeX code in order to output a pdf
Compilers typically consist of a front, a middle, and a back end.
The front end consists of syntactic and semantic processing and translation of the source code to an internal representation.
In most compilers, the middle end optimizes the internal representation. However, due to the basicness that aztex has when compared to actual programming languages, we believe that a middle end is unnecessary for our aztex program.
The back end takes the internal representation and outputs code that a particular processor and OS can understand.
Our architecture:
Our front end is consistent with standard compilers; it is made up of a parser and a tokenizer. We currently have finished most of the front end and classes that the front end implements, including:
Tokenizer
The tokenizer uses a regular expression to look for two new lines in a row to find each block of text. The tokenizer will return tokens, block by block.
Parser
The parser converts each token/block into a specific element object as defined below, depending on the type of token it is. The parser does this using via matching the string to the type of element via regular expression.
Element
Elements represent the individual parts of the document. Eac part will be defined as a subclass of element such as HeadingElement or StrikeThroughElement, . They both possess the type of element they are, and any subelements the element type may apply to (i.e. bolding, or heading)
InternalRepresentation
The InternalRepresentation class represents the structure of the document in terms of a list of Element objects. This will be program’s internal representation of the document. Also consistent with standard compilers, our back end will include the classes:
GenericOutput:
This will be an abstract class that contains handles that children classes such as LatexOutput or other languages can use.
LatexOutput:
LatexOutput will be a subclass of GenericOutput. We are still deciding the exact implementation of this class. Currently we are debating having this be an interface to a library vs. writing our own library for our code.
In order to possibly expand aztex beyond just a LaTeX document creator in the future, to assist in the ease of debugging, and to make parts of our code interchangeable, we have made sure to encapsulate our data and to practice modular programming. Basically, this means that, for example, the details of the Element class are “hidden” from the InternalRepresentation creator.
Key questions
- Should we use a library or an interface?
- Any edge cases we may not have seen?
- Any concerns that y’all foresee?
- Should we try to implement a GUI for the input?
Agenda for technical review section
- 1 minute - Introduction and extremely high level description of aztex
- 3 minutes - Provide background and context and explain what we have completed (front end).
- 3 minutes - Code architecture description (UML diagram and explaining why we made sure to practice modular programming)
- 5 minutes - Walk through code on a higher level.
- 10 minutes - Have everyone clone our repository, run AztexRunner.py with an input of a text file written in our aztex language. This will help the audience gain a better understanding of our program and will also help us to see if our code does not work given any certain inputs. Since people will be inputting a text file that they create, they might create and test a certain input that we had not already tested. This will help us see possible edge cases that we may have overlooked.
- 3 minutes - Key questions
Feedback and decisions
The feedback we received mainly had to do with errors found in tokenizing our code and including other features. We also received a lot of feedback about whether or not we should use a library to generate LaTeX output. Not using the library might be a good idea since we already know how to write code in python, but we do not yet know how to efficiently implement this library into our code. From the feedback, we have also learned that we definitely should provide the user with the actual LaTeX code (not just the final pdf) in case they want to edit the LaTeX document themselves in order to add more customization to their pdf than aztex can allow. Furthermore, if we complete our MVP and still have time to spare, working on a GUI that users can chose to use would be nice.
Also we heard that we should include the following features:
- list with subelements (DONE)
- list with sublists or indents
- add argument for file name
- one item list
- nested elements (bold italicized things)
- equations -> tables (DONE)
- check if library supports all things
- if we write to a tex file people can learn latex
- add test suite of example files
- gui with equation buttons and side by side would be nice
Review process reflection
The review went well. We were able to follow our agenda, and we did get answers to our key questions, especially in regards to our question about if any of our features do not work or what other features we should include. Although we provided a lot of background context about how compilers work and are built, we probably should have provided an example to show people how to use our program. Although people eventually figured out what our program was supposed to output, some people were confused about what the output of our program was supposed to look like (apparently we did not clearly explain that the output was not supposed to be a nice looking pdf document). As a result, the actual process of getting people to test our code went differently than expected. We probably would have been able to get more feedback/test cases if they had more information on what to type and what the output was. Considering that this was also a code review, we probably should come in with more questions and content on actual coding decisions in the future.