Project Proposal

Big Ideas

The big idea for our project will be to make a program that will emulate LaTex, only simpler and easier to pick up. The program name will be called aztex. The program will take a text file that is in aztex format, and convert it into a nicely formatted pdf document. The minimum viable product for our project is a program that takes a text file with minimal formatting and converts it into a professional pdf using the latex compiler. The following features are highly desired:

  1. Numbered Lists
  2. Unordered Lists
  3. Headers
  4. Links
  5. Bold, italic, underline, strike-through

Some stretch goals that may be pursued are programming our own pdf compiler, integrated gui to see how the documents will be formatted, and adding further features like images and such.

Learning Goals

Shared:

  1. Learn more about parsing documents
  2. Learn more about working in project teams
  3. Interfacing with other libraries.

Individual:

Jordan Van Duyne:

  1. Make more in-line comments
  2. Test more often

Thuc Tran:

  1. Develop better unit tests.
  2. Write more efficient code -> Use less nested for loops.

Isaac Getto:

  1. Write more tests
  2. Organize code better

Implementation Plan

  1. Find latex compiling library
  2. Determine how to simplify LaTeX language a. Develop our own language based off of simplified LaTex
  3. Learn how to parse our language
  4. Separate projects into components -> i.e. compiling, parsing, processing
  5. Develop our own compiler

Project Schedule

Weekly Goals:

  1. Figure out the program architecture (with UML/diagrams)
  2. Create our language (unit tests for individual formatting options)
  3. Working on parsing text file into intermediate representation.
  4. Processing intermediate into LaTex Format
  5. Compile the intermediate into a PDF
  6. Stretch Goals + Debugging.

Collaboration Plan

Overall plan is to work collaboratively on the program architecture and developing the simplified language. This is to make sure that we are all on the same page for what kind of implementation on which we will be working. After which we will subdivide parts of the parsing, processing, and compiling when we have functions and subfunctions that do not interact with each other directly. This will let us work in parallel for sections of the project.

Risks

Additional Course Content

Preparation and Framing Document

The big idea

Making a LaTex that is simpler to use.

How?

Instead of latex style, use markdown style

What is markdown?

http://en.wikipedia.org/wiki/Markdown#Example

Formatting that we want to include:

The goal is for teachers to be able to make (math) homework documents with it

Have user input simple code, which our program turns into the analogous latex code.

We then input this latex code into a latex compiler using the PyLaTeX library: https://github.com/JelteF/PyLaTeX

General compiler architecture

Effectively, the task is to compile a LaTex language in another (new) language.

Generally a compiler will have a front end, middle end, back end.

Why would we not have a middle end?

Key Questions

  1. How do we want to represent each character?
  2. Should attributes like underlining and bolding be its own class/token or should it be part of a text class?
  3. How are the parts (tokenizer and parser) connected?
  4. Is it parsed all at once or token by token?
  5. As a stream or with files?
  6. Do we need a middle end?
  7. How can we make the language simpler for users? Give context of math teacher
  8. Do we want a gui?
  9. How do we make the language simpler for us? Easier to parse
  10. Should the final outputted pdf document be able to be formatted with an option of different themes? E.g. a math theme, note-taking theme, etc.
  11. Should we have users type in our program? If not, what text format do we want to use? .txt? .doc?

Agenda for technical review session

The beginning will mostly be under a std. powerpoint presentation of information.

This will mostly be done in a conversational, “Here is our question, here are some thoughts, help us”

The main questions to be addressed at the review:

  1. Are elementary school math teachers an appropriate target audience? Any other ideas for possible audiences?
  2. How are the parts of the program connected (tokenizer, parser)? Are tokens sent to the parser one by one, or all together?
  3. How should we set up the language itself?
  4. How can it be easy for the user?
  5. How can it be easy for us to parse?
  6. How do you think we might want to store our intermediate representation?
  7. Should it be a class for each attribute? Or should attributes be under a class?
  8. Do we need a middle end?
  9. Any concerns y’all have?

Design Review Reflection

Reflection and Synthesis

Feedback and decisions

Teachers are not necessarily our key audience. In addition, we do want to manage the trade-off between simplicity and flexibility of our new language. There was also a discussion about a compromise between simplicity and flexibility via the use of a raw tag that lets you write LaTex code.

Going Forward:

We are most likely going to transition our target audience to students who are not familiar with LaTex. In particular, we may be focusing on either new students to Olin, or high school students in general. However, we have also decided that we can make design decisions based off of what we want out of our project, not necessarily what our “target audience” would want, since we are making this project for ourselves, not an audience. We may be focusing directly on using Markdown directly as a language.

New Questions:

Should our program come with multiple example LaTeX templates? What features are most important for a student user? Should we use markdown as our exact language?

Review process reflection

How did the review go?

It went well. We got valuable feedback on how we want to think about our target audience going forward, and ideas for ways to implement the language and workflow of our program.

Did you get answers to your key questions?

We did get some feedback. In addition to transitioning to a new audience entirely as a result of feedback, the idea of balance between simplicity and complexity was brought up. We also received an answer to how we should structure our language: some adaptation of Markdown.

Did you provide too much/too little context for your audience?

We provided ample context for one of the technical aspects of our presentation (how compilers are usually split into a front, middle, and back end) in the reading we sent out before our Design Review, and as a result we only briefly addressed this specific background context in our presentation; we assumed our audience would have read the reading and thus be familiar with what we were talking about. However, no one actually read the reading, so as a result the small amount of context that we provided in our presentation was not enough for our audience to be able to answer the few technical questions that we had. However, besides this one small technical aspect of our presentation, we believe that we provided an appropriate amount of context for our audience.

Did you stick closely to your planned agenda, or did you discover new things during the discussion that made you change your plans?

We had stuck fairly closely to our overall agenda, but had found that certain questions we had got more response than others. In particular, for many of the more technical questions we had little feedback. This might be addressed by structuring the question as a function of different options or ideas that we are choosing from.

What could you do next time to have an even more effective technical review?

For our next presentation, we can change up our presentation style. For example, if we want our audience to help us generate ideas (such as design options), instead of just asking if the audience has any ideas, we could split the audience into smaller groups and have each group ideate for a few minutes and then bring all of the groups back together to look at all of the new ideas. We should also cover all of the background context that we provide in our pre-review readings in our presentation. We could frame our technical questions in a different way to give the audience more knowledge about the effects of different design options.

Code Review

Preparation and Framing Document

Background and context

We are still working towards creating aztex: an easier to use version of LaTeX. However, we have decided that although we have pivoted our user group to be general students (with a focus on high school students or students who are not familiar with LaTeX), we will not be too concerned with creating aztex specifically for our user group.

aztex will work by compiling a txt file written using aztex language (similar to Markdown) into the analogous LaTeX code in order to output a pdf

Compilers typically consist of a front, a middle, and a back end.

The front end consists of syntactic and semantic processing and translation of the source code to an internal representation.

In most compilers, the middle end optimizes the internal representation. However, due to the basicness that aztex has when compared to actual programming languages, we believe that a middle end is unnecessary for our aztex program.

The back end takes the internal representation and outputs code that a particular processor and OS can understand.

Our architecture:

Our front end is consistent with standard compilers; it is made up of a parser and a tokenizer. We currently have finished most of the front end and classes that the front end implements, including:

Tokenizer

The tokenizer uses a regular expression to look for two new lines in a row to find each block of text. The tokenizer will return tokens, block by block.

Parser

The parser converts each token/block into a specific element object as defined below, depending on the type of token it is. The parser does this using via matching the string to the type of element via regular expression.

Element

Elements represent the individual parts of the document. Eac part will be defined as a subclass of element such as HeadingElement or StrikeThroughElement, . They both possess the type of element they are, and any subelements the element type may apply to (i.e. bolding, or heading)

InternalRepresentation

The InternalRepresentation class represents the structure of the document in terms of a list of Element objects. This will be program’s internal representation of the document. Also consistent with standard compilers, our back end will include the classes:

GenericOutput:

This will be an abstract class that contains handles that children classes such as LatexOutput or other languages can use.

LatexOutput:

LatexOutput will be a subclass of GenericOutput. We are still deciding the exact implementation of this class. Currently we are debating having this be an interface to a library vs. writing our own library for our code.

In order to possibly expand aztex beyond just a LaTeX document creator in the future, to assist in the ease of debugging, and to make parts of our code interchangeable, we have made sure to encapsulate our data and to practice modular programming. Basically, this means that, for example, the details of the Element class are “hidden” from the InternalRepresentation creator.

Key questions

  1. Should we use a library or an interface?
  2. Any edge cases we may not have seen?
  3. Any concerns that y’all foresee?
  4. Should we try to implement a GUI for the input?

Agenda for technical review section

Feedback and decisions

The feedback we received mainly had to do with errors found in tokenizing our code and including other features. We also received a lot of feedback about whether or not we should use a library to generate LaTeX output. Not using the library might be a good idea since we already know how to write code in python, but we do not yet know how to efficiently implement this library into our code. From the feedback, we have also learned that we definitely should provide the user with the actual LaTeX code (not just the final pdf) in case they want to edit the LaTeX document themselves in order to add more customization to their pdf than aztex can allow. Furthermore, if we complete our MVP and still have time to spare, working on a GUI that users can chose to use would be nice.

Also we heard that we should include the following features:

Review process reflection

The review went well. We were able to follow our agenda, and we did get answers to our key questions, especially in regards to our question about if any of our features do not work or what other features we should include. Although we provided a lot of background context about how compilers work and are built, we probably should have provided an example to show people how to use our program. Although people eventually figured out what our program was supposed to output, some people were confused about what the output of our program was supposed to look like (apparently we did not clearly explain that the output was not supposed to be a nice looking pdf document). As a result, the actual process of getting people to test our code went differently than expected. We probably would have been able to get more feedback/test cases if they had more information on what to type and what the output was. Considering that this was also a code review, we probably should come in with more questions and content on actual coding decisions in the future.