Summary Coqlex Formally Verified Lexer Generation arxiv.org
13,132 words - PDF document - View PDF document
One Line
Coqlex is a formally verified lexer generator implemented in Coq that provides a reliable and efficient solution for generating lexers, although it may have slower execution times compared to other lexer generators.
Slides
Slide Presentation (6 slides)
Key Points
- Coqlex is a formally verified lexer generator implemented in Coq.
- Coqlex uses different directories and files to organize its source code.
- Coqlex includes score computation functions for longest match and shortest match rules.
- Verbatim++ is another lexer generator that uses a trie-based representation of the DFA and memoization of derivative computations for efficiency.
- Coqlex performed reasonably well in a comparison with Verbatim++, OCamllex, and Flex for a JSON lexer benchmark.
- Coqlex provides a formally verified and efficient solution for lexer generation, while Verbatim++ offers additional optimizations but may have slower execution times in certain cases.
Summary
999 word summary
Coqlex is a formally verified lexer generator that has been implemented in Coq, a proof assistant. The source code of Coqlex is organized into different directories and files, each containing specific components of the lexer generation process. The main function of Coqlex is located in the coqlex.ml file. The lexer definition is provided in the CoqlexLexer.v file, while the selection system and data type definition are found in the LexerDefinition.v file. The score computation functions for the longest match rule and shortest match rule are implemented in MachLen.v and ShortestLen.v, respectively. The optimized versions of these score computation functions are present in MachLenSimpl.v and ShortestLenSimpl.v. The regexp simplification process is implemented in the RegexpSimpl.v file. The RValues.v file contains the definition of various regex elements such as strings, characters, identifiers, and numbers. Coqlex also includes an extended version of the regular expression optimization algorithm, which is located in the regexp-opt directory.
Verbatim++ is another lexer generator that has been verified and optimized for performance. It uses a different approach than Coqlex, employing a trie-based representation of the DFA and memoization of derivative computations to improve efficiency. Verbatim++ also incorporates smart constructors that simplify regex expressions on the fly during derivation. However, despite these optimizations, Verbatim++ can still exhibit quadratic behavior in certain worst-case scenarios.
A comparison between Coqlex, Verbatim++, OCamllex, and Flex was conducted using a JSON lexer benchmark. The results showed that Coqlex performed reasonably well, while Verbatim++ had significantly slower execution times compared to OCamllex and Flex. The slower performance of Verbatim++ can be attributed to its more complex implementation choices and the potential for quadratic behavior due to the size increase of regex expressions during derivation.
Overall, Coqlex provides a formally verified and efficient solution for lexer generation, while Verbatim++ offers additional optimizations but may suffer from slower execution times in certain cases. Coqlex is a lexer generator that allows users to generate formally verified lexers. It uses a simple implementation approach using Brzozowski derivatives and provides a Coq library for implementing common features used in lexical analysis. The generated lexers have been proven to be correct and can be used in a verified setting. Coqlex lexers have good enough performance for most use cases, but they are slower than standard lexers and other lexer generators like OCamllex and flex. However, the simplicity of the Coqlex lexer specifications allows for easy proof writing and validation of the lexers. Coqlex also allows users to express potentially non-terminating lexers using the "fuel" technique. Verbatim++ is another lexer generator that is slower than Coqlex but provides additional features like handling semantic actions and labels. However, Verbatim++ lexers cannot ignore parts of the input string. In terms of performance, Coqlex is two orders of magnitude slower than OCamllex and flex, but it meets the "good enough" performance bar for most use cases. Coqlex is a lexer generator that is similar to flex or OCamllex. It provides a formally verified way to generate lexers. Coqlex uses two matching policies: longest match and shortest match. The longest match policy selects the rule with the highest l-score, which is the length of the longest prefix that a regexp can match. The shortest match policy selects the rule with the lowest s-score, which is the length of the shortest prefix that a regexp can match. Coqlex also supports function-based rules, which are selected based on the result of applying a function to the input string. The implementation of Coqlex includes optimizations to improve performance, such as simplifying regexps and reducing the number of character reads. Coqlex has been formally verified using Coq and provides proofs of correctness for its matching policies and scoring functions. Coqlex is a lexer generator that is used to generate formally verified lexers. It is implemented in Coq and allows users to describe lexers using lexical rules. The generated lexers are verified for correctness and can be used in programs. The Coqlex generator has three components: the lexer, the parser, and the code printer. The lexer generates tokens from the input string, the parser generates an abstract representation of the lexer, and the code printer generates the final code in Coq. The generated lexers can be used in various projects, including safety-critical ones. Coqlex provides features such as matching policies, semantic actions, and support for regular expressions. The generated code is human-readable and can be reviewed for correctness. Coqlex has been used in an industrial project for optimizing programs that are executed on a vital computer. Coqlex is a tool that generates formally verified lexers. It allows developers to write lexers using sets of lexical rules and provides a function that interprets these rules and produces a lexing function. The lexing functions generated by Coqlex are correct with respect to a standard specification. Coqlex also implements two selection policies for handling multiple matches in the lexing rules. The generated lexers can be easily integrated into formally verified compiler toolchains. Coqlex outperforms Verbatim++ in terms of execution time and provides better usability compared to OCamllex. The source code of Coqlex can be found on the provided link. Coqlex is a tool that generates formally verified lexers, which are important for lexical analysis in compilers and interpreters. Existing lexer implementations lack formal proof of correctness, making it difficult and time-consuming to implement a lexer from scratch. Coqlex provides safety guarantees and is usable in practice, offering trust and replacing more costly certification approaches. Performance comparisons show that Coqlex is two orders of magnitude faster than Verbatim++ and two orders of magnitude slower than OCamllex. Coqlex is designed to combine simplicity and decent performance, and the generated lexers can be reviewed and further properties can be proven by the user. The tool follows a user experience similar to lex/flex or OCamllex and supports most features of standard lexer generators. Coqlex is part of the effort to formally verify compilers, as the formal verification of each component is crucial in the tool-chain. Overall, Coqlex provides a formally verified implementation of lexers with practical applicability.