CliSpec
This is a documentation page for the CliSpec project and showcases some proof of concepts around static analysis of shell scripts.
Shell scripts have a notoriously high degree of dynamism and flexibility making them the number one choice for constructing pipelines of programs. It’s common place to write shell scripts that rely on external assumptions, such as programs being installed, environment variables being set or files existing in specific locations. This externalization often leads to debugging trivial errors at runtime and poor portability. In order to move such errors to compile time, the system needs to be augmented with more information about programs and their arguments. Such information can enable the system to reason about an individual program invocation statically. Take for example the following:
rm file.txt --interactive=true
While the usage of rm is widely known due to it’s prevalence, it may not be obvious that the above invocation is actually guaranteed to error at runtime, simply because “true” is actually not a valid option for the interactive option. Only valid options are: ‘never’ | ‘no’ | ‘none’ |‘once’ | ‘always’ | ‘yes’. Furthermore, with this script alone there is no guarantee that the program will succeed, as it’s unclear whether file.txt exists in this context.
However, by communicating all of the options and valid uses of rm to a static checker, we can get warnings about these things ahead of time. This introduces the concept of specifications which provide a human and machine readable way of interfacing with popular CLI programs. With a collection of specifications stored in a database, a static checker can also start to reason about entire shell scripts. In this context there are a host of different errors to be able to defend against. There’s a whole host of errors that can be checked ahead of time related to the shell interpreter such as environment variables, control flow, behaviour of builtin commands and basic file I/O.