Code Search for Developers
 
 
  

user-guide.tex from CQual at Krugle


Show user-guide.tex syntax highlighted

\documentclass[10pt]{article}

\usepackage{fullpage}

\usepackage{semantic}
\renewcommand{\predicatebegin}{\begin{math}}
\renewcommand{\predicateend}{\end{math}}

\newcommand{\cmdarg}[1]{$\langle$#1$\rangle$}
\newcommand{\carillon}{\textsc{Carillon}}
\newcommand{\centertt}[1]{\begin{center}\tt #1\end{center}}
\newcommand{\chartype}{\texttt{char}}
\newcommand{\confine}{\texttt{confine}}
\newcommand{\const}{\texttt{const}}
\newcommand{\cqual}{\textsc{Cqual}}
\newcommand{\elib}{elib}
\newcommand{\emacs}{\textsc{Emacs}}
\newcommand{\flag}[2]{\smallskip\texttt{#1} & \parbox[t]{4in}{#2} \\}
\newcommand{\inttype}{\texttt{int}}
\newcommand{\iquals}{\textsc{iquals}}
\newcommand{\kernel}{\texttt{\$kernel}}
\newcommand{\locked}{\texttt{\$locked}}
\newcommand{\nonconst}{\texttt{nonconst}}
\newcommand{\option}[2]{\smallskip\texttt{#1} & \parbox[t]{4in}{#2} \\}
\newcommand{\pam}{\textsc{Pam}}
\newcommand{\ptrtype}[1]{\texttt{ptr(}#1\texttt{)}}
\newcommand{\qchartype}[1]{#1\;\texttt{char}}
\newcommand{\qfuntype}[3]{#1\;\texttt{fun (}#2\texttt{) \begin{math}\rightarrow\end{math} } #3}
\newcommand{\qptrtype}[2]{#1\;\texttt{ptr(}#2\texttt{)}}
\newcommand{\qinttype}[1]{#1\;\texttt{int}}
\newcommand{\qvoidtype}[1]{#1\;\texttt{void}}
\newcommand{\restrict}{\texttt{restrict}}
\newcommand{\tainted}{\texttt{\$tainted}}
\newcommand{\textrmit}[1]{\textrm{\textit{#1}}}
\newcommand{\unlocked}{\texttt{\$unlocked}}
\newcommand{\untainted}{\texttt{\$untainted}}
\newcommand{\user}{\texttt{\$user}}
\input{.version}
\newcommand{\volatile}{\texttt{volatile}}
\newcommand{\wf}{\vdash_{\textit{wf}}}

\newenvironment{options}{\bigskip\begin{tabular}{ll}}{\end{tabular}\bigskip}
\newenvironment{flags}{\bigskip\begin{tabular}{ll}}{\end{tabular}\bigskip}

\newcommand{\comment}[1]{\textbf{#1}}
\newcommand{\ignore}[1]{}

\title{\cqual{} User's Guide \\
       Version \version{}}

\author{Jeffrey S. Foster (University of Maryland, College Park) \\
  et al (University of California, Berkeley) \\
        \texttt{cqual@cs.umd.edu}}

\begin{document}

\maketitle{}

Many people have contributed to the development of \cqual{}.  \cqual{}
uses the front-end from David Gay's region compiler \cite{gay:pldi01}
to parse C programs.  Martin Elsman and Alex Aiken worked on
\carillon{} \cite{elsman:carillon99}, an earlier version of this
system written in SML/NJ, which used the type qualifier system of
\cite{foster:pldi99} to find Y2K bugs in C programs.  Umesh Shankar,
Kunal Talwar, and David Wagner used the system to find format-string
bugs in C programs \cite{shankar:usenixsec01}, in the process making a
number of important usability improvements.  Tachio Teruachi and Alex
Aiken helped develop the flow-sensitive portion of \cqual{}.  John
Kodumal and Rob Johnson implemented polymorphic recursive qualifier
inference for \cqual{}, and Rob also enhanced \cqual{} for user/kernel
pointer checking, as well as making numerous other improvements.
Portions of this document are taken from \cite{elsman:carillon99} and
\cite{shankar:usenixsec01}.

\bigskip
\paragraph{Warning:} See
Appendix~\ref{sec:limitations-bugs} for known limitations and bugs.

\vspace*{2in}
\noindent
This documentation is copyright (c) 2001-2003 The Regents of the
University of California.  \cqual{} is distributed without any
warranty.  See the notice in Appendix~\ref{app:copyright} for full
copyright information.

\newpage
{\small \tableofcontents{}}
\newpage

\section{Introduction}

\cqual{} is a type-based analysis tool for finding bugs in C programs.
\cqual{} extends the type system of C with extra user-defined
\textit{type qualifiers}.  The programmer annotates their program in a
few places, and \cqual{} performs \textit{qualifier inference} to
check whether the annotations are correct.  \cqual{} presents the
analysis results either on the command line or in an interactive
\emacs{} buffer.

Earlier versions of \cqual{} written in SML/NJ have been used to
perform \const{}-inference \cite{foster:pldi99} and to find Y2K bugs
\cite{elsman:carillon99}.  The current version of \cqual{} has been
used to detect potential format-string vulnerabilities
\cite{shankar:usenixsec01} and to find locking bugs in the Linux
kernel \cite{foster:pldi02}.  The latest version of \cqual{} adds
support for polymorphic recursive qualifier inference and gated
qualifier edges, which are used for structural qualifier constraints.
This release also includes direct support for checking for user/kernel
pointer errors.

\subsection{cqual and PAM Installation}

The latest version of \cqual{} can be found at
\centertt{http://cqual.sourceforge.net} To unpack \cqual{},
execute the following commands: \centertt{
\begin{tabular}{l}
gunzip cqual-\version{}.tar.gz \\
tar xf cqual-\version{}.tar
\end{tabular}}
\cqual{} will be unpacked into a directory \texttt{cqual-\version{}},
which contains, among other things,
\begin{center}
\begin{tabular}{ll}
\texttt{COPYRIGHT} & The copyright notice \\
\texttt{KERNEL-QUICKSTART} & Guide to using cqual to find user/kernel
errors \\
\texttt{bin}       & Some utilities \\
\texttt{config}    & Sample \cqual{} configuration files \\
\texttt{doc}       & Documentation (contains this file) \\
\texttt{src}       & Source code for \cqual{} \\
\texttt{PAM-3}     & The latest version of PAM
\end{tabular}
\end{center}
The latest version of \pam{} can also be downloaded separately from
\centertt{http://www.cs.berkeley.edu/\~{}chrishtr/pam}.

\noindent
To build \cqual{} and \pam{}, simply \texttt{cd} into the directory,
run \texttt{configure} and then run \texttt{make}: \centertt{
\begin{tabular}{l}
cd cqual-\version{} \\
./configure \\
make
\end{tabular}}
If all goes well, the makefile will build two executables in the
\texttt{src} directory: \texttt{cqual}, the type qualifier inference
system, and \texttt{iquals}, a small tool for experimenting with
qualifier constraints.  The makefile will also build \pam{} and append
some commands to your \texttt{.emacs} file so that you can run
\cqual{} using \pam{}.

While you don't need to run \cqual{} in \pam{} mode, the analysis
results are much easier to understand if you do.  If you wish to run
\cqual{} solely from the command line, you will be able to see some
but not all of the information you could in \pam{} mode.  In
particular, rather than being able to browse through the program on
qualifier paths, you will be presented with paths corresponding to
just the errors.

\subsection{A Small Example}

In this section we present a small example showing how to use \cqual{}
to find a potential format-string vulnerability in a C program.
Consider the following program, which is included in the distribution
as \texttt{examples/taint0.c}:
\centertt{
\begin{tabbing}
char *getenv(const char *name); \\
int printf(const char *fmt, ...); \\
\\
int \= main(void) \\
$\{$ \\
\> char *s, *t; \\
\> s = getenv("LD\_LIBRARY\_PATH"); \\
\> t = s; \\
\> printf(t); \\
$\}$
\end{tabbing}}
This program reads the value of \texttt{LD\_LIBRARY\_PATH} from the
environment and passes it to \texttt{printf} as a format string.  If
an untrusted user can control the environment in which this program is
run, then this program may have a format-string vulnerability.  For
example, if the user sets \texttt{LD\_LIBRARY\_PATH} to a long
sequence of \texttt{\%s}'s, the program will likely seg fault.

By default \cqual{} assumes nothing about the behavior of your
program.\footnote{\cqual{} comes with some default configuration files
  for checking for format-string vulnerabilities and user/kernel
  errors; more on this below.}  In order to start checking for bugs,
we need to annotate the program with extra \textit{type qualifiers}.
For this example we will use two qualifiers.  We will annotate
untrusted strings as \tainted{}, and we will require that
\texttt{printf} take \untainted{} data: \centertt{
\begin{tabbing}
\$tainted char *getenv(const char *name); \\
int printf(\$untainted const char *fmt, ...); \\
\\
int \= main(void) \\
$\{$ \\
\> char *s, *t; \\
\> s = getenv("LD\_LIBRARY\_PATH"); \\
\> t = s; \\
\> printf(t); \\
$\}$
\end{tabbing}}
In \cqual{} all user-defined qualifiers, which we will refer to
\textit{constant qualifiers} or \textit{partial order elements}, begin
with dollar signs.  Notice that we only need to annotate
\texttt{getenv} and \texttt{printf} with type qualifiers.  For this
example \cqual{} will infer that \texttt{s} and \texttt{t} must also
be \tainted{}, and hence will signal a type error: \tainted{} data is
being passed to \texttt{printf}, which requires \untainted{} data.
The presence of a type error indicates a potential format-string
vulnerability.

\subsection{Running cqual}

Assuming that \cqual{} is already installed as described above, you
can run \cqual{} on this example program to see what happens.  From
within \emacs{} type \texttt{M-x cqual} and press return.  Enter the
file name \texttt{examples/taint1.c} (assuming you are in the
top-level \texttt{cqual} directory) and press return.

\cqual{} analyzes the file and brings up a window listing the input
files and the analysis results.  In this case, \cqual{} complains
\begin{verbatim}
/home/rtjohnso/projects/cquals/fields2/cqual/examples/taint1.c:7
incompatible types in assignment
\end{verbatim}
The user interface used to display the analysis results is Program
Analysis Mode (\pam{}), a generic interface for marking up programs in
emacs \cite{harrelson:pam}.  Middle-clicking on a hyperlink with the
mouse or moving the cursor over a hyperlink and pressing \texttt{C-c
  C-l} will follow that link.  Error messages are linked to the
position in the file where the error was generated.

If you middle-click on the error message link you will see a marked-up
display of \texttt{taint1.c}.  Identifiers are colored according to
their inferred qualifiers.  In the default configuration file,
\tainted{} identifiers are colored red, \untainted{} identifiers are
colored green, and identifiers that may contribute to a type error are
colored purple.

Each marked-up identifier is also a hyperlink.  Middle-clicking on an
identifier will show you the type of the identifier, fully annotated
with qualifiers.  For example, middle-clicking on \texttt{t} should
bring up a window showing \centertt{t: \&t ptr (t ptr (*t char))}
The name of the identifier is shown to the left of the colon, and its
inferred type is are shown to the right of the colon.

Here \texttt{t} has the type pointer
to pointer to character.  (We will explain the extra level of
\texttt{ptr} in Section~\ref{sec:l_value}.)  Notice that \cqual{}
writes types from left-to-right using \texttt{ptr} as a type
constructor.

The three hyperlinked names in the type are \textit{qualifier
  variables} (see Section~\ref{sec:quals}).  In this case the
qualifier variable $*t$ (throughout this document we italicize
qualifier variables) is colored purple because it has been inferred to
be both \tainted{} and \untainted{}, an error.

Middle-clicking on a qualifier variable will show you the inferred
value of the qualifier variable and the shortest path on which it was
inferred to have its value.  For example, if you click on $*t$, you
should see the following result:
\begin{verbatim}
*t:  $tainted $untainted

$tainted <= *getenv_ret
         <= *s
         <= *t
         <= *printf_arg1
         <= $untainted
\end{verbatim}
The first line tells us that $*t$ is both \tainted{} and
\untainted{}, an error.  The remaining lines show us an erroneous
path.  We see that $*t$ was tainted from $*s$, which was tainted
from the return type of \texttt{getenv}.  We also see that the error
arises because $*t$ taints the parameter to \texttt{printf}, which
must be untainted.

Middle-clicking on a \texttt{<=} will jump to the source location
where that constraint was generated.  Middle-clicking an a qualifier we
compute the shortest path by which that qualifier was inferred to have
its value.  And shift-middle-clicking on a qualifier will jump to the
source location where the identifier corresponding to that qualifier
was defined.

You can also run \cqual{} on the command line.  If you do so with the
appropriate configuration arguments then \cqual{} will generate the
same error messages, but you will be unable to interactively explore
the analysis results:

\begin{verbatim}
  bash-2.05a$ src/cqual -config config/lattice examples/taint1.c
  Analyzing examples/taint1.c
  examples/taint1.c:1 ``getenv'' used but not defined
  examples/taint1.c:2 ``printf'' used but not defined
  examples/taint1.c:7 incompatible types in assignment
  *s: $tainted $untainted 
  examples/taint1.c:1      $tainted <= *getenv_ret
  examples/taint1.c:7               <= *s
  examples/taint1.c:8               <= *t
  examples/taint1.c:9               <= *printf_arg1
  examples/taint1.c:2               <= $untainted
\end{verbatim}%$
Cqual also lists the globals that are used but not defined; see
Section~\ref{sec:multiple-files}.

\cqual{} comes with a standard \textit{prelude file} that contains
declarations of standard-library functions that have been annotated
with \tainted{} and \untainted{}.  See Section~\ref{sec:prelude} for a
discussion of prelude files, and Section~\ref{sec:change_analysis} for
instructions on how to invoke \cqual{} in \pam{} mode with the
standard prelude file.

\subsection{Another Example:  Structural Qualifiers}

\label{sec:example-structural}

In the previous example, we saw how format-string vulnerabilities can
occur when a program uses untrusted data in certain positions, namely
as format strings.  Strings are particularly simple, because they're
flat sequences of characters.  In programs that have trusted and
untrusted data structures and pointers, the rules are more
complicated, and we need to extend our type qualifiers with additional
\textit{structural} constraints.  As an example of such a system,
consider the Linux operating system kernel, which copies data
structures between user space (which we will annotate with \user{} and
consider untrusted) and kernel space (which we will annotate with
\kernel{} and consider trusted).

The following program (available as \texttt{examples/user0.c}) has a
user/kernel pointer bug, meaning that data copied from user space is
improperly trusted, and hence a malicious local user could breach
security:

\begin{verbatim}
unsigned long copy_from_user(void $user * $kernel to, void * $user from, unsigned long n);
$$a _op_deref ($$a *$kernel x);

struct msg {
  char *buf;
};

void dev_ioctl(long arg)
{
  struct msg m;
  char c;

  copy_from_user(&m, (void*)arg, sizeof (m));
  c = m.buf[0];
}
\end{verbatim}

After the call to \texttt{copy\_from\_user}, the contents of \texttt{m}
are under user control.  Thus \texttt{m.buf} is under user control,
and hence should be sanity checked before being dereferenced.  The
statement ``\texttt{c = m.buf[0];}'' unsafely dereferences
\texttt{m.buf}, and hence is an error.  The annotations capture these
rules:
\begin{itemize}
\item The annotation ``\texttt{void \$user * \$kernel to}'' indicates that
the first argument to \texttt{copy\_from\_user} must be a kernel
pointer, but that its contents are under user control.
\item The annotation ``\texttt{\$\$a \_op\_deref (\$\$a *\$kernel x);}''
declares that only kernel pointers can be dereferenced.  For more on
this annotation, see Section~\ref{sec:operators}.
\end{itemize}
Moreover, once we infer that \texttt{m} has the \user{} qualifier, we
also infer, via structural constraints, that its \texttt{m.buf} must 
also be annotated with \user{}.  For \user{} and \kernel{}, \cqual{} 
enforces \textit{structural} rules
like, ``If a structure is \user{}, then so are all of its fields,''
and ``A \user{} pointer can only point to \user{} data.''  For more on
structural constraints see Section~\ref{sec:structural}.

For more on checking for user/kernel pointer errors in the Linux
kernel, see \texttt{KERNEL-QUICKSTART}.

\subsection{A Flow-Sensitive Example}

The qualifiers \tainted{} and \untainted{} are
\textit{flow-insensitive},  meaning that a variable's taintedness does
not change during program execution.  I.e., if \texttt{x} is inferred
to be \tainted{}, then it is \tainted{} everywhere.

Sometimes this flow-insensitive restriction makes it difficult to
apply type qualifiers to certain checking problems.  For example, if
we want to use qualifiers to keep track of state changes, then we need
\textit{flow-sensitivity}, i.e., we need qualifiers that can change as
the state changes.  For example, consider the following program, which
can be found in \texttt{examples/lock.c}:
\begin{verbatim}
typedef int lock_t;

lock_t lock;

int main(void)
{
  lock = ($unlocked lock_t) 0;
  lock;
  lock = ($locked lock_t) 1;
  lock;
  lock = ($unlocked lock_t) 0;
  lock;
}
\end{verbatim}%$
In this case, we want to use qualifiers \locked{} and \unlocked{} to
keep track of whether this thread last left \texttt{lock} in the
locked or unlocked state.

In order to analyze this example, we need to tell \cqual{} that
\locked{} and \unlocked{} should be modeled flow-sensitively.  That's
already taken care of in the default configuration files, so to try
out this example type \texttt{M-x cqual} within \emacs{}, press
return, and then enter the file name \texttt{examples/lock.c} and
press return again.

As before, \cqual{} analyzes the program.  This time there are no type
errors.  If you click on the file name, \cqual{} will display the
source code colored according to the inferred qualifiers.  In this
case, \cqual{} colors \texttt{lock} green wherever it is unlocked, and
red wherever it is locked.

If you click on the various occurrences of \texttt{lock}, you can see
its type and its qualifiers.  Notice that name of the qualifier
\texttt{lock} points to changes after an assignment.  Initially it is
$\textit{lock}$, then it is $\textit{lock}@0$, and so on.

\textbf{Warning:} This version of cqual has been enhanced with support
for polymorphic recursion, gated qualifier edges for structural
constraints, and better handling of type casts.  These features,
however, are not supported by the flow-sensitive type qualifier
inference system, and as a result flow-sensitive \cqual{} may fail on
some programs.  For the time being, we recommend using version 0.98 of
\cqual{} for flow-sensitive analysis.

\subsection{$l$-values and $r$-values}

\label{sec:l_value}

In C there is an important distinction between $l$-values, which
correspond to memory locations, and $r$-values, which are ordinary
values like integers.  In the C type system, $l$-values and $r$-values
are given the same type.  For example, consider the following code:
\centertt{
\begin{tabular}{l}
int x; \\
x = ...; \\
... = x;
\end{tabular}}
The first line declares that \texttt{x} is a location containing an
integer.  On the second line \texttt{x} is used as an $l$-value: it
appears on the left-hand side of an assignment, meaning that the
location corresponding to \texttt{x} should be updated.  On the third
line \texttt{x} is used as an $r$-value.  Here when we use \texttt{x}
as an $r$-value we are not referring to the location \texttt{x}, but
to \texttt{x}'s contents.  In the C type system, \texttt{x} is given
the type \texttt{int} in both places, and the syntax distinguishes
integers that are $l$-values from integers that are $r$-values.

\cqual{} uses a slightly different approach in which the types
distinguish $l$-values and $r$-values.  In \cqual{}, \texttt{x} is
given the type \texttt{ptr(int)}, meaning that the name \texttt{x} is
a location containing an integer.  When \texttt{x} is used as an
$l$-value its type stays the same---in \cqual{}, the left-hand side of
an assignment is always a \texttt{ptr} type.  When \texttt{x} is used
as an $r$-value the outermost \texttt{ptr} is removed, i.e.,
\texttt{x} as an $r$-value has the type \texttt{int}.  \cqual{} is
implemented in this way both because it makes the implementation
cleaner in a number of ways and because it makes \const{} easier to
understand \cite{foster:pldi99}.

In more concrete terms, if you click on an identifier \texttt{a} that
can be used as an $l$-value you will see \texttt{a}'s type as an
$l$-value, i.e., with an extra \texttt{ptr} at the top-level.  For
most purposes you can safely ignore this extra level of indirection.

\section{Type Qualifiers}

\label{sec:quals}

\cqual{} is a type-based analysis tool.  As described above, to use
\cqual{} the programmer annotates their program with extra type
qualifiers.  \cqual{} type checks the program and warns the programmer
about any inconsistent type qualifier annotations, which indicate
potential bugs.

In the rest of this section we discuss what type qualifiers are and
how \cqual{} checks for inconsistent qualifier annotations.
Section~\ref{sec:c} describes how \cqual{} applies these ideas to C.

\subsection{Qualifiers and Subtyping}

\cqual{} extends the type system of C to work over \textit{qualified
types}, which are the combination of some number of type qualifiers
with a standard C type.  We allow type qualifiers to appear on every
level of a type.  Here are some examples of qualified types:
\begin{center}
\begin{tabular}{ll}
$\inttype{}$ & Integer \\
$\locked{}\;\texttt{lock\_t}$ & Acquired lock \\
$\ptrtype{\untainted{}\;\chartype{}}$ & Pointer to untainted character \\
%$\qptrtype{\untainted{}}{\chartype{}}$ & Untainted pointer to character
$\qptrtype{\user{}}{\chartype{}}$ & User-level pointer to character
\end{tabular}
\end{center}

In general, the rules for checking that type qualifiers are valid can
be arbitrary, and indeed, the source code of \cqual{} can be modified
to support qualifiers with arbitrary meanings.  The key insight behind
\cqual{}, however, is that many kinds of type qualifiers naturally
induce a \textit{subtyping} relationship on qualified types.  The
notion of subtyping most commonly appears in object-oriented
programming.  In Java, for example, if $B$ is a subclass of $A$ (which
we will write $B<A$), then an object of class $B$ can be used wherever
an object of class $A$ is expected.

For example, consider the following program, which uses the \tainted{}
and \untainted{} qualifiers introduced above:
\centertt{
\begin{tabular}{l}
void f(\tainted{} int); \\
\untainted{} int a; \\
f(a); \\
\end{tabular}}
In this program, \texttt{f}, which expects tainted data, is passed
untainted data.  This program should type check.  Intuitively, if a
function can accept tainted data (presumably by doing more checks on
its input), then it can certainly accept untainted data.

Now consider another program:
\centertt{
\begin{tabular}{l}
void g(\untainted{} int); \\
\tainted{} int b; \\
g(b); \\
\end{tabular}}
In this program, \texttt{g} is declared to take an
\texttt{\untainted{} int} as input.  Then \texttt{g} is called with a
\texttt{\tainted{} int} as a parameter.  This program should fail to
type check, since tainted data is being passed to a function that
expects untainted data.

Putting these two examples together, we have the following subtyping
relation:
\[
\untainted{}\;\inttype{} < \tainted{}\;\inttype{}
\]
As in object-oriented programming, if $T_1 \leq T_2$ (read $T_1$ is a
subtype of $T_2$), then $T_1$ can be used wherever $T_2$ is expected,
but not vice-versa.  We write $T_1 < T_2$ if $T_1 \leq T_2$ and $T_1
\neq T_2$.

On the other hand, consider \locked{} and \unlocked{}.  It is an error
for a lock to be in both the \locked{} and \unlocked{} state, so these
qualifiers are in the discrete partial order: Neither
$\locked{}\not<\unlocked{}$ nor $\unlocked{}\not<\locked{}$.
(Alternately, we could add a third qualifiers $\top{}$ and have
$\locked{}<\top$ and $\unlocked<\top$.)

\subsection{Qualified Types}

\label{sec:qtypes}

\cqual{} needs to know not only how integer types with qualifiers
relate but also how qualifiers affect pointer types,
pointer-to-pointer types, function types, and so on.  Fortunately,
well-known results on subtyping tell us how to extend the subtyping on
integers to other data types.

The programmer supplies \cqual{} with a configuration file describing
a partial order of type qualifiers (see Section~\ref{sec:po_file} for
the file format).  Right now cqual supports any partial order that is
a lattice (a lattice is a partial order where for each pair of
elements $x$ and $y$, the least upper bound and greatest lower bound
of $x$ and $y$ both always exist.  For example, the qualifiers
\tainted{} and \untainted{} with the partial order $\untainted{} <
\tainted{}$ form a lattice.)  \cqual{} also supports the discrete
partial orders, and any of the three-point partial orders.  Other
partial orders may or may not work correctly.

Given the partial order configuration file, \cqual{} extends the
partial order on qualifiers to a subtyping relation on qualified
types.  We have already seen one of the subtyping rules:
\[
\inference{q_1 \leq q_2}
          {\qinttype{q_1} \leq \qinttype{q_2}}
\]
This is a natural-deduction style inference rule, read as follows: If
$q_1 \leq q_2$ in the partial order ($q_1$ and $q_2$ are qualifiers),
then $\qinttype{q_1}$ is a subtype of $\qinttype{q_2}$ (note the
overloading of $\leq$).  For our example, it means that
$\qinttype{\untainted{}} \leq \qinttype{\tainted{}}$.  The same kind
of rule applies to any primitive type (\texttt{char}, \texttt{double},
etc.).

For pointer types, we need to be a little careful.  Naively, we might
expect to use the following rule for pointers:
\[
\inference[(Wrong)]{q_1 \leq q_2 & \tau_1 \leq \tau_2}
                 {\qptrtype{q_1}{\tau_1} \leq \qptrtype{q_2}{\tau_2}}
\]
Here the type $\qptrtype{q_1}{\tau_1}$ is a pointer to type $\tau_1$,
and the pointer is qualified with $q_1$.  Unfortunately, this turns
out to be unsound, as illustrated by the following code fragment:
\begin{verbatim}
  tainted char *t;
  untainted char *u;

  t = u;                /* Allowed by (Wrong) */
  *t = <tainted data>;  /* tainted data written into untainted *u */
\end{verbatim}
According to (Wrong), the first assignment \texttt{t = u} typechecks,
because $\ptrtype{\qchartype{\untainted}}$ is a subtype of
$\ptrtype{\qchartype{\tainted}}$.  But then after the assignment
\texttt{*t} is an alias of \texttt{*u}, yet they have different types.
Therefore we can store \tainted{} data into \texttt{*u} by going
through \texttt{*t}, even though \texttt{*u} is supposed to be
untainted.

This is a well-known problem, and the standard solution, which is
followed by \cqual{}, is to use the following rule:
\[
\inference{q_1 \leq q_2 & \tau_1 = \tau_2}
          {\qptrtype{q_1}{\tau_1} \leq \qptrtype{q_2}{\tau_2}}
\]
Here we require $\tau_1 = \tau_2$, which intuitively means that any
two objects that may be aliased must be given exactly the same type.
In particular, if $\tau_1$ and $\tau_2$ are decorated with qualifiers,
the qualifiers must themselves match exactly, too.  This equality,
while sound, is sometimes too conservative in practice.
Section~\ref{sec:deep-const} describes how \const{} can be used to
weaken the equality to an inequality.

For function types, we use the following standard rule:
\[
\inference{q \leq q' &
           \tau'_1 \leq \tau_1 \quad\cdots\quad \tau'_n \leq \tau_n &
           \tau \leq \tau'}
          {\qfuntype{q}{\tau_1, \ldots, \tau_n}{\tau} \leq
                \qfuntype{q'}{\tau'_1, \ldots, \tau'_n}{\tau'}}
\]
Here the type $\qfuntype{q}{\tau_1,\ldots,\tau_n}{\tau}$ is a function,
qualified by $q$, with argument types $\tau_1$ through $\tau_n$ and
result type $\tau$.

\subsection{Structural Qualifiers}

\label{sec:structural}

In Section ~\ref{sec:example-structural}, we saw how some qualifiers
come with well-formedness conditions on the types they decorate.  We
write $\wf \tau$ to mean that type $\tau$ is well-formed; \cqual{}
requires that all types in the program be well-formed.  Each qualifier
partial order comes with a set of well-formedness conditions.  For
example, a partial order may specify that its qualifiers flow from
outer to inner pointer constructors or vice-versa.  Formally, \cqual{}
enforces the rule
\[
\inference{\wf \tau &
           \left\{
             {\begin{array}{rl}
               c \leq q \rightarrow c \leq q' &
               \hbox{if $c$'s p.o. flows down pointers} \\
               q \leq c \rightarrow q' \leq c &
               \hbox{if $c$'s p.o. flows down pointers} \\
               c \leq q' \rightarrow c \leq q &
               \hbox{if $c$'s p.o. flows up pointers} \\
               q' \leq c \rightarrow q \leq c &
               \hbox{if $c$'s p.o. flows up pointers}
             \end{array}}
           \right.
         }
         {\wf \qptrtype{q}{q'\;\tau}}
\]
This rule states that a pointer-to-pointer-to-$\tau$ type is
well-formed if type $\tau$ is well-formed.  Additionally, for any
qualifier $c$ of a partial order whose qualifiers flow ``down''
pointers, constraints on the outer pointer type propagate to the
inner pointer type, and similarly for the case when qualifiers flow
``up'' pointers.  For example, the \user{} qualifier from
Section~\ref{sec:example-structural} flows down pointers.

\cqual{} also provides support for two other well-formedness
conditions.  In a similar manner as above, the user may specify that
qualifiers propagate from pointers-to-aggregates (structures and
unions) to the corresponding pointers-to-fields or vice-versa, and
separately, the user may specify that qualifiers propagate from
aggregates (structures and unions) to their fields or vice-versa.

\subsection{Qualifier Inference}

\label{sec:inference}

Given the partial order configuration file, \cqual{} extends the
qualifier partial order to a subtyping relation among qualified types
as described above.  The next problem is to determine whether a
program is type correct or not, i.e., whether the qualifier
annotations are valid.

\cqual{} checks a program's correctness by performing
\textit{qualifier inference}.  Rather than requiring the programmer to
specify type qualifiers on every type in the program, using \cqual{}
the programmer can sprinkle a few qualifier annotations through the
program, and \cqual{} will infer the remaining qualifiers.  It is this
qualifier inference process that makes \cqual{} easy to use.

\cqual{} begins by adding fresh qualifier variables to every level of
every type in the program.  A qualifier variable stands for an unknown
qualifier.  For any explicit qualifier annotations in the program,
\cqual{} generates the appropriate constraint on the corresponding
qualifier variable (see Section~\ref{sec:po_file}).  Next \cqual{}
walks over the program and generates constraints between qualified
types.  For example, for an assignment \texttt{x = y}, \cqual{}
generates the constraint that the type of \texttt{y} is a subtype of
the type of \texttt{x}.  For a function call \texttt{f(x)}, \cqual{}
generates the constraint that the type of \texttt{x} is a subtype of
the type of the formal parameter of \texttt{f}.

Applying the subtyping rules from above, these constraints between
types yield constraints between qualifiers (variables and partial
order elements).  More formally, we are left with a set of constraints
of the form $q_1 \leq q_2$, where each $q_i$ is either a qualifier
variable or a partial order element.

\subsection{Flow-Sensitive Type Qualifiers}

\label{sec:flow-sensitive}

(If you are only interested in flow-insensitive type qualifiers, such
as used in the tainted/untainted analysis or the user/kernel analysis,
you may safely skip this section.)

The qualifier system described so far is \textit{flow-insensitive}.
For example, if we declare \texttt{x} to be an integer, then the
contents of \texttt{x} is assigned a single type $\qinttype{q}$ for
the whole program execution.  For example, in
\centertt{
\begin{tabular}{l}
/* x\hbox{ has type }$\qinttype{q}$ */ \\
x = ...; \\
/* x\hbox{ still has type }$\qinttype{q}$ */
\end{tabular}
} the contents of \texttt{x} (here we are ignoring the
$l$-value/$r$-value distinction) has the same qualifier $q$ before and
after the assignment.  For checking some properties, such as keeping
track of the state of locks, we need \textit{flow-sensitive} type
qualifiers.

\cqual{} supports flow-sensitive type qualifier inference, as
described in \cite{foster:pldi02}.  Each qualifier partial order may
either be flow-insensitive, the default, or it may be flow-sensitive,
as declared in the partial order configuration file
(Section~\ref{sec:po_file}).

The flow-sensitive analysis consists of two separate passes over the
source code.  In the first pass, \cqual{} performs flow-insensitive
alias analysis and effect inference.  This pass is done at the same
time as flow-insensitive qualifier inference.  In the second pass,
\cqual{} uses the results of the first pass to help perform
flow-sensitive qualifier inference.

During flow-sensitive analysis, qualifiers on variables may change
after an assignment:
\centertt{
\begin{tabular}{l}
/* x\hbox{ has type }$\qinttype{q}$ */ \\
x = ...; \\
/* x\hbox{ now has type }$\qinttype{q'}$ */
\end{tabular}
}

\subsubsection{Aliasing}

Not every assignment in C is a simple variable assignment of the form
shown above---updates can also occur indirectly, through pointers.
\cqual{} performs a unification-based, flow-insensitive alias analysis
to compute an approximation to the aliasing behavior of the program.
The alias analysis computes, for each pointer-valued expression $e$ in
the program, the set of \textit{locations} (either stack variables or
heap memory) to which $e$ may point.  The basic rule of the alias
analysis is that given an assignment between pointers \texttt{x = y},
we unify (equate) the locations to which \texttt{x} and \texttt{y} can
point.  Formally, if \texttt{x} points to location $\rho_x$ and
\texttt{y} points to location $\rho_y$, then upon seeing the
assignment \texttt{x = y} we require $\rho_x = \rho_y$.

By using the results of alias analysis, \cqual{} can track the effect
of indirect updates, e.g.,
\centertt{
\begin{tabular}{l}
y = \&x; \\
/* x\hbox{ has type }$\qinttype{q}$ */ \\[0pt]
*y = ...; \\
/* x\hbox{ now has type }$\qinttype{q'}$ */
\end{tabular}
}

Because the alias analysis is flow-insensitive, sometimes it will
produce unexpected results.  For example, the analysis will assume
that \texttt{y} points to $\rho_x$, the location of \texttt{x}, no
matter where the assignment \texttt{y = \&x} actually occurs.  The
alias analysis does not track null pointers, and hence does not check
for null pointer dereference statically.

In the above example, \texttt{y} pointed to exactly one location
$\rho_x$.  In this case we say that $\rho_x$ is \textit{linear}, and
the flow-sensitive analysis allows \textit{strong updates} on $\rho_x$
(at assignments to $\rho_x$ it is given a new qualifier).

But what if the alias analysis determines that \texttt{y} may point to
more than one location, say both \texttt{x} and \texttt{z}?  Then the
alias analysis will say that \texttt{y} points to $\rho_x$ where
$\rho_x = \rho_z$ ($\rho_x$ is the location of \texttt{x} and $\rho_z$
is the location of \texttt{z}).  In this case, we say that $\rho_x$ is
non-linear, because it may represent more than one location.

Then at the assignment \texttt{*y = ...} we can't distinguish which
location we are updating.  Thus the flow-sensitive inference gives
$\rho_x$, which stands for both \texttt{x} and \texttt{z}, the type
$\qinttype{q''}$ after the assignment with constraints $q\leq q''$ and
$q'\leq q''$.  Intuitively, this means that the qualifier of location
$\rho_x$ is either $q$ or $q'$.  This is called a \textit{weak
  update}.

\subsubsection{Restrict}

\label{sec:restrict}

Clearly weak updates can cause the analysis to lose precision.
\cqual{} supports two language constructs to expose the alias analysis
to programmer control.  The idea behind both constructs is to
introduce a lexical scope in which a non-linear location can locally
be treated as linear.

In an idealized syntax, the \restrict{} construct has the form
\centertt{ restrict x = e1 in e2 } 
In our C notation, this construct will be written using the ANSI C
qualifier \restrict{} \cite{c99}:
\begin{verbatim}
  {
    T *const restrict x = e1;
    e2;
  }
\end{verbatim}
(The \texttt{const} needs to be there so that \texttt{x} is not
modified within the scope of the declaration.)

In this construct, \texttt{e1} is a pointer to some location $\rho$.
The name \texttt{x}, which can be used during evaluation of
\texttt{e2}, is initialized to \texttt{e1}, but it is given a fresh
location $\rho_x$.  At the beginning and end of \restrict{}, $\rho$
and $\rho_x$ have the same type.

The key property of \restrict{} is that within \texttt{e2}, the
location $\rho_x$ may be accessed, but the location $\rho$ may not.
Outside of \texttt{e2} the reverse is true: $\rho$ may be accessed,
but $\rho_x$ may not.  This property is enforced automatically by
\cqual{}.

Since the accesses within \texttt{e2} go through location $\rho_x$,
notice that the flow-sensitive qualifier inference may be able to
treat $\rho_x$ as a linear location even if $\rho$ is non-linear.  When
the scope of \texttt{e2} ends the analysis may need to weakly
update $\rho$.

For example, suppose we want to lock and then unlock a single array
element:
\begin{verbatim}
  spin_lock(&foo[i].lock);
  ...
  spin_unlock(&foo[i].lock);
\end{verbatim}
Then because the alias analysis does not distinguish array elements,
both the lock acquire and release will be weak updates, and the
analysis will conclude that \texttt{foo[i].lock} is both locked and
unlocked, which is an error in the supplied partial order
configuration file.

But we can use \restrict{} to introduce a new name, and hence a new
location, for \texttt{foo[i].lock}:
\begin{verbatim}
  {
    spinlock_t *const restrict l = &foo[i].lock;
    spin_lock(l);
    ...
    spin_unlock(l);
  }
\end{verbatim}
Assuming no other array elements are used within this scope,
\texttt{l} can be strongly updated to first be locked and then be
unlocked.  When the scope ends, the analysis will do a weak update
from the final state of \texttt{l} (unlocked) to the state of the
array.

The name \restrict{} is deliberately chosen to correspond to the ANSI
C qualifier; see \cite{foster:tr01_restrict} for a discussion.

\subsubsection{Confine}

While \restrict{} can be used to locally recover strong updates,
sometimes it is inconvenient, as it requires the programmer to come up
with a new name.  \cqual{} also includes a construct \confine{} that
allows expressions to be restricted without introducing a new name.
The syntax is \centertt{confine (e1) s2} Here \texttt{e1} is an
expression that occurs within statement \texttt{s2}.  As with
\restrict{}, the expression \texttt{e1} must evaluate to a pointer to
some location $\rho$.  Within \texttt{s2}, the analysis treats
occurrences of \texttt{e1} as pointing to a fresh location $\rho'$.
As before, location $\rho$ may only be accessed outside of
\texttt{s2}, and location $\rho'$ may only be accessed within
\texttt{s2}.  At the beginning and end of \confine{}, $\rho$ and
$\rho'$ have the same type.

The key to making this sound is that \texttt{e1} must not contain any
side-effects, and the value of \texttt{e1} must not change during
evaluation of \texttt{s2}.  As before this is checked automatically by
\cqual{}.

Going back to the last example in the previous section, with
\confine{} we can more conveniently annotate the program as
\begin{verbatim}
  confine (&foo[i].lock) {
    spin_lock(&foo[i].lock);
    ...
    spin_unlock(&foo[i].lock);
  }
\end{verbatim}
In this case \cqual{} will check that neither \texttt{foo} nor
\texttt{i} changes during the evaluation of the \texttt{...}'s, and
that none of the other aliases of \texttt{foo[i].lock} is changed in
the scope of \texttt{confine}.

For further explanation of \texttt{restrict} and \texttt{confine}, see
\cite{foster:pldi02, aiken:pldi03}.  As described in
\cite{aiken:pldi03}, \cqual{} also supports automatic inference for
\texttt{restrict} and \texttt{confine}, but only experimentally, and
not currently as part of the regular system.

\subsection{Browsing Qualifier Inference Results with PAM}

\cqual{} represents constraints between qualifiers as a directed graph
whose nodes are qualifier variables and partial order elements.  For
each constraint $q_1 \leq q_2$, there is an edge from $q_1$ to $q_2$
in the graph.  After all the constraints have been generated, \cqual{}
solves the constraints and, if there is an error, performs a second
pass over the constraint graph to isolate the most useful error
messages to display to the user.  For example, if \cqual{} ever
generates the constraint $\tainted{} \leq \untainted{}$ then it will
signal an error.

If you run \cqual{} with \pam{}, then once \cqual{} has completed
qualifier inference you will be able to browse the inference results.
There are a few important things to know about this browsing
interface.

When displaying a source program, \cqual{} colors each identifier
according to its inferred qualifiers.  Currently, \cqual{} colors an
identifier \texttt{x} by computing the colors of all partial order
elements reachable in the constraint graph from any of \texttt{x}'s
qualifier variables.  If there is one such color, \cqual{} uses it to
color \texttt{x}.  If there is more than one such color, \cqual{}
colors \texttt{x} purple.  \cqual{} colors individual qualifiers
similarly.  When computing colors, \cqual{} does not include the
qualifiers on fields of structures and unions, or on argument or
result types of functions.

When you click on a qualifier variable $q$, \cqual{} tries to show you
how $q$'s color was inferred.  If no partial order elements are
reachable from $q$ in the constraint graph, \cqual{} prints \texttt{No
  qualifiers}.  Otherwise \cqual{} displays the shortest path from any
partial order element to $q$, and from $q$ to any partial order
element.

The shortest path algorithm really works best with lattices, and it
should also work with the discrete partial orders.  Your luck with
other partial orders may vary.

\section{Applying Type Qualifiers to C}

\label{sec:c}

\subsection{Names}

As described in Section~\ref{sec:inference}, \cqual{} introduces
qualifier variables at every position in a type.

Qualifier variables are named after the corresponding program
variable.  For an identifier \texttt{x}, the outermost qualifier on
\texttt{x}'s type is given the name \textit{x}.  The names of
qualifiers on nested \texttt{ptr} types are constructed by appending
$'$ to the name of the qualifier from the outer type.  For example,
given the declaration \texttt{char *x}, the $l$-value \texttt{x} is
given the type $\qptrtype{\&x}{\qptrtype{x}{\qchartype{*x}}}$.

The $i^{\hbox{th}}$ argument (starting with one) of function
\texttt{f} has associated qualifier variable \textit{f\_arg}i, and the
return value of function \texttt{f} has qualifier variable
\textit{f\_ret}.

When parsing a C program, \cqual{} assumes that any identifier
beginning with a dollar sign (\texttt{\$}) is a type qualifier (e.g.,
\tainted{}, \untainted{}).  Constant qualifiers appearing in a program
must be declared in the partial order configuration file
(Section~\ref{sec:po_file}).  Qualifier variables are not normally
added to the program explicitly, except in the case of polymorphism
(Section~\ref{sec:polymorphism}).

\subsection{Source Code Considerations}

\cqual{} accepts standard pre-processed source code and performs most
C type checking.  Currently error messages from the parser and
standard C type checker are not displayed in \pam{} mode.  If you wish
to view the parser error messages in \pam{} mode, switch to the
\texttt{*pam-results-buf*} buffer.

\subsubsection{Multiple Files}

\label{sec:multiple-files}

\cqual{} can analyze single files at a time or whole programs at once.
Recall that \cqual{} assigns fresh qualifier variables to every level
of every type in the program.  In particular, if a function \texttt{f}
is declared with no explicit type qualifiers and is not defined
anywhere, \cqual{} assumes that the body of \texttt{f} places no
constraints on \texttt{f}'s type qualifiers.

Thus, in general, it is best to run \cqual{} on a whole program rather
than on individual files, unless you are careful to fully annotate the
types of every declared function.  For example, suppose we have two
source files \texttt{file1.c} and \texttt{file2.c}: \centertt{
\begin{tabular}{|ll|ll|}
\hline
file1: & char *foo(void);               & file2: & char *foo(void) $\{$ \\
       & void bar(void) $\{$            &        & \hspace*{1em}\$tainted char *t; \\
       & \hspace*{1em} char *s = foo(); &        & \hspace*{1em}return t;\\
       & \hspace*{1em} ...              &        & $\}$ \\
       & $\}$                           &        & \\ \hline
\end{tabular}}
Because \texttt{foo}'s type has no explicit qualifiers, we will only
discover that \texttt{s} is tainted if we analyze both files
together.

In order to help avoid this problem, \cqual{} generates a list of
functions that are declared but not defined.  In \pam{} mode this list
is available by clicking on \texttt{Undefined Globals}.  If a function
is declared in a prelude file (Section~\ref{sec:prelude}) then is it
not added to the undefined globals list.

To specify multiple input files to \cqual{}, simply list them on the
command line.  When invoking \cqual{} in \pam{} mode you can only
enter one file.  In this case \cqual{} will expand the input file name
using \texttt{glob}, which allows you to specify a set of input files
using wildcards (for example, you can analyze \texttt{foo/*.c}).

\subsubsection{Pre-Processed Source}

\cqual{} is designed to run on pre-processed source code.  If you
invoke \cqual{} from the command line then you can use standard
pre-processed source from \texttt{gcc -E}.  The standard output from
gcc -E can be processed by cqual, and the line numbers of error
messages will be correct with respect to the original,
non-pre-processed source code.  PAM mode currently ignores
\texttt{\#line} directives in the preprocessed source.  Thus, when
using \pam, you will be browsing the pre-processed source and the
line numbers of errors will, of course, be the line numbers of the
pre-processed code.  In practice this is rarely a problem.

\ignore{
The easiest way remove \texttt{\#line} directives is to edit the
Makefile of your program to invoke the \texttt{bin/gccpreproc} script
to compile your program instead of \texttt{gcc}.  You can usually do
this by looking for a line \texttt{CC = ...} in your Makefile and
replacing it with \centertt{ CC =
  \textit{$\langle$path-to-cqual$\rangle$}/bin/gccpreproc } If you need to compile
your source with a compiler other than \texttt{gcc}, edit the last
line of \texttt{gccpreproc} appropriately.  After you run
\texttt{make} with \texttt{CC = gccpreproc}, you should be left with a
set of \texttt{.i} files containing the preprocessed source for your
program.

Alternately, if \texttt{gccpreproc} doesn't work, you can modify your
Makefile to add a rule for compiling a \texttt{.c} file into a
\texttt{.o} file, in the process saving the pre-processed output in a
\texttt{.i} file:
\begin{verbatim}
   .c.o:
           $(CC) -E $< | remblanks > $*.ii
           perl remquals < $*.ii > $*.i
           $(CC) $(CFLAGS) -c -o $*.o $*.i
           mv -f $*.ii $*.i
\end{verbatim}%$
}

The wrapper script, \texttt{gcqual}, makes cqual behave more like a
regular compiler: it accepts non-pre-processed source files,
preprocesses them, and calls cqual.  It is the preferred interface to
cqual.  For more info, see Section~\ref{sec:gcqual}.  The program
\texttt{remblanks}, provided in the \texttt{bin} directory, strips out
all \texttt{\#line} directives.  The perl script \texttt{remquals},
also provided in the \texttt{bin} directory, strips out all
identifiers beginning with a dollar sign, i.e., anything that might be
a type qualifier.

\subsubsection{Flow-Sensitivity}

In \cqual{}, a set of qualifiers forming a partial order can be
declared to be flow-sensitive in the partial order configuration file
(Section~\ref{sec:po_file}).  Flow-sensitive analysis is an additional
step, so to enable flow-sensitive analysis you also need to run
\cqual{} with the \texttt{-fflow-sensitive} option.  If you do not
need flow-sensitivity, you should probably not use
\texttt{-fflow-sensitive} and you should probably comment \restrict{}
out of your partial order configuration file, because having either of
these will cause \cqual{} to consume more resources.

\cqual{}'s alias analysis is based on the C types, hence casts can
introduce unsoundness into the alias analysis.  E.g., given an
assignment \texttt{x = (void *) y}, we do not assume that \texttt{x}
and \texttt{y} point to the same location.  As with qualifiers, the
locations of structure fields are shared across instances of the same
struct (Section~\ref{sec:structures}).  Thus if you cast a pointer to
a structure to \texttt{void *} and then back to its original type, the
locations of the fields, and their qualifiers, will be preserved.

\cqual{} adds two extra forms to C to make flow-sensitive type
annotations a bit easier:

\begin{itemize}
\item \texttt{change\_type(e, T);} is a statement that updates the type
  of $l$-value \texttt{e} to have type \texttt{T}.  This statement is
  equivalent to the assignment \texttt{e =
    \textit{$\langle$something-of-type-T$\rangle$};}, except that you don't need to
  come up with an expression for the right-hand side, only the type.
\item \texttt{assert\_type(e, T);} is a statement that checks whether
  the $r$-value of \texttt{e} has type \texttt{T}.  Alternately,
  instead of using \texttt{assert\_type} you can declare a variable
  \texttt{x} to have type \texttt{T} and try to initialize it to
  \texttt{e}.
\end{itemize}

The flow-sensitive analysis is monomorphic, hence any polymorphic
qualifier declarations are ignored during flow-sensitive analysis.
The \texttt{-fcasts-preserve} flag also is not implemented for
flow-sensitive analysis.

\subsubsection{Type Casts}

\label{sec:casts}

By default \cqual{} tries to keep track of qualifiers even in the
presence of type casts.  For example, consider the following program:
\centertt{
\begin{tabular}{l}
\$tainted char **s; \\
void *t; \\
\\
t = (void *) s;
\end{tabular}}
When \texttt{s} is used as an $r$-value at the cast, it is used with
the type $\qptrtype{s}{\qptrtype{*\!\!s}{\qchartype{*\!\!*s}}}$.  The
$r$-value of \texttt{t} has the type $\qptrtype{t}{\qvoidtype{*\!t}}$.
Because of the typecast, normally \cqual{} will generate the
constraint $s \leq t$ but it will generate no constraints between
deeper qualifiers of \texttt{s} and \texttt{t}.

In some cases, however, qualifiers should propagate through type
casts.  Each qualifier partial order can optionally be marked as being
\textit{preserved through casts}.  (\textit{Note: This option is
  unavailable with flow-sensitive qualifiers.})  If \cqual{} analyzes
the above program with \tainted{}'s partial order $Q$ marked as
cast-preserved, then \cqual{} will generate the constraints $*s \cap
Q = *\!\!*s \cap Q = *t \cap Q$.  In other words, the \tainted{}
qualifier and all other qualifiers in $Q$ will flow from $*\!\!*\!s$ to
$*s$ and to $*t$.  \cqual{} does not preserve qualifiers on
structure fields, function arguments, or function result types at
casts even with a cast-preserved qualifier, because this introduces
too much imprecision.

Currently each structure or union with a distinct tag that is cast to
a \textit{void} or other primitive type is associated separately.
(Primitive types interact with pointers just like they had type
\texttt{void *}.) In other words, if both \texttt{struct foo *a} and
\texttt{struct bar *b} are stored in \texttt{void *c}, then no
constraints between \texttt{a} and \texttt{b}'s fields are generated,
even if the two structures are related in some way.

In order to provide an escape mechanism from this behavior, if you
cast to a type containing a qualifier $c$, then other qualifiers in
$c$'s partial order will not propagate through that cast even if the
partial order is cast-preserved.  It is highly recommended that you do
not enable \texttt{casts-preserve} for \const{}, since many C programs
will fail to type check if \const{} propagates through casts.

\paragraph{Warning.}  Preserving qualifiers across casts still does
not guarantee soundness.  For example, consider the following code:
\begin{verbatim}
  char *x, *y;
  int a, b;

  a = (int) x;     (1)
  b = a;           (2)
  y = (char *) b;  (3)
\end{verbatim}
For line (1), \cqual{} generates the constraints $*x = x = a$.  For
line (2), \cqual{} generates the constraint $a \leq b$.  And for
line (3), \cqual{} generates the constraints $b = *y = y$.  Notice
that we have $*x \leq *y$ but we do not have $*y \leq *x$.

Finally, \cqual{} handles casts on aggregate types in another manner.
Consider the following program:
\centertt{
\begin{tabular}{l}
struct foo *s, *u; \\
void *t; \\
\\
t = (void *) s; \\
u = (struct foo *) t;
\end{tabular}}
Now \texttt{s} has type $\qptrtype{s}{*\!\!s\;\texttt{struct foo}}$,
where \texttt{struct foo} itself has fields.  In \cqual{}, every
\textit{void} type has an associated set of aggregates it represents.
Upon analyzing the first assignment statement, \cqual{} generates the
constraint $s \leq t$ as before, and it also adds \texttt{struct
  foo} to \texttt{t}'s set of aggregates.  Then at the second
assignment statement the constraint $t \leq u$ is generated as
usual, and additionally \texttt{t}'s \texttt{struct foo} information
is extracted and the appropriate constraints are generated with
\texttt{u}'s fields.

\subsubsection{Structures}

\label{sec:structures}

In \cqual{} each aggregate (i.e. struct or union) is modeled as having
its own collection of fields with their own type qualifiers.  Thus,
for example, the following code type-checks:
\begin{verbatim}
struct buf
{
  char *data;
  int len;
};
void main (void)
{
  struct buf a, b;
  a.data = ($tainted char *)0;
  b.data = ($utainted char *)0;
}
\end{verbatim}
For efficiency reasons, \cqual{} unifies the fields of aggregates that
interact (e.g. by assignments between structs or pointers-to-structs).
Thus there is no subtyping or polymorphism when dealing with fields of
aggregates.

When analyzing multiple files, \cqual{} will match up structure types
from different files field-by-field, and it will complain if a
structure is declared differently in different files.

Finally, structure initializers are not always handled correctly.
\cqual{} requires that the shape on the right-hand side of an
initializer match the shape of the type being initialized.  For
example, \cqual{} won't understand the following code
\begin{verbatim}
    struct foo { char *s; int x; } f[] = {"abc", 3, "def", 4};
\end{verbatim}
unless it is rewritten as
\begin{verbatim}
    struct foo { char *s; int x; } f[] = {{"abc", 3}, {"def", 4}};
\end{verbatim}

\subsubsection{Restrict}

As described in Section~\ref{sec:restrict}, \cqual{} uses the
\restrict{} qualifier to help improve the precision of flow-sensitive
qualifier inference.  Moreover, occurrences of \restrict{} are checked
by \cqual{}.  This means that if you analyze a program that already
contains \restrict{} (for example, newer versions of the standard C
library headers), \cqual{} will attempt to check its uses of
\restrict{}.  Often these uses of \restrict{} will fail to typecheck,
usually because they are not annotated with \const{} and because the
alias analysis is not precise enough.  Warnings about \restrict{}
qualifiers in code you did not write may be ignored.

If you want to disable \restrict{} checking, simply remove \restrict{}
from your partial order configuration file.  Doing so will improve the
resource usage of \cqual{} if you also run without
\texttt{-fflow-sensitive}.

\subsection{Partial Order Configuration File}

\label{sec:po_file}

The qualifier partial order configuration file is specified with a
command-line option of the form \centertt{-config
  $\langle$po-file$\rangle$} All qualifiers except the three standard
C qualifiers \const{}, \volatile{}, and \restrict{} must begin with a
dollar sign.

The partial order configuration file contains a series of partial
order declarations.  For now these partial orders should be lattices,
the discrete partial order, or any three-point partial order.  For
other partial orders the implementation may or may not generate correct
results.

Each partial order is assumed to be orthogonal to any other partial
orders specified in the file.  For example, if $q_1$ and $q_2$ are two
qualifiers from different partial orders, then the constraints $q_1
\leq q_2$ and $q_2 \leq q_1$ are always satisfiable.  More formally,
the qualifier partial order is the product of each of the partial
orders specified in the configuration file \cite{foster:pldi99}.

The full grammar for partial order configuration files is given in
Section~\ref{sec:po_grammar}.  Here we show how to specify
partial orders by example.  As one example, consider the two point
lattice: \centertt{
\begin{tabbing}
partial order $\{$ \\
\hspace*{1cm}\$a < \$b \\
$\}$
\end{tabbing}}
This partial order declaration declares two qualifiers, \texttt{\$a}
and \texttt{\$b}, where $\texttt{\$a} < \texttt{\$b}$.  But now what
should happen when we declare, say \texttt{\$a int x}? Recall that
\texttt{x} is given the type $\qptrtype{\&x}{\qinttype{x}}$.  Where
should the \texttt{\$a} qualifier go?

If not specified, \cqual{} assumes that a qualifier annotates
$r$-types, and that it should be less than or equal to the
corresponding qualifier variable.  In the case of the declaration of
\texttt{x}, \cqual{} adds the constraint $\texttt{\$a} \leq x$.

As another example, consider the qualifiers used for tainting
analysis:
\centertt{\begin{tabbing}
partial order $\{$ \\
\hspace*{1cm} \=
  \$untainted [level = value, color = "pam-color-untainted", sign = neg] \\
\>\$tainted [level = value, color = "pam-color-tainted", sign = pos] \\
\\
\>\$untainted < \$tainted \\
$\}$
\end{tabbing}}
As in the previous example here we define a two-point lattice with
$\untainted{} < \tainted{}$.  Further, we explicitly declare that
$\untainted{}$ and $\tainted{}$ should annotate $r$-types with the
option \texttt{level = value} (the default).  We also specify that
$\tainted{}$ is a positive qualifier (\texttt{sign = pos}), meaning
that it should be made less than the corresponding qualifier variable
when used in a type, the default.  $\untainted{}$ is declared as a
negative qualifier (\texttt{sign = neg}), meaning that it should be
made greater than the corresponding qualifier variable when used in a
type.  For example, if we declare 
\centertt{
\begin{tabular}{l}
\$tainted int t; \\
\$untainted int u;
\end{tabular}}
then \cqual{} generates the constraints $\tainted{} \leq t$ and
$u \leq \untainted{}$.

Finally, the \texttt{color} options specify the colors that should be
used in \pam{} mode to mark-up identifiers that have tainted or
untainted types.

As another example, consider 

\centertt{\begin{tabbing}
partial order [casts-preserve] $\{$ \\
\hspace*{1cm} \=
  \$user \= [level=value, color = "pam-color-6", sign = eq, \\
\> \> ptrflow=down, fieldflow=down, fieldptrflow=all] \\
\> \$kernel \> [level=value, color = "pam-color-4", sign=eq, \\
\> \> fieldptrflow=all] \\
$\}$
\end{tabbing}}
Here \user{} and \kernel{} are in a discrete partial order: neither
$\user{} < \kernel{}$ nor $\kernel{} < \user{}$.  The partial order is declared
as casts-preserving, which means these qualifiers are preserved across 
casts between types with different shapes (see Section~\ref{sec:casts}).
The qualifiers are
declared to be non-variant (\texttt{sign = eq}), meaning that when
they occur in the source code they should be may equal to the
corresponding qualifier variables.  \ignore{Is this true?: Since this 
is a discrete partial
order, making the qualifiers positive or negative will have the same
effect.}  The \texttt{ptrflow=down} option on \user{} means that
\user{} flows from outer to inner pointer levels, and
\texttt{fieldflow=down} means that \user{} also flows from structures
to fields of structures, and finally \texttt{fieldptrflow=all}
means that \user{} flows both to and from pointers to structures and
the pointers to their fields.  (See Section~\ref{sec:structural}.)

As another example, consider ANSI C's \const{}:
\centertt{\begin{tabbing}
partial order $\{$ \\
\hspace*{1cm} \=
  const [level = ref, sign = pos] \\
\>\$nonconst [level = ref, sign = neg] \\
\\
\>\$nonconst < const \\
$\}$
\end{tabbing}}
Here the \texttt{level = ref} options mean that \const{} and
\nonconst{} annotate $l$-types instead of $r$-types.  For example,
given the declaration \texttt{const int x}, \cqual{} will generate the
constraint $\const{} \leq \&x$ (not $\const{} \leq x$ like it would
if \const{} qualified $r$-types).

If \const{} is not declared in the partial order file, \const{}
annotations will be ignored during type qualifier inference.  This is
the recommended usage, since the \const{} inference described in
\cite{foster:pldi99} is not fully implemented in this system.

%Next consider
%\centertt{\begin{tabbing}
%partial order $\{$ \\
%\hspace*{1cm} \= \$YYYY [level = value, color = "pam-color-yyyy", sign = eq] \\
%\> \$YY      [level = value, color = "pam-color-yy", sign = eq] \\
%\> \$NONYEAR [level = value, color = "pam-color-nonyear", sign = eq] \\
%\> \$RCSYEAR [level = value, color = "pam-color-rcsyear", sign = eq] \\
%$\}$
%\end{tabbing}}
%This entry declares the three qualifiers used in Carillon
%\cite{elsman:carillon99}.  Since there are no relations between these
%qualifiers, they are in the discrete partial order.  They are marked
%as being non-variant, which the current version of the code relies on
%to generate correct results.

As another example, consider qualifiers for checking locking:
\centertt{\begin{tabbing}
partial order [flow-sensitive] $\{$ \\
\hspace*{1cm} \= \$locked [level = value, color = "pam-color-locked",
sign = eq] \\
\> \$unlocked [level = value, color = "pam-color-unlocked", sign = eq] \\
$\}$
\end{tabbing}}
Here the \texttt{flow-sensitive} modifier means that \locked{} and
\unlocked{} should be propagated flow-sensitively.

Finally, consider
\centertt{\begin{tabbing}
partial order [nonprop] $\{$ \\
\hspace*{1cm} volatile [sign = eq, level = ref, color = "pam-color-4"] \\
$\}$
\end{tabbing}}
This entry declares that \volatile{} is a non-propagating qualifier,
i.e., it does not flow through the qualifier constraint graph.  In
other words, if \texttt{b} is \volatile{} as we assign \texttt{a = b},
that does not mean that \texttt{a} is \volatile{}.

\subsection{Prelude Files}

\label{sec:prelude}

One way to add annotations to your program, especially annotations for
library functions, is to use prelude files.  One or more prelude files
can be passed as arguments to \cqual{} with the syntax
\centertt{-prelude $\langle$file$\rangle$} If you specify one or more
prelude files with this flag, then these files will be analyzed before
any other files (and in order from left to right).  Additionally,
\cqual{} assumes that any file called \texttt{prelude.cq} is a prelude
file, whether or not it is preceded by \texttt{-prelude}.  Thus a
convenient way to maintain per-project prelude files is to include a
local \texttt{prelude.cq} in the source directory.

The declarations in prelude files override declarations in non-prelude
files.  Therefore if there is some library function you want to give a
polymorphic type (see below), you can give it a type in the prelude
file and not worry about how it's actually declared in the source
files.

\cqual{} comes with some default prelude files:
\begin{itemize}
\item \texttt{config/prelude.cq} can be used to find format-string bugs
in C programs.
\item \texttt{config/proto-noderef.cq} and
\texttt{config/linux-syscalls.cq} can be used to find user/kernel bugs
in the Linux kernel.
\end{itemize}

\subsection{Qualifier Polymorphism}

\label{sec:polymorphism}

One of the important techniques for improving the accuracy of \cqual{}
is to add \textit{polymorphism} to qualified type annotations.
Consider the following simple example code:
\begin{verbatim}
  char id(char x) { return x; }
  ...
  tainted char t;
  untainted char u;
  char a, b;

  a = id(t); /* 1 */
  b = id(u); /* 2 */
\end{verbatim}
Because of call 1, we infer that \texttt{x} is a
$\qchartype{\tainted{}}$, and therefore we also infer that \texttt{a}
is \tainted{}.  Then call 2 type checks (because
$\qchartype{\untainted{}}\leq\qchartype{\tainted{}}$), but we infer
that \texttt{b} must also be \tainted{}.

While this is a sound inference, it is clearly overly conservative.
Even though this simple example looks unrealistic, this problem occurs
in practice, most notably with library functions such as
\texttt{strcpy}.  The problem arises because we are summarizing
multiple stack frames for distinct calls to \texttt{id} with a single
function type---\texttt{x} has to either be untainted everywhere or
tainted everywhere.  The solution to this problem is to introduce
\textit{polymorphism}, which is a form of context-sensitivity.

A function is said to be \textit{polymorphic} if it has more than one
type.  Notice that \texttt{id} behaves the same way no matter what
qualifier is on its argument \texttt{x}: it always returns exactly
\texttt{x}.  Thus we can give \texttt{id} the signature
\centertt{forall $q$ . $\qfuntype{id}{\qchartype{q}}{\qchartype{q}}$}
meaning that \texttt{id}, applied to a \texttt{char} qualified by any
qualifier $q$, returns a \texttt{char} qualified by that same
qualifier $q$.  (\textit{id} is the qualifier on the function
\texttt{id}---think of \textit{id} as qualifying the arrow.)

Operationally, when we call a polymorphic function, we
\textit{instantiate} its type---we make a copy of its type, replacing
all the generic qualifier variables $\alpha$ with fresh qualifier
variables.  Intuitively, this corresponds exactly to inlining the
function, except that instead of making a fresh copy of the function's
code, we make a fresh copy of the function's type.  In this case we
say that \texttt{id} has a \textit{polymorphic type}, which we
constructed by \textit{generalizing} the type variable $q$.

In \cqual{}, if the \texttt{-fpoly} flag is specified on the command
line, then \cqual{} will perform polymorphic type qualifier inference;
for the above example, \texttt{id} will be inferred to have the
polymorphic type as specified.  As you might expect, polymorphic
qualifier inference requires more resources (time and space) than
ordinary monomorphic qualifier inference.

\cqual{} also allows the user to explicitly specify that certain
functions are polymorphic in their qualifiers.  This is useful when
you don't have the code for a function but want to assign it the
correct type (e.g., for library functions).  Inside of a type, if you
use qualifiers beginning with \texttt{\$\_}, they are interpreted as
named qualifier variables.  Names are sequences of integers separated
by \_ (examples below).  Function types containing explicitly named
qualifier variables are generalized.  For example, the declaration
\centertt{\$\_1 int foo(\$\_1 int);} gives \texttt{foo} the type
\centertt{forall \textit{foo\_ret} .  \textit{foo} fun
  (\textit{foo\_ret} int) -> \textit{foo\_ret} int} Whenever
\texttt{foo} is used, the generalized variables in its type will be
instantiated with fresh qualifier variables: \centertt{
\begin{tabular}{ll}
\textrm{Program} & \textrm{Type of \texttt{foo}} \\ \hline
foo(a); & \textit{foo} fun (\textit{foo\_ret@0} int) -> \textit{foo\_ret@0} int \\
... \\
foo(b); & \textit{foo} fun (\textit{foo\_ret@1} int) -> \textit{foo\_ret@1} int \\
\end{tabular}}
In this way the qualifiers from distinct calls to \texttt{foo} are
kept distinct.  It is important to note that on any place on a type
where an explicit qualifier variable is \textit{not} mentioned, the
qualifier in that position will not be generalized.  For example, in
\centertt{\$\_1 int foo(\$\_1 int, float);}, all instances of
\texttt{foo} share the same qualifier on the \texttt{float}, whereas
they get separate instances of the qualifier on the \texttt{int} and
return type.

\cqual{} ignores the definition of any function given a polymorphic
type, i.e., \cqual{} assumes that all polymorphic function
declarations are correct.  The intention is to give polymorphic types
to library functions, e.g., \texttt{strcpy}.

You can write down types containing more complicated constraints
between the qualifiers using special notation.  The declaration
\centertt{\$\_1\_2 int foo(\$\_1 int);} assign \texttt{foo} the type
\centertt{forall \textit{foo\_arg0} \textit{foo\_ret} . \textit{foo}
fun (\textit{foo\_arg0} int) -> \textit{foo\_ret} int} where
$\textit{foo\_arg0} \leq \textit{foo\_ret}$.  In general, explicit
qualifier names are interpreted as sets (not sequences) of integers,
and if the set derived from one qualifier name $q_1$ is a subset of
the set derived from another qualifier name $q_2$, then the constraint
$q_1 \leq q_2$ is added.  So, for example, \centertt{\$\_1\_2 int
foo(\$\_2 int);} is an alternate declaration that assigns \texttt{foo}
the same polymorphic type.

In general we recommend placing declarations of polymorphic functions
in prelude files (see Section~\ref{sec:prelude}).  Since declarations
in prelude files override declarations in regular files, adding a
function declaration to a prelude file has the same effect as
rewriting functions declarations in all the source files.

Currently polymorphism in the flow-sensitive qualifiers is not
supported.  If you use functions given polymorphic signatures in a
flow-sensitive analysis, \cqual{} will simply ignore your polymorphic
declarations during the flow-sensitive portion of the analysis.

\subsection{Deep Subtyping with const}

\label{sec:deep-const}

As described in Section~\ref{sec:qtypes}, we use a conservative rule
for pointer subtyping. This rule can lead to non-intuitive backwards
flow, which often causes false positives. For example, consider the
following code:
\centertt{
\begin{tabular}{l}
f(const char *x); \\
\$tainted char *a; \\
char *b; \\
f(a); \\
f(b); /* b gets tainted */
\end{tabular}}
Here the declaration of \texttt{a} adds the constraint $\tainted{}
\leq *a$ The first function call to \texttt{f} adds the constraints
$a \leq x$ and $*a = *x$ The second function call generates the
constraints $b \leq x$ and $*b = *x$.  Notice that
\[ \tainted{} \leq *a = *b\]
and thus \texttt{*b} is tainted, which is counter-intuitive but
necessary if \texttt{f} writes to \texttt{*x}.

Observe, however, that \texttt{f}'s argument \texttt{x} is of type
\texttt{const char *}, so \texttt{f} cannot taint \texttt{*x} if it is
not tainted in the first place.  We can modify the subtyping rule for
pointers to take advantage of this fact:
\[
\inference{q_1 \leq q_2 & \const{} \leq q_2 & \tau_1 \leq \tau_2}
          {\qptrtype{q_1}{\tau_1} \leq \qptrtype{q_2}{\tau_2}}
\]
For example, for an assignment
\begin{verbatim}
  const char *s;
  char *t;
  ...
  s = t;
\end{verbatim}
\cqual{} generates the constraints $t \leq s$ and $*t \leq *s$.
If \texttt{s} were not \texttt{const}, \cqual{} would generate the
more conservative $*s = *t$.

If the flag \texttt{-fconst-subtyping} is enabled (the default), then
\cqual{} will use deep-subtyping for pointers explicitly qualified in
the source program with \const{}.  I.e., the requirement $\const{}
\leq q_2$ above means that $q_2$ must have been explicitly annotated
with \const{}.  Explicit annotations for \const{} are kept in the
qualifier graph even if \const{} does not appear in the partial order
file.

Currently, \texttt{const} annotations are ignored during the
flow-sensitive portion of the analysis, so there is no deep subtyping
for flow-sensitive type qualifiers.

\subsection{Functions with Variable Numbers of Arguments}

\label{sec:varargs}

C allows functions to be declared to take a variable number of
arguments by specifying a ``rest'' parameter \texttt{...} in a
function declaration.  As in C, by default \cqual{} does not type
check rest arguments (arguments passed to the rest parameter).  For
some analyses to be correct, however, we do need to type check rest
arguments.

\cqual{} extends the syntax of C to allow functions to have a
\textit{rest qualifier}, which is syntactically specified as a type
qualifier on the \texttt{...} of a function.  If a function \texttt{f}
is declared with a rest qualifier and \texttt{f} is called with rest
argument \texttt{p}, then the qualifiers of \texttt{p}'s types are
constrained to be equal to \texttt{f}'s rest qualifier.  More
precisely, for each qualifier $q$ of \texttt{p}, \cqual{} instantiates
a fresh copy of \texttt{f}'s rest qualifier $r$ as \textit{r\_inst}i
with the appropriate constraints and add the constraint $q =
\textit{r\_inst}\textrm{i}$.  If \texttt{f} has no rest qualifier,
then no type constraints are generated for rest arguments.

For example, in the sample prelude file for tainting analysis
\texttt{config/prelude.cq}, the function \texttt{sprintf} is declared as
\centertt{
int sprintf(char \$\_1\_2 *str, const char \$untainted *format, \$\_1 ...);}
This declaration tells \cqual{} to generate constraints $q \leq
\$\_1\_2$ for all qualifiers $q$ on rest arguments to \texttt{sprintf}.

Be aware that the current implementation of varargs annotations is not
completely sound.  Specifically, rest qualifiers may be lost when
varargs functions are stored and retrieved through function pointers.
Also, note that due to limitations with \cqual{}'s parser, at most two
qualifiers or qualifier variables can be specified on \texttt{...},
one to the left and one to the right.

\subsection{Old-Style Functions}

If you declare a function \texttt{f} using the K\&R style, then no
type checking is done to arguments at a call to \texttt{f}.  This
matches the behavior of C, but it can lead to unexpected results.  If
wish to run \cqual{} on a program written in the K\&R style, you can
use the GNU package \texttt{protoize} to ANSIfy the function
definitions and declarations.  \cqual{} will warn about some, but not
all, uses of old-style functions.

\subsection{Operators}
\label{sec:operators}

In some type qualifier-based analyses, the user-defined qualifiers
interact with C operators.  For example, \carillon{} requires strings
that are dereferenced to be qualified with \texttt{\$NONYEAR}.

\cqual{} provides an experimental interface for adding such rules.
You can annotate operators with type qualifiers by declaring special
functions (probably in a prelude file).  For example, to require that
every dereferenced object be a \texttt{\$NONYEAR}, you can declare
\centertt{\$\$a \_op\_deref(\$\$a *\$NONYEAR);}
This declaration says that \texttt{\_op\_deref} is a polymorphic
function that takes a pointer to type \texttt{\$\$a} and returns a
value of type \texttt{\$\$a}, for any type \texttt{\$\$a}.  Further,
that pointer must be qualified with \texttt{\$NONYEAR}.

Currently you can only add a signature to the dereference operator.
The constraints are applied to every dereference, even implicit ones.
For example, the assignment \texttt{y = x;} is interpreted as
dereferencing both \texttt{\&y} and \texttt{\&x}, even though the
dereference operator, \texttt{*}, is never mentioned.

\subsection{iquals}

\cqual{} includes the program \iquals{}, which is a simple interface
to the qualifier constraint solver.  \iquals{} accepts as an option a
partial order configuration file, in the same format as \cqual{}.
\iquals{} reads in a file of qualifier constraints, solves the
constraints, and then outputs the results.

\iquals{} is intended mainly as a debugging tool for the qualifier
constraint solver.

\section{PAM Mode}

\label{sec:pam}

\subsection{The Interface}

In the default configuration, \pam{} is invoked by typing \texttt{M-x
cqual} in emacs.  \pam{} launches \cqual{} as a sub-process.  \cqual{}
analyzes its input files and sends the results back to \pam{}.
\cqual{} then enters an event loop in which it responds to mouse-click
events from \pam{}.

There are five active bindings when in \pam{} mode:

\begin{center}
\begin{tabular}{ll}
\texttt{middle click} & Follow hyperlink \\
\texttt{shift middle click} & Jump to qualifier definition \\
\texttt{C-c C-l} & Follow hyperlink \\
\texttt{C-c C-f} & Run \cqual{} on another file \\
\texttt{C-c C-r} & Exit \pam{} and kill all \pam{} buffers
\end{tabular}
\end{center}

\subsection{Changing the Analysis}

\label{sec:change_analysis}

\pam{} runs the analysis defined by the variable
\texttt{pam-default-analysis}, which is a list of strings, the first
of which is the path name of the executable and the rest of which are
arguments.  \pam{} will interactively ask for the target file and
append it to the argument list.  For example, here is the default
analysis in the author's \texttt{personal.el}.

\centertt{
\begin{tabbing}
(setq pam-default-analysis '(\="/home/jfoster/cqual/bin/cqual" \\
        \>"-fpam-mode" \\
        \>"-hotspots" \\
        \>"10" \\
        \>"-fflow-sensitive" \\
        \>"-config" \\
        \>"/home/jfoster/cqual/config/lattice"))
\end{tabbing}}

You can add extra options to \pam{} mode by inserting them into the
list.  For example, if you want to use the default prelude file for
tainting analysis (Section~\ref{sec:prelude}), change the above to

\centertt{
\begin{tabbing}
(setq pam-default-analysis '(\="/home/jfoster/cqual/bin/cqual" \\
        \>"-fpam-mode" \\
        \>"-hotspots" \\
        \>"10" \\
        \>"-fflow-sensitive" \\
        \>"-prelude" \\
        \>"/home/jfoster/cqual/config/prelude.cq" \\
        \>"-config" \\
        \>"/home/jfoster/cqual/config/lattice"))
\end{tabbing}}
Be sure to re-evaluate your \texttt{personal.el} file (\texttt{M-x
  eval-buffer}) or re-launch \emacs{} after making a change to the
file.

\subsection{Customizing Colors}

You can customize the colors that \pam{} uses by editing your
\texttt{.emacs} file.  By default, \pam{} defines nine type faces:
\texttt{pam-color-}$i$, where $i$ is between 1 and 8, and
\texttt{pam-color-mouse}, the color to use to highlight a link when
the cursor is dragged over it.

If you want to change a color defined by \pam{}, use
\texttt{custom-set-faces}:

\centertt{
\begin{tabbing}
(require 'pam-faces) \\
(custom-set-faces \\
\hspace*{1cm} \='(pam-color-1 ((t (:foreground "Yellow" :underline t))) t) \\
              \> '(pam-color-6 ((t (:foreground "Black" :underline t))) t))
\end{tabbing}}

If you want to add a new type face, use \texttt{pam-add-face}:

\centertt{
\begin{tabbing}
(require 'pam-faces) \\
(pam-add-face pam-color-tainted ((t (:foreground "Red" :underline t)))) \\
(pam-add-face pam-color-untainted ((t (:foreground "Green" :underline t))))
\end{tabbing}}

\section{Reference}

\subsection{cqual}

\paragraph{Synposis}
\centertt{cqual [ options ] source-files $\ldots$}

\paragraph{Description}
Invoke the type qualifier inference on \texttt{source-files}.
\cqual{} accepts all of the standard GCC options, most of which have
no effect on \cqual{}'s behavior.  \cqual{} silently ignores any
options it doesn't understand.

\begin{options}
  \option{-config \cmdarg{file}}{Specifies the partial order
    configuration file to use (Sections~\ref{sec:po_file}
    and~\ref{sec:po_grammar})}
  
  \option{-prelude \cmdarg{file}}{Specifies the prelude file to use
    (Section~\ref{sec:prelude})}
  
  \option{-hotspots \cmdarg{num}}{If specified, generate a list of the
    top \texttt{num} qualifier variables involved in error paths.
    Don't take this information too seriously.}
  
  \option{-program-files \cmdarg{file}}{Add the files listed one
    per-line in \textit{file} to the list of files to be analyzed.}
\end{options}

In addition, there are a number of flags that change cqual's behavior.
If \texttt{-f$\langle$flag-name$\rangle$} appears as an option, the
flag is enabled.  If \texttt{-fno-$\langle$flag-name$\rangle$} appears
as an option, the flag is disabled.

\begin{flags}
  \flag{pam-mode}{Enter into PAM mode after analysis is complete.
    Usually only used if PAM itself is invoking cqual.  Default value
    is off.}
  
  \flag{print-quals-graph}{Generate \texttt{quals.dot}, containing the
    (non-transitively closed) constraints, which is interpretable by
    \texttt{dot}.  Default value is off.}
  
  \flag{strict-const}{Assume anything not marked const is non-const.
    Default value is off.}
  
  \flag{print-results}{Print a summary of the results after the
    analysis is complete.  Intended mostly for regression testing.
    Default value is off.}
  
%  \flag{casts-preserve}{(Section~\ref{sec:casts}) At any cast to a
%    type that is not explicitly qualified, propagate qualifiers
%    through the cast.  The standard behavior is to stop the flow of
%    qualifiers at casts.  This flag cannot be used during
%    flow-sensitive analysis. Default value is off.}
  
  \flag{use-const-subtyping}{(Section~\ref{sec:deep-const}) Use
    \const{} qualifiers to increase the precision of the analysis by
    using subtyping, rather than equality, under a \const{} pointer.
    This flag has no effect on flow-sensitive analysis.  Default value
    is on.}
  
  \flag{flow-sensitive}{(Section~\ref{sec:flow-sensitive}) Perform
    flow-sensitive qualifier inference after flow-insensitive
    qualifier inference.  If you enable this flag the analysis will
    consume more resources, even if no flow-sensitive qualifiers
    appear in the source code.  Hence we recommend you disable it if
    you do not need the flow-sensitive analysis.  Default value is
    off.}

  \flag{ugly}{Display memory addresses next to qualifier variable
    names.  This is mainly useful for big programs that tend to reuse
    local variable names---without using this flag it's hard to tell
    them apart.}

  \flag{explain-errors}{For each error message, display a constraint
    path exhibiting the error.  This is useful when running \cqual{}
    directly from the command line without using \pam{}.  Default
    value is off.}

  \flag{poly}{Enabled polymorphic recursive inference.  This requires
    more time and memory than purely monomorphic inference, but the
    result will be more precise.  Default value is off.}
\end{flags}

\subsection{iquals}

\paragraph{Synposis}
\centertt{iquals [ -config $\langle$file$\rangle$ ] [ -g ]
  constraint-file}

\paragraph{Description}
Solve the qualifier constraints in \texttt{constraint-file}.

\begin{options}
  \option{-config [file]}{Specifies the partial order configuration
    file to use.}  \option{-g}{Generate \texttt{quals.dot}, containing
    the (non-transitively closed) constraints, which is interpretable
    by dot.}
\end{options}

The \texttt{constraint} file should consist of a list of constraints
of the form

\begin{center}
\begin{tabular}{ll}
\texttt{q1 $<$= q2} & inequality \\
\texttt{q1 = q2} & equality \\
\texttt{q1 == q2} & unification \\
\texttt{q1 $<$= q2 ==$>$ q3 $<$= q4} & conditional inequality
\end{tabular}
\end{center}

Here if \texttt{qi} begins with \texttt{\$} it is assumed to be a
partial order element specified in the partial order file.  Otherwise
\texttt{qi} is assumed to be a variable.  Variables may contain
numbers, upper- and lower-case letters, and underscores.

\subsection{gcqual}
\label{sec:gcqual}

\paragraph{Synposis}
\centertt{gcqual [-debug] [-cc <CC>] [-cqual <cqual>] [cqual options] 
                 [-- [CC options]] <file1> <file2> ...}

\paragraph{Description}
Preprocess the given input files with the C preprocessor, then call
cqual on the results.  \texttt{gcqual} doesn't preprocess input files
ending in \texttt{.i}, and doesn't preprocess any input files if the
\texttt{-fpreprocessed} flag is given.  \texttt{gcqual} always
preprocesses prelude files.  If no lattice file is specified on the
command line, \texttt{gcqual} looks in the following places, in order,
until one is found:
\begin{itemize}
\item \texttt{\$CQUAL\_CONFIG\_DIR/lattice}
\item \texttt{./lattice}
\item \texttt{/usr/local/share/cqual/lattice}
\end{itemize}
If no lattice or prelude files are specified on the command line, then
\texttt{gcqual} also loads all prelude files (\texttt{*.cq}) in the 
same directory as the lattice file it finds.

\begin{options}
  \option{-debug}{display commands executed by \texttt{gcqual}}  
  \option{-cc <CC>}{use \texttt{<CC>} instead of \texttt{gcc}}
  \option{-cqual <cqual>}{use \texttt{<cqual>} instead of \texttt{cqual}}
\end{options}

\subsection{Partial Order Configuration File}

\label{sec:po_grammar}

The partial order configuration file should contain a series of
entries defining partial orders.  In the current version of the code
these partial orders should be lattices to generate valid inference
results.  Inference should also work correctly on any discrete partial
order, and on any of the three-point partial orders.  A future version
of \cqual{} will correct this limitation.

Below is the grammar for partial order configuration files.  In this
grammar, \textit{x$^*$} means zero or more occurrences of \texttt{x},
and \texttt{[} \textit{x} \texttt{]}$^?$ means either zero or one
occurrences of \texttt{[} \textit{x} \texttt{]}.

\begin{center}
\begin{tabular}{rcl}
\textit{po-defn} & ::= & \texttt{partial order} 
                \texttt{[} \textit{po-opt$^*$} \texttt{]}$^?$
                   \{ \textit{po-entry$^*$} \} \\
\textit{po-opt}  & ::=    & \texttt{nonprop} \\
                 & $\mid$ & \texttt{flow-sensitive} \\
                 & $\mid$ & \texttt{casts-preserve} \\

\textit{po-entry} & ::=    & \textit{qual-name}
                        \texttt{[} \textit{qual-opt$^*$} \texttt{]}$^?$ \\
                  & $\mid$ & \textit{qual-name} $<$
                             \textit{qual-name} \\
\textit{qual-opt} & ::=    & \texttt{color = "\textrmit{color-name}"} \\
                  & $\mid$ & \texttt{level =} \textit{level} \\
                  & $\mid$ & \texttt{sign =} \textit{sign} \\
                  & $\mid$ & \texttt{ptrflow =} \textit{flow-dir} \\
                  & $\mid$ & \texttt{fieldflow =} \textit{flow-dir} \\
                  & $\mid$ & \texttt{fieldptrflow =} \textit{flow-dir} \\
\textit{level} & ::= & \texttt{ref} $\mid$ \texttt{value} \\
\textit{sign} & ::= & \texttt{pos} $\mid$ \texttt{neg} $\mid$
                        \texttt{eq} \\
\textit{flow-dir} & ::= & \texttt{down} $\mid$ \texttt{up} $\mid$ \texttt{all}
\end{tabular}
\end{center}

In addition, comments beginning with \texttt{/*} and ending with
\texttt{*/} may be added to the configuration file.  Comments may not
be nested, following the C convention.

%\bibliographystyle{alpha}
%\bibliography{jfoster2}

\begin{thebibliography}{STFW01}
  
\bibitem[AFKT03]{aiken:pldi03} Alex Aiken, Jeffrey~S. Foster, John
  Kodumal, and Tachio Terauchi.  \newblock {Checking and Inferring
    Local Non-Aliasing}.  \newblock In {\em Proceedings of the 2003
    ACM SIGPLAN Conference on Programming Language Design and
    Implementation}, pages 129--140, San Diego, California, June 2003.

\bibitem[ANS99]{c99}
ANSI.
\newblock {\em Programming languages -- C}, 1999.
\newblock ISO/IEC 9899:1999.

\bibitem[EFA99]{elsman:carillon99}
Martin Elsman, Jeffrey~S. Foster, and Alexander Aiken.
\newblock {Carillon---A System to Find Y2K Problems in C Programs}, 1999.
\newblock \texttt{http://bane.cs.berkeley.edu/carillon}.

\bibitem[Fos02]{foster:thesis}
Jeffrey~Scott Foster.
\newblock {\em {Type Qualifiers: Lightweight Specifications to Improve Software
  Quality}}.
\newblock PhD thesis, University of California, Berkeley, December 2002.

\bibitem[FA01]{foster:tr01_restrict}
Jeffrey~S. Foster and Alex Aiken.
\newblock {Checking Programmer-Specified Non-Aliasing}.
\newblock Technical Report UCB//CSD-01-1160, University of California,
  Berkeley, October 2001.

\bibitem[FFA99]{foster:pldi99}
Jeffrey~S. Foster, Manuel F{\"{a}}hndrich, and Alexander Aiken.
\newblock {A Theory of Type Qualifiers}.
\newblock In {\em Proceedings of the 1999 ACM SIGPLAN Conference on Programming
  Language Design and Implementation}, pages 192--203, Atlanta, Georgia, May
  1999.

\bibitem[FTA02]{foster:pldi02}
Jeffrey~S. Foster, Tachio Terauchi, and Alex Aiken.
\newblock {Flow-Sensitive Type Qualifiers}.
\newblock In {\em Proceedings of the 2002 ACM SIGPLAN Conference on Programming
  Language Design and Implementation}, Berlin, Germany, June 2002.
\newblock To appear.

\bibitem[GA01]{gay:pldi01}
David Gay and Alexander Aiken.
\newblock {Language Support for Regions}.
\newblock In {\em Proceedings of the 2001 ACM SIGPLAN Conference on Programming
  Language Design and Implementation}, pages 70--80, Snowbird, Utah, June 2001.

\bibitem[PAM]{harrelson:pam}
Christopher Harrelson.
\newblock {Program Analysis Mode}.
\newblock \texttt{http://www.cs.berkeley.edu\slash{}\~{}chrishtr\slash{}pam}.

\bibitem[STFW01]{shankar:usenixsec01}
Umesh Shankar, Kunal Talwar, Jeffrey~S. Foster, and David Wagner.
\newblock {Detecting Format String Vulnerabilities with Type Qualifiers}.
\newblock In {\em Proceedings of the 10th Usenix Security Symposium},
  Washington, D.C., August 2001.

\end{thebibliography}

\appendix{}

\section{Limitations and Bugs}
\label{sec:limitations-bugs}

\begin{itemize}
\item Only some partial orders will work correctly:  any lattice, any
  discrete partial order, and any of the two- or three-point partial
  orders.  Your mileage will vary with other partial orders.
  
\item The constraint graph traversal in \pam{} mode really works best
  if the program is analyzed using only a single, two-point lattice.
  It works for other partial orders, but less reliably.
  
\item \const{} inference (described in \cite{foster:pldi99}, which
  used a previous version of this system written in ML) is not fully
  implemented.  Specifically, the relationship between \const{} fields
  and \const{} structures is not handled fully correctly.
  
\item Only the dereference operator can be annotated with qualifiers
  without hacking the source code.
  
\item If you kill a marked-up buffer in \pam{} mode, then you need to
  re-run \cqual{} to recover the buffer.
  
\item Structure initializers aren't always handled correctly.
  
\item The flow-sensitive analysis does not support polymorphism, gated
  qualifiers, or \texttt{-fconst-subtyping}.
\end{itemize}

\section{Copyright}

\label{app:copyright}

This manual is copyright (C) 2001-2003 The Regents of the University
of California.

\cqual{} includes parts of the RC compiler, which is derived from the GNU
C Compiler.  It is thus

\begin{center}
  Copyright (C) 1987, 88, 89, 92-7, 1998 Free Software Foundation, Inc. \\
  Copyright (C) 2000-2003 The Regents of the University of California.
\end{center}

\cqual{} is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.

\cqual{} is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with cqual; see the file \texttt{COPYING}.  If not, write to
the Free Software Foundation, 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.

\end{document}
% LocalWords:  Synposis cqual GCC config hotspots num cqual's fno pam quals ll
% LocalWords:  interpretable const subtyping iquals qi rcl po defn nonprop qual
% LocalWords:  pos neg eq untainted Elsman hndrich ACM SIGPLAN RC Elib SML xf
% LocalWords:  MERCHANTABILITY Umesh Shankar Kunal Talwar usability gzipped src
% LocalWords:  gunzip cd makefile executables el emacs getenv printf fmt LD ptr
% LocalWords:  untrusted hyperlink hyperlinked pre satisfiable nonconst YYYY YY
% LocalWords:  yyyy yy NONYEAR nonyear Polymorphism forall ret strcpy foo's str
% LocalWords:  fconst Varargs sprintf vararg varargs dereferenced deref setq th
% LocalWords:  fpam eval Harrelson chrishtr Usenix dereference hyperlinks versa
% LocalWords:  subtype aliased Globals globals wildcard typechecks wildcards
% LocalWords:  gcc remblanks sed perl remquals protoize ANSIfy xzf traversal
% LocalWords:  fcasts struct Initializers buf initializers initializer inlining
% LocalWords:  polymorphism Operationally instantiate instantiated inst arg IEC
% LocalWords:  instantiates Snowbird Tachio Teruachi README Aliasing STFW fflow
% LocalWords:  Terauchi unsoundness variable's taintedness aliasing gccpreproc
% LocalWords:  preprocessed monomorphic typecheck UCB




See more files for this project here

CQual

CQual - A tool for adding type qualifiers to C

Project homepage: http://sourceforge.net/projects/cqual
Programming language(s): C,Java,Shell Script
License: other

  Makefile.am
  code-guide.txt
  mylatex
  semantic.dtx
  semantic.ins
  semantic.sty
  user-guide.tex