# QEMU VM Escape

tl;dr

This post will describe how I exploited CVE-2019-14378, which is a pointer miscalculation in network backend of QEMU. The bug is triggered when large IPv4 fragmented packets are reassembled for processing. It was found by code auditing.

## Vulnerability Details

There are two parts to networking within QEMU:

• The virtual network device that is provided to the guest (e.g. a PCI network card).
• The network backend that interacts with the emulated NIC (e.g. puts packets onto the host’s network).

By default QEMU will create a SLiRP user network backend and an appropriate virtual network device for the guest (eg an e1000 PCI card)

The bug was found in the packet reassembly in SLiRP.

### IP fragmentation

IP fragmentation is an Internet Protocol (IP) process that breaks packets into smaller pieces (fragments), so that the resulting pieces can pass through a link with a smaller maximum transmission unit (MTU) than the original packet size. The fragments are reassembled by the receiving host.

### Flags:

3 bits

• Bit 0: reserved, must be zero
• Bit 1: (DF) 0 = May Fragment, 1 = Don’t Fragment.
• Bit 2: (MF) 0 = Last Fragment, 1 = More Fragments.
• Fragment Offset: 13 bits

mbuf structure is used to store IP layer information received. There are two buffers m_dat which is inside the structure and m_ext is allocated on the heap if the m_dat is insufficient to store the packet.

For the NAT translation if the incoming packets are fragmented they should be reassembled before they are edited and re transmitted. This reassembly is done by the ip_reass(Slirp *slirp, struct ip *ip, struct ipq *fp) function. ip contains the current IP packet data, fp is a link list containing the fragmented packets.

• ip_reass does the following:
• If first fragment to arrive (fp==NULL), create a reassembly queue and insert ip into this queue.
• Check if the fragment is overlapping with previous received fragments, then discard it.
• If all the fragmented packets are received reassemble it. Create header for new ip packet by modifying header of first packet;

The bug is at the calculation of the variable delta. The code assumes that the first fragmented packet will not be allocated in the external buffer (m_ext). The calculation q - m->dat is valid when the packet data is inside mbuf->m_dat ( q will be inside m_dat ) ( q is structure containing link list of fragments and packet data). Otherwise if m_ext buffer was allocated, then q will be inside the external buffer and the calculation of the delta will be wrong.

Later the newly calculated pointer q is converted into ip structure and values are modified, Due to the wrong calculation of the delta, ip will be pointing to incorrect location and ip_src and ip_dst can be used to write controlled data onto the calculated location. This may also crash qemu if the calculated ip is located in unmaped area.

## Exploitation

What are we facing

• If we control delta we will be able to write controlled data relative to m->m_ext. For that need precise control over the heap.
• Need leaks to bypass ASLR
• There are no useful function pointers on the heap to get code execution. We have to get arbitrary write.

### Controlling Heap

Let’s look into how heap objects are allocated in slirp.

m_get, m_free , m_inc and m_cat are wrappers for handling dynamic memory allocation. When new packet arrives new mbuf object is allocated, and if m_dat is sufficient for storing the packet data then it is used, otherwise new external buffer is allocated with m_inc and the data is copied onto it.

If the incoming packet is fragmented, new mbuf object is used to store the packets (fp) until all the fragments arrives. When next part arrives they are enqueued onto this list.

This gives us a good primitive to allocate controlled chunks on the heap size ( > 0x608 ). Few things to keep in mind is that, for every packets mbuf(0x670) will be allocated and if it is the first fragment then the another mbuf will be allocated (fp : fragment queue).

We can use this to spray the heap,so that the subsequent allocation will be taken from the top chunk, which gives us a predictable heap state.

### Getting controlled write on heap

Now that we can control the heap. Let’s see how we can use the bug to overwrite something useful.

Assume this heap state

Now delta will be -padding and this will be added with m->m_ext and later we can write to that offset. Thus controlling this padding we are able to control delta.

When all the fragments arrive they are concatenated to one mbuf object with m_cat function.

The m_inc calls realloc function, realloc function return the same chunk if it can accommodate the requested size. So even after the reassembly of the packets, we can get the same m->m_ext buffer of the fist packet. Note, m_ext will be allocated for the first fragment packet, q will be pointing inside this buffer . Then the addition of -padding will also be relative to q. This just makes things bit easier

So after the pointer calculation q will be pointing to target

since we control fp->ipq_src and fp->ipq_dst which is the source and destination ip of the packet we can overwrite targets content.

### Arbitrary Write

My initial target was to overwrite the m_data field, so that we can use the packet reassembly’s m_cat() to get arbitrary write, but that seems to be not possible due to some alignment and offsets issues.

But was able to overwrite m_len field of the object. Since there is no check in the m_cat function we can use the m_len to get arbitrary write relative to m_data. So now we do not have the issue of alignment and we use this to overwrite the m_data of different object to get arbitrary write.

• Send packet with id 0xdead and MF bit set (1)
• Send packet with id 0xcafe and MF bit set (1)
• Trigger the bug to overwrite m_len of 0xcafe so that m_data + m_len points to 0xdead‘s m_data
• Send packet with id 0xcafe and MF bit unset (0) to trigger reassembly and overwrite 0xdead‘s m_data with target address
• Send packet with id 0xdead and MF bit unset (0) which will write the content of this packet to m_data.

### Getting Leaks

We need leaks to bypass ASLR and PIE. For that we need some way to transfer data back to the guest . It turns out that there is a very common service that matches that description exactly: ICMP echo request. SLiRP gateway responds to a ICMP echo requests, reflecting back the payload of the packet (after the ICMP headers) unchanged.

We have arbitrary write, but where will we write to since leaks are not known at this point ?

We can do a partial overwrite of the m_data and write data on the heap.

Leaks :

• Use arbitrary write to create fake ICMP header on the heap
• Send an ICMP request with the MF bit set (1).
• Partially Overwrite m_data to point to fake header on heap
• Send the packet with MF bit to 0 to end the ICMP request.
• Receive leaks from the host.

### Getting Code Execution

Timers (more precisely QEMUTimers) provide a means of calling a given routine (a callback) after a time interval has elapsed, passing an opaque pointer to the routine.

main_loop_tlg is a array in bss which contains QEMUTimerList associated with different timer. And these contains list of QEMUTimer structures. qemu loops through these to check whether there any of them have expired, If so, cb function is called with argument opaque.

RIP control :

• Create fake QEMUTimer with callback as system and opaque as the argument
• Create fake QEMUTImerList which contains our fake QEMUTimer
• Overwrite main_loop_tlg entry with fake QEMUTimerList

You can find the full exploit at CVE-2019-14378