CVE-2020-3992 & CVE-2021-21974: Pre-Auth Remote Code Execution in VMware ESXi

March 02, 2021 | Lucas Leong

Last fall, I reported two critical-rated, pre-authentication remote code execution vulnerabilities in the VMware ESXi platform. Both of them reside within the same component, the Service Location Protocol (SLP) service. In October, VMware released a patch to address one of the vulnerabilities, but it was incomplete and could be bypassed. VMware released a second patch in November completely addressing the use-after-free (UAF) portion of these bugs. The UAF vulnerability was assigned CVE-2020-3992. After that, VMware released a third patch in February completely addressing the heap overflow portion of these bugs. The heap overflow was assigned CVE-2021-21974.

This blog takes a look at both bugs and how the heap overflow could be used for code execution. Here is a quick video demonstrating the exploit in action:

Service Location Protocol (SLP) is a network service that listens on TCP and UDP port 427 on default installations of VMware ESXi. The implementation VMware uses is based on OpenSLP 1.0.1. VMware maintains its own version and has added some hardening to it.

The service parses network input without authentication and runs as root, so a vulnerability in the ESXi SLP service may lead to pre-auth remote code execution as root. This vector could also be used as a virtual machine escape, since by default a guest can access the SLP service on the host.

The Use-After-Free Bug (CVE-2020-3992)

This bug exists only in VMware’s implementation of SLP. Here is the simplified pseudocode:

At (3), if a SLP_FUNCT_DAADVERT or SLP_FUNCT_SRVREG request is handled correctly, it will save the allocated SLPMessage into the database. However, at (4), the SLPMessage is freed even though the handled request returns without error. It leaves a dangling pointer in the database. It is possible the free at (4) was added in the course of fixing some older bugs.

Bypassing the First Patch for CVE-2020-3992

The first patch (build-16850804) by VMware was interesting. VMware didn’t make any changes to the vulnerable code shown above. Instead, they added logic to check the source IP address before handling the request. The logic, which is in IsAddrLocal(), allows requests from a source IP address of localhost only.

After a few seconds, you might notice that it can still be accessed from an IPv6 link-local address via the LAN.

The Second Patch for CVE-2020-3992

Just over two weeks later, the second patch (build-17119627) was released. This time, they improved the IP source address check logic.

This change does eliminate the IPv6 vector. Additionally, they patched the root cause of the UAF bug by clearing the pointer to the SLPMessage after adding it to the database.

The Heap Overflow Bug (CVE-2021-21974)

Like the previous bug, this bug exists only in VMware’s implementation of SLP. Here is the simplified pseudocode:

At (5), srvurl comes from network input, but the function does not terminate srvurl with a NULL byte before using strstr(). The out-of-bounds string search leads to a heap overflow at (6). This happened because VMware did not merge an update from the original OpenSLP project.

The Patch for CVE-2021-21974

Six weeks later, the third patch (build- 17325551) was released. It addressed the root cause of the heap overflow bug by checking the length before the memcpy at (6).

Exploitation

All Linux exploit mitigations are enabled for /bin/slpd, and most notably, Position Independent Executables (PIE). This makes it difficult to achieve code execution without first disclosing some addresses from memory. At first, I considered using the UAF, but I could not figure out an effective method to get a memory disclosure. Therefore, I moved my focus to the heap overflow bug instead.

Upgrading the Overflow

SLP uses struct SLPBuffer to handle events that it sends and receives. One SLPBuffer* sendbuf and one SLPBuffer* recvbuf are allocated for each SLPDSocket* connection.

The plan is to partially overwrite the start or curpos pointer in SLPBuffer and leak some memory on the next message reply. However, the sendbuf is emptied and updated before each reply. Fortunately, there is a timeslot during which sendbuf can survive due to the select-based socket model:

  1. Fill a socket send buffer without receiving until the send buffer is full.
  2. Partially overwrite sendbuf->curpos for that socket.
  3. Start to receive from the socket. The leaked memory will be appended at the end.

There are some additional challenges, though:

       -- Due to the use of strstr(), you cannot overflow with a NULL byte.
       -- The overflowed buffer (obuf) will be automatically freed very soon after the return of SLPParseSrvUrl().

Together, this means that the overwrite can only extend partway through the next chunk header. Otherwise, the size of the next free chunk will be set to a very large value (four non-NULL bytes), and shortly after obuf is freed, the process will abort.

The following layout overcomes these challenges:

Assume that the target is sendbuf. In (F1), each chunk marked “IN USE” can be either a SLPBuffer or a SLPDSocket. A hole is prepared for obuf in (F2). After triggering the overflow in (F4), the next freed chunk is enlarged and overlapped onto the target. Next, obuf is then freed in (F5). Now, you can allocate a new recvbuf from a new connection to overwrite the target in (F6). This time the overwrite can include NULL bytes.

There is an additional problem:

       -- Many malloc() functions from OpenSLP are replaced with calloc() by VMware.

The recvbuf in (F6) is also allocated from calloc(), which zero-initializes memory. This means that partial pointer overwrites are not possible when recvbuf overlaps the target. There is a trick to get around that, though: You can first overwrite the IS_MAPPED flag on the freed chunk in (F4). This causes calloc() to skip the zero initialization on the next allocation. This is a general method that is useful in many situations where you want to perform an overwrite on target.

Putting It All Together

  1. Overwrite a connection state (connection->state) as STREAM_WRITE_FIRST. This is necessary so that sendbuf->curpos will get reset to sendbuf->start in preparation for the memory disclosure.
  2. Partially overwrite sendbuf->start with 2 NULL bytes, where sendbuf belongs to the connection mentioned in step 1. Start receiving from the connection. You can then get memory disclosure, including the address of sendbuf.
  3. Overwrite sendbuf->curpos from a new connection to leak the address of a recvbuf, which is allocated from mmap(). Once you have an mmapped address, it becomes possible to infer the libc base address.
  4. Overwrite recvbuf->curpos from a new connection, setting it to the address of free_hook. Start sending on the connection. You can then overwrite free_hook.
  5. Close a connection, invoking free_hook to start the ROP chain.

These steps may not be the optimized form.

Privilege Level Obtained

If everything goes fine, you can execute arbitrary code with root permission on the target ESXi system. In ESXi 7, a new feature called DaemonSandboxing was prepared for SLP. It uses an AppArmor-like sandbox to isolate the SLP daemon. However, I find that this is disabled by default in my environment.

This suggests that a sandbox escape stage will be required in the future.

Conclusion

VMware ESXi is a popular infrastructure for cloud service providers and many others. Because of its popularity, these bugs may be exploited in the wild at some point. To defend against this vulnerability, you can either apply the relevant patches or implement the workaround. You should consider applying both to ensure your systems are adequately protected. Additionally, VMware now recommends disabling the OpenSLP service in ESXi if it is not used.

We look forward to seeing other methods to exploit these bugs as well as other ESXi vulnerabilities in general. Until then, you can find me on Twitter @_wmliang_, and follow the team for the latest in exploit techniques and security patches.