tomek7667

ECSC 24 Polish Qualifications - Semantic Security - crypto - easy

The description of the challenge is as follows:

It’s not cryptographically secure random, but I think it’s good enough. (10 solves)

Additionally a netcat service is provided, that runs the following:

import binascii
import random


def xor(*t):
    from functools import reduce
    from operator import xor
    return [reduce(xor, x, 0) for x in zip(*t)]


def main():
    flag = open("flag.txt", 'rb').read()
    while True:
        print("1. Get ciphertext")
        print("2. Exit")
        choice = input(">").strip()
        if choice == "1":
            keystream = [random.randrange(0, 255) for _ in range(len(flag))]
            random.shuffle(keystream)
            print(binascii.hexlify(bytes(xor(flag, keystream))).decode())
        elif choice == "2":
            return
        else:
            print("WTF?")


if __name__ == "__main__":
    main()

The Challenge

We input 1 and we get flag ^ keystream where keystream is a list of random bytes generated by random.randrange(0, 255) and then shuffled. Nothing fancy here, just a simple XOR encryption.

Side note: even though we know that the flag format is ecsc24{...} we won’t be needing it. Of course we could leak the beginning of the keystream and last character, but that wouldn’t give us much.

Vulnerability

The main vulnerability here lies in:

random.randrange(0, 255)

This function generates numbers ranging from 0 to 255, but not including 255 (Also like a standard range() function in python). Why is that a vulnerability? It will be easier (at least was for me) if we look at the binary representation of bytes.

In detail

Let’s assume, that our secret is currently a letter a which after using ord function, gets converted to the byte value 97.

The following a.py script, will create all possible keys from the challenge for the a character and will also create the encrypted output (this is equivalent to what we get from the challenge) and store it in the all_encryptions array. I will additionally print out the whole operations that are ongoing, so we investigate what exactly is done there.

# a = 97 (01100001 in binary)
a = ord("a")
out = ""
all_encryptions = []
# `range(0, 255)` has the same "scope" as `random.randrange(0, 255)` from the challenge
for random_key in range(0, 255):

    # We encrypt the 'a' character the same way the server does, plain xor.
    encrypted_a = a ^ random_key

    # just for nice binary representation:
    b_a = bin(a)[2:].zfill(8)
    b_random_key = bin(random_key)[2:].zfill(8)
    b_encrypted_a = bin(encrypted_a)[2:].zfill(8)
    out += b_a + "^" + b_random_key + "=" + b_encrypted_a + "\n"

    all_encryptions.append(b_encrypted_a)

open("1-255-xor.txt", "w").write(out)

After we run the script with the command:

python a.py

We can see the following file being created:

01100001^00000000=01100001
...
... many rows later...
...
01100001^11111110=10011111

These are all operations performed. I would like to highlight for you especially the following:

"a"   ^   <key>  = <result>
01100001^00000000=01100001
01100001^00000001=01100000
...
01100001^01100000=00000001
01100001^01100001=00000000 << 'a' is xored with 'a', so naturally it will give 0, however this doesn't give us the information we need
01100001^01100010=00000011
...
01100001^11111101=10011100
01100001^11111110=10011111 << This is the last key.
<EOF>

Can you see something suspicious? If you see that the keys are lacking one value, then you are right!

The last key is 254 which is 11111110 in binary. This means, that we can be sure that the server will not generate 11111111 key. So how we can decode our character a?

We know what is the encrypted text, because we know what isn’t.

What I mean by that, is if we were to generate all possible values of a byte (from 0 to 255, or from 0000000 to 11111111), we could check which of the keys is missing from the 254 keys that were given by the server. The following python script will generate those values and will print out, which of the key is not present:

all_binary = []
for i in range(0, 256):
    all_binary.append(bin(i)[2:].zfill(8))

# check which from all_binary is not in the all_encryptions array
for i in all_binary:
    if i not in all_encryptions:
        print(i) # 10011110

and if we were to xor this (only once printed) value with the missing key 11111111, we would get 01100001

10011110
11111111 ^
--------
01100001

which is 97 decimal, and a as a string!

Exploit

So in order to get all possible values, we need to invoke a lot of times the 1. command. For that, a great library comes in with help called pwntools. With it’s help we can fairly easily programtically connect to a service running on netcat, gather as much keys we need, and when we have all 254, we can extract the flag.

(similar principle in gamedev: the bullet knows where it is, because it knows where it isn’t)

from pwn import *
from binascii import hexlify, unhexlify

# utility functions
def unhex(s):
    return unhexlify(s.encode())


def hhex(s):
    return hexlify(s).decode()


def get_bin(i: int) -> str:
    return bin(i)[2:].zfill(8)


# getting a next ciphertext and decoding it to raw bytes
def get_ciphertext(r):
    r.sendlineafter('>', '1')
    ciphertext = r.recvline().strip()    
    return unhex(ciphertext.decode())

# verifying whether the mapper dictionary hass been filled with all keys
def is_mapper_full(mapper):
    for i in mapper:
        if len(mapper[i]) != 255:
            # uncomment for progress
            # print(len(mapper[i]))
            return False
    return True


# after mapper is filled with all keys, we generate all 255 keys and extract which of the are not in the mapper
# after so, we xor the values with `11111111` which is `0xff` in hexadecimal representation, and convert the result
# to a string from value.
def decode_mapper(mapper, r):
    all_binary = []
    for i in range(0, 256):
        all_binary.append(bin(i)[2:].zfill(8))

    flag = ""
    for i in range(len(mapper)):
        for j in all_binary:
            if j not in mapper[i]:
                decoded = int(j, 2) ^ 0xff
                flag += chr(decoded)
                
    print(f'flag={flag}')
    r.close()

def is_not_in_mapper_yet(mapper, index, number):
    if number in mapper[index]:
        return False
    return True

r = remote('semantic.ecsc24.hack.cert.pl', 5102)
cipher = get_ciphertext(r)

# Create empty mapper object with `i` as the index of each flag character
# e.g. if flag was 4 bytes long, the mapper would have 4 keys with an empty array
# inside of each one.
mappers = {}
for i in range(len(cipher)):
    mappers[i] = []


while True:
    cipher = get_ciphertext(r)
    for i in range(len(cipher)):
        binary = get_bin(cipher[i])
        if is_not_in_mapper_yet(mappers, i, binary):
            mappers[i].append(binary)
    if is_mapper_full(mappers):
        print('mapper is full, decoding flag')
        decode_mapper(mappers, r)
        break

# ecsc24{I_hope_that_only_negligible_information_about_the_plaintext_can_be_extracted}

The code and explanation could’ve been done in pure bytes, however I strongly feel that better understanding of the problem is under the binary representation.