Understanding Poll Generators

Dynamically writing polls

Challenge binary interactivity verification within the context of the DARPA Cyber Grand Challenge is implemented via specialized unit tests known as "service polls". Service polls are intended to validate the performance and functionality of challenge binaries in order to test the efficacy and impact of reformulation performed by competitors. Similar in ideals to unit tests, service polls should be written to validate the interactivity and complex state machines implemented by CBs.

Due to the requirements for CB authors to deliver a large number of CB validations that are both deterministic and unique, a service poll generator has been provided that will be used to assist in developing these polls.

Developing services polls using the generator is performed through the use of a weighted directed graph, and an associated python module that implements the state machine that is used to generate the polls.

Directed Graph

The Directed Graph specifies the connection between each individual components within the state machine, allowing a CB author to define the polls as small individual components, and explore the combinatorial permutations of between the different components for CB validation.

The directed graph defines individual nodes, which relate to a method in the provided python class, and edges, which define which nodes could be called upon completion of the execution of a given node.

These components are defined in YAML, at the top level as a dictionary, nodes and edges respectively.

Nodes

The nodes entry is composed of a list of dictionaries, each defining an individual node. Each dictionary must contain the key name, which is a string that defines the name of the given node. The value of the name entry corresponds to the name of the method within the provided python module that should be executed upon accessing that node during the graph traversal.

Within this dictionary, two additional entries are supported: chance and continue. Both chance and continue, if provided, are specified as floats between 0.0 and 1.0.

  • continue allows the developer to specify the likelihood that the state machine should continue processing upon execution of the node.

  • chance allows the developer to specify the likelihood that the state machine should execute the node, or continue traversal of the graph without executing the underlying functionality of the node.

Node names must be unique. If the node name start is provided, then traversal will always begin at this node, otherwise, the traversal will start at a random node in the graph.

Edges

The edges entry is composed of a list of dictionaries, each defining an edge between two nodes. Each dictionary can have up to two entries, with the first defining the start and end nodes for a given edge. The key and value both refer to a node defined in the nodes entry. Within this dictionary, two additional entries are supported: weight and after. weight, if provided, is specified as a float between 0.0 and 1.0. after, if provided, is specified as a float between 0.0 and 1.0.

  • weight allows the developer to specify the likelihood of traversal of a given edge when a node has multiple edges leaving the node within the graph.
  • after allows the developer to specify that a path may only be included in the traversal allows the developer enable code paths be enabled after a specified percentage of polls have been generated.

State Machine

The State Machine is a python class, which defines the implementations of methods that perform the interactions with a service for a given state. The underlying implementation is a subclass of the generator.Actions python class provided as part of the poll-generation package.

This class provides a set of methods that perform specific functions that are reliant for CB interactions performed via the XML DTD used by cb-replay. These methods are:

  • read: creates a read interaction
  • write: creates a write interaction
  • xml: create an XML for all of the existing interactions
  • chance: simple to use wrapper to get a True or False value, specifying the likelihood that True will be taken via a float between 0.0 and 1.0.
  • reset: resets the internal state of the machine, which is called per iteration of the graph traversal.

To create a state machine for use within the generate-polls, provide methods for each node in the directed graph described above. Each method should call self.read and self.write as needed, which will implement read and write interactions to the service.

There is a dictionary state, that is provided by the parent class that can be used for storing arbitrary intermediary values during execution of a given iteration but is reset between each iteration. The state dictionary is reinitialized by the above reset method.

There is also a per-poll magic_page string will be used in the challenge binary, allowing for perfect knowledge of the magic page as discussed in submitting-a-cb.md.

Sample Implementation Walk Through

A sample implementation of a poll generator, ftplite, is provided in the /usr/share/poll-generator/examples/. This generator provides a state machine interaction to an FTP like protocol.

The 9 unique states in the machine, defined as the following:

  • after_login
  • delete
  • fetch
  • file_list
  • login
  • logout
  • password
  • put
  • start

Each of these states have an entry in the graph as well as a method in the python implementation of the state machine.

Looking at the python implementation, there are a few items of note. The methods start and after_login are empty methods. These methods are used as a common point for the directed graph to connect into.

The start method is provided such that each iteration of walking the state machine starts at the same location.

The after_login method provides a common location in the graph that can such that each of the methods intended to be run in an arbitrary combinations once logins have occurred do not need to link to all of the rest of the methods.

In the state dictionary, the key login is used to track if the login state successfully completed. This shows the ability to store and access arbitrary information between states within the graph.

Sample Implementation Walk Through 2

This is a step-by-step walkthrough for the overly simple CADET_00001 Palindrome service. The service is comprised of a banner on the initial connect, a banner requesting the user enter a possible palindrome, the user entering a palindrome, and the service returning back a response. We will build the state machine and the graph for this simple service.

Starting in the CADET_00001 directory edit poller/for-release/machine.py:

Python module dependancies

Service poll generators are required to import the Actions module. Any others are up to the author. In our example we also use the random and string modules.

from generator.actions import Actions
import random
import string

Service poll generation class

The service poll generator expects a class derived from the CGC Actions class. This class will define each individual state in a service poller state machine. This class should also contain an empty start member which serves as the entry to the state machine.

class Palindrome(Actions):
    def start(self):
        pass

The palindrome service always begins with a "Welcome to Palindrome Finder" banner with an extra carriage return before it. Our state machine needs confirm the presence of this banner.

def banner(self):
    # Confirm the initial empty line
    self.read(delim='\n', expect='\n')
    # Confirm the actual banner
    self.read(delim='\n', expect='Welcome to Palindrome Finder\n')

State graph

We've created our first node in the state machine, now we need to connect it in the graph. The state machine graph is constructed in the state-graph.yaml file. It is composed of two sections. 1. Node definitions 2. Node connections

The format of the file looks like

---
nodes:
    <add nodes here>
edges:
    <add edges here>

Initial node definitions

We created two nodes, start and banner, which must be defined in the state tree:

    - name: start
    - name: banner

We also need to define the edge for the state machine to traverse from the fictional start node into the banner:

    - start: banner

Testing

We have a very rudimentary service poller generator that this point which can be tested. Generate one service poll via:

$ generate-polls --count 1 poller/for-release/machine.py poller/for-release/state-graph.yaml poller/for-release
  • Please note that the above generation example will fail with more complex graphs since a single generation won't hit all of the nodes. The count must be expanded for these more complex graphs

A single service poll was created in poller/for-release/gen_000000.xml This service poll can be manually inspected or tested against the challenge binary with:

$ cb-test --cb CADET_00001 --directory bin --xml poller/for-release/gen_000000.xml

The expected output indicates that strings matched:

# service - poller/for-release/gen_000000.xml
ok 1 - match: string
ok 2 - match: string
# passed: 2
# failed: 0
# total passed: 2
# total failed: 0

An alternative testing strategy involves spawning the CB in one window while connecting to it with the cb-test utilility like so:

Window 1: sudo cb-server --insecure --once -p 12345 -d . bin/CADET_00001
Window 2: cb-replay --failure_ok --host 127.0.0.1 --port 12345 poller/for-release/gen_000000.xml

Handling palindrome requests

Following the banner, the service asks the user for possible palindromes. We must parse this request and test it with palindromes and non-palindromes.

def request(self):
    # Skip the empty line
    self.read(delim='\n', expect='\n')
    # Confirm the request
    self.read(length=37, expect='\tPlease enter a possible palindrome: ') 

def palindrome(self):
    halfword = self.random_string(random.randint(1, 16))
    self.write(halfword + halfword[::-1] + "\n")
    self.read(delim='\n', expect='\t\tYes, that\'s a palindrome!\n\n')

def not_palindrome(self):
    word = self.random_string(random.randint(2, 32))
    while self.is_palindrome(word):
            word = self.random_string(random.randint(2, 32))
    self.write(word + "\n")
    self.read(delim='\n', expect='\t\tNope, that\'s not a palindrome\n\n')

Helper functions

def is_palindrome(self, word):
    for i in range(0, len(word) / 2):
            if word[i] != word[-i - 1]:
                    return False
    return True

def random_string(self, size):
    chars = string.letters + string.digits
    return ''.join(random.choice(chars) for _ in range(size))

Node definitions

We created three nodes which must be defined in the state tree: request, palindrome, and not_palindrome.

In the state tree:

    - name: request
      continue: .9
    - name: palindrome
    - name: not_palindrome

Note the continue entry. There is a 90% chance that the poll generator will continue down the state graph on every service poll and a 10% chance that the service poll will stop.

We also need to define the edges for the state machine to traverse these nodes. First we link the previous banner node into the request node and then into the palindrome and not_palindrome nodes.

    - banner: request
    - request: palindrome
      weight: .2
    - request: not_palindrome
      weight: .8

Note that each of the request edges contains a weight. 20% of the time the request will lead to a palindrome and 80% of the time it will lead to a non-palindrom.

Finally we must link both palindrome and not_palindrome back into the request since this is effectively a circular state graph.

    - palindrome: request
    - not_palindrome: request

Evaluating the generated polls

The poll generated creates an edges.png and a nodes.png in the output directory. These should be manually inspected to confirm the desired graph traversals and nodes are reached.

Repeated Polls

A CRS is able to configure a submitted POV to be used more than once per round in case a POV is unreliable. The repeated use of a POV in a given round would allow for trivial identification of POVs via traffic analysis because polls are intended to never be reused through the course of the competition. The command line arguments --repeat and --duplicate provide the ability for the CB author to purposely reuse polls in a given round to mimic the potential of a POV being repeated used.

The command line argument --duplicate specifies the number of polls that should appear multiple times in a round. The command line argument --repeat specifies the maximum number of times a poll that is repeated should appear in a round, which is variable between 1 and the specified amount.

The ability to repeat polls provides the CB author the ability to include repeated content, which reduces the ability of a CRS to identify POV traffic purely by content repetition.