Mocking a Textract LimitExceededException with boto

For s3-ocr issue #21 I needed to write a test that simulates what happens when Amazon Textract returns a "LimitExceededException". When using boto this error presents itself as an exception:

botocore.errorfactory.LimitExceededException: An error occurred (LimitExceededException) when calling the StartDocumentTextDetection operation: Open jobs exceed maximum concurrent job limit

I uses moto to simulate AWS in that test suite, but moto does not yet have a mechanism for simulating Textract errors like this one.

I ended up turning to Python mocks, here provided by the the pytest-mock fixture. Here's the test I came up with:

def test_limit_exceeded_automatic_retry(s3, mocker):
    mocked = mocker.patch("s3_ocr.cli.start_document_text_extraction")
    # It's going to fail the first time, then succeed
    mocked.side_effect = [
        {"JobId": "123"},
    runner = CliRunner()
    result = runner.invoke(cli, ["start", "my-bucket", "--all"])
    assert result.exit_code == 0
    assert result.output == (
        "Found 0 files with .s3-ocr.json out of 1 PDFs\n"
        "An error occurred (Unknown) when calling the StartDocumentTextExtraction operation: Unknown - retrying...\n"
        "Starting OCR for blah.pdf, Job ID: 123\n"

Here I'm patching the function identified by the string "s3_ocr.cli.start_document_text_extraction". This is a new function that I wrote specifically to make this mock easier to apply - it lives in s3_ocr/ and looks like this:

def start_document_text_extraction(textract, **kwargs):
    # Wrapper function to make this easier to mock in tests
    return textract.start_document_text_detection(**kwargs)

The most confusing thing about working with Python mocks is figuring out the string to use to mock the right piece of code. I like this pattern of refactoring the code under test to make it as simple to mock as possible.

The code I am testing here implements automatic retries. As such, I needed the API method I am simulating to fail the first time and then succeed the second time.

Originally I had done this with a side_effect() function - see below - but then @szotten on Twitter pointed out that you can instead set mock.side_effect to a list and it will cycle through those items in turn:

mocked.side_effect = [
    {"JobId": "123"},

Any exception objects in that list will be raised by the mocked function; any other kind of object will be returned.

The hardest thing to figure out was how to simulate the exception. The original error message indicated botocore.errorfactory.LimitExceededException but that's not actually a class you can import and raise.

Instead, I used boto3.client("textract").exceptions.LimitExceededException.

Figuring out that it needed an error_response and operation_name was tricky too. I eventually tracked down the botocore ClientError constructor, which showed me what I needed to provide:

class ClientError(Exception):
        'An error occurred ({error_code}) when calling the {operation_name} '
        'operation{retry_info}: {error_message}'

    def __init__(self, error_response, operation_name):
        retry_info = self._get_retry_info(error_response)
        error = error_response.get('Error', {})
        msg = self.MSG_TEMPLATE.format(
            error_code=error.get('Code', 'Unknown'),
            error_message=error.get('Message', 'Unknown'),
        self.response = error_response
        self.operation_name = operation_name

Using a side effect function

Prior to the tip about setting .side_effect to a list I used a side effect function instead, with a nonlocal variable to change its behaviour the second time it was called.

should_fail = True

def side_effect(*args, **kwargs):
    nonlocal should_fail
    if should_fail:
        should_fail = False
        raise boto3.client("textract").exceptions.LimitExceededException(
        return {"JobId": "123"}

mocked.side_effect = side_effect


Created 2022-08-07T10:37:11-07:00, updated 2022-08-07T12:37:51-07:00 · History · Edit