Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: testing: Collect test failure logs on the testing.common struct and expose collected logs via a new method on testing.TB #61860

Open
SophisticaSean opened this issue Aug 8, 2023 · 2 comments
Labels
Milestone

Comments

@SophisticaSean
Copy link

Problem:

During test runtime there is very little programmatic exposure from the go/testing stdlib on what has gone wrong during your tests. Currently, t.Error and the like append to the testing log as your tests run, which is adequate for most use cases. The problem arises if you'd like to get real-time programmatic feedback, similar to how we expose test status via t.Failed() or t.Skipped(). There is not a way, currently, to ask TB what testing errors occurred during your runtime. As such, we lack the ability to customize our test output or instrument our tests via tracing until the end of the go test execution.

Solution:

The currently drafted solution to address this would be to change the API in the following ways:
public testing API (rough draft, happy to change this API)

+pkg testing, method (*B) FailureMsg() string
+pkg testing, method (*F) FailureMsg() string
+pkg testing, method (*T) FailureMsg() string
+pkg testing, type TB interface, FailureMsg() string

Internal testing api changes:

+testing.common.failureMsg  []byte

We then change testing.common.logDepth to append all incoming log requests to both c.output AND c.failureMsg.
https://github.com/SophisticaSean/go/blob/sophisticasean-add-failure-messages-to-tests/src/testing/testing.go#L1018

API changes in a forked golang/go repo here: master...SophisticaSean:go:sophisticasean-add-failure-messages-to-tests

There is a point to be made on where failureMsg should live. I added it to the common struct so that Fuzz, Bench and Testing can all 3 take advantage of its utility. I'm happy to push this change to just Testing.T if appropriate. I would love to also build tooling around seeing my Bench and Fuzz failures in real-time.

Proposal Questionnaire:

  • Would you consider yourself a novice, intermediate, or experienced Go programmer?
    • Experienced, ~10 years of production Go experience.
  • What other languages do you have experience with?
    • Javascript, Elixir, Python, Java
  • Would this change make Go easier or harder to learn, and why?
    • No, the change adds an optional method for the testing.T struct.
  • Has this idea, or one like it, been proposed before?
    • Not to my knowledge, I searched issues and proposals from the past and have not found anything pertinent.
  • If so, how does this proposal differ?
    • N/A
  • Who does this proposal help, and why?
    • Testing library authors, people who want real-time granular error feedback per-test in Go.
  • What is the proposed change?
    • Add an additional field to testing.T and method to extract only t.Error msgs for the given t.(* common).
  • Please describe as precisely as possible the change to the language.
    • Further information is below.
  • What would change in the language spec?
    • No, it only extends the testing API.
  • Please also describe the change informally, as in a class teaching Go.
    • It's not possible to get errors from the currently running test at runtime. The logs only go to the test log internal to testing, and streamed to the console or a file. This change adds a function to testing that allows you to get all testing errors for the currently running test, and that alone, in real time as the test is running/exiting.
  • Is this change backward compatible?
    • Yes.
  • Show example code before and after the change:
     func TestFoo(t *testing.T) {
      t.Error("some failure")
     }
    func TestFoo(t *testing.T) {
      t.Error("some failure")
      fmt.Println(t.FailureMsg() == "some failure") // should be truthy
     }
  • What is the cost of this proposal? (Every language change has a cost).
    • Very low, I have most the code completed.
  • How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?
    • Only go test will be affected.
  • What is the compile time cost?
    • None.
  • What is the run time cost?
    • Marginal extra memory cost depending on how many test failures you have in a given run, if this is significant enough we can turn this on/off via a flag of some sort if we deem that necessary.
    • I tested this vs go 1.20.6 and the difference in test runtime on a large test suite (10k tests) was not statistically significant.
  • Can you describe a possible implementation?
    • Handled below.
  • Do you have a prototype? (This is not required.)
    • Yes, described above.
  • How would the language spec change?
    • It would not.
  • Orthogonality: how does this change interact or overlap with existing features?
    • It slightly overlaps with the test log, but not significantly.
  • Is the goal of this change a performance improvement?
    • No.
  • Does this affect error handling?
    • No.
  • Is this about generics?
    • No.

Prior Art

I think #43620, and thus it's adoption, is very similar. A rather simple new method added to extend the functionality of Testing.B.
I spent a couple hours perusing proposals and haven't found any that seem to get as granular and actionable as the proposal set forth here.

Further information

I'm working on continuing the work in this library: https://github.com/DataDog/dd-sdk-go-testing, and thus providing real-time test tracing functionality to Go testing suites (would love to get this working in opentelemetry as well). Since Go does not expose failure messages/logs from tests until the end of execution, it's infeasible to provide real time feedback to users on their test traces. Currently, everything but test log information is easily surfaced.

We're able to extract actionable test stacktraces (on t.Fatal or t.FailNow or stretchr/require.*):
image
While the stacktrace here is indeed helpful, it doesn't describe or expose what the issue was with this particular line. If this was an require.Equal(t, someStruct, someOtherStruct) we'd have no feedback to discern what the difference was on this test and we'd have to comb through the test logs to find the output.

The issue gets worse if you use non-Fatal or FailNow constructs like t.Error. Since these errors don't stop test execution we are unable to get a useful stack trace at all from the stack at the end of test execution to point the engineer at what lines may have failed. Worse yet, is due to their non-blocking nature, we can't get the user actionable info on what went wrong during the test. That information only exists in testing.(c *common).log. For a test like this:

func TestDELETEME_SHOULD_FAIL_ONLY_ASSERTS(t *testing.T) {
        t.Parallel()

        output := 0
        if output != 1 {
          t.Error("output was not 1")
        }
	assert.Equal(t, 0, 1)
	assert.True(t, false)
	assert.False(t, true)
	assert.True(t, true)
}

We receive almost no information about the failure logs and violated conditions for this test during runtime, resulting in no error messages and no stack trace to help the user identify and fix the test.
example-app

With the changes proposed here, the error output (for the same test TestDELETEME_SHOULD_FAIL_ONLY_ASSERTS above) is much improved. We are now able to use t.FailureMsg() to retrieve all testing.Error messages accumulated during that test's runtime. We then use the output of t.FailureMsg() and pass it up as the ErrorMsg on the span/trace of that test.
image

We can also see that this enables great test table output as well:

func TestDELETEME_SHOULD_ASSERTS_TEST_TABLE(t *testing.T) {
	t.Parallel()

	tests := map[string]struct {
		input          int
		expectedResult int
		shouldBreak    bool
	}{
		"ok": {
			input:          2,
			expectedResult: 1,
		},
		"invalid": {
			input:          1,
			expectedResult: 1,
		},
		"broken": {
			input:          1,
			expectedResult: 1,
			shouldBreak:    true,
		},
	}
	for name, tt := range tests {
		tt := tt
		t.Run(name, func(t *testing.T) {
			assert := assert.New(t)

			result := tt.input - 1
			if result != tt.expectedResult {
				t.Errorf("output was not %v", tt.expectedResult)
			}
			assert.Equal(result, tt.expectedResult)

			if tt.shouldBreak {
				assert.True(false)
			}
		})
	}
}

image

This work would also allow other testing libraries, like ginkgo or testify/assert/suite to collate and provide realtime feedback to their library consumers.
This would also allow users to add a wrapper that can log out arbitrary test failure messages with whatever metadata they'd like to add (useful for extra timestamps, arbitrary logging metadata tagging, json test log output to Datadog, Splunk, logstash, etc.), and would provide an API for people concerned about issues like #28544 to use.
This also allows users to arbitrarily log and track long-running integration test suites in real-time.

This work has no backwards-incompatibility issues as we're only adding functionality to the testing pkg.

Alternative approaches to solving this problem:

I've been considering this for some time now, and there's 2 possible alternative routes to solve for this.

  1. Wrap any downstream t.Error/t.Fatal/etc. calls in a custom library that takes the msg argument provided and appends it to some bespoke struct internal to that library. That custom library can then expose the custom struct information properly associated with the current test running. We'd then need to write a linter that disallows usage of any non-wrapped downstream t.Error usages in our applications, which would be complicated. Without the linter, any vanilla usage of Go testing Error funcs or any libraries would produce invisible-to-our-custom-library errors and we'd get mixed visibility into the results of our tests. This approach is non-optimal because it would disallow the usage of popular testing libraries like stretchr/assert without an explicit, complicated, and difficult to maintain wrapper around any and all testing libraries.

  2. Write a higher level tool that wraps go test -json. This tool would have to do the following:

    • Wrap and run go test.
    • Force trace collection to pause and not flush traces in realtime (extra memory footprint for test tracing).
    • Filter through the resulting go test json and inject the testing logs from a given test into their staged traces in memory.
    • Flush updated traces.
      This approach is non-optimal because it's a fragile interface that would be prone to failures and misalignment with the underlying toolset.

Additional Considerations

I've been writing Go for the better part of 10 years and as such I don't often believe in copying the features/design patterns from other languages into Go.
In other languages like Elixir or Python, it's rather trivial to implement a monkey-patch or a macro to solve this problem. Due to the design constraints of Go this becomes much more difficult to solve external to implementing this feature at the go/testing level, as such that's why I believe this feature is helpful and necessary.

@gopherbot gopherbot added this to the Proposal milestone Aug 8, 2023
@bcmills
Copy link
Contributor

bcmills commented Aug 8, 2023

Huh. Would it be feasible to run the test with -test.v=test2json and plumb the output back to stdin via pipes to and from a cmd/test2json subprocess? 🤔

@SophisticaSean
Copy link
Author

SophisticaSean commented Aug 8, 2023

Huh. Would it be feasible to run the test with -test.v=test2json and plumb the output back to stdin via pipes to and from a cmd/test2json subprocess? 🤔

That's a spicy idea. I can try that.

I think the issue will still remain that the parent process is disconnected from the test runtime, and thus disconnected from the test trace. We have duration, test suite name, test name, etc. avail on t right now, so we can wrap the test in a trace no problem, I'm not sure how you'd expose that wrapped trace info up into the parent process to cmd/test2json. Similar to how gotestsum works currently.

Walking through how cmd/test2json works, we'd have to implement a custom test runner to elevate the traces up to the same level. We'd have to run tests one at a time in order to keep per test trace granularity working as well.

Looking through test2json output, it doesn't seem to have accurate enough output around Elapsed. Most tests I'm seeing have an Elapsed of 0. Whereas you can see we have very granular microsecond precision on test time by running a timer wrapper around each test:
image

While the 0.0 granularity is mostly human-friendly, once you have hundreds to thousands of tests those 1ms tests start to add up and are not currently reflected in my testing of the test2json output.

Maybe I'm missing something though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

3 participants