Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/trace: TestTraceStress failed #9729

Closed
reusee opened this issue Jan 29, 2015 · 22 comments
Closed

runtime/trace: TestTraceStress failed #9729

reusee opened this issue Jan 29, 2015 · 22 comments
Milestone

Comments

@reusee
Copy link

reusee commented Jan 29, 2015

-> go version
go version devel +0e80b2e Thu Jan 29 13:07:30 2015 +0000 linux/amd64

-> go test -run TraceStress -v
=== RUN TestTraceStress
--- FAIL: TestTraceStress (1.71s)
trace_test.go:207: failed to parse trace: p 1 is running before start (offset 128960, time 1475855)
FAIL
exit status 1
FAIL runtime/pprof 1.718s

complete trace file: https://drive.google.com/file/d/0B7KKArVsnMd2WldQRDB4VXlOQXc/view?usp=sharing

@adg
Copy link
Contributor

adg commented Jan 29, 2015

@dvyukov

@dvyukov
Copy link
Member

dvyukov commented Jan 30, 2015

What hardware do you use?
The first thing that I would imagine is that cputicks are broken on some kind of broken virtual hardware.

@reusee
Copy link
Author

reusee commented Jan 30, 2015

-> uname -a
Linux reuspc 3.18.2-2-ARCH #1 SMP PREEMPT Fri Jan 9 07:37:51 CET 2015 x86_64 GNU/Linux

-> cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 107
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+
stepping : 1
cpu MHz : 1800.000
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch lbrv vmmcall
bugs : apic_c1e fxsave_leak
bogomips : 3601.74
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

processor : 1

@dvyukov
Copy link
Member

dvyukov commented Jan 30, 2015

I see, thank you.
This may be due to the processor, maybe it does not support synchronization of RDTSC or something.
I don't know what to do with it yet...

@adnaan
Copy link

adnaan commented Feb 3, 2015

On Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz

$go version
go version devel +3c4be23 Tue Feb 3 00:28:27 2015 +0000 linux/amd64

$go test -run TraceStress -v
=== RUN TestTraceStress
--- FAIL: TestTraceStress (0.30s)
trace_test.go:203: failed to parse trace: p 2 is running before start (offset 50963, time 2945641)
FAIL
exit status 1
FAIL runtime/pprof 0.300s

$./all.bash
...

ok runtime/debug 0.012s
--- FAIL: TestTraceStress (0.47s)
trace_test.go:203: failed to parse trace: g 6 is not runnable before start (offset 64687, time 1754450)
FAIL
FAIL runtime/pprof 6.979s
? runtime/race [no test files]

....

$uname -a
Linux adnaan 3.17.6-1-ARCH #1 SMP PREEMPT Sun Dec 7 23:43:32 UTC 2014 x86_64 GNU/Linux

$ cat /proc/cpuinfo
https://gist.github.com/adnaan/efa69f6d1de3ea12c113

@mewmew
Copy link
Contributor

mewmew commented Feb 5, 2015

It is also failing for me, although with another error message ("failed to parse trace: bogus scanning end"). The same error was reported by one of the build bots:
https://build.golang.org/log/c73afafef42941323274c760361014004af0f40f

$ go test -run TestTraceStress -v
=== RUN TestTraceStress
--- FAIL: TestTraceStress (1.14s)
    trace_test.go:203: failed to parse trace: p 1 is not running before stop (offset 64491, time 9069336)
FAIL
exit status 1
FAIL    runtime/pprof   1.147s

$ go test -run TestTraceStress -v
=== RUN TestTraceStress
--- FAIL: TestTraceStress (1.59s)
    trace_test.go:203: failed to parse trace: g 10 is not runnable before start (offset 128937, time 1722066)
FAIL
exit status 1
FAIL    runtime/pprof   1.600s

$ go test -run TestTraceStress -v
=== RUN TestTraceStress
--- FAIL: TestTraceStress (1.08s)
    trace_test.go:203: failed to parse trace: p 3 is running before start (offset 180061, time 3248992)
FAIL
exit status 1
FAIL    runtime/pprof   1.085s

$ go test -run TestTraceStress -v
=== RUN TestTraceStress
--- FAIL: TestTraceStress (1.65s)
    trace_test.go:203: failed to parse trace: g 4 is not running while GoPreempt (offset 129604, time 1657874)
FAIL
exit status 1
FAIL    runtime/pprof   1.660s
$ go version
go version devel +cb4b28e Thu Feb 5 10:23:35 2015 +0000 linux/amd64
$ uname -a
Linux x61s 3.18.5-1-ARCH #1 SMP PREEMPT Fri Jan 30 07:31:50 CET 2015 x86_64 GNU/Linux
$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 15
model name  : Intel(R) Core(TM)2 Duo CPU     L7500  @ 1.60GHz
stepping    : 11
microcode   : 0xba
cpu MHz     : 1601.000
cache size  : 4096 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 10
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida dtherm tpr_shadow vnmi flexpriority
bugs        :
bogomips    : 3193.78
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 15
model name  : Intel(R) Core(TM)2 Duo CPU     L7500  @ 1.60GHz
stepping    : 11
microcode   : 0xba
cpu MHz     : 800.000
cache size  : 4096 KB
physical id : 0
siblings    : 2
core id     : 1
cpu cores   : 2
apicid      : 1
initial apicid  : 1
fpu     : yes
fpu_exception   : yes
cpuid level : 10
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida dtherm tpr_shadow vnmi flexpriority
bugs        :
bogomips    : 3193.78
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

@quinte17
Copy link

similar problem

ok      runtime/debug   0.017s
--- FAIL: TestTraceStress (3.78s)
    trace_test.go:203: failed to parse trace: g 48 is not runnable before start (offset 130776, time 1006292)
FAIL
FAIL    runtime/pprof   11.042s
?       runtime/race    [no test files]
$ uname -a
Linux monkey 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:45:15 UTC 2015 i686 i686 i686 GNU/Linux
$ go version
go version devel +52dadc1 Sun Feb 15 08:52:14 2015 +0000 linux/386
$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 14
model name  : Genuine Intel(R) CPU           T2500  @ 2.00GHz
stepping    : 8
microcode   : 0x39
cpu MHz     : 1000.000
cache size  : 2048 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fdiv_bug    : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 10
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm dtherm
bogomips    : 3990.20
clflush size    : 64
cache_alignment : 64
address sizes   : 32 bits physical, 32 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 14
model name  : Genuine Intel(R) CPU           T2500  @ 2.00GHz
stepping    : 8
microcode   : 0x39
cpu MHz     : 1000.000
cache size  : 2048 KB
physical id : 0
siblings    : 2
core id     : 1
cpu cores   : 2
apicid      : 1
initial apicid  : 1
fdiv_bug    : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 10
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm dtherm
bogomips    : 3990.20
clflush size    : 64
cache_alignment : 64
address sizes   : 32 bits physical, 32 bits virtual
power management:

@mdempsky
Copy link
Member

@dvyukov In case it helps, I can reproduce this issue 100% on my temporary Z620 in TOK... but not all on my permanent one back in SFO.

SFO (does NOT repro):

$ uname -a
Linux mdempsky-z620.sfo.corp.google.com 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ sed '/^$/q' /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 45
model name  : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
stepping    : 7
microcode   : 0x710
cpu MHz     : 1200.000
cache size  : 20480 KB
physical id : 0
siblings    : 16
core id     : 0
cpu cores   : 8
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5786.06
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

TOK (DOES repro):

$ uname -a
Linux mdempsky.tok.corp.google.com 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ sed '/^$/q' /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 62
model name  : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping    : 4
microcode   : 0x428
cpu MHz     : 1200.000
cache size  : 25600 KB
physical id : 0
siblings    : 20
core id     : 0
cpu cores   : 10
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips    : 5586.16
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Let me know if granting you remote access would be helpful in investigating. I'll have it for at least two more weeks.

@dvyukov
Copy link
Member

dvyukov commented Feb 17, 2015

I will appreciate access to a machine that can reproduce this. I will contact you offline.
What's weird is that my desktop is z620, however with slightly different CPU E5-2690 @ 2.90GHz. I also turned off frequency scaling, so it always runs at 2.9 as opposed to your 1.2. But I would say that lowever frequency should make it less reproducible...

@dvyukov
Copy link
Member

dvyukov commented Feb 17, 2015

I can reproduce it on your machine.

I wrote the following program to investigate it.
There are 2 problems:

  1. The first is easier. We need to issue lfence/mfence before rdtsc. Without it rdtsc gives slightly skewed values even with synchronized clocks:
21-11: BROKEN 21949834462233 -> 21949834462016 (diff=217)
..
21-13: BROKEN 21949862105390 -> 21949862105216 (diff=174)
.........................................
23-11: BROKEN 21950170054450 -> 21950170054299 (diff=151)

But diffs are never larger than ~250.

  1. The harder one. The machine has non-synchronized clocks:
30-28: BROKEN 4421327030055133 -> 4421326924776785 (diff=105278348)
.
30-29: BROKEN 4421327045449761 -> 4421326927940720 (diff=117509041)
..
31-1: BROKEN 4421327005789964 -> 4421326997345549 (diff=8444415)
.
31-2: BROKEN 4421327024981920 -> 4421327022851416 (diff=2130504)
.
31-3: BROKEN 4421327044067804 -> 4421327020646358 (diff=23421446)
.
31-4: BROKEN 4421327062301860 -> 4421327047504514 (diff=14797346)

I hoped this problem is left in stone age. Linux must do its best to synchronize clocks today, so maybe it's bad BIOS or processor microcode.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <pthread.h>
#include <sched.h>
#include <errno.h>

void* thr(void *arg);
uint64_t sig, ticks0, ticks1;

int main() {
    cpu_set_t mask0;
    if (sched_getaffinity(0, sizeof(mask0), &mask0)) {
        perror("sched_getaffinity");
        exit(1);
    }
    int i;
    for (i = 0; i < CPU_SETSIZE; i++) {
        if (!CPU_ISSET(i, &mask0))
            break;
        int j;
        for (j = 0; j < CPU_SETSIZE; j++) {
            if (i == j || !CPU_ISSET(j, &mask0))
                break;
            printf(".");
            fflush(0);
            sig = ticks0 = ticks1 = 0;

            pthread_t th[2];
            pthread_attr_t attr[2];
            cpu_set_t mask[2];

            CPU_ZERO(&mask[0]);
            CPU_SET(i, &mask[0]);
            pthread_attr_init(&attr[0]);
            pthread_attr_setaffinity_np(&attr[0], sizeof(attr[0]), &mask[0]);
            pthread_create(&th[0], &attr[0], thr, (void*)1);

            CPU_ZERO(&mask[1]);
            CPU_SET(j, &mask[1]);
            pthread_attr_init(&attr[1]);
            pthread_attr_setaffinity_np(&attr[1], sizeof(mask[1]), &mask[1]);
            pthread_create(&th[1], &attr[1], thr, 0);

            pthread_join(th[0], 0);
            pthread_join(th[1], 0);
            pthread_attr_destroy(&attr[0]);
            pthread_attr_destroy(&attr[1]);

            if (ticks0 != 0)
                printf("\n%d-%d: BROKEN %lu -> %lu (diff=%lu)\n", i, j, ticks0, ticks1, ticks0 - ticks1);
        }
    }
    printf("\n");
    return 0;
}

uint64_t cputicks() {
    uint64_t low, high;
    __asm__ __volatile__("lfence; rdtsc" : "=a" (low), "=d" (high));
    return (high << 32) | low;
}

void* thr(void *arg) {
    int i;
    for (i = 0; i < 10000; i++) {
        if (arg != 0) {
            uint64_t t0 = cputicks();
            __atomic_store_n(&sig, 1, __ATOMIC_RELEASE);
            while (__atomic_load_n(&sig, __ATOMIC_RELAXED) == 1) {
            }
            uint64_t t1 = __atomic_load_n(&sig, __ATOMIC_ACQUIRE);
            if (t1 <= t0) {
                ticks0 = t0;
                ticks1 = t1;
            }
        } else {
            while (__atomic_load_n(&sig, __ATOMIC_ACQUIRE) != 1) {
            }
            __atomic_store_n(&sig, cputicks(), __ATOMIC_RELEASE);
        }
    }
    return 0;
}

@dvyukov
Copy link
Member

dvyukov commented Feb 17, 2015

We can manully synchronize RDTSC on different cores using RDTSCP instruction. Namely, measure RTT between all cores and offsetting values of RDTSCP by computed deltas.
RDTSCP is available on quite old AMD processors. But Intel started supporting it only since Nehalem (2008).
I am not sure whether it is worth it.

dvyukov added a commit that referenced this issue Feb 20, 2015
See the following issue for context:
#9729 (comment)
In short, RDTSC can produce skewed results without preceding LFENCE/MFENCE.
Information on this matter is very scrappy in the internet.
But this is what linux kernel does (see rdtsc_barrier).
It also fixes the test program on my machine.

Update #9729

Change-Id: I3c1ffbf129fdfdd388bd5b7911b392b319248e68
Reviewed-on: https://go-review.googlesource.com/5033
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@capnm
Copy link

capnm commented Mar 1, 2015

FYI the test still 100% fails on this AMD cpu:

git rev-parse HEAD
bc674b3d16cd9e7d25f8b23d0562c954b7a979be

--- FAIL: TestTraceStress (2.32s)
    trace_test.go:204: failed to parse trace: p 2 is running before start (offset 22, time 4773116)
--- FAIL: TestTraceStressStartStop (0.32s)
    trace_test.go:343: failed to parse trace: g 59 is not runnable before start (offset 58555, time 8890466)
FAIL
FAIL    runtime/pprof   9.485s

cat /proc/cpuinfo 
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model       : 107
model name  : AMD Athlon(tm) Dual Core Processor 4850e
stepping    : 2
cpu MHz     : 1000.000
cache size  : 512 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 1
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch lbrv vmmcall
bogomips    : 2004.55
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

processor   : 1
...

dmesg, if that means something:
[    0.000000] hpet clockevent registered
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 2505.691 MHz processor
[    0.000000] tsc: Marking TSC unstable due to TSCs unsynchronized

./tcc-sync-test
1-0: BROKEN 68817623561008 -> 68817620807021 (diff=2753987)
1-0: BROKEN 68821314770710 -> 68821312016488 (diff=2754222)
1-0: BROKEN 68822550699192 -> 68822547945079 (diff=2754113)
...

LKML: "Brunner, Richard": TSC and Power Management Events on AMD Processors
https://lkml.org/lkml/2005/11/4/173

"this patch syncs the TSCs every 20 seconds. That is enough to sync
up AMD CPUs where the TSC slows down slightly"
https://lkml.org/lkml/2008/9/26/35

@bradfitz
Copy link
Contributor

bradfitz commented Mar 1, 2015

Also saw on OpenBSD-amd64:

https://storage.googleapis.com/go-build-log/114bbd11/openbsd-amd64-gce56_7da78022.log

--- FAIL: TestTraceStressStartStop (0.05s)
    trace_test.go:343: failed to parse trace: EvStack has bad number of frames at offset 0x1622: 16045725885737590445
FAIL
FAIL    runtime/pprof   7.977s

@mewmew
Copy link
Contributor

mewmew commented Mar 13, 2015

FYI. Tried again today using a revision which includes the "fix cputicks on x86" patch, and it still fails.

[~/go/src]$ go version
go version devel +3eb84c8 Fri Mar 13 05:22:55 2015 +0000 linux/amd64

[~/go/src/runtime/pprof]$ go test
--- FAIL: TestTraceSymbolize (7.63s)
    trace_stack_test.go:127: failed to parse trace: p 0 is running before start (offset 215, time 701683846)
--- FAIL: TestTraceStress (23.24s)
    trace_test.go:219: failed to parse trace: p 2 is running before start (offset 64596, time 67699491)
--- FAIL: TestTraceStressStartStop (0.87s)
    trace_test.go:358: failed to parse trace: g 38 is not runnable before start (offset 242017, time 1447)
FAIL
exit status 1
FAIL    runtime/pprof   69.825s

This is on the same hardware as last time.

[~]$ uname -a
Linux x61s 3.18.6-1-ARCH #1 SMP PREEMPT Sat Feb 7 08:44:05 CET 2015 x86_64 GNU/Linux
[~]$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 15
model name  : Intel(R) Core(TM)2 Duo CPU     L7500  @ 1.60GHz
stepping    : 11
microcode   : 0xba
cpu MHz     : 800.000
cache size  : 4096 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 10
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida dtherm tpr_shadow vnmi flexpriority
bugs        :
bogomips    : 3193.91
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 15
model name  : Intel(R) Core(TM)2 Duo CPU     L7500  @ 1.60GHz
stepping    : 11
microcode   : 0xba
cpu MHz     : 800.000
cache size  : 4096 KB
physical id : 0
siblings    : 2
core id     : 1
cpu cores   : 2
apicid      : 1
initial apicid  : 1
fpu     : yes
fpu_exception   : yes
cpuid level : 10
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida dtherm tpr_shadow vnmi flexpriority
bugs        :
bogomips    : 3193.91
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

@nqv
Copy link
Contributor

nqv commented Mar 19, 2015

Same error on my very old AMD machine:

~/go/src/runtime/pprof$ go version
go version devel +b90638e Wed Mar 18 23:52:24 2015 +0000 linux/amd64

~/go/src/runtime/pprof$ go test 
--- FAIL: TestTraceSymbolize (3.99s)
        trace_stack_test.go:127: failed to parse trace: g 47 is not running while GoSysBlock (offset 305, time 0)
--- FAIL: TestTraceStress (4.61s)
        trace_test.go:219: failed to parse trace: g 67 is not runnable before start (offset 247, time 50276463)
--- FAIL: TestTraceStressStartStop (1.08s)
        trace_test.go:358: failed to parse trace: g 86 is not runnable before start (offset 236, time 16670)
--- FAIL: TestTraceFutileWakeup (1.71s)
        trace_test.go:422: failed to parse trace: p 0 is running before start (offset 237, time 5824664)
~$ uname -a
Linux compaq 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux
~$ cat /proc/cpuinfo
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 17
model       : 3
model name  : AMD Athlon Dual-Core QL-60
stepping    : 1
microcode   : 0x2000032
cpu MHz     : 950.000
cache size  : 512 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 1
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch osvw skinit hw_pstate lbrv svm_lock nrip_save
bogomips    : 3799.80
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor   : 1
vendor_id   : AuthenticAMD
cpu family  : 17
model       : 3
model name  : AMD Athlon Dual-Core QL-60
stepping    : 1
microcode   : 0x2000032
cpu MHz     : 950.000
cache size  : 512 KB
physical id : 0
siblings    : 2
core id     : 1
cpu cores   : 2
apicid      : 1
initial apicid  : 1
fpu     : yes
fpu_exception   : yes
cpuid level : 1
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch osvw skinit hw_pstate lbrv svm_lock nrip_save
bogomips    : 3800.19
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

@bradfitz
Copy link
Contributor

Dmitry, reassign as necessary.

@rsc
Copy link
Contributor

rsc commented Apr 17, 2015

I don't believe this is the same problem as #10476. TestTraceStress had real (non-RDTSC-related) synchronization problems that Austin and Rick found and fixed. The failures in TestTraceStressStartStop are different, I believe. They're always about system calls, which are still not traced quite right. To avoid confusion, I'm going to close this and reopen #10476.

@rsc rsc closed this as completed Apr 17, 2015
@hyangah
Copy link
Contributor

hyangah commented Apr 20, 2015

Is it fixed?
I still see TestTraceStress failures in a docker container running on my mac (os x yosemite+boot2docker).

Dockerfile is in https://go-review.googlesource.com/#/c/9077/

In the container, the go version is:

devel +3f4de49 Mon Apr 20 15:56:56 2015 +0000

root@0e6edb498432:/go/src/runtime/pprof# go test -run TraceStress -v
=== RUN TestTraceStress
--- FAIL: TestTraceStress (1.08s)
trace_test.go:224: failed to parse trace: p 2 is not running before stop (offset 46931, time 264675913)
=== RUN TestTraceStressStartStop
--- SKIP: TestTraceStressStartStop (0.00s)
trace_test.go:232: test is unreliable; issue #10476
FAIL
exit status 1
FAIL runtime/pprof 1.092s

@dvyukov
Copy link
Member

dvyukov commented Apr 20, 2015

No, it is not resolved.
Does it fail outside of container? If yes, then there is something wrong with OS and/or CPU. If no, then is something wrong with the container.

@hyangah
Copy link
Contributor

hyangah commented Apr 20, 2015

It doesn't fail outside of container (btw that is darwin/amd64). I don't see any failure reports from the build dashboard or my personal ubuntu machine currently. So, it's probably the docker container in my case.

@capnm
Copy link

capnm commented May 23, 2015

Does it fail outside of container? If yes, then there is something wrong with OS and/or CPU. If no, then is something wrong with the container.

Yes, it fails on some (arguably misdesigned, see e.g. https://lkml.org/lkml/2005/11/4/173 ) CPUs. Nothing changed since the original report. This is bare Ubuntu LTS 14.04:

../bin/go version
go version devel +eeb64b7 Sat May 23 05:22:57 2015 +0000 linux/amd64

--- FAIL: TestTraceSymbolize (0.02s)
    trace_stack_test.go:130: failed to parse trace: g 5 is not waiting during syscall exit (offset 463, time 0)
--- FAIL: TestTraceStress (1.78s)
    trace_test.go:224: failed to parse trace: g 5 is not waiting during syscall exit (offset 19, time 719)
--- FAIL: TestTraceFutileWakeup (0.04s)
    trace_test.go:432: failed to parse trace: g 135 is not runnable before start (offset 33487, time 17751)
FAIL
FAIL    runtime/pprof   8.399s

@alexbrainman
Copy link
Member

What is the processor? And OS version?

It is vmware. Inside VM it says Intel Xeon CPU E5-2680 0 @ 2.70GHz, Windows 7.

We need to port this program to windows:
#9729 (comment)
It will allow to verify whether it is an issue with hardware/os, or hardware/os are ok and there is an issue with > tracer itself.

It looks like my pc has 1 CPU only. Your program seems like it needs multiple CPUs. Isn't it? Can I use Go?

Alex

@mikioh mikioh changed the title runtime/pprof: TestTraceStress failed runtime/trace: TestTraceStress failed Jul 31, 2015
@golang golang locked and limited conversation to collaborators Aug 5, 2016
@rsc rsc unassigned dvyukov Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests