Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: efficiency of collection 1.5.3 vs 1.6 #15068

Closed
noxiouz opened this issue Apr 1, 2016 · 20 comments
Closed

runtime: efficiency of collection 1.5.3 vs 1.6 #15068

noxiouz opened this issue Apr 1, 2016 · 20 comments
Milestone

Comments

@noxiouz
Copy link

noxiouz commented Apr 1, 2016

  1. What version of Go are you using (go version)?
    go 1.5.3/1.6 linux amd64
  2. What operating system and processor architecture are you using (go env)?
    Trusty
  3. What did you do?
    I have an HTTPS proxy which uses cgo to work with a storage system.
    If the app is built using 1.5.3 memory usage is much lower than when go1.6 is used.
    gtraces were colelcted for both cases and are provided below. In both of them runtime.GC() is called every second. Even gocheck=2 does not lead to crash.
    1.5.3:
gc 4 @0.714s 0%: 0.16+0.50+3.8+9.9+0.17 ms clock, 0.64+0.50+0+0.007/3.4/10+0.69 ms cpu, 3->3->3 MB, 3 MB goal, 32 P
gc 5 @0.731s 0%: 0.029+0.98+1.5+8.6+0.28 ms clock, 0.11+0.98+0+0/0.018/4.4+1.1 ms cpu, 4->4->3 MB, 4 MB goal, 32 P
gc 6 @0.755s 0%: 0.47+0.29+2.5+2.9+0.30 ms clock, 1.9+0.29+0+0.031/0.036/4.7+1.2 ms cpu, 5->5->4 MB, 5 MB goal, 32 P
gc 7 @0.787s 0%: 0.50+0.41+4.5+0.18+0.35 ms clock, 2.0+0.41+0+0/0.021/4.3+1.4 ms cpu, 7->7->6 MB, 7 MB goal, 32 P
gc 8 @0.826s 0%: 0.50+0.36+1.8+9.9+0.40 ms clock, 2.0+0.36+0+0.015/0.34/8.9+1.6 ms cpu, 9->9->7 MB, 8 MB goal, 32 P
gc 9 @0.867s 0%: 0.49+0.57+0.026+10+0.32 ms clock, 1.9+0.57+0+0/5.2/3.9+1.3 ms cpu, 10->11->10 MB, 10 MB goal, 32 P
gc 10 @0.920s 0%: 0.082+0.74+2.4+1.7+0.36 ms clock, 0.33+0.74+0+0/3.1/8.7+1.4 ms cpu, 13->13->11 MB, 13 MB goal, 32 P
gc 11 @0.991s 0%: 0.57+0.34+0.83+2.4+0.63 ms clock, 2.2+0.34+0+0.007/8.8/4.9+2.5 ms cpu, 16->16->13 MB, 16 MB goal, 32 P
gc 12 @1.128s 0%: 0.072+0.39+2.8+0.12+0.44 ms clock, 0.29+0.39+0+0.028/0.008/5.2+1.7 ms cpu, 20->20->13 MB, 20 MB goal, 32 P
gc 13 @1.286s 0%: 0.090+0.32+0.024+6.0+0.67 ms clock, 0.36+0.32+0+0/6.5/3.9+2.7 ms cpu, 19->19->11 MB, 20 MB goal, 32 P
gc 14 @1.425s 0%: 0.046+0.31+0.014+1.6+0.33 ms clock, 0.18+0.31+0+0.007/1.9/4.2+1.3 ms cpu, 17->17->9 MB, 17 MB goal, 32 P
gc 15 @1.570s 0%: 0.047+0.42+0.036+2.7+0.60 ms clock, 0.19+0.42+0+0/2.5/3.1+2.4 ms cpu, 14->14->8 MB, 14 MB goal, 32 P
gc 16 @1.668s 0%: 0.046+0.48+0.012+2.3+0.43 ms clock, 0.18+0.48+0+0/3.4/6.2+1.7 ms cpu, 12->12->8 MB, 13 MB goal, 32 P
gc 17 @1.772s 0%: 0.045+0.32+5.0+0.19+0.29 ms clock, 0.18+0.32+0+0/0.068/6.3+1.1 ms cpu, 12->12->7 MB, 12 MB goal, 32 P
gc 18 @1.846s 0%: 0.046+0.34+0.094+3.1+0.42 ms clock, 0.18+0.34+0+0.022/8.1/3.9+1.6 ms cpu, 10->10->6 MB, 10 MB goal, 32 P
gc 19 @1.922s 0%: 0.046+0.32+2.1+0.14+0.34 ms clock, 0.18+0.32+0+0/0.017/6.3+1.3 ms cpu, 9->9->5 MB, 9 MB goal, 32 P
gc 20 @1.978s 0%: 0.079+0.42+0.022+2.2+0.48 ms clock, 0.31+0.42+0+0/4.3/2.2+1.9 ms cpu, 7->7->4 MB, 7 MB goal, 32 P
gc 21 @2.029s 0%: 0.076+0.29+0.019+2.4+0.34 ms clock, 0.30+0.29+0+0/2.4/4.5+1.3 ms cpu, 6->6->3 MB, 6 MB goal, 32 P
gc 22 @2.069s 0%: 0.052+0.36+3.0+2.5+0.35 ms clock, 0.20+0.36+0+0.56/5.9/8.4+1.4 ms cpu, 5->5->4 MB, 5 MB goal, 32 P
gc 23 @2.118s 0%: 0.064+0.31+0.047+2.3+0.30 ms clock, 0.25+0.31+0+0.099/2.2/4.2+1.2 ms cpu, 5->5->3 MB, 5 MB goal, 32 P
gc 24 @2.150s 0%: 0.39+2.5+7.6+0.16+13 ms clock, 1.5+2.5+0+2.5/0.027/10+52 ms cpu, 5->5->3 MB, 5 MB goal, 32 P
gc 25 @2.241s 0%: 7.4+4.0+0.009+0.25+0.43 ms clock, 29+4.0+0+6.0/0.030/0.020+1.7 ms cpu, 14->14->13 MB, 5 MB goal, 32 P
gc 26 @2.255s 0%: 0.90+1.3+1.8+0.23+0.40 ms clock, 3.6+1.3+0+1.5/0.14/6.9+1.6 ms cpu, 14->15->14 MB, 19 MB goal, 32 P
gc 27 @2.270s 0%: 0.10+3.3+0.48+0.19+1.7 ms clock, 0.42+3.3+0+0.76/0.20/7.2+6.9 ms cpu, 16->17->17 MB, 21 MB goal, 32 P
gc 28 @2.285s 0%: 8.2+0.90+0.046+8.4+0.66 ms clock, 33+0.90+0+8.6/49/0.035+2.6 ms cpu, 28->39->39 MB, 24 MB goal, 32 P
gc 29 @2.305s 0%: 0.24+8.3+0.59+0.24+0.52 ms clock, 0.98+8.3+0+15/0.010/0.58+2.1 ms cpu, 40->72->72 MB, 42 MB goal, 32 P
gc 30 @2.321s 0%: 0.67+5.0+0.45+0.27+0.76 ms clock, 2.7+5.0+0+12/0.001/2.0+3.0 ms cpu, 105->116->106 MB, 60 MB goal, 32 P
gc 31 @2.331s 0%: 0.21+1.5+11+1.0+0.95 ms clock, 0.85+1.5+0+0.057/0.002/18+3.8 ms cpu, 107->118->118 MB, 142 MB goal, 32 P
gc 32 @2.385s 0%: 0.19+2.1+0.004+3.5+0.44 ms clock, 0.78+2.1+0+0/4.0/8.9+1.7 ms cpu, 130->130->70 MB, 160 MB goal, 32 P
gc 33 @2.533s 0%: 0.073+6.0+0.016+1.2+0.52 ms clock, 0.29+6.0+0+0/1.8/11+2.1 ms cpu, 98->98->73 MB, 105 MB goal, 32 P
gc 34 @2.595s 0%: 16+12+0.95+0.51+0.75 ms clock, 66+12+0+7.7/0.006/1.8+3.0 ms cpu, 116->146->131 MB, 110 MB goal, 32 P
gc 35 @2.629s 0%: 7.9+6.2+0.87+0.33+0.63 ms clock, 31+6.2+0+3.6/0.008/12+2.5 ms cpu, 142->144->132 MB, 152 MB goal, 32 P
gc 36 @2.694s 0%: 0.24+16+4.1+0.19+0.93 ms clock, 0.99+16+0+8.2/0.036/9.7+3.7 ms cpu, 184->185->151 MB, 196 MB goal, 32 P
gc 37 @2.829s 0%: 0.066+6.2+0.16+0.13+0.58 ms clock, 0.26+6.2+0+0/0.012/18+2.3 ms cpu, 227->227->165 MB, 226 MB goal, 32 P
gc 38 @2.979s 0%: 0.12+4.8+1.5+6.9+5.6 ms clock, 0.50+4.8+0+0.20/21/16+22 ms cpu, 245->246->125 MB, 247 MB goal, 32 P
gc 39 @3.055s 0%: 15+2.9+5.7+3.4+0.85 ms clock, 61+2.9+0+17/12/0.020+3.4 ms cpu, 190->190->144 MB, 187 MB goal, 32 P
gc 40 @3.150s 0%: 8.7+0+0+0+7.5 ms clock, 34+0+0+17/12/0.020+30 ms cpu, 176->176->141 MB, 176 MB goal, 32 P (forced)
gc 41 @3.225s 0%: 6.5+17+0.080+0.24+0.64 ms clock, 26+17+0+19/0.042/27+2.5 ms cpu, 206->217->170 MB, 211 MB goal, 32 P
gc 42 @3.402s 0%: 4.0+8.7+5.8+0.38+0.68 ms clock, 16+8.7+0+7.7/0.17/20+2.7 ms cpu, 229->240->169 MB, 239 MB goal, 32 P
gc 43 @3.587s 0%: 6.8+6.8+1.4+0.42+0.59 ms clock, 27+6.8+0+10/0.012/20+2.3 ms cpu, 233->234->161 MB, 237 MB goal, 32 P
gc 44 @3.816s 0%: 5.2+29+0.29+0.18+0.66 ms clock, 20+29+0+26/0.005/11+2.6 ms cpu, 243->274->101 MB, 241 MB goal, 32 P
gc 45 @3.860s 0%: 0.73+9.1+0.51+0.29+0.78 ms clock, 2.9+9.1+0+9.3/0.016/10+3.1 ms cpu, 102->103->90 MB, 104 MB goal, 32 P
gc 46 @3.973s 0%: 6.1+3.3+11+0.30+0.84 ms clock, 24+3.3+0+20/0.67/8.5+3.3 ms cpu, 134->135->89 MB, 133 MB goal, 32 P
gc 47 @4.167s 0%: 0.054+0+0+0+9.3 ms clock, 0.21+0+0+20/0.67/8.5+37 ms cpu, 119->119->58 MB, 119 MB goal, 32 P (forced)
gc 48 @4.240s 0%: 5.6+4.0+0.68+0.36+0.42 ms clock, 22+4.0+0+6.7/1.2/6.0+1.6 ms cpu, 90->90->87 MB, 86 MB goal, 32 P
gc 49 @4.380s 0%: 4.8+5.2+0.54+0.35+0.60 ms clock, 19+5.2+0+4.5/0.006/7.2+2.4 ms cpu, 124->125->58 MB, 130 MB goal, 32 P
gc 50 @4.482s 0%: 0.060+5.8+0.017+0.33+0.50 ms clock, 0.24+5.8+0+0/0.11/13+2.0 ms cpu, 86->86->29 MB, 87 MB goal, 32 P
gc 51 @4.606s 0%: 4.4+5.6+0.096+0.15+0.59 ms clock, 17+5.6+0+3.4/0.004/12+2.3 ms cpu, 52->53->48 MB, 43 MB goal, 32 P
gc 52 @4.688s 0%: 0.071+6.6+0.23+0.20+0.55 ms clock, 0.28+6.6+0+0/0.010/25+2.2 ms cpu, 61->62->49 MB, 71 MB goal, 32 P
gc 53 @4.715s 0%: 5.2+3.8+1.1+0.26+0.51 ms clock, 21+3.8+0+3.7/0.023/8.7+2.0 ms cpu, 69->70->68 MB, 73 MB goal, 32 P
gc 54 @4.750s 0%: 6.5+7.5+0.11+0.22+0.34 ms clock, 26+7.5+0+11/0.014/0.30+1.3 ms cpu, 100->100->79 MB, 101 MB goal, 32 P
gc 55 @4.892s 0%: 6.1+10+0.10+0.53+1.6 ms clock, 24+10+0+7.5/0.026/8.1+6.5 ms cpu, 124->134->78 MB, 118 MB goal, 32 P
gc 56 @4.939s 0%: 0.068+2.7+2.0+0.24+0.75 ms clock, 0.27+2.7+0+0.011/0.025/27+3.0 ms cpu, 90->90->77 MB, 102 MB goal, 32 P
gc 57 @5.033s 0%: 6.2+8.4+0.74+0.39+0.60 ms clock, 25+8.4+0+7.2/0.37/8.7+2.4 ms cpu, 123->133->97 MB, 116 MB goal, 32 P
gc 58 @5.162s 0%: 5.6+10+1.3+0.26+0.62 ms clock, 22+10+0+4.1/0.025/5.8+2.4 ms cpu, 124->126->79 MB, 131 MB goal, 32 P
gc 59 @5.182s 0%: 0.092+0+0+0+6.8 ms clock, 0.37+0+0+4.1/0.025/5.8+27 ms cpu, 79->79->79 MB, 79 MB goal, 32 P (forced)
gc 60 @5.281s 0%: 0.13+2.9+1.2+10+0.81 ms clock, 0.52+2.9+0+2.9/30/11+3.2 ms cpu, 123->144->99 MB, 118 MB goal, 32 P
gc 61 @5.330s 0%: 0.11+3.2+1.6+0.44+0.38 ms clock, 0.44+3.2+0+0.015/0.93/13+1.5 ms cpu, 102->102->80 MB, 118 MB goal, 32 P
gc 62 @5.438s 0%: 4.6+4.4+9.6+0.41+0.53 ms clock, 18+4.4+0+7.9/0.018/10+2.1 ms cpu, 126->127->100 MB, 121 MB goal, 32 P
gc 63 @5.640s 0%: 0.29+9.9+0.63+0.15+0.48 ms clock, 1.1+9.9+0+0.12/0.049/21+1.9 ms cpu, 143->145->85 MB, 150 MB goal, 32 P
gc 64 @5.716s 0%: 6.1+9.1+4.4+0.27+0.45 ms clock, 24+9.1+0+6.6/0.025/8.7+1.8 ms cpu, 130->140->113 MB, 125 MB goal, 32 P
gc 65 @5.869s 0%: 0.13+5.8+0.030+0.47+0.45 ms clock, 0.53+5.8+0+0/0.67/8.2+1.8 ms cpu, 148->148->51 MB, 154 MB goal, 32 P
gc 66 @5.935s 0%: 1.4+12+4.6+0.43+0.69 ms clock, 5.7+12+0+8.4/0.056/15+2.7 ms cpu, 84->95->70 MB, 76 MB goal, 32 P
gc 67 @5.959s 0%: 6.3+10+0.039+0.77+0.42 ms clock, 25+10+0+12/1.2/1.8+1.6 ms cpu, 80->81->58 MB, 89 MB goal, 32 P
gc 68 @6.051s 0%: 5.4+5.0+3.3+0.35+0.82 ms clock, 21+5.0+0+9.0/0.022/3.0+3.2 ms cpu, 91->92->58 MB, 86 MB goal, 32 P
gc 69 @6.148s 0%: 2.3+11+2.3+0.16+0.42 ms clock, 9.5+11+0+1.5/0.023/35+1.6 ms cpu, 81->91->66 MB, 85 MB goal, 32 P
gc 70 @6.180s 0%: 5.5+3.1+8.6+0.74+2.5 ms clock, 22+3.1+0+13/0.070/3.6+10 ms cpu, 87->98->98 MB, 85 MB goal, 32 P
gc 71 @6.204s 0%: 1.1+0+0+0+12 ms clock, 4.5+0+0+13/0.070/3.6+51 ms cpu, 98->98->87 MB, 98 MB goal, 32 P (forced)
gc 72 @6.263s 1%: 5.0+16+0.10+0.91+0.56 ms clock, 20+16+0+28/1.2/2.3+2.2 ms cpu, 122->133->90 MB, 131 MB goal, 32 P
gc 73 @6.297s 1%: 8.9+27+0.12+0.23+0.78 ms clock, 35+27+0+17/0.021/8.8+3.1 ms cpu, 110->143->141 MB, 118 MB goal, 32 P
gc 74 @6.344s 1%: 3.3+6.5+4.8+0.61+0.82 ms clock, 13+6.5+0+6.6/0.27/19+3.2 ms cpu, 151->152->128 MB, 162 MB goal, 32 P
gc 75 @6.502s 1%: 4.0+15+7.9+6.4+0.95 ms clock, 16+15+0+9.7/0.60/0.72+3.8 ms cpu, 193->234->149 MB, 191 MB goal, 32 P
gc 76 @6.541s 1%: 5.6+28+5.6+1.6+0.86 ms clock, 22+28+0+26/0.001/6.7+3.4 ms cpu, 160->222->222 MB, 163 MB goal, 32 P
gc 77 @6.585s 1%: 4.8+33+0.84+0.22+0.86 ms clock, 19+33+0+19/0.003/1.7+3.4 ms cpu, 232->256->215 MB, 239 MB goal, 32 P
gc 78 @6.718s 1%: 2.9+15+0.12+0.47+0.59 ms clock, 11+15+0+17/0.006/18+2.3 ms cpu, 292->313->212 MB, 287 MB goal, 32 P
gc 79 @6.957s 1%: 4.2+7.9+1.1+0.21+0.52 ms clock, 16+7.9+0+14/0.034/1.0+2.0 ms cpu, 290->291->98 MB, 287 MB goal, 32 P
gc 80 @7.178s 1%: 2.1+7.1+6.9+0.49+0.67 ms clock, 8.7+7.1+0+5.1/0.20/21+2.6 ms cpu, 147->157->89 MB, 147 MB goal, 32 P
gc 81 @7.223s 1%: 0.15+0+0+0+8.4 ms clock, 0.63+0+0+5.1/0.20/21+33 ms cpu, 90->90->78 MB, 90 MB goal, 32 P (forced)
gc 82 @7.324s 1%: 4.7+6.4+0.091+0.23+0.72 ms clock, 19+6.4+0+5.1/0.19/12+2.8 ms cpu, 115->115->88 MB, 117 MB goal, 32 P
gc 83 @7.485s 1%: 2.0+4.4+0.39+0.22+0.47 ms clock, 8.0+4.4+0+4.2/0.011/5.5+1.8 ms cpu, 133->133->88 MB, 131 MB goal, 32 P
gc 84 @7.576s 1%: 2.7+10+0.16+0.32+0.59 ms clock, 11+10+0+6.3/0.009/15+2.3 ms cpu, 132->143->108 MB, 131 MB goal, 32 P
gc 85 @7.654s 1%: 2.9+7.0+0.045+0.20+0.45 ms clock, 11+7.0+0+6.0/0.058/9.8+1.8 ms cpu, 139->140->97 MB, 146 MB goal, 32 P
gc 86 @7.744s 1%: 0.16+5.7+5.9+0.21+0.60 ms clock, 0.64+5.7+0+0.42/0.001/19+2.4 ms cpu, 141->152->99 MB, 145 MB goal, 32 P
gc 87 @7.797s 1%: 4.0+6.4+0.008+0.25+0.40 ms clock, 16+6.4+0+1.0/0.071/15+1.6 ms cpu, 129->130->108 MB, 132 MB goal, 32 P
gc 88 @7.932s 1%: 3.2+6.3+0.10+0.17+0.48 ms clock, 12+6.3+0+1.1/0.028/16+1.9 ms cpu, 164->164->108 MB, 161 MB goal, 32 P
gc 89 @8.236s 1%: 0.070+0+0+0+9.1 ms clock, 0.28+0+0+1.1/0.028/16+36 ms cpu, 147->147->47 MB, 147 MB goal, 32 P (forced)
gc 90 @8.270s 1%: 3.5+14+0.13+0.38+1.0 ms clock, 14+14+0+22/0.005/1.5+4.2 ms cpu, 70->72->70 MB, 70 MB goal, 32 P
gc 91 @8.499s 1%: 2.7+17+0.73+0.28+0.58 ms clock, 10+17+0+21/0.008/8.0+2.3 ms cpu, 107->128->87 MB, 102 MB goal, 32 P
gc 92 @8.523s 1%: 0.088+10+0.17+0.17+0.37 ms clock, 0.35+10+0+0.30/0.042/12+1.5 ms cpu, 88->89->89 MB, 100 MB goal, 32 P
gc 93 @8.644s 1%: 5.6+8.6+0.27+8.9+0.71 ms clock, 22+8.6+0+14/0.013/1.2+2.8 ms cpu, 135->145->89 MB, 131 MB goal, 32 P
gc 94 @8.899s 1%: 2.0+6.8+0.26+0.16+0.51 ms clock, 8.1+6.8+0+4.1/0.016/7.0+2.0 ms cpu, 116->117->47 MB, 118 MB goal, 32 P
gc 95 @8.999s 1%: 1.5+6.7+1.0+0.61+0.50 ms clock, 6.3+6.7+0+6.8/0.020/15+2.0 ms cpu, 73->74->49 MB, 69 MB goal, 32 P
gc 96 @9.097s 1%: 5.7+8.9+0.89+0.20+0.68 ms clock, 23+8.9+0+6.3/0.005/14+2.7 ms cpu, 73->74->48 MB, 73 MB goal, 32 P
gc 97 @9.133s 1%: 4.9+8.7+8.1+0.33+0.75 ms clock, 19+8.7+0+14/0.52/1.4+3.0 ms cpu, 69->79->79 MB, 71 MB goal, 32 P
gc 98 @9.200s 1%: 1.3+12+7.1+0.34+0.55 ms clock, 5.4+12+0+8.6/0.031/2.6+2.2 ms cpu, 100->112->59 MB, 102 MB goal, 32 P
gc 99 @9.246s 1%: 0.29+0+0+0+8.4 ms clock, 1.1+0+0+8.6/0.031/2.6+33 ms cpu, 60->60->48 MB, 60 MB goal, 32 P (forced)
gc 100 @9.400s 1%: 4.4+9.0+0.029+0.14+1.1 ms clock, 17+9.0+0+5.7/0.011/9.6+4.7 ms cpu, 75->85->58 MB, 72 MB goal, 32 P
gc 101 @9.419s 1%: 4.1+18+0.057+0.31+1.1 ms clock, 16+18+0+11/0.033/13+4.4 ms cpu, 68->89->79 MB, 72 MB goal, 32 P
gc 102 @9.446s 1%: 7.5+9.3+1.7+0.18+0.59 ms clock, 30+9.3+0+12/0.026/0.95+2.3 ms cpu, 89->89->59 MB, 87 MB goal, 32 P
gc 103 @9.568s 1%: 0.33+4.2+3.9+8.0+0.72 ms clock, 1.3+4.2+0+4.4/24/14+2.8 ms cpu, 84->86->34 MB, 88 MB goal, 32 P
gc 104 @9.660s 1%: 3.4+6.2+1.1+0.31+0.65 ms clock, 13+6.2+0+6.6/0.21/13+2.6 ms cpu, 51->52->42 MB, 47 MB goal, 32 P
gc 105 @9.706s 1%: 4.1+3.5+1.4+0.89+0.53 ms clock, 16+3.5+0+0.32/1.4/11+2.1 ms cpu, 63->63->49 MB, 63 MB goal, 32 P
gc 106 @9.775s 1%: 2.7+19+0.21+0.20+0.58 ms clock, 11+19+0+7.8/0.014/23+2.3 ms cpu, 70->91->59 MB, 73 MB goal, 32 P
gc 107 @9.800s 1%: 0.37+6.9+0.059+0.20+1.1 ms clock, 1.4+6.9+0+4.7/0.19/5.0+4.6 ms cpu, 60->60->49 MB, 58 MB goal, 32 P
gc 108 @9.840s 1%: 7.0+15+0.21+0.24+0.54 ms clock, 28+15+0+13/0.43/6.0+2.1 ms cpu, 82->93->72 MB, 74 MB goal, 32 P
gc 109 @9.870s 1%: 3.1+17+8.1+0.36+0.86 ms clock, 12+17+0+9.0/0.048/0.96+3.4 ms cpu, 83->104->92 MB, 92 MB goal, 32 P
gc 110 @9.920s 1%: 3.0+8.1+3.1+0.27+0.50 ms clock, 12+8.1+0+5.6/0.050/7.6+2.0 ms cpu, 102->102->78 MB, 105 MB goal, 32 P
gc 111 @10.033s 1%: 4.2+16+1.9+6.0+0.74 ms clock, 16+16+0+8.4/0.043/7.3+2.9 ms cpu, 123->145->48 MB, 118 MB goal, 32 P
gc 112 @10.065s 1%: 5.0+34+0.70+4.9+3.1 ms clock, 20+34+0+23/4.8/0.94+12 ms cpu, 58->60->60 MB, 41 MB goal, 32 P
gc 113 @10.134s 1%: 10+6.3+1.5+1.3+0.59 ms clock, 43+6.3+0+16/1.3/2.0+2.3 ms cpu, 93->94->73 MB, 88 MB goal, 32 P

1.6:

gc 4 @0.846s 0%: 1.0+7.4+0.32 ms clock, 4.0+0/0.83/2.3+1.2 ms cpu, 2->2->2 MB, 3 MB goal, 32 P
gc 5 @0.859s 0%: 0.14+7.9+0.82 ms clock, 0.56+0/1.6/3.6+3.3 ms cpu, 2->2->2 MB, 3 MB goal, 32 P
gc 6 @0.872s 0%: 0.073+2.6+0.42 ms clock, 0.29+0.47/2.2/2.0+1.7 ms cpu, 3->3->2 MB, 4 MB goal, 32 P
gc 7 @0.885s 0%: 0.022+11+0.39 ms clock, 0.089+0/12/3.9+1.5 ms cpu, 4->5->4 MB, 5 MB goal, 32 P
gc 8 @0.903s 0%: 0.013+2.2+0.30 ms clock, 0.053+0.068/4.0/0.16+1.2 ms cpu, 4->4->4 MB, 5 MB goal, 32 P
gc 9 @0.928s 0%: 0.017+2.7+0.30 ms clock, 0.070+0.007/2.4/2.6+1.2 ms cpu, 6->6->5 MB, 7 MB goal, 32 P
gc 10 @0.960s 0%: 0.018+5.4+0.34 ms clock, 0.073+0.13/1.4/5.5+1.3 ms cpu, 7->8->6 MB, 8 MB goal, 32 P
gc 11 @1.003s 0%: 0.029+3.1+0.30 ms clock, 0.11+0/5.8/1.0+1.2 ms cpu, 10->10->8 MB, 11 MB goal, 32 P
gc 12 @1.060s 0%: 0.041+4.9+0.36 ms clock, 0.16+0.71/6.0/0.73+1.4 ms cpu, 12->12->10 MB, 13 MB goal, 32 P
gc 13 @1.122s 0%: 3.4+3.1+0.36 ms clock, 13+2.6/2.9/0.078+1.4 ms cpu, 15->15->12 MB, 16 MB goal, 32 P
gc 14 @1.297s 0%: 0.028+2.3+0.33 ms clock, 0.11+0/5.1/0.98+1.3 ms cpu, 19->19->12 MB, 20 MB goal, 32 P
gc 15 @1.497s 0%: 0.027+1.8+0.14 ms clock, 0.11+0/6.0/2.3+0.59 ms cpu, 18->18->10 MB, 19 MB goal, 32 P
gc 16 @1.656s 0%: 0.028+2.1+0.30 ms clock, 0.11+1.4/3.6/2.3+1.2 ms cpu, 15->15->9 MB, 16 MB goal, 32 P
gc 17 @1.800s 0%: 0.029+2.7+0.36 ms clock, 0.11+2.0/2.5/1.7+1.4 ms cpu, 13->13->6 MB, 14 MB goal, 32 P
gc 18 @1.895s 0%: 0.025+3.2+0.16 ms clock, 0.10+0.19/2.6/3.1+0.64 ms cpu, 10->10->6 MB, 11 MB goal, 32 P
gc 19 @2.001s 0%: 0.027+7.6+0.32 ms clock, 0.10+0.21/2.5/7.4+1.2 ms cpu, 9->10->5 MB, 10 MB goal, 32 P
gc 20 @2.086s 0%: 0.025+2.3+1.0 ms clock, 0.10+0/4.2/3.1+4.2 ms cpu, 7->7->3 MB, 8 MB goal, 32 P
gc 21 @2.135s 0%: 0.024+2.0+0.16 ms clock, 0.097+0/6.4/0.78+0.67 ms cpu, 5->5->4 MB, 6 MB goal, 32 P
gc 22 @2.199s 0%: 0.023+2.7+0.30 ms clock, 0.094+0.26/9.5/1.9+1.2 ms cpu, 6->6->3 MB, 7 MB goal, 32 P
gc 23 @2.497s 0%: 0.019+1.9+0.34 ms clock, 0.079+0/6.8/2.0+1.3 ms cpu, 5->5->3 MB, 6 MB goal, 32 P
gc 24 @2.531s 0%: 0.38+5.0+0.33 ms clock, 1.5+0.61/7.1/0+1.3 ms cpu, 5->6->3 MB, 6 MB goal, 32 P
gc 25 @2.619s 0%: 0.016+1.9+0.50 ms clock, 0.066+1.2/4.0/0.97+2.0 ms cpu, 4->5->3 MB, 5 MB goal, 32 P
gc 26 @2.624s 0%: 0.28+2.8+0.56 ms clock, 1.1+3.3/4.3/0.37+2.2 ms cpu, 13->13->13 MB, 14 MB goal, 32 P
gc 27 @2.629s 0%: 0.14+3.0+0.62 ms clock, 0.59+0.032/7.0/0+2.4 ms cpu, 13->23->23 MB, 19 MB goal, 32 P
gc 28 @2.633s 0%: 0.17+3.7+0.68 ms clock, 0.70+0/6.2/0.084+2.7 ms cpu, 24->24->24 MB, 25 MB goal, 32 P
gc 29 @2.653s 0%: 0.10+6.1+0.71 ms clock, 0.40+1.1/9.2/0.35+2.8 ms cpu, 35->36->35 MB, 36 MB goal, 32 P
gc 30 @2.662s 0%: 0.24+8.7+0.41 ms clock, 0.96+1.0/28/0.085+1.6 ms cpu, 45->56->56 MB, 53 MB goal, 32 P
gc 31 @2.673s 0%: 0.12+6.9+0.49 ms clock, 0.49+0.062/8.0/0.10+1.9 ms cpu, 57->69->68 MB, 68 MB goal, 32 P
gc 32 @2.683s 0%: 0.45+12+0.62 ms clock, 1.8+0.081/23/0+2.5 ms cpu, 79->90->89 MB, 85 MB goal, 32 P
gc 33 @2.701s 0%: 0.42+4.2+0.53 ms clock, 1.7+2.3/17/0.36+2.1 ms cpu, 111->121->120 MB, 118 MB goal, 32 P
gc 34 @2.756s 0%: 0.36+25+6.5 ms clock, 0.72+0/31/0+13 ms cpu, 168->170->168 MB, 169 MB goal, 32 P
gc 35 @2.877s 0%: 16+18+1.0 ms clock, 66+0.067/25/0.087+4.0 ms cpu, 295->296->293 MB, 296 MB goal, 32 P
gc 36 @2.997s 0%: 0.27+10+1.1 ms clock, 1.1+0.26/22/0.12+4.4 ms cpu, 395->417->391 MB, 440 MB goal, 32 P
gc 37 @3.080s 0%: 0.31+9.3+0.91 ms clock, 1.2+0/23/0.096+3.6 ms cpu, 526->547->520 MB, 559 MB goal, 32 P
gc 38 @3.322s 0%: 0.12+9.2+0.92 ms clock, 0.50+0/20/0.11+3.7 ms cpu, 722->732->678 MB, 751 MB goal, 32 P
gc 39 @3.531s 0%: 1.2+0+11 ms clock, 5.1+0/20/0.11+46 ms cpu, 786->786->747 MB, 786 MB goal, 32 P (forced)
gc 40 @3.972s 0%: 0.13+10+1.6 ms clock, 0.53+0/46/0.16+6.4 ms cpu, 1102->1113->1034 MB, 1120 MB goal, 32 P
gc 41 @4.544s 0%: 0.17+0+15 ms clock, 0.70+0/46/0.16+60 ms cpu, 1371->1371->1180 MB, 1371 MB goal, 32 P (forced)
gc 42 @5.366s 0%: 0.53+27+1.6 ms clock, 2.1+0.005/52/0.25+6.5 ms cpu, 1750->1760->1555 MB, 1771 MB goal, 32 P
gc 43 @5.570s 0%: 8.4+0+15 ms clock, 33+0.005/52/0.25+61 ms cpu, 1711->1711->1647 MB, 1711 MB goal, 32 P (forced)
gc 44 @6.595s 0%: 0.10+0+21 ms clock, 0.40+0.005/52/0.25+87 ms cpu, 2346->2346->1998 MB, 2346 MB goal, 32 P (forced)
gc 45 @7.617s 0%: 0.12+0+18 ms clock, 0.49+0.005/52/0.25+72 ms cpu, 2587->2587->2313 MB, 2587 MB goal, 32 P (forced)
gc 46 @8.647s 0%: 1.1+0+23 ms clock, 4.6+0.005/52/0.25+95 ms cpu, 2747->2747->2449 MB, 2747 MB goal, 32 P (forced)
gc 47 @9.672s 0%: 0.39+0+22 ms clock, 1.5+0.005/52/0.25+89 ms cpu, 2786->2786->2507 MB, 2786 MB goal, 32 P (forced)
gc 48 @10.695s 0%: 3.1+0+28 ms clock, 12+0.005/52/0.25+113 ms cpu, 2939->2939->2670 MB, 2939 MB goal, 32 P (forced)
gc 49 @11.741s 0%: 0.19+0+31 ms clock, 0.76+0.005/52/0.25+127 ms cpu, 3143->3143->2823 MB, 3143 MB goal, 32 P (forced)
gc 50 @12.773s 0%: 0.28+0+29 ms clock, 1.1+0.005/52/0.25+117 ms cpu, 3153->3153->2886 MB, 3153 MB goal, 32 P (forced)
gc 51 @13.804s 0%: 0.20+0+31 ms clock, 0.81+0.005/52/0.25+125 ms cpu, 3365->3365->3054 MB, 3365 MB goal, 32 P (forced)
gc 52 @14.836s 0%: 0.34+0+29 ms clock, 1.3+0.005/52/0.25+119 ms cpu, 3637->3637->3391 MB, 3637 MB goal, 32 P (forced)
gc 53 @15.866s 0%: 0.21+0+26 ms clock, 0.84+0.005/52/0.25+107 ms cpu, 3725->3725->3438 MB, 3725 MB goal, 32 P (forced)
gc 54 @16.910s 0%: 1.6+0+34 ms clock, 6.6+0.005/52/0.25+137 ms cpu, 3996->3996->3641 MB, 3996 MB goal, 32 P (forced)
gc 55 @17.968s 0%: 0.15+0+31 ms clock, 0.60+0.005/52/0.25+124 ms cpu, 4020->4020->3700 MB, 4020 MB goal, 32 P (forced)
gc 56 @19.009s 0%: 0.31+0+32 ms clock, 1.2+0.005/52/0.25+128 ms cpu, 3927->3927->3668 MB, 3927 MB goal, 32 P (forced)
gc 57 @20.042s 0%: 0.28+0+31 ms clock, 1.1+0.005/52/0.25+127 ms cpu, 3864->3864->3541 MB, 3864 MB goal, 32 P (forced)
gc 58 @21.075s 0%: 0.18+0+35 ms clock, 0.75+0.005/52/0.25+140 ms cpu, 3790->3790->3446 MB, 3790 MB goal, 32 P (forced)
gc 59 @22.110s 0%: 0.28+0+30 ms clock, 1.1+0.005/52/0.25+123 ms cpu, 4117->4117->3853 MB, 4117 MB goal, 32 P (forced)
gc 60 @23.142s 0%: 0.13+0+31 ms clock, 0.52+0.005/52/0.25+125 ms cpu, 4381->4381->4005 MB, 4381 MB goal, 32 P (forced)
gc 61 @24.190s 0%: 0.23+0+29 ms clock, 0.92+0.005/52/0.25+116 ms cpu, 4438->4438->4163 MB, 4438 MB goal, 32 P (forced)
gc 62 @25.235s 0%: 0.12+0+29 ms clock, 0.48+0.005/52/0.25+119 ms cpu, 4596->4596->4258 MB, 4596 MB goal, 32 P (forced)
gc 63 @26.294s 0%: 0.39+0+30 ms clock, 1.5+0.005/52/0.25+123 ms cpu, 4708->4708->4405 MB, 4708 MB goal, 32 P (forced)
gc 64 @27.326s 0%: 0.72+0+49 ms clock, 2.9+0.005/52/0.25+198 ms cpu, 4731->4731->4417 MB, 4731 MB goal, 32 P (forced)
gc 65 @28.377s 0%: 6.9+0+45 ms clock, 27+0.005/52/0.25+180 ms cpu, 4817->4817->4499 MB, 4817 MB goal, 32 P (forced)
gc 66 @29.430s 0%: 0.85+0+40 ms clock, 3.4+0.005/52/0.25+163 ms cpu, 4954->4954->4680 MB, 4954 MB goal, 32 P (forced)
gc 67 @30.472s 0%: 0.24+0+37 ms clock, 0.96+0.005/52/0.25+150 ms cpu, 5049->5049->4756 MB, 5049 MB goal, 32 P (forced)
gc 68 @31.510s 0%: 0.22+0+35 ms clock, 0.91+0.005/52/0.25+140 ms cpu, 5099->5099->4777 MB, 5099 MB goal, 32 P (forced)
gc 69 @32.609s 0%: 0.39+0+39 ms clock, 1.5+0.005/52/0.25+157 ms cpu, 5235->5235->4965 MB, 5235 MB goal, 32 P (forced)
gc 70 @33.649s 0%: 0.32+0+37 ms clock, 1.3+0.005/52/0.25+149 ms cpu, 5483->5483->5203 MB, 5483 MB goal, 32 P (forced)
gc 71 @34.692s 0%: 0.31+0+42 ms clock, 1.2+0.005/52/0.25+169 ms cpu, 5467->5467->5148 MB, 5467 MB goal, 32 P (forced)
gc 72 @35.736s 0%: 0.30+0+40 ms clock, 1.2+0.005/52/0.25+162 ms cpu, 5531->5531->5213 MB, 5531 MB goal, 32 P (forced)
gc 73 @36.777s 0%: 0.18+0+40 ms clock, 0.74+0.005/52/0.25+162 ms cpu, 5567->5567->5305 MB, 5567 MB goal, 32 P (forced)
gc 74 @37.819s 0%: 0.31+0+41 ms clock, 1.2+0.005/52/0.25+165 ms cpu, 5814->5814->5501 MB, 5814 MB goal, 32 P (forced)
gc 75 @38.889s 0%: 0.28+0+46 ms clock, 1.1+0.005/52/0.25+187 ms cpu, 6072->6072->5749 MB, 6072 MB goal, 32 P (forced)
gc 76 @39.954s 0%: 0.25+0+41 ms clock, 1.0+0.005/52/0.25+164 ms cpu, 6093->6093->5766 MB, 6093 MB goal, 32 P (forced)
gc 77 @40.996s 0%: 0.15+0+37 ms clock, 0.62+0.005/52/0.25+151 ms cpu, 6053->6053->5779 MB, 6053 MB goal, 32 P (forced)
gc 78 @42.035s 0%: 0.28+0+38 ms clock, 1.1+0.005/52/0.25+153 ms cpu, 6163->6163->5836 MB, 6163 MB goal, 32 P (forced)
gc 79 @43.074s 0%: 0.27+0+43 ms clock, 1.1+0.005/52/0.25+172 ms cpu, 6378->6378->6073 MB, 6378 MB goal, 32 P (forced)
gc 80 @44.118s 0%: 0.24+0+40 ms clock, 0.98+0.005/52/0.25+160 ms cpu, 6312->6312->6031 MB, 6312 MB goal, 32 P (forced)
gc 81 @45.158s 0%: 0.30+0+43 ms clock, 1.2+0.005/52/0.25+174 ms cpu, 6252->6252->5909 MB, 6252 MB goal, 32 P (forced)
gc 82 @46.203s 0%: 0.32+0+42 ms clock, 1.3+0.005/52/0.25+169 ms cpu, 6308->6308->6037 MB, 6308 MB goal, 32 P (forced)
gc 83 @47.280s 0%: 0.30+0+44 ms clock, 1.2+0.005/52/0.25+178 ms cpu, 6399->6399->6060 MB, 6399 MB goal, 32 P (forced)
gc 84 @48.325s 0%: 0.26+0+45 ms clock, 1.0+0.005/52/0.25+180 ms cpu, 6530->6530->6262 MB, 6530 MB goal, 32 P (forced)
gc 85 @49.371s 0%: 0.21+0+46 ms clock, 0.87+0.005/52/0.25+186 ms cpu, 6536->6536->6215 MB, 6536 MB goal, 32 P (forced)
gc 86 @50.419s 0%: 0.25+0+42 ms clock, 1.0+0.005/52/0.25+171 ms cpu, 6583->6583->6318 MB, 6583 MB goal, 32 P (forced)
gc 87 @51.476s 0%: 0.18+0+46 ms clock, 0.73+0.005/52/0.25+186 ms cpu, 6646->6646->6353 MB, 6646 MB goal, 32 P (forced)
gc 88 @52.524s 0%: 0.25+0+46 ms clock, 1.0+0.005/52/0.25+184 ms cpu, 6717->6717->6401 MB, 6717 MB goal, 32 P (forced)
gc 89 @53.570s 0%: 0.38+0+41 ms clock, 1.5+0.005/52/0.25+164 ms cpu, 6748->6748->6465 MB, 6748 MB goal, 32 P (forced)
gc 90 @54.621s 0%: 1.0+0+46 ms clock, 4.2+0.005/52/0.25+187 ms cpu, 6697->6697->6365 MB, 6697 MB goal, 32 P (forced)
gc 91 @55.688s 0%: 0.21+0+52 ms clock, 0.86+0.005/52/0.25+208 ms cpu, 6744->6744->6438 MB, 6744 MB goal, 32 P (forced)
gc 92 @56.780s 0%: 0.38+0+56 ms clock, 1.5+0.005/52/0.25+225 ms cpu, 7125->7125->6846 MB, 7125 MB goal, 32 P (forced)
gc 93 @57.837s 0%: 8.1+0+43 ms clock, 32+0.005/52/0.25+172 ms cpu, 7122->7122->6812 MB, 7122 MB goal, 32 P (forced)
gc 94 @58.916s 0%: 0.25+0+42 ms clock, 1.0+0.005/52/0.25+170 ms cpu, 7228->7228->6881 MB, 7228 MB goal, 32 P (forced)
gc 95 @59.979s 0%: 0.18+0+43 ms clock, 0.72+0.005/52/0.25+174 ms cpu, 7308->7308->6979 MB, 7308 MB goal, 32 P (forced)
gc 96 @61.023s 0%: 0.23+0+58 ms clock, 0.95+0.005/52/0.25+235 ms cpu, 7315->7315->7042 MB, 7315 MB goal, 32 P (forced)
gc 97 @62.110s 0%: 0.68+0+45 ms clock, 2.7+0.005/52/0.25+183 ms cpu, 7510->7510->7259 MB, 7510 MB goal, 32 P (forced)
gc 98 @63.192s 0%: 0.46+0+43 ms clock, 1.8+0.005/52/0.25+172 ms cpu, 7555->7555->7278 MB, 7555 MB goal, 32 P (forced)
gc 99 @64.236s 0%: 0.27+0+46 ms clock, 1.1+0.005/52/0.25+186 ms cpu, 7760->7760->7492 MB, 7760 MB goal, 32 P (forced)
gc 100 @65.284s 0%: 0.89+0+57 ms clock, 3.5+0.005/52/0.25+229 ms cpu, 7848->7848->7526 MB, 7848 MB goal, 32 P (forced)
gc 101 @66.343s 0%: 0.28+0+56 ms clock, 1.1+0.005/52/0.25+227 ms cpu, 7912->7912->7612 MB, 7912 MB goal, 32 P (forced)
gc 102 @67.419s 0%: 0.21+0+45 ms clock, 0.87+0.005/52/0.25+183 ms cpu, 8058->8058->7781 MB, 8058 MB goal, 32 P (forced)
gc 103 @68.465s 0%: 0.89+0+52 ms clock, 3.5+0.005/52/0.25+210 ms cpu, 8076->8076->7716 MB, 8076 MB goal, 32 P (forced)
gc 104 @69.527s 0%: 0.26+0+47 ms clock, 1.0+0.005/52/0.25+189 ms cpu, 8104->8104->7776 MB, 8104 MB goal, 32 P (forced)
gc 105 @70.594s 0%: 0.29+0+48 ms clock, 1.1+0.005/52/0.25+192 ms cpu, 7967->7967->7669 MB, 7967 MB goal, 32 P (forced)
gc 106 @71.643s 0%: 0.19+0+48 ms clock, 0.79+0.005/52/0.25+194 ms cpu, 7868->7868->7547 MB, 7868 MB goal, 32 P (forced)
gc 107 @72.715s 0%: 0.32+0+55 ms clock, 1.2+0.005/52/0.25+221 ms cpu, 7932->7932->7647 MB, 7932 MB goal, 32 P (forced)
gc 108 @73.788s 0%: 0.32+0+71 ms clock, 1.3+0.005/52/0.25+287 ms cpu, 8075->8075->7776 MB, 8075 MB goal, 32 P (forced)
gc 109 @74.861s 0%: 0.39+0+52 ms clock, 1.5+0.005/52/0.25+208 ms cpu, 8012->8012->7736 MB, 8012 MB goal, 32 P (forced)
gc 110 @75.914s 0%: 0.51+0+52 ms clock, 2.0+0.005/52/0.25+208 ms cpu, 7934->7934->7639 MB, 7934 MB goal, 32 P (forced)
gc 111 @76.967s 0%: 0.29+0+52 ms clock, 1.1+0.005/52/0.25+210 ms cpu, 7980->7980->7685 MB, 7980 MB goal, 32 P (forced)
gc 112 @78.020s 0%: 3.7+0+54 ms clock, 15+0.005/52/0.25+216 ms cpu, 7954->7954->7625 MB, 7954 MB goal, 32 P (forced)
gc 113 @79.088s 0%: 0.38+0+44 ms clock, 1.5+0.005/52/0.25+178 ms cpu, 8045->8045->7723 MB, 8045 MB goal, 32 P (forced)
  1. What did you expect to see?

Similar memory usage is expected, as the app works in the same conditions under the same load.

  1. What did you see instead?

Memory usage differs about 100 times

@noxiouz noxiouz changed the title gc: efficiency of collectionion 1.5.3 vs 1.6 gc: efficiency of collection 1.5.3 vs 1.6 Apr 1, 2016
@ianlancetaylor ianlancetaylor changed the title gc: efficiency of collection 1.5.3 vs 1.6 runtime: efficiency of collection 1.5.3 vs 1.6 Apr 1, 2016
@ianlancetaylor ianlancetaylor added this to the Go1.7 milestone Apr 1, 2016
@ianlancetaylor
Copy link
Contributor

CC @RLH @aclements

@aclements
Copy link
Member

It looks like for some reason your application is actually retaining a lot more memory in Go 1.6, since the STW ("forced") GC lines should be showing precise retained memory (there's no floating garbage and STW GC doesn't have any control systems that can get into bad feedback loops). The GC could be incorrectly retaining something, but that seems unlikely, which would mean this isn't a GC problem per se.

Could you try using the heap profiler (https://godoc.org/runtime/pprof#Profile) to see what objects are being retained in Go 1.6?

@bradfitz
Copy link
Contributor

bradfitz commented Apr 2, 2016

Any difference if you disable HTTP/2? Set:

GODEBUG=http2client=0,http2server=0

And if so, try to narrow it down to either client or server.

@gopherbot
Copy link

CL https://golang.org/cl/21451 mentions this issue.

@bioothod
Copy link
Contributor

bioothod commented Apr 2, 2016

http2server=0,http2client=0 still leaks, load is much lower at the moment, but server managed to leak 1.5Gb after 2 minutes of work.

Heap dump attached.
heap.txt

@bioothod
Copy link
Contributor

bioothod commented Apr 2, 2016

Here is go1.6 top10 dump:

Showing top 10 nodes out of 54 (cum >= 372)
      flat  flat%   sum%        cum   cum%
     26398 40.45% 40.45%      26398 40.45%  github.com/bioothod/elliptics-go/elliptics._Cfunc_session_get_routes
     11694 17.92% 58.37%      11694 17.92%  net/http.newBufioReader
     10923 16.74% 75.11%      10923 16.74%  crypto/tls.(*halfConn).newBlock
      6554 10.04% 85.15%       6554 10.04%  runtime.deferproc.func1
      2979  4.56% 89.71%       2979  4.56%  crypto/aes.NewCipher
      2979  4.56% 94.28%      31398 48.11%  net/http.(*conn).serve
      1982  3.04% 97.32%       1982  3.04%  crypto/tls.(*block).reserve
       910  1.39% 98.71%        910  1.39%  github.com/bioothod/backrunner/bucket.(*BucketMsgpack).ExtractMsgpack
       385  0.59% 99.30%        385  0.59%  net/http.newBufioWriterSize
       372  0.57% 99.87%        372  0.57%  net/textproto.(*Reader).ReadMIMEHeader

and go1.5.3 (virtually the same code, just different compiler):

46140 of 46142 total (  100%)
Dropped 228 nodes (cum <= 230)
Showing top 10 nodes out of 59 (cum >= 280)
      flat  flat%   sum%        cum   cum%
     16385 35.51% 35.51%      16385 35.51%  runtime.deferproc.func1
      8875 19.23% 54.74%       8875 19.23%  github.com/bioothod/elliptics-go/elliptics._Cfunc_session_get_routes
      8195 17.76% 72.50%       8195 17.76%  github.com/bioothod/elliptics-go/elliptics.NewReadSeekerKeyOffsetSize
      4681 10.14% 82.65%       4681 10.14%  crypto/sha1.New
      4096  8.88% 91.53%       4096  8.88%  net/textproto.(*Reader).ReadLine
      2730  5.92% 97.44%       2730  5.92%  runtime.malg
       513  1.11% 98.55%        513  1.11%  net/http.newBufioWriterSize
       385  0.83% 99.39%        385  0.83%  net/http.newBufioReader
       280  0.61%   100%        280  0.61%  crypto/tls.(*block).reserve
         0     0%   100%        280  0.61%  bufio.(*Writer).Write

or a bit earlier the same 1.5.3

32048 of 32048 total (  100%)
Dropped 218 nodes (cum <= 160)
Showing top 10 nodes out of 41 (cum >= 5193)
      flat  flat%   sum%        cum   cum%
     10695 33.37% 33.37%      10695 33.37%  github.com/bioothod/elliptics-go/elliptics._Cfunc_session_get_routes
      8192 25.56% 58.93%       8192 25.56%  runtime.deferproc.func1
      4681 14.61% 73.54%       4681 14.61%  crypto/sha1.New
      4096 12.78% 86.32%       4096 12.78%  net/textproto.(*Reader).ReadLine
      2730  8.52% 94.84%       2730  8.52%  runtime.malg
       627  1.96% 96.80%        627  1.96%  crypto/tls.(*block).reserve
       514  1.60% 98.40%        514  1.60%  net/http.newBufioWriterSize
       513  1.60%   100%        513  1.60%  net/http.newBufioReader
         0     0%   100%       4681 14.61%  crypto/hmac.New
         0     0%   100%       5193 16.20%  crypto/tls.(*Conn).Handshake

@bioothod
Copy link
Contributor

bioothod commented Apr 2, 2016

For comparison, heap profile dump with 1.5.3 build
heap.1.5.3.txt

@aclements
Copy link
Member

@bioothod, thanks for the heap dumps. I'm not sure why your top10s are reporting deferproc and _Cfunc_session_get_routes; those don't appear anywhere as allocation sites in the raw heap dumps files. Are you sure you ran pprof with the same binary that generated each profile?

From the raw heap dumps, it's clear that the vast majority of your memory in both cases is coming from 10MB allocations at github.com/bioothod/elliptics-go/elliptics.NewReadSeekerKeyOffsetSize (github.com/bioothod/elliptics-go/elliptics/reader.go:89). What's not clear is why there are far more of those sticking around in Go 1.6 than there were in Go 1.5.3. What happens with the ReadSeeker constructed there?

@randall77, would it be possible to use the heap dumper to figure out why this allocation is being retained in 1.6 and not in 1.5.3?

@bioothod
Copy link
Contributor

bioothod commented Apr 4, 2016

Probably I used different container, I've redone it and made it sure to match binary, top10 indeed shows elliptics.NewReadSeekerKeyOffsetSize() as main source for allocation, but profiler doesn't show excessive memory usage, both 1.5.3/1.6 look quite similar.
1.5.3 version ran quite for a while (like days under load) and its memory usage is about 1.1Gb, 1.6 version ran for couple of minutes and sucked up 1.5Gb.

1.6

159.70MB of 164.73MB total (96.95%)
Dropped 243 nodes (cum <= 0.82MB)
Showing top 10 nodes out of 56 (cum >= 3MB)
      flat  flat%   sum%        cum   cum%
  137.92MB 83.73% 83.73%   137.92MB 83.73%  github.com/bioothod/elliptics-go/elliptics.NewReadSeekerKeyOffsetSize
   13.88MB  8.43% 92.15%    13.88MB  8.43%  crypto/tls.(*block).reserve
    4.51MB  2.74% 94.89%     4.51MB  2.74%  net/http.newBufioReader
    2.01MB  1.22% 96.11%     2.01MB  1.22%  net/http.newBufioWriterSize
    0.88MB  0.53% 96.64%     0.88MB  0.53%  bytes.makeSlice
    0.50MB   0.3% 96.95%   161.85MB 98.25%  net/http.(*conn).serve
         0     0% 96.95%    12.38MB  7.51%  bufio.(*Writer).Write
         0     0% 96.95%     0.88MB  0.53%  bytes.(*Buffer).Write
         0     0% 96.95%     0.88MB  0.53%  bytes.(*Buffer).grow
         0     0% 96.95%        3MB  1.82%  crypto/tls.(*Conn).Handshake

1.5.3

292.83MB of 300.33MB total (97.50%)
Dropped 448 nodes (cum <= 1.50MB)
Showing top 10 nodes out of 53 (cum >= 4MB)
      flat  flat%   sum%        cum   cum%
  262.97MB 87.56% 87.56%   262.97MB 87.56%  github.com/bioothod/elliptics-go/elliptics.NewReadSeekerKeyOffsetSize
   20.57MB  6.85% 94.41%    20.57MB  6.85%  crypto/tls.(*block).reserve
    3.01MB  1.00% 95.41%     3.01MB  1.00%  net/http.newBufioWriterSize
    2.74MB  0.91% 96.32%     2.74MB  0.91%  bytes.makeSlice
    2.51MB  0.84% 97.16%     2.51MB  0.84%  net/http.newBufioReader
    1.03MB  0.34% 97.50%    19.60MB  6.53%  io.copyBuffer
         0     0% 97.50%    18.57MB  6.18%  bufio.(*Writer).Write
         0     0% 97.50%     2.74MB  0.91%  bytes.(*Buffer).Write
         0     0% 97.50%     2.74MB  0.91%  bytes.(*Buffer).grow
         0     0% 97.50%        4MB  1.33%  crypto/tls.(*Conn).Handshake

@aclements
Copy link
Member

By "memory usage", do you mean what's being reported by gctrace or what's being reported by the OS? The profiles are showing only 300MB and 165MB in use, respectively, which should match reasonably closely with what gctrace says. If you still have the raw heap profile files from your new run, could you post them?

Are both versions under the same amount of load?

@bioothod
Copy link
Contributor

bioothod commented Apr 4, 2016

Attached 1.5.3/1.6 gctrace and heap dumps, clean run for about 2 minutes, the same load. top10 shows about 10 times difference, but OS reported 768Mb (1.5.3) vs 2.9Gb (1.6)

1.6
1.6.gctrace.txt
1.6.heap.txt

2.28GB of 2.30GB total (99.15%)
Dropped 257 nodes (cum <= 0.01GB)
Showing top 10 nodes out of 26 (cum >= 0.02GB)
      flat  flat%   sum%        cum   cum%
    2.26GB 98.26% 98.26%     2.26GB 98.26%  github.com/bioothod/elliptics-go/elliptics.NewReadSeekerKeyOffsetSize
    0.02GB  0.89% 99.15%     0.02GB  0.89%  crypto/tls.(*block).reserve
         0     0% 99.15%     0.02GB  0.76%  bufio.(*Writer).Write
         0     0% 99.15%     0.02GB  0.76%  crypto/tls.(*Conn).Write
         0     0% 99.15%     0.02GB  0.79%  crypto/tls.(*Conn).writeRecord
         0     0% 99.15%     0.02GB  0.83%  crypto/tls.(*block).resize
         0     0% 99.15%     2.28GB 99.04%  github.com/bioothod/backrunner/bucket.(*BucketCtl).Stream
         0     0% 99.15%     2.26GB 98.26%  github.com/bioothod/elliptics-go/elliptics.NewReadSeekerOffsetSize
         0     0% 99.15%     0.02GB  0.79%  io.Copy
         0     0% 99.15%     0.02GB  0.76%  io.CopyBuffer

1.5.3
1.5.3.gctrace.txt
1.5.3.heap.txt

226.58MB of 230.59MB total (98.26%)
Dropped 226 nodes (cum <= 1.15MB)
Showing top 10 nodes out of 37 (cum >= 2MB)
      flat  flat%   sum%        cum   cum%
  206.65MB 89.62% 89.62%   206.65MB 89.62%  github.com/bioothod/elliptics-go/elliptics.NewReadSeekerKeyOffsetSize
   13.92MB  6.04% 95.65%    13.92MB  6.04%  crypto/tls.(*block).reserve
    2.51MB  1.09% 96.74%     2.51MB  1.09%  net/http.newBufioWriterSize
       2MB  0.87% 97.61%        2MB  0.87%  github.com/bioothod/elliptics-go/elliptics._Cfunc_session_get_routes
    1.51MB  0.65% 98.26%     1.51MB  0.65%  net/http.newBufioReader
         0     0% 98.26%    13.92MB  6.04%  bufio.(*Writer).Write
         0     0% 98.26%    13.92MB  6.04%  crypto/tls.(*Conn).Write
         0     0% 98.26%    13.92MB  6.04%  crypto/tls.(*Conn).writeRecord
         0     0% 98.26%    13.92MB  6.04%  crypto/tls.(*block).resize
         0     0% 98.26%        2MB  0.87%  github.com/bioothod/backrunner/bucket.(*BucketCtl).BucketStatUpdate

@aclements
Copy link
Member

Thanks.

top10 is somewhat approximate, but the OS memory load you reported matches pretty closely with the peak heap size in both gctraces (which, in both cases, was a bit before the end of the trace, which also contributes to why top10 is showing a somewhat lower number).

Is it possible for me to run this locally to reproduce it (or, at least run some analysis)?

@bioothod
Copy link
Contributor

bioothod commented Apr 6, 2016

Here are steps to reproduce problem with synthetic load though

  • download and unpack into $dir <-
    elliptics.tar.gz
  • start container:
    docker run -t -i -v $dir/elliptics:/mnt/elliptics -p 9091:9091 bioothod/test:gc.1.6 /bin/bash -c /mnt/elliptics/etc/start.sh
    or
    docker run -t -i -v $dir/elliptics:/mnt/elliptics -p 9091:9091 bioothod/test:gc.1.5.3 /bin/bash -c /mnt/elliptics/etc/start.sh
  • upload test file into the storage:
    curl -d "test" http://localhost:9091/upload/b1/test.txt
  • start ab: ab -n 10000 -c 1000 http://localhost:9091/get/b1/test.txt
  • monitor memory usage, please note that proxy will eat huge amount of memory (gigabytes for this particular run) and will not return it back to the system (at least on fedora 23), although gctrace will tell you that everything is ok, I didn't investigate it further yet (and it looks like debian wheezy is ok with memory usage in this test). Container with proxy built using 1.6 compiler will eat way more memory than that built with 1.5.3
  • to get heap dump run curl http://localhost:9091/profile/heap

@davecheney
Copy link
Contributor

monitor memory usage, please note that proxy will eat huge amount of
memory (gigabytes for this particular run) and will not return it back to
the system (at least on fedora 23), although gctrace will tell you that
everything is ok,

How are you monitoring memory usage ? Top will not tell you the truth. You
should monitor memory usage via the /debug/pprof web endpoint.

Also, Go cannot tell the operating system to take back virtual address
space, it can only inform the operating system that a particular page is
not needed so if the operating system wants to use the page for something
else, it does not need to restore the original contents. Operating systems
can and frequently do ignore this advice (that's why the syscall is called
madvise(2)) when they are not under sufficient memory pressure.

On Wed, Apr 6, 2016 at 12:56 PM, Evgeniy Polyakov notifications@github.com
wrote:

Here are steps to reproduce problem with synthetic load though

download and unpack into $dir <- elliptics.tar.gz
https://github.com/golang/go/files/205704/elliptics.tar.gz

start container:
docker run -t -i -v $dir/elliptics:/mnt/elliptics -p 9091:9091
bioothod/test:gc.1.6 /bin/bash -c /mnt/elliptics/etc/start.sh
or
docker run -t -i -v $dir/elliptics:/mnt/elliptics -p 9091:9091
bioothod/test:gc.1.5.3 /bin/bash -c /mnt/elliptics/etc/start.sh

  • upload test file into the storage: curl -d "test"
    http://localhost:9091/upload/b1/test.txt
  • start ab: ab -n 10000 -c 1000 http://localhost:9091/get/b1/test.txt
  • monitor memory usage, please note that proxy will eat huge amount of
    memory (gigabytes for this particular run) and will not return it back to
    the system (at least on fedora 23), although gctrace will tell you that
    everything is ok, I didn't investigate it further yet (and it looks like
    debian wheezy is ok with memory usage in this test). Container with proxy
    built using 1.6 compiler will eat way more memory than that built with 1.5.3


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#15068 (comment)

@bioothod
Copy link
Contributor

bioothod commented Apr 6, 2016

Of course top will tell you the plain bald truth as well as /proc/<pid>/map and friends.
And pprof heap dump will not tell you the truth since there are quite a few layers yet to commit memory back to kernel.

You may also detect that something is wrong by trying to move your mouse and observe kswapd being crazy, but that's of course not the real truth, since gctrace will tell you something like dumb below, while proxy eats 4+ Gb of resident ram.

gc 191 @17.189s 17%: 1.3+6.1+5.4+4.5+4.3 ms clock, 10+6.1+0+7.8/0.020/0.019+34 ms cpu, 703->833->802 MB, 1039 MB goal, 8 P
gc 192 @17.221s 17%: 2.9+7.0+11+4.7+4.5 ms clock, 23+7.0+0+6.5/2.0/0.001+36 ms cpu, 752->933->512 MB, 993 MB goal, 8 P
gc 193 @17.435s 17%: 0.11+0+0+0+4.3 ms clock, 0.93+0+0+6.5/2.0/0.001+34 ms cpu, 111->111->0 MB, 111 MB goal, 8 P (forced)
gc 194 @18.440s 16%: 0.072+0+0+0+2.9 ms clock, 0.58+0+0+6.5/2.0/0.001+23 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 195 @19.443s 15%: 0.18+0+0+0+3.2 ms clock, 1.4+0+0+6.5/2.0/0.001+25 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 196 @20.447s 15%: 0.12+0+0+0+2.8 ms clock, 0.99+0+0+6.5/2.0/0.001+22 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)

As you may expect the real truth told you by heap dump and pprof is a little bit misleading, since go proxy has eaten the memory and hasn't returned it back, at least kswapd and OS performance degrade heavily. It can be that this is some memory used by c++ libraries, but I doubt so since there are virtually no such number of allocations in the c++ library used for 10k requests even if it leaks like hell. Second, subsequent run of the same test eats noticeably less additional memory.
Running this test (gc.1.5.3) on wheezy results in 2Gb of resident memory used for the first run and 2.7Gb after the second.
Running it on fedora23 ends up with 4.3/4.4 Gb, but fedora is a bit faster server, I suspect concurrency in the web server is the issue who is in charge for that memory consumption.

pprof) top10
2048.59kB of 2048.59kB total (  100%)
Dropped 82 nodes (cum <= 10.24kB)
      flat  flat%   sum%        cum   cum%
 1536.56kB 75.01% 75.01%  1536.56kB 75.01%  runtime.malg
  512.03kB 24.99%   100%   512.03kB 24.99%  runtime.deferproc.func1
         0     0%   100%   512.19kB 25.00%  runtime.mcommoninit
         0     0%   100%   512.19kB 25.00%  runtime.mpreinit

@bioothod
Copy link
Contributor

bioothod commented Apr 6, 2016

@aclements I have a different idea to think about how this leak has been created, but I need your opinion on how to debug this. There will be 2 parts which are both related to leak but from quite different perspectives.

The first part is about leak itself, i.e. about memory used by the go proxy. It is unrelated to go version (at least 1.5.1, 1.5.3 and 1.6 behave similar but with different volumes of leaked ram).

Containers and test I crafted above lead to substantial (or plain huge) memory consumption because of quite ugly 10m allocation per request. This design choice is out of the scope of this post. I modified proxy to allocate 10m slice at every call to http handler and read data from small file from disk to that 10m area and then return it back to the http client via http.ResponseWriter.Write(10m_slice[:size_of_the_small_file_has_been_read]) and I observe the same huge memory leak of 4.4Gb when running on fedora system even using go 1.5.1

It looks like go 1.6 introduced something new to the web server (or goroutines scheduling or anything else) which started to work more parallel, which ends up calling http handler more times in parallel and each parallel call allocates 10Mb buffer. I highlight that there is no c++ code involved in this new test, only plain go http handler which does the following (error checks omitted for space saving in the post)

x := make([]byte, 10 * 1024 * 1024)
f, err := os.Open(key)
defer f.Close()

n, err := f.Read(x)

w.WriteHeader(http.StatusOK)
w.Write(x[:n])

Since amount of memory used by this test is the same as with running my previous c++/elliptics reader test, I suspect that running this new test compiled with go 1.6 will end up eating more memory just like in the previous test.

Heap dump, pprof top10 and gctrace show that go returns memory to the system, and there is no leak. Maybe this is true, but this behaviour is really unfriendly to the client. One virtually can not allocate somewhat big buffer in http handler since this memory will be leaked and by leaked I mean it will not be returned to the OS kernel, and eventually this proxy will be OOM-killed if number of parallel requests allocating large amount of ram is somewhat big (like thousands requests).
Problem here is not in the memory allocated (1k rps times 10Mb ends up with 10Gb), which is ok, but the fact that it is never returned back to OS kernel.

Which brings us to the second problem.
After test completes gctrace shows that memory consumptions goes to very small values like 0-1 megabytes, i.e. go thinks it returns memory back. But 1.6 returns memory much slower than 1.5, here are examples:

1.5

gc 553 @65.881s 14%: 3.2+0+0+0+7.4 ms clock, 25+0+0+15/0.002/0+59 ms cpu, 883->883->862 MB, 883 MB goal, 8 P (forced)
gc 554 @65.900s 14%: 8.5+72+17+18+9.7 ms clock, 68+72+0+35/0.41/0+78 ms cpu, 944->1825->1293 MB, 1294 MB goal, 8 P
gc 555 @66.027s 14%: 1.3+37+5.0+4.6+6.1 ms clock, 11+37+0+31/0.26/0+49 ms cpu, 1213->1584->1563 MB, 1804 MB goal, 8 P
gc 556 @66.088s 14%: 3.9+28+18+7.8+7.4 ms clock, 31+28+0+19/0.082/0+59 ms cpu, 1313->1747->1622 MB, 1879 MB goal, 8 P
gc 557 @66.162s 14%: 3.4+32+7.0+5.6+10 ms clock, 27+32+0+24/0.12/0.78+81 ms cpu, 523->913->580 MB, 723 MB goal, 8 P
gc 558 @66.222s 14%: 1.2+25+11+7.4+7.0 ms clock, 10+25+0+23/0.086/0.078+56 ms cpu, 390->752->750 MB, 571 MB goal, 8 P
gc 559 @66.276s 14%: 1.3+31+16+8.6+5.6 ms clock, 11+31+0+37/0.27/0+45 ms cpu, 400->851->830 MB, 585 MB goal, 8 P
gc 560 @66.342s 14%: 1.4+17+5.2+7.4+6.0 ms clock, 11+17+0+32/0.12/0+48 ms cpu, 439->681->669 MB, 644 MB goal, 8 P
gc 561 @66.381s 14%: 2.5+21+7.0+7.1+6.8 ms clock, 20+21+0+21/0.065/0.099+55 ms cpu, 198->509->508 MB, 283 MB goal, 8 P
gc 562 @66.427s 14%: 1.2+7.8+0.15+1.6+3.5 ms clock, 10+7.8+0+10/0.38/0.005+28 ms cpu, 478->538->538 MB, 702 MB goal, 8 P
gc 563 @66.448s 14%: 1.3+7.3+1.3+8.1+7.3 ms clock, 11+7.3+0+8.7/0.006/0.044+58 ms cpu, 448->599->517 MB, 582 MB goal, 8 P
gc 564 @66.475s 14%: 2.5+22+6.0+8.0+7.1 ms clock, 20+22+0+14/0.082/4.5+57 ms cpu, 257->558->557 MB, 356 MB goal, 8 P
gc 565 @66.523s 14%: 1.2+11+6.4+7.3+7.9 ms clock, 9.7+11+0+11/0.080/0.81+63 ms cpu, 317->537->537 MB, 460 MB goal, 8 P
gc 566 @66.557s 14%: 1.2+14+9.8+5.1+8.0 ms clock, 9.9+14+0+15/0.39/1.7+64 ms cpu, 417->667->666 MB, 610 MB goal, 8 P
gc 567 @66.597s 14%: 1.2+38+10+8.7+7.5 ms clock, 10+38+0+9.9/0.14/4.3+60 ms cpu, 566->1037->1026 MB, 834 MB goal, 8 P
gc 568 @66.665s 14%: 1.2+25+13+11+6.4 ms clock, 10+25+0+11/0.035/1.6+51 ms cpu, 796->1207->1206 MB, 1179 MB goal, 8 P
gc 569 @66.724s 14%: 1.3+4.9+7.0+4.5+6.2 ms clock, 10+4.9+0+7.9/0.029/0.037+50 ms cpu, 895->1018->1005 MB, 1328 MB goal, 8 P
gc 570 @66.754s 14%: 4.7+11+4.9+11+7.5 ms clock, 38+11+0+7.1/0.12/0.026+60 ms cpu, 245->465->364 MB, 308 MB goal, 8 P
gc 571 @66.795s 14%: 1.0+20+8.3+13+6.0 ms clock, 8.7+20+0+11/0.028/0.020+48 ms cpu, 314->655->644 MB, 457 MB goal, 8 P
gc 572 @66.844s 14%: 1.2+16+7.0+7.7+5.2 ms clock, 10+16+0+7.8/0.019/0.12+42 ms cpu, 584->805->794 MB, 861 MB goal, 8 P
gc 573 @66.884s 14%: 2.3+6.5+1.7+4.9+7.1 ms clock, 19+6.5+0+4.7/1.1/0+57 ms cpu, 674->796->764 MB, 981 MB goal, 8 P
gc 574 @66.910s 14%: 2.0+0+0+0+4.7 ms clock, 16+0+0+4.7/1.1/0+38 ms cpu, 264->264->133 MB, 264 MB goal, 8 P (forced)
gc 575 @66.918s 14%: 2.6+15+5.2+7.4+7.3 ms clock, 21+15+0+7.8/0.37/2.7+59 ms cpu, 163->413->322 MB, 199 MB goal, 8 P
gc 576 @66.957s 14%: 1.1+11+1.0+2.0+8.3 ms clock, 9.0+11+0+5.2/0.003/3.6+66 ms cpu, 292->443->442 MB, 424 MB goal, 8 P
gc 577 @66.982s 14%: 1.2+6.6+5.3+10+4.0 ms clock, 9.7+6.6+0+13/0.15/0.23+32 ms cpu, 312->492->491 MB, 453 MB goal, 8 P
gc 578 @67.010s 14%: 2.3+2.5+0.032+0.17+2.1 ms clock, 18+2.5+0+4.1/0.013/0.11+17 ms cpu, 421->432->431 MB, 602 MB goal, 8 P

test completed

gc 579 @67.918s 14%: 0.26+0+0+0+6.8 ms clock, 2.0+0+0+4.1/0.013/0.11+55 ms cpu, 242->242->0 MB, 242 MB goal, 8 P (forced)
gc 580 @68.925s 14%: 0.088+0+0+0+2.8 ms clock, 0.70+0+0+4.1/0.013/0.11+22 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 581 @69.928s 14%: 0.071+0+0+0+2.8 ms clock, 0.57+0+0+4.1/0.013/0.11+23 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 582 @70.932s 13%: 0.23+0+0+0+2.7 ms clock, 1.8+0+0+4.1/0.013/0.11+22 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 583 @71.935s 13%: 0.17+0+0+0+2.9 ms clock, 1.3+0+0+4.1/0.013/0.11+23 ms cpu, 1->1->0 MB, 1 MB goal, 8 P (forced)
gc 584 @72.938s 13%: 0.13+0+0+0+2.9 ms clock, 1.1+0+0+4.1/0.013/0.11+23 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)

1.6 is quite different - it allocates memory much faster and releases it much slower, I had to interrupt test since it ate 10+Gb of memory and computer started heavily lagging. I show below the whole test duration, i.e. 36 seconds compared to 72+ seconds above where test completed, while it is far from being completed below. Also, 1.6 timings are much worse, but it should be related to the memory consumption and since it became worse, it takes more time to allocate buffer, which in turn ends up longer handling http request.

gc 1 @1.110s 0%: 0.005+0+2.1 ms clock, 0.010+0/0/0+4.2 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 2 @2.113s 0%: 0.11+0+1.7 ms clock, 0.22+0/0/0+3.4 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 3 @3.115s 0%: 0.14+0+1.8 ms clock, 0.43+0/0/0+5.5 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 4 @4.118s 0%: 0.043+0+1.7 ms clock, 0.21+0/0/0+8.7 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 5 @5.120s 0%: 0.017+0+1.5 ms clock, 0.10+0/0/0+9.2 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 6 @6.122s 0%: 0.024+0+1.6 ms clock, 0.16+0/0/0+11 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 7 @7.124s 0%: 0.024+0+1.5 ms clock, 0.17+0/0/0+10 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 8 @8.125s 0%: 0.16+0+1.5 ms clock, 1.1+0/0/0+10 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 9 @9.127s 0%: 0.006+0+1.8 ms clock, 0.055+0/0/0+14 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 10 @10.129s 0%: 0.007+0+1.6 ms clock, 0.057+0/0/0+13 ms cpu, 0->0->0 MB, 0 MB goal, 8 P (forced)
gc 11 @10.151s 0%: 0.11+0.91+1.2 ms clock, 0.58+0.60/1.4/0+6.0 ms cpu, 21->21->20 MB, 22 MB goal, 8 P
gc 12 @10.154s 0%: 0.37+0.95+1.2 ms clock, 0.37+1.0/1.4/0.080+1.2 ms cpu, 31->51->50 MB, 52 MB goal, 8 P
gc 13 @10.157s 0%: 3.1+2.5+3.1 ms clock, 22+0.081/4.6/0.080+22 ms cpu, 60->141->140 MB, 76 MB goal, 8 P
gc 14 @10.166s 0%: 0.14+2.6+1.3 ms clock, 0.89+0.51/1.9/0+8.0 ms cpu, 151->161->160 MB, 211 MB goal, 8 P
gc 15 @10.171s 0%: 0.21+1.1+1.7 ms clock, 1.2+0.86/1.5/0+10 ms cpu, 191->281->281 MB, 241 MB goal, 8 P
gc 16 @10.174s 0%: 0.24+1.8+1.2 ms clock, 1.2+0.95/2.4/0+6.1 ms cpu, 301->381->381 MB, 421 MB goal, 8 P
gc 17 @10.188s 0%: 0.077+2.2+3.2 ms clock, 0.62+0.27/3.6/0+25 ms cpu, 401->422->421 MB, 571 MB goal, 8 P
gc 18 @10.196s 0%: 0.12+15+1.6 ms clock, 0.97+0.036/4.2/0+12 ms cpu, 521->582->581 MB, 632 MB goal, 8 P
gc 19 @10.216s 0%: 0.28+2.1+1.7 ms clock, 2.2+1.6/3.9/0+13 ms cpu, 752->802->802 MB, 872 MB goal, 8 P
gc 20 @10.226s 0%: 0.92+3.5+1.7 ms clock, 7.4+2.0/6.3/0+13 ms cpu, 1082->1162->1162 MB, 1203 MB goal, 8 P
gc 21 @10.243s 0%: 0.20+3.9+1.7 ms clock, 1.6+0.38/6.1/0+14 ms cpu, 1563->1643->1643 MB, 1743 MB goal, 8 P
gc 22 @10.488s 0%: 0.17+3.9+2.6 ms clock, 1.4+0/6.5/0+21 ms cpu, 2284->2304->2093 MB, 2464 MB goal, 8 P
gc 23 @11.132s 0%: 0.022+0+5.6 ms clock, 0.18+0/6.5/0+45 ms cpu, 2684->2684->2093 MB, 2684 MB goal, 8 P (forced)
gc 24 @11.303s 0%: 0.18+3.3+2.7 ms clock, 1.4+2.0/5.6/0+22 ms cpu, 3005->3035->2924 MB, 3140 MB goal, 8 P
gc 25 @11.340s 0%: 0.048+17+3.3 ms clock, 0.38+4.2/7.3/0+26 ms cpu, 4267->4307->4296 MB, 4387 MB goal, 8 P
gc 26 @12.147s 0%: 0.031+0+9.0 ms clock, 0.25+4.2/7.3/0+72 ms cpu, 6039->6039->5277 MB, 6039 MB goal, 8 P (forced)
gc 27 @13.162s 0%: 0.82+0+11 ms clock, 6.5+4.2/7.3/0+94 ms cpu, 7641->7641->6729 MB, 7641 MB goal, 8 P (forced)
gc 28 @14.145s 0%: 0.19+13+8.2 ms clock, 1.5+0/25/0+66 ms cpu, 9894->9904->9002 MB, 10094 MB goal, 8 P
gc 29 @14.196s 0%: 0.18+0+8.5 ms clock, 1.4+0/25/0+68 ms cpu, 9032->9032->9032 MB, 9032 MB goal, 8 P (forced)
gc 30 @15.222s 0%: 0.19+0+12 ms clock, 1.5+0/25/0+99 ms cpu, 9964->9964->8982 MB, 9964 MB goal, 8 P (forced)
gc 31 @16.237s 0%: 0.071+0+12 ms clock, 0.56+0/25/0+101 ms cpu, 9954->9954->9022 MB, 9954 MB goal, 8 P (forced)
gc 32 @17.259s 0%: 0.050+0+15 ms clock, 0.40+0/25/0+120 ms cpu, 10925->10925->9994 MB, 10925 MB goal, 8 P (forced)
gc 33 @18.295s 0%: 3.6+0+14 ms clock, 29+0/25/0+115 ms cpu, 10925->10925->10014 MB, 10925 MB goal, 8 P (forced)
gc 34 @19.315s 1%: 0.041+0+13 ms clock, 0.33+0/25/0+106 ms cpu, 10655->10655->9733 MB, 10655 MB goal, 8 P (forced)
gc 35 @20.337s 1%: 0.062+0+12 ms clock, 0.49+0/25/0+102 ms cpu, 9734->9734->8762 MB, 9734 MB goal, 8 P (forced)
gc 36 @21.359s 1%: 0.070+0+13 ms clock, 0.56+0/25/0+105 ms cpu, 8763->8763->7791 MB, 8763 MB goal, 8 P (forced)
gc 37 @22.373s 1%: 0.16+0+11 ms clock, 1.3+0/25/0+93 ms cpu, 7792->7792->6820 MB, 7792 MB goal, 8 P (forced)
gc 38 @23.385s 1%: 0.069+0+10 ms clock, 0.55+0/25/0+86 ms cpu, 6820->6820->5848 MB, 6820 MB goal, 8 P (forced)
gc 39 @24.398s 1%: 0.23+0+10 ms clock, 1.8+0/25/0+84 ms cpu, 5849->5849->4887 MB, 5849 MB goal, 8 P (forced)
gc 40 @25.417s 1%: 0.060+0+9.7 ms clock, 0.48+0/25/0+77 ms cpu, 4888->4888->3896 MB, 4888 MB goal, 8 P (forced)
gc 41 @26.433s 1%: 0.20+0+8.7 ms clock, 1.6+0/25/0+70 ms cpu, 3897->3897->2934 MB, 3897 MB goal, 8 P (forced)
gc 42 @27.448s 1%: 0.075+0+8.1 ms clock, 0.60+0/25/0+65 ms cpu, 2935->2935->1973 MB, 2935 MB goal, 8 P (forced)
gc 43 @28.457s 1%: 0.21+0+7.6 ms clock, 1.7+0/25/0+60 ms cpu, 1974->1974->982 MB, 1974 MB goal, 8 P (forced)
gc 44 @29.465s 1%: 0.12+0+6.4 ms clock, 0.99+0/25/0+51 ms cpu, 983->983->1 MB, 983 MB goal, 8 P (forced)
gc 45 @30.471s 1%: 0.039+0+3.9 ms clock, 0.31+0/25/0+31 ms cpu, 1->1->1 MB, 1 MB goal, 8 P (forced)
gc 46 @31.476s 1%: 0.043+0+3.6 ms clock, 0.34+0/25/0+29 ms cpu, 1->1->1 MB, 1 MB goal, 8 P (forced)
gc 47 @32.480s 1%: 0.030+0+2.9 ms clock, 0.24+0/25/0+23 ms cpu, 1->1->1 MB, 1 MB goal, 8 P (forced)

Basically, the second problem is the fact that after some changes in 1.6 web server started to call http handlers much more frequently in parallel and memory is released by go much slower than 1.5.

And to refresh, the first problem is that after go releases memory, i.e. gc magic tells us in gctrace that memory is no longer used, it is not returned to kernel and go proxy continues to occupy all that memory somewhere. This can be glibc allocator cache or it can be some other cache if go uses custom allocator.

Anyway, this all ends up with the case when allocating big slice in http handler will lead to OOM kill after a while if number of requests handled in parallel is rather big. 1.6 makes this problem much sharper.

I will cook up a much smaller test with this allocation strategy.

@bioothod
Copy link
Contributor

bioothod commented Apr 6, 2016

And this can be that there were no changes which ended up with more parallel http handlers, but it is all related to the fact, that go 1.6 releases memory much slower, so it looks like more handlers allocate more memory in parallel.

Please notice that in 1.6 gctrace above go stopped automatic GC invocation at some point, there are only handcrafted runtime.GC() calls in the server. I could stop the test at that point already, but yet number of automatic GC calls is hugely smaller than that in 1.5

@bioothod
Copy link
Contributor

bioothod commented Apr 6, 2016

Fun fact is that when using smaller concurrency on client, go eats/leaks less memory, looks like allocated memory is reused. If this memory resides in glibc cache (does go use glibc allocator, can I change it to something else by exporting LD_PRELOAD pointing to binary with different malloc/free symbols like tcmalloc/jemalloc?), this could partially explain the problem (allocation part, i.e. the first problem above), yet without solution.

I will cook up boost::asio test with the same allocation strategy to check whether it is system problem, but I do believe that this test will release memory to kernel.

@rsc
Copy link
Contributor

rsc commented May 18, 2016

Timed out.

@rsc rsc closed this as completed May 18, 2016
@aclements
Copy link
Member

@bioothod, to answer some of your questions: Go has its own memory manager from top to bottom (it doesn't use glibc at all, and garbage collection is so tightly integrated with allocation that it's probably impossible to build a good garbage collector on top of glibc malloc). The numbers reported by gctrace (and the heap profiler, etc) and by top are different because they are answering different questions, not because either of them is wrong. In particular, while Go's memory manager does return memory to the system, there's a few minutes of hysteresis built in to that mechanism (because there's overhead to returning memory to the system and then asking for it back), so the numbers in top should roughly match the peak memory usage over the last five minutes or so. There's also a difference between heap memory and total memory, since there are non-heap allocations such as stacks and runtime-internal memory (which should be small, but #15319 suggests they may not always be as small as they're supposed to be).

If you can create a small Go program that reproduces the issue, we may have a handle on debugging this (though it may have to wait for 1.8). Right now there's simply too much going on in this issue to effectively debug it.

@golang golang locked and limited conversation to collaborators May 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants