Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builders: taking too long to create new pods? #22598

Closed
rsc opened this issue Nov 6, 2017 · 6 comments
Closed

builders: taking too long to create new pods? #22598

rsc opened this issue Nov 6, 2017 · 6 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Nov 6, 2017

https://farmer.golang.org/try?commit=c007a205 has a bunch of linux builders on "running 18 minutes". Here's one log:

linux-amd64 rev c007a205 (trybot set for I4ddf6e1); running; http://10.0.4.156 Kube Pod: buildlet-linux-kubestd-rn4187cd8, 18m43.343322571s ago
  2017-11-06T16:21:07Z checking_for_snapshot 
  2017-11-06T16:21:07Z finish_checking_for_snapshot after 164.4ms
  2017-11-06T16:21:07Z get_buildlet 
  2017-11-06T16:21:07Z creating_kube_pod buildlet-linux-kubestd-rn4187cd8
  2017-11-06T16:21:07Z pod_creating 
  2017-11-06T16:35:48Z pod_created 
  2017-11-06T16:35:48Z got_pod_info waiting_for_buildlet...
  2017-11-06T16:35:48Z finish_get_buildlet after 14m41.1s
  2017-11-06T16:35:48Z using_buildlet 10.0.4.156:80

Note the 14 minute gap between pod_creating and pod_created.

Do we need to add more capacity?

/cc @bradfitz

@rsc rsc added this to the Unreleased milestone Nov 6, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Nov 6, 2017

Likely. I haven't looked at the builders since I've been back. Will do later today.

/cc @adams-sarah @andybons --- either of you notice this getting bad lately?

@bradfitz bradfitz added the Builders x/build issues (builders, bots, dashboards) label Nov 6, 2017
@s-mang
Copy link
Contributor

s-mang commented Nov 6, 2017

Last we talked about upping capacity we decided not to b/c the fluctuations in builder workload are such that we’ll “always” run into the limit at some point.
But other than that tidbit, no I have no opinion, and have not not seen anything out of the ordinary (these fluctuations seem to be ordinary).

@rsc
Copy link
Contributor Author

rsc commented Nov 6, 2017

Are there any metrics about median pod-creation time over the past few months?

@bradfitz
Copy link
Contributor

bradfitz commented Nov 6, 2017

Yeah, it's all logged to BigQuery. Somebody just needs to write an SQL query.

@bradfitz
Copy link
Contributor

bradfitz commented Nov 6, 2017

Notes to self, but others can play:

https://bigquery.cloud.google.com/results/symbolic-datum-552:bquijob_48ba253e_15f9301aacf?pli=1

SELECT
  Builder,
  Year(StartTime)*100+Week(StartTime) as YYYYWW,
  NTH(501, QUANTILES(Seconds, 1001)) as P50,
  NTH(951, QUANTILES(Seconds, 1001)) as P95,
  COUNT(*) as Count
FROM
  builds.Spans
WHERE
  Seconds>0 and Event="write_snapshot_tar" AND
  Builder="linux-arm" And IsTry = True
GROUP BY
  Builder, YYYYWW
ORDER BY
  Builder, YYYYWW

The other interesting Event type is get_buildlet:

SELECT
  Builder,
  Year(StartTime)*100+Week(StartTime) as YYYYWW,
  NTH(501, QUANTILES(Seconds, 1001)) as P50,
  NTH(951, QUANTILES(Seconds, 1001)) as P95,
  COUNT(*) as Count
FROM
  builds.Spans
WHERE
  Seconds>0 and Event="get_buildlet" AND
  Builder="linux-amd64" And IsTry = True
GROUP BY
  Builder, YYYYWW
ORDER BY
  Builder, YYYYWW
Row | Builder | YYYYWW | P50 | P95 | Count
-- | -- | -- | -- | -- | --
1 | linux-amd64 | 201641 | 5.159103486 | 5.502129208 | 27 |  
2 | linux-amd64 | 201642 | 5.172505746 | 11.472289336 | 233 |  
3 | linux-amd64 | 201643 | 5.155149073 | 637.49585534 | 383 |  
4 | linux-amd64 | 201644 | 5.230459826 | 1282.983028915 | 449 |  
5 | linux-amd64 | 201645 | 5.161286177 | 10.162791389 | 305 |  
6 | linux-amd64 | 201646 | 5.166425553 | 10.906560458 | 140 |  
7 | linux-amd64 | 201647 | 5.158037605 | 10.151957412 | 114 |  
8 | linux-amd64 | 201648 | 5.024652822 | 5.323652471 | 775
9 | linux-amd64 | 201649 | 5.024453722 | 5.386522303 | 268 |  
10 | linux-amd64 | 201650 | 5.127671524 | 5.500965332 | 110 |  
11 | linux-amd64 | 201651 | 5.134561463 | 5.471936887 | 98 |  
12 | linux-amd64 | 201652 | 5.113210525 | 5.230564361 | 65 |  
13 | linux-amd64 | 201653 | 5.168110496 | 5.500492794 | 28 |  
14 | linux-amd64 | 201701 | 5.162542146 | 5.532547664 | 78 |  
15 | linux-amd64 | 201702 | 5.139340988 | 11.064165749 | 125 |  
16 | linux-amd64 | 201703 | 5.093717237 | 5.348959758 | 55 |  
17 | linux-amd64 | 201704 | 5.221774831 | 407.048154548 | 104 |  
18 | linux-amd64 | 201705 | 5.26305578 | 143.882530348 | 238 |  
19 | linux-amd64 | 201706 | 5.165460737 | 10.645747147 | 319 |  
20 | linux-amd64 | 201707 | 5.053820078 | 17.197521418 | 472 |  
21 | linux-amd64 | 201708 | 5.109949546 | 21.868360725 | 219 |  
22 | linux-amd64 | 201709 | 5.116179421 | 513.852034865 | 308 |  
23 | linux-amd64 | 201710 | 5.1281634480000005 | 10.59649347 | 220 |  
24 | linux-amd64 | 201711 | 5.181949398 | 35.604151069 | 225 |  
25 | linux-amd64 | 201712 | 5.12220003 | 130.162668457 | 294 |  
26 | linux-amd64 | 201713 | 5.173122457 | 661.305827716 | 252 |  
27 | linux-amd64 | 201714 | 5.116764695 | 395.375861803 | 216 |  
28 | linux-amd64 | 201715 | 5.118663224 | 17.636606107 | 212 |  
29 | linux-amd64 | 201716 | 5.105974973 | 514.580457604 | 314 |  
30 | linux-amd64 | 201717 | 5.0830206 | 10.343634243 | 278 |  
31 | linux-amd64 | 201718 | 5.06051674 | 10.75187722 | 146 |  
32 | linux-amd64 | 201719 | 5.043152229 | 5.257027995 | 156 |  
33 | linux-amd64 | 201720 | 5.066021215 | 5.421258958 | 113 |  
34 | linux-amd64 | 201721 | 5.050594972 | 5.236382346 | 150 |  
35 | linux-amd64 | 201722 | 5.057175634 | 5.658022007 | 64 |  
36 | linux-amd64 | 201723 | 5.046606781 | 16.844070255 | 159 |  
37 | linux-amd64 | 201724 | 5.040313456 | 15.580424937 | 153 |  
38 | linux-amd64 | 201725 | 5.042714123 | 501.915287187 | 114 |  
39 | linux-amd64 | 201726 | 5.054075412 | 5.254687633 | 115 |  
40 | linux-amd64 | 201727 | 5.046645584 | 5.248499417 | 60 |  
41 | linux-amd64 | 201728 | 5.06583029 | 20.040213734 | 79 |  
42 | linux-amd64 | 201729 | 5.058221665 | 20.378777127 | 117 |  
43 | linux-amd64 | 201730 | 5.064584957 | 5.477534545 | 53 |  
44 | linux-amd64 | 201731 | 5.096214737 | 21.151727401 | 105 |  
45 | linux-amd64 | 201732 | 140.06589253 | 2756.808954302 | 394 |  
46 | linux-amd64 | 201733 | 10.190890034 | 655.301846045 | 396 |  
47 | linux-amd64 | 201734 | 5.723519224 | 640.355835846 | 272 |  
48 | linux-amd64 | 201735 | 5.23396237 | 190.699682693 | 231 |  
49 | linux-amd64 | 201736 | 5.187438133 | 16.807368066 | 131 |  
50 | linux-amd64 | 201737 | 5.130365261 | 15.39816034 | 187 |  
51 | linux-amd64 | 201738 | 5.116743211 | 552.502376851 | 229 |  
52 | linux-amd64 | 201739 | 5.103093751 | 20.191637861 | 180 |  
53 | linux-amd64 | 201740 | 5.344963095 | 502.212763144 | 315 |  
54 | linux-amd64 | 201741 | 5.160710547 | 190.370843482 | 304 |  
55 | linux-amd64 | 201742 | 11.210894478 | 1382.560308294 | 364 |  

@bradfitz
Copy link
Contributor

bradfitz commented Mar 8, 2019

Closing, as we no longer use GKE for builds.

@bradfitz bradfitz closed this as completed Mar 8, 2019
@golang golang locked and limited conversation to collaborators Mar 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

4 participants