Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: add LUCI linux-loong64 builder #65398

Open
abner-chenc opened this issue Jan 31, 2024 · 32 comments
Open

x/build: add LUCI linux-loong64 builder #65398

abner-chenc opened this issue Jan 31, 2024 · 32 comments
Assignees
Labels
arch-loong64 Issues solely affecting the loongson architecture. Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done. new-builder OS-Linux
Milestone

Comments

@abner-chenc
Copy link
Contributor

Proposal Details

@gopherbot, please add label new-builder

@abner-chenc
Copy link
Contributor Author

@mauri870 mauri870 added Builders x/build issues (builders, bots, dashboards) arch-loong64 Issues solely affecting the loongson architecture. OS-Linux and removed Proposal labels Jan 31, 2024
@mauri870 mauri870 added this to the Unreleased milestone Jan 31, 2024
@mauri870
Copy link
Member

Thanks for working on this!

@mknyszek
Copy link
Contributor

I'm on rotation so I'll try to get to this this week.

@mknyszek mknyszek self-assigned this Jan 31, 2024
@mknyszek mknyszek added the NeedsFix The path to resolution is known, but the work has not been done. label Jan 31, 2024
@dr2chase
Copy link
Contributor

dr2chase commented Feb 5, 2024

Does anyone know offhand if this is a slower-than-usual builder? (I'm on rotation this week, in the middle of doing the addition.)

@dr2chase
Copy link
Contributor

dr2chase commented Feb 5, 2024

and it does have network access?

@mknyszek
Copy link
Contributor

mknyszek commented Feb 5, 2024

@dr2chase The SLOW_HOST list you're referring to can be updated later if we deem them necessary.

NO_NETWORK does not have to do with having network access or not. They're platforms where we set up a special environment to run tests that explicitly need to run in a no-network environment.

@mknyszek mknyszek assigned dr2chase and unassigned mknyszek Feb 5, 2024
@gopherbot
Copy link

Change https://go.dev/cl/561339 mentions this issue: main.star: add linux-loong64 builder

gopherbot pushed a commit to golang/build that referenced this issue Feb 5, 2024
For golang/go#65398.

Change-Id: Ideeea51070055f154b8720ee811dd43563b544ec
Reviewed-on: https://go-review.googlesource.com/c/build/+/561339
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
@abner-chenc
Copy link
Contributor Author

I haven't fully configured loong64's builder yet, and some additional settings may be needed for the builder to work properly.
But sorry, I'm on vacation recently (about a week), I'll try to finish it in advance if possible.

@dr2chase
Copy link
Contributor

dr2chase commented Feb 6, 2024

The certificate you provided, it is for "linux-loong64-builder", could it instead be for "linux-loong64"? That would match the pattern for all the other builders. Thanks much, sorry for the hiccup, no rush if you are enjoying your vacation.

@abner-chenc
Copy link
Contributor Author

abner-chenc commented Feb 18, 2024

linux-loong64.csr.txt

This is the certificate I regenerated using hostname "linux-loong64". Please replace linux-loong64-builder.csr.txt with the current linux-loong64.csr.txt. 

Thanks

@abner-chenc
Copy link
Contributor Author

Gentle ping.
Are the certificates ready?

@prattmic
Copy link
Member

@abner-chenc
Copy link
Contributor Author

Thank you for the certificate.

I've tried to launch the swarming bot, but I came across this error:

luci_machine_tokend -backend luci-token-server.appspot.com -cert-pem /tmp/linux-loong64-1708541172.cert.txt -pkey-pem /home/golang/linux-loong64.key -token-file=/var/lib/luci_machine_tokend/token.json
[I2024-02-22T15:15:37.646633+08:00 1526740 0 iface.go:167] tsmon is disabled because no endpoint is configured
[I2024-02-22T15:15:37.649157+08:00 1526740 0 main.go:237] The token is valid, skipping the update
/tmp/bootstrapswarm -hostname linux-loong64 
2024/02/22 15:15:42 Bootstrapping the swarming bot with certificate authentication
2024/02/22 15:15:42 retrieving the luci-machine-token from the token file /var/lib/luci_machine_tokend/token.json (default path for GOOS != windows)
2024/02/22 15:15:42 Downloading the swarming bot
2024/02/22 15:15:44 Starting the swarming bot /root/.swarming/swarming_bot.zip
1526774 2024-02-22 07:22:00.529 E: Unable to open given url, https://chromium-swarm.appspot.com/swarming/api/v1/bot/server_ping, after 30 attempts or 360 timeout.
HTTPSConnectionPool(host='chromium-swarm.appspot.com', port=443): Max retries exceeded with url: /swarming/api/v1/bot/server_ping (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fffe0619050>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
1526774 2024-02-22 07:22:00.530 E: No response from server_ping
1526774 2024-02-22 07:26:05.915 E: Unable to open given url, https://chromium-swarm.appspot.com/swarming/api/v1/bot/handshake, after 30 attempts or 240 timeout.
HTTPSConnectionPool(host='chromium-swarm.appspot.com', port=443): Max retries exceeded with url: /swarming/api/v1/bot/handshake (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fffe0634d90>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
1526774 2024-02-22 07:26:05.915 E: Failed to contact for handshake, retrying in 0 sec...
1526774 2024-02-22 07:30:10.905 E: Unable to open given url, https://chromium-swarm.appspot.com/swarming/api/v1/bot/handshake, after 30 attempts or 240 timeout.
HTTPSConnectionPool(host='chromium-swarm.appspot.com', port=443): Max retries exceeded with url: /swarming/api/v1/bot/handshake (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fffe0644ad0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
1526774 2024-02-22 07:30:10.905 E: Failed to contact for handshake, retrying in 1 sec...
1526774 2024-02-22 07:34:19.128 E: Unable to open given url, https://chromium-swarm.appspot.com/swarming/api/v1/bot/handshake, after 30 attempts or 240 timeout.
HTTPSConnectionPool(host='chromium-swarm.appspot.com', port=443): Max retries exceeded with url: /swarming/api/v1/bot/handshake (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fffe05fe490>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
1526774 2024-02-22 07:34:19.128 E: Failed to contact for handshake, retrying in 3 sec...

Is there something I'm doing wrong?
Thanks

@prattmic
Copy link
Member

Can you access https://chromium-swarm.appspot.com/swarming/api/v1/bot/server_ping? That shouldn't require authentication or anything. It seems like there may be an issue with your network connection?

@abner-chenc
Copy link
Contributor Author

Can you access https://chromium-swarm.appspot.com/swarming/api/v1/bot/server_ping? That shouldn't require authentication or anything. It seems like there may be an issue with your network connection?

Link https://chromium-swarm.appspot.com/swarming/api/v1/bot/server_ping I can access it normally, I try to curl the command to retrieve it and return Server up

This is the content of swarming_bot.log

[root@loongson-3a5000-01 logs]# cat swarming_bot.log 
977 2024-02-26 09:45:34.248 I: start_bot with args: []
977 2024-02-26 09:45:34.735 I: importing bot_main: /root/.swarming/swarming_bot.1.zip, d859187601d58b8378b1618189bcac3594a5269f12dd6b32a46eeee1bd54c1d6
977 2024-02-26 09:45:34.737 I: [singleton] acquire: /root/.swarming/swarming.lck = <_io.BufferedRandom name='/root/.swarming/swarming.lck'>
977 2024-02-26 09:45:34.744 D: Starting new HTTPS connection (1): chromium-swarm.appspot.com:443
977 2024-02-26 09:46:34.823 W: Unable to open url https://chromium-swarm.appspot.com/swarming/api/v1/bot/server_ping on attempt 0: HTTPSConnectionPool(host='chromium-swarm.appspot.com', port=443): Max retries exceeded with url: /swarming/api/v1/bot/server_ping (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fffe2748850>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
977 2024-02-26 09:46:36.771 W: Retrying request https://chromium-swarm.appspot.com/swarming/api/v1/bot/server_ping, attempt 1/30...
977 2024-02-26 09:46:36.772 D: Starting new HTTPS connection (2): chromium-swarm.appspot.com:443
977 2024-02-26 09:47:36.825 W: Unable to open url https://chromium-swarm.appspot.com/swarming/api/v1/bot/server_ping on attempt 1: HTTPSConnectionPool(host='chromium-swarm.appspot.com', port=443): Max retries exceeded with url: /swarming/api/v1/bot/server_ping (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fffe2749d10>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
977 2024-02-26 09:47:38.603 W: Retrying request https://chromium-swarm.appspot.com/swarming/api/v1/bot/server_ping, attempt 2/30...
977 2024-02-26 09:47:38.604 D: Starting new HTTPS connection (3): chromium-swarm.appspot.com:443
977 2024-02-26 09:48:38.684 W: Unable to open url https://chromium-swarm.appspot.com/swarming/api/v1/bot/server_ping on attempt 2: HTTPSConnectionPool(host='chromium-swarm.appspot.com', port=443): Max retries exceeded with url: /swarming/api/v1/bot/server_ping (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fffe274ae10>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

Based on this https://go.dev/wiki/DashboardBuilders Network access requirements, links https://remotebuildexecution.googleapis.com I am unable to access it, everything else is fine

@abner-chenc
Copy link
Contributor Author

abner-chenc commented Feb 27, 2024

prattmic The problem is that I configured the proxy environment variables https_proxy and http_proxy, but swarming_bot did not use them, which caused the network to be inaccessible.

I didn't see relevant prompt information in the documentation and code of swarming_bot, so I don't know how to configure it to make these environment variables take effect.

@prattmic
Copy link
Member

I am not sure what is the correct way to configure a proxy for the swarming bot. The source code for the swarming bot is at https://source.chromium.org/chromium/infra/infra/+/main:luci/appengine/swarming/swarming_bot/bot_code/bot_main.py;l=1785;drc=0d534b7e80a34a79a114fa0a05e5e984190fdbf7 (this line is, I believe, where the server_ping call comes from).

I am not very familiar with this code, but digging down several layers to https://source.chromium.org/chromium/infra/infra/+/main:luci/client/utils/net.py;l=810;drc=f9002fe7a0ae6b2837c9c48112b87d820bff9ffa, it looks like this ultimately uses Python requests, which looks like it should respect the environment variables: https://requests.readthedocs.io/en/latest/user/advanced/#proxies. Perhaps there is some additional configuration I missed.

@abner-chenc
Copy link
Contributor Author

I am not sure what is the correct way to configure a proxy for the swarming bot. The source code for the swarming bot is at https://source.chromium.org/chromium/infra/infra/+/main:luci/appengine/swarming/swarming_bot/bot_code/bot_main.py;l=1785;drc=0d534b7e80a34a79a114fa0a05e5e984190fdbf7 (this line is, I believe, where the server_ping call comes from).

I am not very familiar with this code, but digging down several layers to https://source.chromium.org/chromium/infra/infra/+/main:luci/client/utils/net.py;l=810;drc=f9002fe7a0ae6b2837c9c48112b87d820bff9ffa, it looks like this ultimately uses Python requests, which looks like it should respect the environment variables: https://requests.readthedocs.io/en/latest/user/advanced/#proxies. Perhaps there is some additional configuration I missed.

This problem has been confirmed, that is, swarming_bot does not support system environment variable settings. Someone has reported this problem here luci/luci-py#323. After I tried adding this CL, I can see the builder of loong64 at https://7419-34ac013-dot-chromium-swarm.appspot.com/bot?id=linux-loong64

2024-02-28_20-01

Here are my changes:

diff --git a/api/os_utilities.py b/api/os_utilities.py
index 445d3f7..3e51f9f 100644
--- a/api/os_utilities.py
+++ b/api/os_utilities.py
@@ -252,6 +252,9 @@ def get_cpu_type():
     return 'arm64'
   if machine == 'mips64':
     return 'mips'
+  if machine == 'loongarch64':
+    return 'loong64'
+
   return machine
 
 
diff --git a/client/cipd.py b/client/cipd.py
index 316fd06..97f39bb 100644
--- a/client/cipd.py
+++ b/client/cipd.py
@@ -307,6 +307,8 @@ def get_platform():
     arch = 'armv6l' if python_bits == 32 else 'arm64'
   elif arch == 'powerpc64':  # OpenBSD's name for ppc64
     arch = 'ppc64'
+  elif arch == 'loongarch64':
+    arch = 'loong64'
 
   elif not arch and os_name == 'windows':
     # On some 32bit Windows7, platform.machine() returns None.
diff --git a/utils/net.py b/utils/net.py
index 35392d0..8a403ef 100644
--- a/utils/net.py
+++ b/utils/net.py
@@ -808,8 +808,14 @@ class RequestsLibEngine:
   def __init__(self):
     super(RequestsLibEngine, self).__init__()
     self.session = requests.Session()
+    #proxies = {
+    #   'http':'http://x.x.x.x:yy',
+    #   'https':'http://x.x.x.x:yy',
+    #}
+    #self.session.proxies.update(proxies)
+
     # Configure session.
-    self.session.trust_env = False
+    self.session.trust_env = True
     self.session.verify = tools.get_cacerts_bundle()
     # Configure connection pools.
     for protocol in ('https://', 'http://'):

@abner-chenc
Copy link
Contributor Author

I have enabled the system proxy environment settings and added Loong64 support to swarming-bot.

CL:
https://chromium-review.googlesource.com/c/infra/luci/luci-py/+/5334944
https://chromium-review.googlesource.com/c/infra/luci/luci-py/+/5334945

hubot pushed a commit to luci/luci-py that referenced this issue Mar 12, 2024
In Golang, the GOARCH defined for loongarch64 is loong64.

Related links:
	golang/go#46229
	golang/go#65398

Change-Id: Idd99c006e8c069f5936654bfafcccf4168f51876
Reviewed-on: https://chromium-review.googlesource.com/c/infra/luci/luci-py/+/5334945
Reviewed-by: Chan Li <chanli@chromium.org>
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>
@abner-chenc
Copy link
Contributor Author

I have enabled the system proxy environment settings and added Loong64 support to swarming-bot.

CL: https://chromium-review.googlesource.com/c/infra/luci/luci-py/+/5334944 https://chromium-review.googlesource.com/c/infra/luci/luci-py/+/5334945

prattmic 
  These two CLs have been merged. Can you update swarming_bot.zip?
  Thanks!

@abner-chenc
Copy link
Contributor Author

This error occurred while running the task: https://7419-34ac013-dot-chromium-swarm.appspot.com/task?id=685629cf2b55df10&w=true

@prattmic
Copy link
Member

cc @golang/release

@abner-chenc
Copy link
Contributor Author

I found the following error after the service ran for a while:
HTTPSConnectionPool(host='chromium-swarm.appspot.com', port=443): Max retries exceeded with url: /swarming/api/v1/bot/rbe/session/update (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response')))

But I have confirmed that the network is OK, Here is the detailed log: swarming_bot.log

@abner-chenc
Copy link
Contributor Author

This error occurred while running the task: https://7419-34ac013-dot-chromium-swarm.appspot.com/task?id=685629cf2b55df10&w=true

This problem seems to be happening again

@abner-chenc
Copy link
Contributor Author

I am ready to connect multiple builders, but how can I distinguish the BOT_ID? The environment variable SWARMING_BOT_ID seems to be deprecated.

@abner-chenc
Copy link
Contributor Author

I am ready to connect multiple builders, but how can I distinguish the BOT_ID? The environment variable SWARMING_BOT_ID seems to be deprecated.

Thanks, I know how to do this.

@abner-chenc
Copy link
Contributor Author

This error occurred while running the task: https://7419-34ac013-dot-chromium-swarm.appspot.com/task?id=685629cf2b55df10&w=true

This problem seems to be happening again

Gentle ping.

@abner-chenc
Copy link
Contributor Author

Can a task be triggered on this builder? If everything goes smoothly, I plan to take all the builders connected through the buildlet method offline and then switch them all to the LUCI access method

Thanks.

@abner-chenc
Copy link
Contributor Author

abner-chenc commented Apr 9, 2024

Friendly ping. @golang/release

Can you solve this error? ( https://7419-34ac013-dot-chromium-swarm.appspot.com/task?id=68c4af6ef5417c10)?
I want to switch the builder’s access method to LUCI.

Thanks

@cagedmantis
Copy link
Contributor

I'm working on resolving that error message. As soon as it is resolved, I will respond to this issue. Thanks for pushing this along.

@abner-chenc
Copy link
Contributor Author

I'm working on resolving that error message. As soon as it is resolved, I will respond to this issue. Thanks for pushing this along.

Gentle ping.
Thanks.

@joedian joedian self-assigned this Apr 24, 2024
@dmitshur
Copy link
Contributor

We're still working on this on our side, and will update this after it's resolved or more progress is made. Thanks for your patience and for pushing this forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-loong64 Issues solely affecting the loongson architecture. Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done. new-builder OS-Linux
Projects
Status: In Progress
Development

No branches or pull requests

9 participants