Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/ssh: client.NewSession can hang indefinitely #26643

Open
mborsz opened this issue Jul 27, 2018 · 6 comments
Open

x/crypto/ssh: client.NewSession can hang indefinitely #26643

mborsz opened this issue Jul 27, 2018 · 6 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@mborsz
Copy link

mborsz commented Jul 27, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.9.3

Does this issue reproduce with the latest release?

I'm not able to verify this.

What operating system and processor architecture are you using (go env)?

linux, amd64

What did you do?

In kubernetes e2e we are using ssh to fetch logs from kubernetes nodes.
In kubernetes/kubernetes#66609 we see that it quite frequently hangs for ~90 minutes in client.NewSession call (the stacktrace is there).

Relevant code is available here: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/log_size_monitoring.go#L245

What did you expect to see?

Attempt to create NewSession should finish with error if node doesn't respond to ssh connection.

What did you see instead?

Attempt to create NewSession hung for ~90 minutes.

Relevant stacktraces are available in:

@gopherbot gopherbot added this to the Unreleased milestone Jul 27, 2018
ymmt2005 added a commit to cybozu-go/cke that referenced this issue Oct 24, 2018
golang.org/x/crypto/ssh has some known issues that block clients
indefinitely when SSH server dies.

Ref: golang/go#26643
     golang/go#21420

To workaround the problem, this commit creates TCP connection
to the server by itself then passes it to ssh.NewClientConn to
control the underlying TCP connection directly.

It adds deadlines against the TCP connection before any SSH
activity.  It also enables TCP keepalive with short period.
@agnivade
Copy link
Contributor

agnivade commented Jan 7, 2019

/cc @hanwen

@agnivade agnivade added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 7, 2019
@hanwen
Copy link
Contributor

hanwen commented Jan 7, 2019

What is the problem here? Your stack trace suggests it's waiting for the remote end to acknowledge the SSH session. If you want timeouts, you should implement them separately.

Arguably, the SSH package should support contexts to do this neatly, but I think it might be an invasive change, API wise.

@mborsz
Copy link
Author

mborsz commented Jan 7, 2019

Thanks for the response!

What is the problem here? Your stack trace suggests it's waiting for the remote end to acknowledge the SSH session. If you want timeouts, you should implement them separately.
The problem is that there is no way (AFAIK) to prevent openChannel from blocking for hours in case the remote end never acks the SSH session.

I would like to see some timeout mechanism there. Could you hint how can I implement timeout there?

Arguably, the SSH package should support contexts to do this neatly, but I think it might be an invasive change, API wise.

@ymmt2005
Copy link

ymmt2005 commented Jan 7, 2019

@mborsz
We could avoid the problem by making a raw TCP net.Conn first and wrapping it
with ssh.NewClientConn. You can set any deadline to the raw connection before
calling SSH methods.

https://github.com/cybozu-go/cke/pull/81/files is the fix.

@hanwen
Copy link
Contributor

hanwen commented Jan 7, 2019

The SSH state machine doesn't support timeouts on a channel level. The only thing you can do is tear down the entire SSH connection if there is an error.

@EthanHemo
Copy link

You can see the problem in much smaller scale than k8s.

  1. Run command nc -lv 8090 locally
  2. Execute the following code:
sshConfig := &ssh.ClientConfig{
	HostKeyCallback: ssh.InsecureIgnoreHostKey(),
	Timeout:         10 * time.Second,
}
_, err := ssh.Dial("tcp", "<local_ip>:8090", sshConfig)
if err != nil {
	panic(err)
}

This will run indefinitely since the logic crypto@v0.13.0/ssh/transport.go:332:

_, err := io.ReadFull(r, buf[:])
if err != nil {
	return nil, err
}

is waiting for an input to read, but it won't get any since it open locally.

While this issue not likely to occur, it can cause DOS attack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants