Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/sys/windows/svc: no windows service recovery settings configuration #23239

Closed
madiganz opened this issue Dec 25, 2017 · 28 comments
Closed

x/sys/windows/svc: no windows service recovery settings configuration #23239

madiganz opened this issue Dec 25, 2017 · 28 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Windows
Milestone

Comments

@madiganz
Copy link

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.9.2

What did you expect to see?

Some way to configure a windows service with recovery settings when creating the service.

What did you see instead?

No configuration settings for this.

The Microsoft structures

SERVICE_FAILURE_ACTIONS
https://msdn.microsoft.com/en-us/library/windows/desktop/ms685939(v=vs.85).aspx

SC_ACTION
https://msdn.microsoft.com/en-us/library/windows/desktop/ms685126(v=vs.85).aspx

@madiganz madiganz changed the title Add windows service recovery settings configuration No windows service recovery settings configuration Dec 25, 2017
@alexbrainman
Copy link
Member

Thank you for creating the issue.

Some way to configure a windows service with recovery settings when creating the service.

What sort of configuration do you require? What cannot you do with the current code? How do you propose to change the package API to implement your requirement?

Alex

@alexbrainman alexbrainman changed the title No windows service recovery settings configuration x/sys/windows/svc: no windows service recovery settings configuration Dec 25, 2017
@gopherbot gopherbot added this to the Unreleased milestone Dec 25, 2017
@madiganz
Copy link
Author

madiganz commented Dec 25, 2017

Windows services can be set with options for the computer to respond when the service fails. These are all manually configurable through Window's services, but it would be great if these configurations would be included in the API.

These options include, actions taken after first failure, second failure, and subsequent failures. The possible actions are no action, restart the service, restart the computer, or to run a program.

Other settings include when to reset this failure count, and how long after the service fails should the action be triggered.

Currently, none of these options can be set with the current code. I propose to make it so these are configurable options that may or may not be used when creating the service. This would mean the fields for these settings would need to be added to the current configuration struct.

A different option would be to allow a user to create a service, and then add these recovery settings after. As Alex has mentioned that it is not good to modify the QUERY_SERVICE_CONFIG structure, this seems like the most viable choice.

@alexbrainman
Copy link
Member

@madiganz thank you very much for explaining what service recovery settings are available on Windows. If we decide to implement this, what API changes would x/sys/windows/svc or x/sys/windows/svc/mgr require to implement them? Please provide all new functions. methods, structures and whatever else you are planing to add or change including documentation to implement these.

Thank you

Alex

@madiganz
Copy link
Author

madiganz commented Dec 26, 2017

These two structures based on the windows structures will be added to windows/services.go

https://msdn.microsoft.com/en-us/library/windows/desktop/ms685939(v=vs.85).aspx

type SERVICE_FAILURE_ACTIONS struct {
	ResetPeriod  uint32
	RebootMsg    *uint16
	Command      *uint16
	ActionsCount uint32
	Actions      uintptr
}

https://msdn.microsoft.com/en-us/library/windows/desktop/ms685126(v=vs.85).aspx

type SC_ACTION struct {
	Type  uint32
	Delay uint32
}

These constants will also be added:

SC_ACTION_NONE        = 0 // This is default so no errors will be thrown if user did not change value
SC_ACTION_RESTART     = 1
SC_ACTION_REBOOT      = 2
SC_ACTION_RUN_COMMAND = 3

Let's assume we go with the 2nd choice that I mentioned above.

function SetServiceFailureActions will be added to windows/svc/mgr/mgr.go and will set the recovery settings based on parameters based in. This function will be able to be called independently of CreateService, but the windows handle will need to be passed in.

func SetServiceFailureActions(handle windows.Handle, firstFailureAction uint32, secondFailureAction uint32, subsequentFailureAction uint32, restartDelay uint32, resetFailCount uint32) error {}

These names can change, but the firstFailureAction, secondFailureAction, and subsequentFailureAction will be one of the actions that are mentioned below and will be defined in windows/svc/mgr/config.go.


NoRestart      = windows.SC_ACTION_NONE        // the service will not restart on failure
ServiceRestart = windows.SC_ACTION_RESTART     // the service will restart on failure
ComputerReboot = windows.SC_ACTION_REBOOT      // the computer will reboot on service failure
RunCommand     = windows.SC_ACTION_RUN_COMMAND // run a program on service failure

retartDelay is the Delay field in the SC_ACTION struct and is defined as the time to wait before performing the specified action, in millisecond.

resetFailCount is the ResetPeriod field in SERVICE_FAILURE_ACTIONS and is defined as the time after which to reset the failure count to zero if there are no failures, in seconds.

The array of SC_ACTIONS will be created inside the function SetServiceFailureActions, which is why those actions types should be passed into the function.

It would also be possible to create a RecoveryConfig struct, but this would not be based on any windows structure and would allow a user to set these values and pass this struct into the function.

Let me know if anything is unclear.

@madiganz
Copy link
Author

madiganz commented Dec 26, 2017

After looking at https://technet.microsoft.com/en-us/library/cc742019
Some sort of RecoveryConfig struct seems like the best implementation as it allows a user to more easily add different delays for each action.

This structure could have multiple implementations, but what seems like the easiest for the user would look something like this:

type RecoveryConfig struct {
    FirstFailureAction    uint32
    SecondFailureAction    uint32
    SubsequentFailureAction    uint32
    FirstFailureDelay    uint32
    SecondFailureDelay    uint32
    SubsequentFailureDelay    uint32
    ResetCount    uint32
}

@alexbrainman
Copy link
Member

the easiest for the user would look something like this:

type RecoveryConfig struct {
FirstFailureAction uint32
SecondFailureAction uint32
SubsequentFailureAction uint32
FirstFailureDelay uint32
SecondFailureDelay uint32
SubsequentFailureDelay uint32
ResetCount uint32
}

I would not know how to use this. You would need to write documentation explaining how to use that - whatever documentation will go in the source file. Can you do that?

There is some functionality missing from https://msdn.microsoft.com/en-us/library/windows/desktop/ms685939(v=vs.85).aspx in your implementation, for example lpRebootMsg. What if someone wants to use lpRebootMsg?

RecoveryConfig is just a struct. How are you going to use this struct to set service parameters? You probably need some functions or methods for that. What will they be? You will also need some way to read service parameters back into RecoveryConfig too. How are you going to do that? Some services will not have any recovery actions set (see https://msdn.microsoft.com/en-us/library/windows/desktop/ms685937(v=vs.85).aspx ) - how are you going to indicate that service is not going to have any recovery actions?

I am also concerned that this functionality might take too much room in the package. And people who read package documentation will not be able to find basic things among all that "recovery actions" documentations.

Alex

@madiganz
Copy link
Author

madiganz commented Dec 28, 2017

You would need to write documentation explaining how to use that - whatever documentation will go in the source file. Can you do that?

When the time comes, I should be able to do this.

There is some functionality missing from https://msdn.microsoft.com/en-us/library/windows/desktop/ms685939(v=vs.85).aspx in your implementation, for example lpRebootMsg. What if someone wants to use lpRebootMsg?

This will be something that will need to be added to the design, although I do not believe it adds much complexity.

How are you going to use this struct to set service parameters? You probably need some functions or methods for that. What will they be?

func (s *Service) SetFailureSettings(rc RecoveryConfig) error {
	t := []windows.SC_ACTION{
		{Type: rc.FirstFailureAction, Delay: rc.FirstFailureDelay},
		{Type: rc.SecondFailureAction, Delay: rc.SecondFailureDelay},
		{Type: rc.SubsequentFailureAction, Delay: rc.SubsequentFailureDelay},
	}

	m := windows.SERVICE_FAILURE_ACTIONS{ResetPeriod: rc.ResetCount, ActionsCount: uint32(3), Actions: uintptr(unsafe.Pointer(&t[0]))}	

	return windows.ChangeServiceConfig2(s.Handle, windows.SERVICE_CONFIG_FAILURE_ACTIONS, (*byte)(unsafe.Pointer(&m)))
}

Note that this does not currently include a way to set lpRebootMsg.

You will also need some way to read service parameters back into RecoveryConfig too. How are you going to do that?

This should be done using windows.QueryServiceConfig2

how are you going to indicate that service is not going to have any recovery actions?

The link you provided is does not set the recovery settings, but allows any recovery settings set to be used when the service has an error vs just failing. By default, a windows service is set to take no action, so if a user chooses not to set these values, it will act just as before.

I am also concerned that this functionality might take too much room in the package. And people who read package documentation will not be able to find basic things among all that "recovery actions" documentations.

I am hoping that there will not be an excessive amount of documentation.

@alexbrainman
Copy link
Member

When the time comes, I should be able to do this.

We cannot start implementing things without agreeing on external API, on the way your changes would look to package users. So you need to provide full interface changes to the package you are going to do including documentation. It is OK to skip not essential documentation, but we will need at least minimum required for new user to understand how to use all new features.

lthough I do not believe it adds much complexity.

Then include that in your proposed API.

This should be done using windows.QueryServiceConfig2

I am not asking how to implement "reading service parameters back into RecoveryConfig". I want to see the package API for that functionality - how would package user discover service "recovery actions"? Including documentation.

By default, a windows service is set to take no action, so if a user chooses not to set these values, it will act just as before.

That is fine. But if I have a service managed by my program, how would I discover if the service "is set to take no action" or "is set to take some action" on failure ?

I am hoping that there will not be an excessive amount of documentation.

Lets see all extra new API / new documentation and compare it with existing API / documentation. If there is too much extra reading / understanding for package users, then I am against this change.

Alex

@alexbrainman
Copy link
Member

Pushed wrong button. Sorry.

Alex

@madiganz
Copy link
Author

madiganz commented Jan 7, 2018

All external API will be in windows/svc/mgr/config.go

constants:

NoAction       = windows.SC_ACTION_NONE        // the service will not restart on failure
ServiceRestart = windows.SC_ACTION_RESTART     // the service will restart on failure
ComputerReboot = windows.SC_ACTION_REBOOT      // the computer will reboot on service failure
RunCommand     = windows.SC_ACTION_RUN_COMMAND // run a program on service failure

structs:

// Structure to hold a service's recovery actions when a failure occurs.
// Specifies one or more recoveru actions and their delay times (in milliseconds).
// Actions can either be NoAction, ServiceRestart, ComputerReboot, or RunCommand.
type RecoveryConfig struct {
	FirstFailureAction      uint32
	SecondFailureAction     uint32
	SubsequentFailureAction uint32
	FirstFailureDelay       uint32
	SecondFailureDelay      uint32
	SubsequentFailureDelay  uint32
	RebootMsg               *uint16 // specifies the message to be broadcast when a service fails. Used with ComputerReboot
	Command                 *uint16 // specifies the command-line command to be run when the specified service fails. Used with RunCommand action
	ResetCount              uint32  // specifies the length of the period (in seconds) with no failures after which the failure count should be reset to 0
}

functions:

// RecoveryConfig retrieves service's recovery configuration settings.
// This should be used to see current recovery configurations.
func (s *Service) RecoveryConfig() (RecoveryConfig, error)

// Updates service's recovery settings when a failure occurs.
// By default, a service will take no action when failures occur.
// Use RecoveryConfig rc to specify service recovery settings.
// Not all services allow changes to their failure options - Some run as part of a service set.
func (s *Service) SetFailureSettings(rc RecoveryConfig) error

@alexbrainman
Copy link
Member

Thank you for the doco. It is a good start. See my comments below.

// Specifies one or more recoveru actions and their delay times (in milliseconds).

This documentation is not very useful. It needs to explain to the person new to this what can be done here and how to achieve it.

Also remove " (in milliseconds)". Just make all correspondent time intervals time.Duration, and everyone will know how to deal with them without any explanation and extra comments.

FirstFailureAction uint32
SecondFailureAction uint32
SubsequentFailureAction uint32
FirstFailureDelay uint32
SecondFailureDelay uint32
SubsequentFailureDelay uint32

Why don't you do what Windows API do, something like

type RecoveryAction struct {
    Type int
    Delay time.Duration
}

type RecoveryConfig struct {
    Action []RecoveryAction
    ...
}

?

RebootMsg *uint16 // specifies the message to be broadcast when a service fails. Used with ComputerReboot
Command *uint16 // specifies the command-line command to be run when the specified service fails. Used with RunCommand action

Make these string.

ResetCount uint32 // specifies the length of the period (in seconds) with no failures after which the failure count should be reset to 0

This will be confusing to new users. You say "the failure count" here, but you never even explained what "the failure count" is and how to use it.

Again make it time.Duration.

// Not all services allow changes to their failure options - Some run as part of a service set.

I do not understand what "- Some run as part of a service set" here means. If you are just saying that sometimes RecoveryConfig() can fail, then you don't have to say that - this is implied.

func (s *Service) SetFailureSettings(rc RecoveryConfig) error

I would expect this to be called SetRecoveryConfig. Why SetFailureSettings? We could use FailureSettings everywhere instead of RecoveryConfig, but lets be consistent.

Also your API does not address my comment from #23239 (comment):

But if I have a service managed by my program, how would I discover if the service "is set to take no action" or "is set to take some action" on failure ?

I suppose we should also deal with "I want my service to just fail as it happens without me fiddling with any of these settings" scenario.

Alex

@madiganz
Copy link
Author

structs:

// Type can either be NoAction, ServiceRestart, ComputerReboot, or RunCommand.
// Each type should have a delay for when the action will be performed after the service fails
type RecoveryAction struct {
	Type 	int
	Delay	time.Duration
}

// Structure to hold a service's recovery actions when a failure occurs.
// These settings specify the actions that will be taken when a service fails.
// There can be up to 3 RecoveryActions. These correspond to the action taken after the
// first failure, second failure, and then any subsequent failures.
// A service keeps a failure count, or number of times the service has failed. ResetFailureCount can be used to reset
// this count to 0, so that the action taken if the service fails again will be the first action.
type RecoveryConfig struct {
	Action 		  []RecoveryAction
	RebootMsg 	  string // specifies the message to be broadcast when a service fails
	Command 	  string // specifies the command-line command to be run when the specified service fails
	ResetFailureCount time.Duration // specifies the length of the period (in seconds) with no failures after which the failure count should be reset to 0
}

functions:

// Updates service's recovery settings when a failure occurs.
// By default, a service will take no action when failures occur.
// Use RecoveryConfig rc to specify service recovery settings.
// If this is not used, recovery settings will be set normally,
// and the service will take no action if it fails.
func (s *Service) SetRecoverySettings(rc RecoveryConfig) error

Thank you for your feedback. Made changes to the structs and function above. If you want me to show everything still, I can do that.

But if I have a service managed by my program, how would I discover if the service "is set to take no action" or "is set to take some action" on failure ?

You would use RecoveryConfig() function as specified before, and then look at the struct and determine what the action types are.

I suppose we should also deal with "I want my service to just fail as it happens without me fiddling with any of these settings" scenario.

I address this in the comments for SetRecoverySettings, but in order to have it fail as normal (take no action), then this function should not be used.

@alexbrainman
Copy link
Member

// There can be up to 3 RecoveryActions. These correspond to the action taken after the
// first failure, second failure, and then any subsequent failures.

Why 3? Why not to allow to let user decide how many actions it wants to use?

If you want me to show everything still, I can do that.

Yes, please show full documentation. Thank you.

I address this in the comments for SetRecoverySettings

I do not see what you are referring to. Anyway, how about

// ClearRecoverySettings deletes all service s failure actions.
// Service s will not try to recover after failure after ClearRecoverySettings call.
func (s *Service) ClearRecoverySettings() error

?

And I just realized that our design does not include SERVICE_FAILURE_ACTIONS_FLAG https://msdn.microsoft.com/en-us/library/windows/desktop/ms685937(v=vs.85).aspx
Do you want to accommodate SERVICE_FAILURE_ACTIONS_FLAG somehow in your documentation?

Thank you.

Alex

@madiganz
Copy link
Author

madiganz commented Jan 22, 2018

Why 3? Why not to allow to let user decide how many actions it wants to use?

Windows service only allows for 3 (first, second, and subsequent)

windows/svc/mgr/config.go

constants:

NoAction       = windows.SC_ACTION_NONE        // the service will not restart on failure
ServiceRestart = windows.SC_ACTION_RESTART     // the service will restart on failure
ComputerReboot = windows.SC_ACTION_REBOOT      // the computer will reboot on service failure
RunCommand     = windows.SC_ACTION_RUN_COMMAND // run a program on service failure

structs:

// Type can either be NoAction, ServiceRestart, ComputerReboot, or RunCommand.
// Each type should have a delay for when the action will be performed after the service fails
type RecoveryAction struct {
	Type 	int
	Delay	time.Duration
}

// Structure to hold a service's recovery actions when a failure occurs.
// These settings specify the actions that will be taken when a service fails.
// There can be up to 3 RecoveryActions. These correspond to the action taken after the
// first failure, second failure, and then any subsequent failures.
// A service keeps a failure count, or number of times the service has failed. ResetFailureCount can be used to reset
// this count to 0, so that the action taken if the service fails again will be the first action.
type RecoveryConfig struct {
	Action 			  []RecoveryAction
	RebootMsg 		  string // specifies the message to be broadcast when a service fails
	Command 		  string // specifies the command-line command to be run when the specified service fails
	ResetFailureCount time.Duration // specifies the length of the period (in seconds) with no failures after which the failure count should be reset to 0
}

functions:

// RecoveryConfig retrieves service's recovery configuration settings.
// This should be used to see current recovery configurations.
func (s *Service) RecoveryConfig() (RecoveryConfig, error)

// Updates service's recovery settings when a failure occurs.
// By default, a service will take no action when failures occur.
// Use RecoveryConfig rc to specify service recovery settings.
// If this is not used, recovery settings will be set normally,
// and the service will take no action if it fails.
func (s *Service) SetRecoverySettings(rc RecoveryConfig) error

// Deletes all service s failure actions.
// Service s will not try to recover after failure after ClearRecoverySettings call.
func (s *Service) ClearRecoverySettings() error

// Sets whether or not recovery actions will occur when service s has a non crash failure.
func (s *Service) SetRecoveryActionFlag(bool flag) error

Windows API uses a ServiceFailureActionFlag struct, but I think it would be more clear and easier to use if it was just a boolean. I am not sure it would be useful to query for this flag, but if it is, then we should probably use ServiceFailureActionFlag

@alexbrainman
Copy link
Member

Windows service only allows for 3 (first, second, and subsequent)

I don't see where it says that. I am looking at
https://msdn.microsoft.com/en-us/library/windows/desktop/ms685939(v=vs.85).aspx
. Where do you look?

Windows API uses a ServiceFailureActionFlag struct, but I think it would be more clear and easier to use if it was just a boolean. I am not sure it would be useful to query for this flag, but if it is, then we should probably use ServiceFailureActionFlag

Lets drop SERVICE_FAILURE_ACTIONS_FLAG for now.

Also I tried to see how we would use our new API, and that did not work very well. For example, SERVICE_FAILURE_ACTIONS allows to "delete broadcast massage and have no message broadcast" - you set lpRebootMsg to "", lpsaActions to NULL and lpCommand to NULL. Similarly if you just want to change lpCommand, you set lpCommand to "blah blah blah", lpRebootMsg to NULL and lpCommand to NULL. How are we going to implement these requests?

Maybe we should have this API instead:

const (
	// Possible recovery actions that the service control manager can perform.
	NoAction       = windows.SC_ACTION_NONE        // no action
	ComputerReboot = windows.SC_ACTION_REBOOT      // reboot the computer
	ServiceRestart = windows.SC_ACTION_RESTART     // restart the service
	RunCommand     = windows.SC_ACTION_RUN_COMMAND // run a command
)

// RecoveryAction represents an action that the service control manager can perform when service fails.
// A service is considered failed when it terminates without reporting a status of SERVICE_STOPPED to the service controller.
type RecoveryAction struct {
	Type  int           // one of NoAction, ComputerReboot, ServiceRestart or RunCommand
	Delay time.Duration // the time to wait before performing the specified action
}

// RecoveryActions returns actions that service controller performs when service fails.
// The service control manager counts the number of times service s has failed since the system booted.
// The count is reset to 0 if the service has not failed for ResetPeriod seconds.
// When the service fails for the Nth time, the service controller performs the action specified in element [N-1] of returned slice.
// If N is greater than slice length, the service controller repeats the last action in the slice.
func (s *Service) RecoveryActions() ([]RecoveryAction, error)

// SetRecoveryActions sets actions that service controller performs when service fails.
// Pass nil to SetRecoveryActions to delete all recovery actions.
func (s *Service) SetRecoveryActions([]RecoveryAction) error

// RebootMessage is broadcast to server users before rebooting in response to the ComputerReboot service controller action.
func (s *Service) RebootMessage() (string, error)

// SetRebootMessage sets service s reboot message.
// If msg is "", the reboot message is deleted and no message is broadcast.
func (s *Service) SetRebootMessage(msg string) error

// ResetPeriod is the time after which to reset the service failure
// count to zero if there are no failures, in seconds.
func (s *Service) ResetPeriod() (uint32, error)

// SetResetPeriod sets the time after which to reset the service failure count to zero if there are no failures, in seconds.
// Specify INFINITE to indicate that service failure count should never be reset.
func (s *Service) SetResetPeriod(uint32) error

// RecoveryCommand is the command line of the process to execute in response to the RunCommand service controller action. This process runs under the same account as the service.
func (s *Service) RecoveryCommand() (string, error)

// SetRecoveryCommand sets the command line of the process to execute in response to the RunCommand service controller action.
// If cmd is "", the command is deleted and no program is run when the service fails.
func (s *Service) SetRecoveryCommand(cmd string) error

I think this will allow us to implement everything properly.

It gets bit too long, but I don't see how this can be made shorter. Another alternative, we could implement all this code in a external repository altogether - mgr.Service.Handle can be used to call any Windows API.

What do you think?

Alex

@madiganz
Copy link
Author

madiganz commented Feb 3, 2018

https://social.technet.microsoft.com/Forums/windows/en-US/3db76753-4607-4a20-97a0-790c73e379cc/the-actions-after-system-service-failure?forum=winserver8gen
From reading this and other things like this (including the examples in https://technet.microsoft.com/en-us/library/cc742019, more than 3 actions are never specified.

// When the service fails for the Nth time, the service controller performs the action specified in element [N-1] of returned slice.
// If N is greater than slice length, the service controller repeats the last action in the slice

Based on above comments, this will be something that will need to be fully tested.

For testing func (s *Service) SetResetPeriod(uint32) error I tried:
image
This somehow needs to be incorporated into func (s *Service) SetRecoveryActions([]RecoveryAction) error.
Maybe something like:
func (s *Service) SetRecoveryActions([]RecoveryAction, resetPeriod uint32) error

I have tested both RecoveryCommand and RebootMessage and they both can be individually set.

Is there a reason why you used uint32 for ResetPeriod instead of time.duration?

It gets bit too long, but I don't see how this can be made shorter. Another alternative, we could implement all this code in a external repository altogether - mgr.Service.Handle can be used to call any Windows API.

I agree with you that separating some of these calls out will allow the API to be more user friendly. I don't think it should be but in an external repository as these are specific to the configuration of a windows service so it makes perfect sense for them to be in windows/svc/mgr/config.go.

@alexbrainman
Copy link
Member

https://social.technet.microsoft.com/Forums/windows/en-US/3db76753-4607-4a20-97a0-790c73e379cc/the-actions-after-system-service-failure?forum=winserver8gen
From reading this and other things like this (including the examples in https://technet.microsoft.com/en-us/library/cc742019, more than 3 actions are never specified

We will be using Windows API to implement this. So we don't need to limit our options to 3 actions.

Based on above comments, this will be something that will need to be fully tested.

Sure thing. We can make appropriate test.

This somehow needs to be incorporated into func (s *Service) SetRecoveryActions([]RecoveryAction) error.
Maybe something like:
func (s *Service) SetRecoveryActions([]RecoveryAction, resetPeriod uint32) error

Sounds good to me. But, please, provide a documentation for this function before we start implementing it. So we can be clear about what it is going to do.

Is there a reason why you used uint32 for ResetPeriod instead of time.duration?

According to the documentation "... dwResetPeriod
The time after which to reset the failure count to zero if there are no failures, in seconds. ...". If we will allow for ResetPeriod to be time.Duration, and user passes time.Millisecond, what will you set dwResetPeriod parameter to? 0? 1? 2?

it makes perfect sense for them to be in windows/svc/mgr/config.go

Sure. They don't need to be in windows/svc/mgr/config.go file, just in windows/svc/mgr package.

Please provide final version of the doco (including all changes suggested by you), and then we could decide about implementing this thing.

Thank you.

Alex

@madiganz
Copy link
Author

windows/svc/mgr/recovery_config.go

const (
	// Possible recovery actions that the service control manager can perform.
	NoAction       = windows.SC_ACTION_NONE        // no action
	ComputerReboot = windows.SC_ACTION_REBOOT      // reboot the computer
	ServiceRestart = windows.SC_ACTION_RESTART     // restart the service
	RunCommand     = windows.SC_ACTION_RUN_COMMAND // run a command
)

// RecoveryAction represents an action that the service control manager can perform when service fails.
// A service is considered failed when it terminates without reporting a status of SERVICE_STOPPED to the service controller.
type RecoveryAction struct {
	Type  int           // one of NoAction, ComputerReboot, ServiceRestart or RunCommand
	Delay time.Duration // the time to wait before performing the specified action
}

// RecoveryActions returns actions that service controller performs when service fails.
// The service control manager counts the number of times service s has failed since the system booted.
// The count is reset to 0 if the service has not failed for ResetPeriod seconds.
// When the service fails for the Nth time, the service controller performs the action specified in element [N-1] of returned slice.
// If N is greater than slice length, the service controller repeats the last action in the slice.
func (s *Service) RecoveryActions() ([]RecoveryAction, error)

// SetRecoveryActions sets actions that service controller performs when service fails and
// the time after which to reset the service failure count to zero if there are no failures, in seconds.
// Specify INFINITE to indicate that service failure count should never be reset.
// In order for the reset period to be applied, recovery actions must also be set.
// Pass nil to SetRecoveryActions to delete all recovery actions.
func (s *Service) SetRecoveryActions([]RecoveryAction, resetPeriod uint32) error

// RebootMessage is broadcast to server users before rebooting in response to the ComputerReboot service controller action.
func (s *Service) RebootMessage() (string, error)

// SetRebootMessage sets service s reboot message.
// If msg is "", the reboot message is deleted and no message is broadcast.
func (s *Service) SetRebootMessage(msg string) error

// ResetPeriod is the time after which to reset the service failure
// count to zero if there are no failures, in seconds.
func (s *Service) ResetPeriod() (uint32, error)

// SetResetPeriod sets the time after which to reset the service failure count to zero if there are no failures, in seconds.
// Specify INFINITE to indicate that service failure count should never be reset.
func (s *Service) SetResetPeriod(uint32) error

// RecoveryCommand is the command line of the process to execute in response to the RunCommand service controller action. This process runs under the same account as the service.
func (s *Service) RecoveryCommand() (string, error)

// SetRecoveryCommand sets the command line of the process to execute in response to the RunCommand service controller action.
// If cmd is "", the command is deleted and no program is run when the service fails.
func (s *Service) SetRecoveryCommand(cmd string) error

@alexbrainman
Copy link
Member

// In order for the reset period to be applied, recovery actions must also be set.

Remove that line. I think this is implied. You cannot pass one parameter without passing the other.

// SetResetPeriod sets the time after which to reset the service failure count to zero if there are no failures, in seconds.
// Specify INFINITE to indicate that service failure count should never be reset.
func (s *Service) SetResetPeriod(uint32) error

I was under impression that we agreed not include SetResetPeriod, and use SetRecoveryActions to set ResetPeriod. No?

I think our API looks fine now. If you agree, you can start implementing it. I would start with just implementing RecoveryActions, ResetPeriod and SetRecoveryActions first (including new tests to verify that new code works. If that change is good, we could do RebootMessage / SetRebootMessage and RecoveryCommand / SetRecoveryCommand.

What do you think?

Alex

@madiganz
Copy link
Author

Sorry, I forgot to remove that from my document. I think that is a solid plan.

@madiganz
Copy link
Author

Would you like me to add these changes to https://go-review.googlesource.com/c/sys/+/85316 or make a new one?

@alexbrainman
Copy link
Member

Would you like me to add these changes to https://go-review.googlesource.com/c/sys/+/85316 or make a new one?

Whatever is easier for you.

Alex

@madiganz
Copy link
Author

In order for someone to use ComputerReboot for one of the service failure actions, the process needs to gain privilege to rebooting the computer.

https://msdn.microsoft.com/en-us/library/aa376871(VS.85).aspx

I believe the currently functionality in windows/security_windows.go does not support adjusting the security token. Is this something I should implement as well?

@alexbrainman
Copy link
Member

Is this something I should implement as well?

Not as a solution to this current issue.

Alex

@andybons andybons added the NeedsFix The path to resolution is known, but the work has not been done. label Mar 26, 2018
@madiganz
Copy link
Author

madiganz commented Apr 4, 2018

@gopherbot
Copy link

Change https://golang.org/cl/104635 mentions this issue: windows, windows/svc/mgr: Add ability to recovery from service failure.

gopherbot pushed a commit to golang/sys that referenced this issue Jul 4, 2018
…ings.

Added configuration options for a windows service recovery settings.
Current configurations include modifying actions taken when a service
fails, setting the reset period, and getting the current recovery
settings.

Updates golang/go#23239

Change-Id: I4e91b2068122731e6eba3332afb0fe300b298c97
Reviewed-on: https://go-review.googlesource.com/104635
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
@gopherbot
Copy link

Change https://golang.org/cl/122579 mentions this issue: windows, windows/svc/mgr: add ability to set a reboot message and command when a service fails.

@alexbrainman
Copy link
Member

This issue is closed by golang/sys@ac767d6 I do not know why Github did not close this issue automatically.

Alex

josephlr pushed a commit to josephlr/sys that referenced this issue Aug 14, 2018
… a service fails

Added configuration options for a windows service recovery settings.
New configurations include modifying the reboot message, or command
to be run when a service fails, and getting the current reboot message
or command.

Fixes golang/go#23239

Change-Id: I3e501d66e97745b7536fd654aee2bba488083e6d
Reviewed-on: https://go-review.googlesource.com/122579
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
@golang golang locked and limited conversation to collaborators Jul 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Windows
Projects
None yet
Development

No branches or pull requests

4 participants