Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: testing: report energy consumption figures #30108

Closed
Mahdi89 opened this issue Feb 6, 2019 · 21 comments
Closed

proposal: testing: report energy consumption figures #30108

Mahdi89 opened this issue Feb 6, 2019 · 21 comments

Comments

@Mahdi89
Copy link

Mahdi89 commented Feb 6, 2019

I propose to report energy consumption figures in Go benchmarking machinery. This will allow Go developers to have some estimate of energy consumption of their programs as it's been shown that data movement accounts for 62.7% of energy usage in consumer devices. With energy factors becoming more and more important it's proved that impact of high-level code (e.g. by using more suitable data structures) on energy is orders of magnitude higher than low level/OS management etc. For the impact of high-level code on performance refer to https://github.com/dgryski/go-perfbook/blob/master/performance.md

With given allocated bytes in memory and ns/op I believe energy figures can be derived easily, however, architecture in use could have significant impact on these figures.

For instance, the following snippets show two different implementations where the first one uses variables locally and the second uses channels instead of variables. In terms of performance, the first one beats the second one by a factor of ~1000x (using Go's benchmarking tool). However, the second one avoids allocation of big data arrays and exploits serialisation (less random access), therefore I assume it must show better energy figures for real-world benchmarks:

func WithVariable() (int, int) {

	a := 2 + 3
	b := a + 3
	c := a + 5
	return b, c
}

func WithoutVariable() (int, int) {

	a := make(chan int, 2)
	a <- 2 + 3
	a <- 2 + 3
	return (<-a + 3), (<-a + 5)
}
@gopherbot gopherbot added this to the Proposal milestone Feb 6, 2019
@seebs
Copy link
Contributor

seebs commented Feb 6, 2019

I can't even imagine the second one showing better energy figures. Actual CPU time is a fairly solid proxy for energy consumption.

@ianlancetaylor
Copy link
Contributor

I agree with @seebs . I do not know of any reliable energy measurement that significantly differs from CPU time. And if there were one, it could differ based on the smallest details of the specific processor implementation and external memory and system bus, etc., details that the Go standard library does not currently check and that may not even be available.

So I think this proposal can not be adopted without a clear understanding and explanation of how to implement it.

@ianlancetaylor ianlancetaylor changed the title Proposal: testing: report energy consumption figures proposal: testing: report energy consumption figures Feb 6, 2019
@mvdan
Copy link
Member

mvdan commented Feb 6, 2019

This also depends on the target machine's power saving strategy. For example, most recent Android devices will try to go into "deep sleep" as often as possible, so any app or process that wakes up the device regularly is going to use more battery. On the other hand, laptops don't generally do deep sleep, but they do spin their fans and throttle when heat increases due to CPU usage spikes.

Seems to me like the best way to measure energy consumption is to actually run your program for a while under realistic conditions and measure the amount of energy drawn.

@Mahdi89
Copy link
Author

Mahdi89 commented Feb 6, 2019

@mvdan 'run your program for a while' isn't that what go test -run=x -bench=. dose?
@ianlancetaylor agreed, as I mentioned in the proposal it's gonna be architecture dependent. What I would like to see is rough energy figures. The reported performance figures aren't accurate either, are they? it all depends how the OS schedules the underlying threads..
@seebs the above snippet is just an example to show how serialisation via channels vs. random access could affect performance and energy. The effect could be not as bad as 1000x when big arrays are compared..

@Mahdi89
Copy link
Author

Mahdi89 commented Feb 6, 2019

I can't even imagine the second one showing better energy figures. Actual CPU time is a fairly solid proxy for energy consumption.

The Energy equation: Energy (J) = Power (W) x Time(s). Yes, reducing time implies a reduction in the energy consumed but the Power variable is not a constant. I have seen works that show time and energy could be in opposite direction too.

@seebs
Copy link
Contributor

seebs commented Feb 6, 2019

I would expect that in general, channels are going to be much more expensive than direct operations -- they have synchronization overhead, etcetera.

What I think you're getting at is a general contrast between holding large amounts of memory to operate on, contrasted with streaming operations which operate on only a small amount of data at a time, thus, use less storage?

I think it might be useful-ish, where possible, to expose performance data based on CPU performance counters or OS-level measurements of CPU time actually spent executing a process, as opposed to wall-clock time. Energy estimation might be much, much, harder -- we don't actually have particularly solid information on things like relative energy consumption of individual instructions in multicore systems, or a way to get it, apart from overall measurements of power consumption when running various workloads.

In practice, I think as long as you measure total CPU time used by the process, you're going to be fairly close to an accurate measure of approximate energy consumption. (Less so if, say, a GPU is also involved -- offloading things to a GPU might make runtime faster but cost significant energy to do so.)

See also: https://github.com/aclements/perflock, and perf.

@ianlancetaylor
Copy link
Contributor

@Mahdi89 The reported performance figures are as accurate as the OS provides. Dependencies on thread scheduling are an issue with how repeatable the numbers are, not how accurate they are. And we have tooling to deal with repeatability, such as https://godoc.org/golang.org/x/perf/cmd/benchstat .

But unlike performance, which we can measure directly using tools provided by the OS and the CPU, I do not know of any way to measure energy. My comments about CPU details were intended to point out that estimating energy use is infeasable. So if we can't measure it, and we can't estimate it, I don't see what useful information we can provide.a

@josharian
Copy link
Contributor

This also depends on the target machine's power saving strategy.

Both iOS and macOS prefer to sleep when possible. To that end, they offer APIs for approximate timers (“do this every five minutes or so”) so that they can coalesce timers, wake up, do lots of work, and go back to sleep. It might be interesting to investigate Go support for those, particularly for iOS.

But to repeat Ian, I don’t see how to measure power consumption even moderately reliably such that it could be reported at the end of a benchmark.

@Mahdi89
Copy link
Author

Mahdi89 commented Feb 6, 2019

@ianlancetaylor we might not be able to measure it but I think we can estimate it, e.g. using an analytical model for the Power (W) variable based on allocated memory, ns/op and number of memory operations vs. number of non-memory operations. Next, we already have the Time(s) variable to draw total energy.
@josharian how many power saving strategies are out there? aggressive, moderate and non I assume what developer would want to know.
@seebs exactly, by streaming data less storage will be needed.

@ianlancetaylor
Copy link
Contributor

@Mahdi89 I do not believe that an estimate would be useful, for various reasons mentioned above by various people. If you want to propose measuring allocated memory, ns/op, and number of memory operations, propose that. Let's not try to pretend that we are measuring energy use when we are measuring something else.

@seebs
Copy link
Contributor

seebs commented Feb 6, 2019

Amount of memory in use by a single program has very little effect on energy consumption. Amount of memory physically installed affects power consumption, amount of memory allocated doesn't directly matter much; we can't take memory out when it's not in use, and i don't think any current systems can power-save individual blocks of memory. So if you're on a machine with 8GB of memory, it draws about the same amount of power no matter how much of it is "allocated".

@Mahdi89
Copy link
Author

Mahdi89 commented Feb 7, 2019

@ianlancetaylor I don't think if that's a pretension, it's more of an approximation that requires to get as accurate as possible by collecting more and more parameters. These need to be observed by the current library over time I believe. It should give the programmer a sense to have better notion of the energy consumed by his/her program.

@Mahdi89
Copy link
Author

Mahdi89 commented Feb 7, 2019

@seebs regarding execution time I just did this simplistic experiment to show how the concept of using channels gets interesting when dealing with bigger arrays: https://github.com/Mahdi89/vartest

@seebs
Copy link
Contributor

seebs commented Feb 7, 2019

Can you clarify why you think that memory usage is lower for make(chan int, 10000) than for make([]int, 10000)?

@seebs
Copy link
Contributor

seebs commented Feb 7, 2019

Okay, looking over this, I think I understand. The underlying contention is that reduced memory usage would result in reduced energy consumption. I don't think that's generally true for specific computers. Reducing overall memory footprint of everything on a system might permit you to build systems with less memory in them, reducing energy consumption a bit, but reducing memory usage for individual apps is unlikely to significantly impact energy consumption in most cases. (There's some edge cases, like memory usage which causes a system to start swapping.)

But there's a secondary issue, which is that you keep using buffered channels to illustrate your contention. But a buffered channel still has that large storage space. So all you're doing is replacing simpler operations with more complex operations -- you aren't even reducing the amount of storage in use.

I also think you're seriously underestimating the cost of channels compared to the underlying operations. They're cheap by standards of concurrency systems, they're still very expensive by comparison to the underlying computations.

@dhobsd
Copy link
Contributor

dhobsd commented Feb 7, 2019

May I also suggest that, while example A may use more watt hours than example B, that's not a very interesting observation? If B is 1000 times slower, it is not a better solution from an energy efficiency perspective unless it also uses 1000 times less power. To illustrate this, consider that this implementation likely uses less power than both versions originally posted:

func SleepyVariable() (int, int) {
	a := 2 + 3
	time.Sleep(1*time.Second)
	b := a + 3
	time.Sleep(1*time.Second)
	c := a + 5
	time.Sleep(1*time.Second)
	return b, c
}

It's also relatively more expensive because you still have to power the machine to do basically nothing most of the time. To that end, I see the more interesting measurement to be computation performed per watt hour. If you use instructions retired as a proxy for watt hour usage, then what you should be looking for is maximizing instructions per cycle efficiency. That will give you the most cost-effective implementation.

@Mahdi89
Copy link
Author

Mahdi89 commented Feb 7, 2019

@seebs you're right, that would lead to systems with less memory in them. e.g. In FPGA based systems buffered channels get mapped onto on-chip FIFOs, so data remains local to the computation and data movement from/to off-chip memories (e.g DRAM) gets significantly reduced.
There is another aspect to using channels vs variables/arrays: random access patterns. Channels force processes to communicate serially. That alleviates the degree of randomness, and hence, data movement across the system. Something similar to storage-less or 'hot-potato' communication: https://users.ece.cmu.edu/~omutlu/pub/bless_isca09.pdf

@Mahdi89
Copy link
Author

Mahdi89 commented Feb 8, 2019

@dhobsd I agree with this: 'If B is 1000 times slower, it is not a better solution from an energy efficiency perspective unless it also uses 1000 times less power'. Now, why do I think B could be more power efficient than A: I believe channels utilise on-chip memories (caches) better than random access arrays allocated on off-chip memories (DRAM). Channels force serial communication they should have better cache utilisation therefore lesser data movement will be required when dealing with bigger data. This idea stems from the fact that in consumer devices almost two/third of the total energy usage is due to data movement.
By the way, this gap of 1000x starts to reduce when considering bigger data: https://github.com/Mahdi89/vartest

@rsc
Copy link
Contributor

rsc commented Feb 13, 2019

Package testing reports what the CPU or operating system can tell us.
If the CPU or operating system can start giving us accurate energy estimates
(for what happened during a two-second benchmark)
then great, let's have a discussion about how to report that number.
If not, there does not seem to be a compelling reason for us to try
to develop our own attempts at an accurate estimate. (They won't be accurate!)

@rsc rsc closed this as completed Feb 13, 2019
@Mahdi89
Copy link
Author

Mahdi89 commented Feb 15, 2019

@rsc agreed, our estimates won't be accurate enough, however as a developer I'd like to see some estimates to have an idea what implications of my data structure use is on energy. For instance, I've realised people use channel based implementation of free-lists to keep objects around and avoid GC. So I assume channels are cache friendlier and they can reduce data movement across the system (which is the main source for energy consumption).

@mvdan
Copy link
Member

mvdan commented Feb 15, 2019

@Mahdi89 It seems to me like you're basing your hypothesis on pure guesses. In practice, channels are a bad example, as they're expensive compared to simpler structures like slices or arrays.

The proposal has been declined, so I'd suggest you continue the discussion elsewhere, like golang-nuts. If you'd like to open another proposal or bug in the future, perhaps try to gather pieces of code which reliably differ in CPU cost and energy cost. You'll need hard evidence if you want a proposal this ambitious to be considered.

@golang golang locked and limited conversation to collaborators Feb 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants