Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: x/mobile audio #13432

Closed
rakyll opened this issue Nov 30, 2015 · 24 comments
Closed

proposal: x/mobile audio #13432

rakyll opened this issue Nov 30, 2015 · 24 comments

Comments

@rakyll
Copy link
Contributor

rakyll commented Nov 30, 2015

In the scope of the Go mobile project, an audio package that supports decoding and playback is a top priority. The current status of audio support under x/mobile is limited to OpenAL bindings and an experimental high-level audio player that is backed by OpenAL.

The current experimental audio package fails to

  • provide high level abstractions to represents audio and audio processors,
  • implement a memory-efficient playback model,
  • implement decoders (e.g. an mp3 decoder),
  • support live streaming or other networking audio sources.

In order to address these concerns, I am proposing core abstractions and a minimal set of features based on the proposed abstractions to provide decoding and playback support.

See the proposal document for further details.

@rakyll rakyll self-assigned this Nov 30, 2015
@gopherbot
Copy link

CL https://golang.org/cl/17262 mentions this issue.

@mattetti
Copy link
Contributor

mattetti commented Dec 1, 2015

It looks great!

How about adding AudioFormat string to the Clip interface? For instance a wav audio file can be encoded in a variety of coding formats. It might not belong to the interface since it's a detail that maybe only the decoder cares but I can think of a few examples where it would be useful.

WAV and AIFF both support float PCM values even though the use of float values is unpopular. Should we consider supporting float values?

I think we should, any kind of audio processing would more than likely done in float values, using float right away seems to be a better move IMHO (not a deal breaker tho).

Decoding on desktop. The package will use the system codec libraries provided by Android and iOS on mobile devices. It is not possible to provide feature parity for desktop envs in the scope of decoding.

We can implement the decoders in pure Go code for when codecs are not available. I have partially implemented such decoders and while they might not be as fast as dedicated/optimized codecs, I think they should work well enough.

@mattetti
Copy link
Contributor

mattetti commented Dec 1, 2015

L117 small typo, Clips represents PCM audio should read Clips represent PCM audio

@mattetti
Copy link
Contributor

mattetti commented Dec 1, 2015

Question, why would we have a generic DecodeWAVBytes decoder function? It seems very specific to wav decoders only, doesn't it? DecodeBytes might be a better choice (?)

@mattetti
Copy link
Contributor

mattetti commented Dec 1, 2015

I created a quick and dirty but conform aiff decoder to get a feel for the API: https://github.com/mattetti/exp/blob/master/audio/aiff/decoder.go

I think it might be missing an API to get frames out of a clip. The current proposal goes all the way to get the clip and frame info but we can't get the actual PCM data (which would probably some sort of [][int]float format). Depending on the use case, we might not want to load the entire audio file in memory, so the API might want to take that in consideration.

What do you think?

@rakyll
Copy link
Contributor Author

rakyll commented Dec 1, 2015

Thanks for the feedback!

How about adding AudioFormat string to the Clip interface? For instance a wav audio file can be encoded in a variety of coding formats. It might not belong to the interface since it's a detail that maybe only the decoder cares but I can think of a few examples where it would be useful.

The AudioFormat would represent the coding format of the original source, not the Clip's. The decoders won't work with Clips but arbitrary encoded streams of bytes and convert them into a Clip. The decoder can return the audio format of the original source. Consider the following decoder function:

func Decode(src io.ReadSeeker) (clip Clip, format string, err error)

I think we should, any kind of audio processing would more than likely done in float values, using float right away seems to be a better move IMHO (not a deal breaker tho).

This is unfortunately coming with a performance cost. We need to find a way to support float values but also allow integer-only values for faster processing for those who doesn't care about float-level precision.

We can implement the decoders in pure Go code for when codecs are not available.

We won't have enough human resources probably to reimplement decoders in Go, therefore using the available codecs is going to be our initial step. The proposal is not against of a decoder being implemented in vanilla Go though.

Question, why would we have a generic DecodeWAVBytes decoder function? It seems very specific to wav decoders only, doesn't it? DecodeBytes might be a better choice (?)

It was a sample for decoder implementations, I was not proposing it to be in the audio package. DecodeBytes is a better name in any case, I will update the proposal.

I think it might be missing an API to get frames out of a clip.

We had an earlier debate about providing APIs that work with frames rather than byte slices. The previous API was in the lines of what's below.

type Frame []int64

type Clip interface {
     Read(f []Frame, offset int64) (n int, err error)
}

There are two arguments against the frame-based APIs:

  • It is easy to calculate how long a frame is, therefore working with byte slices doesn't require much additional work.
  • Read(f []Frame, offset int64) (n int, err error) almost reads like io.ReadSeeker. Not introducing a new interface type allows us to reuse utility functions from the stdlib on Clip instances (such as io.Copy).

@mattetti
Copy link
Contributor

mattetti commented Dec 1, 2015

My only concern is that a frame from a wav file can contain data using different encoding and non PCM formats will definitely not have PCM frames. For instance, if one would be to write a mp3 decoder, would the decoder have to provide a clip with an io.ReadSeeker that would return PCM data only?
If that's the case, you might want to make that a little bit clearer.

@rakyll
Copy link
Contributor Author

rakyll commented Dec 2, 2015

For instance, if one would be to write a mp3 decoder, would the decoder have to provide a clip with an io.ReadSeeker that would return PCM data only?

Correct. You can read only and only PCM formatted data from clips. A decoder is responsible to decode any input down to a PCM formatted frames. I am going to update the document to make it clearer.

gopherbot pushed a commit to golang/proposal that referenced this issue Dec 8, 2015
Updates golang/go#13432.

Change-Id: I718006e8f039c476d456c1276c54132bd66d9410
Reviewed-on: https://go-review.googlesource.com/17262
Reviewed-by: Burcu Dogan <jbd@google.com>
@mewmew
Copy link
Contributor

mewmew commented Dec 9, 2015

provide high level abstractions to represents audio and audio processors,

Some work has already been done in this domain, by the audio package of Azul3d. It is intended to define interfaces analogous to the Image interface of the standard library for audio.

Perhaps some ideas could be stolen from its design. @slimsag is the original developer, and he may have some insight into problems encountered when designing the interface, and how they were solved by Azul3d.

implement decoders (e.g. an mp3 decoder),

There exist at least two FLAC decoders in pure Go.

  1. https://github.com/eaburns/flac (GoDoc)
  2. https://github.com/mewkiz/flac (GoDoc)

The second FLAC package is conceptually a back-end for decoding FLAC audio files, for which one or more front-end may be implemented. Currently a front-end has been implemented for the Azul3d audio abstraction, and it should be trivial to implement another front-end for the Go mobile audio abstraction. Analogously to the image package of the standard library, to decode a FLAC audio stream, it suffices to have

import _ "azul3d.org/audio/flac.v0"

in a program's main package.

Below follows a short example program for decoding FLAC audio files with Azul3d:

package main

import (
    "flag"
    "fmt"
    "log"
    "os"

    "azul3d.org/audio.v1"
    _ "azul3d.org/audio/flac.v0" // Add FLAC decoding support.
)

func main() {
    flag.Parse()
    for _, path := range flag.Args() {
        err := parseFLAC(path)
        if err != nil {
            log.Fatal(err)
        }
    }
}

func parseFLAC(path string) error {
    // Open file.
    f, err := os.Open(path)
    if err != nil {
        return err
    }
    defer f.Close()

    // Create decoder.
    dec, _, err := audio.NewDecoder(f)
    if err != nil {
        return err
    }

    // Decode audio stream.
    for {
        samples := make(audio.PCM32Samples, 1024)
        n, err := dec.Read(samples)
        if err != nil {
            if err == audio.EOS {
                return nil
            }
            return err
        }
        fmt.Println(samples[:n])
    }
}

I hope this may give some ideas, and hope improve the API and design of the Go mobile audio package.

Cheers /u

@rsc rsc changed the title proposal: mobile audio proposal: x/mobile audio Dec 28, 2015
@rsc rsc added this to the Proposal milestone Dec 28, 2015
@rakyll
Copy link
Contributor Author

rakyll commented Feb 16, 2016

Thanks for the input. The proposal is currently on hold and require more work on a variety of topics:

  • The interfaces need to consider VBR compression.
  • Seeking is often expensive, we may adopt an io.ReadSeeker-like interface to make Seek a standalone operation to put some more emphasis on its cost.
  • It is not clear how big a buffer is required for a frame for VBR encoded media. A Frame type may be to introduced.
  • We need working prototype decoders for AIFF, FLAC, MP3 and OGG before finalizing the proposal.

@rakyll
Copy link
Contributor Author

rakyll commented Jul 15, 2016

@mattetti and I have been working on audio during GopherCon and will update the proposal soon with our most recent ideas.

rakyll referenced this issue in golang/proposal Jul 15, 2016
Seeking within the audio source is not always a trivial problem, e.g.
the cases where the underlying source is encoded with a VBR
compression algorithm, optimistic seeking and slurping the entire
data source may be required.

Separation of the Seek method allows us to explain the requirements
from the implementors more easily. Since different encoders will have
different stragies to Seek, it is also easy to document the cost and
the underlying algorithm to the users.

Also, the seperation helps us document the requirements of
implementing an efficient Clip.Frames. We are explecting decoders
to have a cursor on the data source and move forward as new Frames
calls arrive. We only expect them to modify the cursor if seeking
is required. Without a separate Seek method, implementors need to
check if the offset is matching with their current internal cursor
and seek if it is not. Separation saves them figuring out the
conditional case for a requirement of seeking.

Change-Id: I0b15d56df9457a462953aaaf915445e268462f97
Reviewed-on: https://go-review.googlesource.com/24702
Reviewed-by: David Crawshaw <crawshaw@golang.org>
@mewmew
Copy link
Contributor

mewmew commented Aug 11, 2016

@mattetti and I have been working on audio during GopherCon and will update the proposal soon with our most recent ideas.

@rakyll Lovely to hear. Any updates on this? Would be interested to see what you have been playing with.

@mattetti
Copy link
Contributor

@mewmew we decided to reduce the scope of the proposal to an interface allowing reading and writing of PCM data. We are currently working on various test implementations covering multiple use cases (audio analysis, processing and playback) with a few different formats and platforms.

The big challenge is defining an interface that is multi-purpose yet flexible enough to support various kind of codecs. As we started implemented our ideas, we started finding issues/challenges and went back to the drawing board. As soon as we have decent implementations validating our design, we will submit an updated proposal.

@mewmew
Copy link
Contributor

mewmew commented Aug 11, 2016

@mewmew we decided to reduce the scope of the proposal to an interface allowing reading and writing of PCM data. We are currently working on various test implementations covering multiple use cases (audio analysis, processing and playback) with a few different formats and platforms.

The big challenge is defining an interface that is multi-purpose yet flexible enough to support various kind of codecs. As we started implemented our ideas, we started finding issues/challenges and went back to the drawing board. As soon as we have decent implementations validating our design, we will submit an updated proposal.

Sounds great! Glad to hear that you are fleshing out these details. I know quite a few Gophers in the community that would be happy to stress test the API of these interfaces. Are the preliminary interfaces available in a public repository by any chance?

Edit: To provide some additional background. The audio project of the Azul3D engine would be happy to see where and how these interfaces fit into the "generic" audio package, which is analogous to the image package of the standard Go library. My brother and I would also be interested in looking into how difficult it would be to add support for FLAC decoding, targetting these interfaces. Then there are quite a few music players, audio visualizers and other such software in the community which would like to use the audio API. So you have a few stress testers lined up :)

@rsc
Copy link
Contributor

rsc commented Dec 19, 2016

On hold for #17244.

@hraban
Copy link

hraban commented Dec 29, 2016

Hi, bit late to the party but I have a small extra data point: the Opus API. (I maintain a wrapper around xiph.org's standard implementations, libopus and libopusfile; https://github.com/hraban/opus. The API mirrors the C API, in Go flavor.)

One note I didn't see explicitly covered (sorry if I missed it) is the difference between a wrapper format like Ogg, and an actual encoding like Opus or Vorbis. If I get a .opus file, that's not raw opus data, that's an ogg stream with opus data inside it. However, if I pass a bunch of PCM frames to an Opus encoder, I don't get an opus stream: I get opus bytes. This is particularly relevant if you don't control the source; if something is feeding you raw (unwrapped) bytes of an encoded audio stream you can't pass it to a decoder expecting it to be wrapped in Ogg or whatever.

On floats: they're great when you're mixing the audio because they don't clip. You can mix up, down, sideways, and scale back without special checks nor losing data. They have their place, it's nice to have an extra FramesFloat32() option which decodes to float32. The Opus decoder supports this natively.

And a final question: is the byte slice returned by Frames() raw pcm? then it should be int16[] (the float debate notwithstanding).

@mattetti
Copy link
Contributor

Update: with the help of @rakyll and @campoy I started working on a more generic audio bugger interface that could be used by codecs, analyzers, effects etc... The code isn't done but the idea would be to create a proposal around the minimal useful interface: https://github.com/go-audio/audio
I implemented wav and riff codecs under the same organization: https://github.com/go-audio

@hraban I'd love to hear your perspective on the audio.Buffer interface and if it would work for Opus. Ideally all audio libraries should be able to work nicely together by exchanging audio buffers.

@bradfitz
Copy link
Contributor

The API mirrors the C API, in Go flavor

It's pretty Go-like, but still fairly C-like in places. Ditch the underscore variables, ditch the ALL_CAPS errors and APPLICATION_VOIP etc (https://github.com/golang/go/wiki/CodeReviewComments#mixed-caps), don't have exported variables of unexported type (opusError, opusFileError), use shorter variables names when it's obvious (func NewEncoder(...., application Application) could be app Application, etc).

@hraban
Copy link

hraban commented Dec 29, 2016

@bradfitz thanks for the great feedback. I've made the changes to the API. I'll look at the private variables another time.

@mattetti I've been thinking about your api and I realised a mistake in my assumptions: PCM doesn't always have to be 16 bit. However, on the flip side, it almost always is. I'm generally very eager to have these codecs be as efficient as possible, because for my specific use case I was trying to handle as many as I could (mixing multiple audio streams, it was cpu bound). This is why I'm very keen on "hot potato" buffer handling, meaning that I wanted to allocate two circular buffers when a client connects (a float32[] for pcm and a byte[] for encoded data), and keep reusing that and passing it to the codec. xiph.org's libopus(file) API allows for this very nicely, and I was never doing any copying or manual converting. If you're going to be making a fundamental audio API for all Go programs to use, I'd definitely urge you to allow implementations to support float pcm and int (of any width) pcm without copying.

That said, I'll have to take a closer look another time, and actually try and implement it to see where it gets me. I only know Vorbis and Opus by xiph.org, and it's very clear they learned from their (numerous) mistakes in the former when designing the API for the latter.

@mattetti
Copy link
Contributor

@hraban you are absolutely right that performance is important in a lot of cases, that's why I implemented 2 buffers: an int and a float buffer. You can easily implement a byte buffer and exchange the data between the 2. I used float64 instead of float32 mainly because there is little overhead and because for signal processing, might as well use float64. That said, we could add a float32 buffer if that made sense. The buffers are designed to be reused so we don't have to relocate them. I believe I provided a few examples with the PCM codecs (in most cases, we don't want to load an entire file in memory and prefer to read chunks at a time and dropping the PCM data in a reused buffer which can then be processed before being reused).

@hraban
Copy link

hraban commented Dec 30, 2016

What I mean is you want to feed your buffers to the codec, and tell it to decode directly into them. A float64 buffer won't fly with libopus nor libvorbis†, because they use float32. That means you will be doing at least one conversion loop there (for i, f32 := range encodedBuf { goBuf[i] = float64(f32) }) for every audio stream you're handling. Two if you're encoding the data again to send back. Also, float64 will double everything in size which can be bad for your cache, and disregarding cache it will just make you wait on memory that much longer. Especially for 16-bit pcm (which is pretty common), with float32 you've got quite a lot of legroom already (the mantissa + sign bit alone are 24 bits which give you a factor 256).

Of course, libopus and libvorbis are just two libraries written in C, they're not gospel; once you get a pure golang implementation you can do what you want... I'm just not sure that even for a pure golang implementation it would be wise to go float64. I don't think you will ever need that precision, and it comes quite a steep cost: halving performance of everything capped by memory: cache, RAM bandwidth.

I'm coming at this from the perspective of a server-side app dealing with a lot of simultaneous audio streams. Think TeamSpeak. For this I can tell you: you want that data in and out asap. and every byte you can squeeze out of a session's footprint is room for another session to cram in. If you're having to convert between your codec's float32 array and an internal float64 array, that's immediately a cost you'd rather forego. Same for int vs int16.

For mobile, I have no idea. Never touched mobile dev in my life.

† can't mention libvorbis here without clarifying they don't actually support this *pcm buffer passing---they allocate a buffer internally which you have to copy out of before your next frame decode. Still, that's a memcpy, rather than a for loop with type conversion. And if you really wanted, you could have your libvorbis wrapper do the same, using unsafe. If you really really wanted. Hopefully not, but the point is it's nice to leave people the choice. When it comes to audio, every cycle counts.

@mattetti
Copy link
Contributor

@hraban those are fair points, many DAWs and audio plugins do process the audio signal in float64 and Go std library is quite partial to this format. That said I agree that float32 is very common and I added it to the WIP proposal. That change required that I update the interface to:

type Buffer interface {
	// PCMFormat is the format of buffer (describing the buffer content/format).
	PCMFormat() *Format
	// NumFrames returns the number of frames contained in the buffer.
	NumFrames() int
	// AsFloatBuffer returns a float 64 buffer from this buffer.
	AsFloatBuffer() *FloatBuffer
	// AsFloat32Buffer returns a float 32 buffer from this buffer.
	AsFloat32Buffer() *Float32Buffer
	// AsIntBuffer returns an int buffer from this buffer.
	AsIntBuffer() *IntBuffer
	// Clone creates a clean clone that can be modified without
	// changing the source buffer.
	Clone() Buffer
}

Because if an API says it would take a generic audio.Buffer interface, it might need to convert it to float32. Now an API, like yours might only accept audio.Float32Buffer to avoid any lookup or potential conversion code. Here is the definition of the audio.Float32Buffer:

// Float32Buffer is an audio buffer with its PCM data formatted as float32.
type Float32Buffer struct {
	// Format is the representation of the underlying data format
	Format *Format
	// Data is the buffer PCM data as floats
	Data []float32
}

As you can see, the buffer is just a wrapper around a slice. The buffer has a few convenient methods to convert its data but it is designed so you can reuse it as often as you want. For instance you could allocate the float buffer of a certain size and keep decoding into it, send it over to a transform method and get the buffer back to then re-encode it.

My goal here is to define a common API that all audio libraries could use. @rakyll and I really quickly realized this was really hard because of the various use cases and concerns involved. But wouldn't be amazing if we could use the same interface and chain all those audio libs together?

I do believe this change should address your concern, but please let me know if that's not the case. Suggestions welcome too!

mattetti referenced this issue in go-audio/audio Dec 30, 2016
@hraban
Copy link

hraban commented Dec 30, 2016 via email

@rakyll
Copy link
Contributor Author

rakyll commented Jan 4, 2017

Closing this proposal on the behalf of #18497, let's keep the conversation on the new proposal.

@rakyll rakyll closed this as completed Jan 4, 2017
@golang golang locked and limited conversation to collaborators Jan 4, 2018
@rsc rsc unassigned rakyll Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants