Processing audio on the Atari VCS

Notes from an unrealized project

I've been intrigued with the hardware oddities of the Atari VCS (aka Atari 2600) ever since taking a course and reading Nick Montfort and Ian Bogost's book on the subject. It has an extremely low-level programming interface, where the distance between code and physical voltages is rarely far—whether on wires going to the television display, which you have to control a scan-line at a time; on time-integrated voltages coming from the strange potentiometer-based paddle controllers; or on the sound chip, which skips the niceties of a DAC and feeds digital square waves of several flavors directly to the speakers.

An idea I've been kicking around for a while, but which I haven't had a chance to actually test the feasibility of, is using the VCS as an audio processor/effects/visualization device. There are several projects using the VCS as a synthesizer, but as far as I can find, none taking audio input and processing it. I haven't built such a thing myself either, but I think it should be possible. Here are some notes on possible approaches I've collected so far, at varying levels of plausibility.

Audio input

There are two possible ways of getting input to the Atari, and both ought to be capable of reading data at audio rates, despite not really being intended for it, thanks to their pretty direct coupling to the line voltages.

The joystick ports are probably the simplest. There are two 4-bit ports, memory-mapped by the Peripheral Interface Adapter (PIA), an auxiliary processor that runs at the same clock as the VCS's main processor, 1.19 MHz. Since normal joystick input is obviously much slower than that, for normal game use there's optional hardware support for latching, where a direction, once registered, will stay registered until it's read. But with latching turned off, the voltage level appears to be directly wired to the I/O address at the PIA clock rate, so it should be possible to treat this as a voltage sampler at a data rate as fast as we can read it. A CD-audio data rate of approximately 44.1 kHz means sampling the port every 27 CPU cycles, which is well within the range of feasibility for us to copy the samples (though perhaps not to really do anything with them).

We have a choice of wiring up to one joystick port for 4-bit sampling, or both of them for 8-bit. Due to the limited processing capability (we only have 8-bit arithmetic), and perhaps to allow us to keep one joystick plugged in for control of something-or-other, 4-bit input is probably more feasible.

The other option is the paddle inputs. These are quite strange. They're implemented as potentiometers (essentially variable resistors) which charge a capacitor inside the Atari. The capacitor sets an I/O address to 1 when it passes a "charged" threshold. The programmer can write to another address to dump the capacitor's charge. By measuring the time between the capacitor being dumped and reaching charge again, the programmer can figure out the average charge rate, and therefore, via a logarithmic relationship, the paddle position.

For audio input, this could be used to read pulse-density-modulated audio input. Hooking up the circuit and getting everything calibrated is likely considerably more complex than with the joystick port, though. I haven't worked out what kind of input fidelity could be achieved with this method; potentially it may be higher.

To make a long story short, let's sample one of the 4-bit joystick ports at some sampling rate (depending on how much other processing we want to do) for our audio input.

Simple digital filters

Assuming we can read some audio samples every few dozen clocks, can we do anything with them? The easiest thing to do is to implement some simple digital filters; we can't store a lot of samples in our 128 bytes of RAM, so any filter has to operate "online" on a handful of recent samples. FIR filters are probably the easiest; they store the past N samples, and modify the output via an equation of the form, y[n] = a0*x[n] + a1*x[n-1] + ... for N terms. IIR filters, which can feedback the outputs y[n] into the filter, are only slightly harder to implement here, though numerical errors might start accumulating with 8-bit arithmetic. Note that all of this needs to be implemented in a way such that it takes a predictable number of cycles, so that we sample at a constant rate. Possibilities are something really small in 27 cycles for almost-CD-rate audio, all the way down to perhaps 149 cycles per sample for a telephony-style 8 kHz sampling rate. Realistically it's probably not actually possible to fit any real math into 27 cycles, so sampling rates will be somewhere south of that.

The audio can then be output again in two ways. One is the way we got it in, via the other joystick port, as a 4-bit digital signal (joystick ports can be set to output as well as input, and can be set independently). Outputting one sample every time we read a sample should keep the audio rate constant (and keep our RAM needs for buffering samples fixed). The other way is to output via the VCS's audio facilities. Despite being usually characterized by the various 8-bit square-wave tones that make up its signature sound, it's also possible to output digital audio via this route. One of the audio voices is just a constant DC tone, which we can use to output arbitrary waveforms by varying the audio volume quickly enough to reach audio sampling rates; this technique was used in some later VCS games.

Something in this space is almost certainly feasible, at least at lower sampling rates, though overall it doesn't feel VCS-y as an idea; really it's just using the Atari as an old CPU with some I/O ports, vying for the world's least convenient DSP engine, without using any of its signature weird audio/visual hardware.

Analysis/resynthesis

One idea is to try to analyze something about the input signal, and use that to cross-synthesize with one of the VCS's characteristic "instruments" (rather than using the digital-sound-output trick). A classic idea is vocoding: estimate frequencies from an input signal and resynthesize it using a different source signal. For example, a metallic robot type sound can be had by estimating energy content in a number of frequency bands, and then resynthesizing them with a single sinusoid per frequency band. Clearly we can't do anything quite that fancy, because the VCS has only two output voices, and we have nowhere near enough RAM or processing power to do a Fourier transform to get a frequency spectrum.

However, it might just be on the edge of feasibility, at low sample rates and with wild numerical inaccuracy, to extract a few frequency components. The Geortzel algorithm is a second-order IIR filter that in effect combines a bandpass and boxcar filter to give the magnitude in one specific frequency band over a time window (essentially the value of one DFT frequency bin). As a low-order IIR filter it can be computed online, unlike the FFT, which needs to accumulate a whole frame of samples first; and it's more efficient if only a handful of frequencies are needed, rather than the whole spectrum, requiring just one multiply, one add, and one subtract per sample. For this application, a possibility is to start with two specific frequencies, and resynthesize their magnitude with two fixed-frequency VCS voices. If leftover processing power were somehow squeezed out, we could check several more frequencies and pick the two strongest to resynthesize. This would make a really rudimentary vocoder.

A major implementation problem would be whether we can get usable data out of a Goertzel filter using 8-bit arithmetic. It operates in the real domain, and can't really be constrained to integer values while producing useful results; so we probably need to use either a 6.2 or a 5.3 fixed-point arithmetic scheme, and hope we don't overflow. Might have to reduce audio input depth to 3 bits to make that work. Alternatively it's theoretically possible to use 8.8 fixed-point with one 8-bit integer each for the fractional and integral part, but doing the bigint-type multiplies on that would kill performance. So the jury is even more out on this one than the rest of this speculation.

Music visualization

Finally, another possibility is to forget about outputting a sound, and use the cycles to draw some visual effects that are somehow music-related instead. Before the VCS, Atari in 1976 briefly manufactured a trippy device called Atari Video Music, a box of analog electronics that produced music-synced video effects. Is anything at all possible on the VCS's digital hardware?

On the minus side, doing video output adds a bunch of cycle-related complications, because now there are horizontal and vertical syncs and miscellaneous screen management, rather than a free-running 6507 clock that just looks at the I/O ports, as it did in the digital-filter case. But on the plus side, we can purposely drop blocks of audio, since many visual effects can be approximated by sampling some audio a few times a second, rather than continuously. We don't have a lot of RAM for a buffer, though, so this is still not easy, but even a small buffer would let us lag a bit behind the input up to a fixed window size.

What to actually do here? Some kind of Goertzel-esque frequency analysis at a handful of frequencies again, perhaps. But it may be more feasible, in terms of CPU budget, to try some really simple time-domain things, like a moving average of loudness, or the zero-crossing rate.

* * *

The first TODO is still to figure out if this works at all—whether my hypothesis that you can read audio-rate data through the joystick ports is correct. If yes, then it'll be much more interesting to figure out what's possible to do in a few dozen cycles per sample with that audio.

If anyone else has already investigated anything along these lines, or has ideas or information, I'd love to hear about it!