Upload a photo of a building. We count its windows and build a sequencer that fits the facade — each window is a beat, each row is a sound.
Upload a building photo and we map this grid onto its real windows.