Skip to content

Database

Sirene uses PocketBase as its database, file storage, and real-time engine. PocketBase runs on port 8090 and provides an admin UI at http://localhost:8090/_/.

Collections

Collections are created automatically via migrations in db/pb_migrations/.

voices

FieldTypeDescription
nametextVoice name
descriptiontextDescription
languagetextLanguage code (en, fr, es...)
avatarfileVoice avatar image
builtInboolBuilt-in model voice vs custom
samplesrelation[]Link to voice_samples (multi-select, cascade delete)

voice_samples

FieldTypeDescription
audiofileAudio file (WAV/MP3)
transcripttextSample transcription (manual or auto via Whisper)
durationnumberDuration in seconds

generations

FieldTypeDescription
voicerelationLink to voices
modeltextModel ID (e.g., kokoro-v1)
texttextSource text
languagetextGeneration language
audiofileGenerated audio file
durationnumberDuration in seconds
speednumberGeneration speed

Voice Creation Workflow

1. User clicks "Create Voice"
   → Dialog: name, description, language, avatar

2. Upload 1-N audio samples (WAV/MP3, 5-30s each)
   → Waveform preview (wavesurfer.js), play/pause, duration
   → Transcription: text field + auto-transcribe button (Whisper)
   → Upload to voice_samples collection

3. Hono server:
   a. Creates the voice in PocketBase (voices collection)
   b. Stores samples in PocketBase (voice_samples, file field)
   c. Forwards samples to the Python service for preprocessing

4. Python service:
   a. Decodes and normalizes samples (resample, mono, RMS)
   b. Prepares optimized reference audio for each compatible backend
   c. Stores processed files in the volume

5. The voice is available for generation
   → Client sees the voice appear in real-time via PocketBase SSE

Real-time Updates

PocketBase provides real-time subscriptions via Server-Sent Events. The client subscribes to collection changes to get instant updates when:

  • A new generation is created
  • A voice is added or modified
  • Download progress updates (via Hono SSE, not PocketBase)