Robopanda Hacks and mods

281 posts / 0 new
Last post
sevik
sevik's picture
sevik
sevik's picture

Lets feed it with garbage :))

http://sevik.org/robopanda/test_codec_state.wav
http://sevik.org/robopanda/test_codec_state.aud

Maked test chunk with a lot of magic string (silence) frames and sequences of 1, 2 ,3, 4, 5 and many instances of one frame from 837 chunk.

for different number of frames seems first, last and all internal frames are the same.

sample with only one data frame consists of first and last frame, without internal frames.

Update: captured again with 48kHz sample rate for better match with 8kHz discretization

milw
milw's picture

sevik said:
Lets feed it with garbage :))

great minds think alike :P that would be my very next request!

thanks Sevik! 

milw
milw's picture

What do you make of the frequency peaks at multiples of ~555 Hz? It shows up really clearly in the bursts in test_codec_state.wav but is also present in the phrase spectrum above. Would that be due to DCT or vector quantization steps?
Also noticed each tone burst is about 14 milliseconds longer than the multiple of 32 bytes per 16 msec; eg one repeat is 30 msec, 2 is 46 msec, 4 is 79 msec. So the decoder must be looking ahead by about 14-16 bytes in order to smooth transitions?

Original data is very likely 8kHz/16 bit - it's format required by both utilities from voice tools.

which two utilities do you mean specifically? I ended up with the following executables:
SACM2000.exe (v1.1)
sacm2000.exe (v1.2.5)
scfmV32.exe
sacm.exe
cel.exe
CmpTool.exe
ADPEN.exe

oh also, if you see another ebay fpga board that's compatible with your code, I'm interested to acquire one!

sevik
sevik's picture

I'm not analyzed spectrum data throughtly... So this 555Hz peaks can be anything :))

Try to use some other window function - it can be some ghost and not a real signal in original sample...

About bursts - see my previous post :)) Seems that each output frame is linear combination of previous and current decoded frame:

output(t) = decode(t-16ms)*(16ms - t%16ms)/16ms + decode(t)*(t%16ms)/16ms

lets assume 8 samples for one decoded frame (really 128)

silence data: [0, 0, 0, 0, 0, 0, 0, 0]
signal data: [5, 0,-5, 0, 5, 0,-5, 0]

curent frame multipliers: [0, 1/8, 2/8, 3/8, 4/8, 5/8, 6/8, 7/8]
previous frame multiplies:[8/8, 7/8, 6/8, 5/8, 4/8, 3/8, 2/8, 1/8]

so for sequence of [silence, data, silence] we got 2 unsilenced frames:

[0,0,0,0,0,0,0,0] (silence decreasing + silence increasing)
[0*8 + 5*0, 0*7 + 0*1, 0*6 + -5*2, ... ] (silence decreasing + data increasing)
[5*8 + 0*0, 0*7 + 0*1, ...] (data decreasing + silence increasing)
[0,0,0,0,0,0,0,0] (silence decreasing + silence increasing)

for sequence of [silence, data, data, silence] we got 3 unsilenced frames:
silence + silence
silence + data
data + data
data + silence

and in result we have 4 types of frames:
silence
increasing data (first)
original data (internal)
decreasing data (last)

and each burst consists of one first frame, n-1 internal frames and one last frame.

for changing data frames we will newer see original data in output - only it's combinations with another frames.

according to this - applying frequency analyzis to full sample is not practical :)) real freq domain data has sense only on internal (original) frames.

And on first and last frames we will get ghosts...

sevik
sevik's picture

About utilities - I'm not used it for long time and I have only 2:

sacm2000.exe 266240 bytes
s485372c.exe 45056 bytes

Possibly i have deleted FM related utilities, and not opened all archives...

sevik
sevik's picture

About fpga boards.

There are two ways :))

You can get any prototyping board with big enought FPGA and memory, and you will get docs, tools and schematics in package.

But for really big fpga it cost a lot :))

Prototyping board for 500k spartan costs ~300$, 1.6M Virtex2 - ~1500$.

I'm going another way :))

I seek used boards with big FPGA's which is usually sold for gold or for unsoldering of chips.

And try to get it working :)) So it's not a guaranted result deal :)) I have bayed 2 exactly for this reason - it's possible that first will die in first 5 minutes :))

But I hope to have 8 2millions Virtexes for 200$ including shipping :) (It's not arrived still :)) )

http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=370049432698

And for VHDL code - there are independent code, which will work anywhere (cpu, spi sniffer, sd reader) and very specific code - like DRAM controller and toplevel with pin assigments.

If you get board with soldered ram - you'll need to interface with exact this ram type.

In my code interface of memory controller is simple enought, so reimplementing it for another memory type is not a hard task.

sevik
sevik's picture

Heh, installed setup.exe from voice tools - all yours utilities found :))

Now trying to encode using it :))

sevik
sevik's picture

tried all this utilities, only sacm2000 output has similar to robopanda files framed structure...

sevik
sevik's picture

Heh :))

Used Audacity some more :))

seems there are something interesting in spectra mode... :)) (set fft size to 1024 in settings )

mirroring around 4kHz - product of 8kHz sample rate





sevik
sevik's picture

Seems this resource can be helpfull for our current problem :))

http://www.hydrogenaudio.org/forums/index.php?act=idx

milw
milw's picture

Are you using the newest version of audacity? I just got the beta (v1.2?) the other day. I tried several window sizes for the spectra and still see the harmonics of ~555 Hz; that's probably the horizontal bands in your first FFT image?
This link from that forum may also be helpful (downloadable book) http://www.dspguide.com/ (or maybe not so great, now that i've looked at it closer)
Will you be trying to get someone from that forum to take a look over here?

sevik
sevik's picture

heh :)) one more test done:

test_bits.wav

test_bits.aud

and spectra image:

In this chunk I change 1 bit from magic sequence for each 10 frames.

Seems that first 16 nibbles encode volume of each of 16 subbands...

milw
milw's picture

and 16 x ~500 Hz = 8kHz, it all works out so neatly!
Just so I'm clear, you use the term 'frame' to refer to the 32 bytes taken at each read, correct?

sevik
sevik's picture
sevik
sevik's picture
sevik
sevik's picture

Frame - yes, I mean 32 bytes (for 7 codec)

sevik
sevik's picture

for 16x500 = 8000, it's not so

for 8kHz discretization max frequency is 4kHz.

I have less than 1 hour left for today experimenting :)) So I try to quicky acquire max number of samples - so you can analyze it throughly :)))

sevik
sevik's picture

there are need to find band freqs, volume levels and anything else you can find :))

sevik
sevik's picture

And the last one for today :)) Have a good night :))

test_bits_for_band.wav (big file: 43M)

test_bits_for_band.aud

This chunk is similar to test_bits, but for remaning bits for each of bands.

sevik
sevik's picture

Code used for generation of each test:


#!/usr/local/bin/python

import sys
import random

def pack(data):
    return [int(a,16) for a in data.split()]

def pack2(data):
    return "".join([chr(a) for a in data])

hdr = pack2(pack("07 80"))
foot = pack2(pack("FF FF 00 00"))

silence = pack("00 00 00 00 00 00 00 00 18 11 11 11 18 11 11 11 18 11 11 11 18 11 55 18 11 55 58 88 88 88 88 88")
frame1 = pack("71 7B 5B 89 B9 EC DC DB 70 91 CF 68 B8 D8 82 F2 B4 92 32 39 52 0A C1 2E 03 A7 DB 06 42 73 D2 D8")

data = silence*10 + frame1 + silence*10

#test_bits
if 0:
    for i in range(len(silence)):
        for j in range(8):
            test_frame = silence[:]
            mask = 1
sevik
sevik's picture

looking at test_bits_for_band - seems there are some aliasing of the same bits for different bands

But looking at training.png there are the same situation - central frequences of several bands moving together.

milw
milw's picture

Here's the main band frequencies (Hz) from the test_volumes.wav file, each band has two major peaks separated by ~160 Hz.

band peak1 peak2 delta
0 120    
1 252    
2 497 692 195
3 813 987 174
4 998 1185 187
5 1244 1434 190
6 1558 1699 141
7 1808 1933 125
8 2050 2185 135
9 2272 2431 159
10 2558 2682 124
11 2745 2891 146
12 3044 3181 137
13 3245 3379 134
14 3496 3651 155
15 3743 3961 218
       
    avg delta: 159

and a plot

milw
milw's picture

sevik do you think you could capture test_volumes and test_bits_for_band again, with a higher gain (esp for test_bits, the amplitude is quite low). I can only see the 10 or so highest volume levels, the rest are too low (maybe its a log volume scale). Lots of data to chew on here!

milw
milw's picture

 

 by 'aliasing of same bits for different bands' do you mean how the same bit is not affecting different bands the same way, or (maybe the same thing) different bits cause similar changes in each band? It looks like some of these bits are affecting the volume of sub-band frequencies. Maybe we should refer to the first 16 nibbles as the main bands, and these as 'sub-bands'?

sevik
sevik's picture

Two peaks for each band caused by magic string really. Seeems it has some default values for band coefficients which gets masked by 0 volume.

I have recaprured test_volumes and test_bits_for_band using string of 0 fs silence - and got clear sinus waves with 250*n frequences. Yes - 0 band is 0 freq :)).

By aliasing - I mean that the same bits affects different bands. See 4 and 5 band for example - they share same bits for central freq offset. (update it was an error, there are no aliasing really)

About exact values of low volume levels - I dont think it too important, we must be able to guess expression based on top values. It's very likely logarithmic.

sevik
sevik's picture

for meaning of bits, seems there are several groups for each band:

 at start of frame starting after main volumes:
  1 bit inverting of main tone
  3 bits freq offset of main tone
  7 groups of additional tones with fixed offsets and phases
    4 bits volume of additional tone

 at end of frame counting backwards:
  4 bits for something

for first 4 bands there are exact groups of bits for each band, for next bands some bit groups shared for some bands.

sevik
sevik's picture
sevik
sevik's picture

Heh, this explains almost all!
Don't listen for in public places :))

test_bits_for_all_bands.wav

test_bits_for_all_bands.aud

generator code:

silence = pack("00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00")

#test_bits_for_all_bands
if 1:
    test_frame_base = silence[:]
    for i in range(16): # band
        # set volume for band
        bnum, shift = divmod(i,2)
        shift *= 4
        test_frame_base[bnum] |= 14 
sevik
sevik's picture

magic frame decomposition:

silence = pack("""
00 00 00 00 00 00 00 00 #main band volumes
18 11 11 11 #0 band
18 11 11 11 #1
18 11 11 11 #2
18 11 55 #3
18 11 55 #4
58 #5
8 #6
8 #7
8 #8
8 #9
8 #10
8 #11
8 #12
8 #13
8 #14
8 #15
""")

Pages