Song captured: http://sevik.org/robopanda/00710.wav
Lets feed it with garbage :))
Maked test chunk with a lot of magic string (silence) frames and sequences of 1, 2 ,3, 4, 5 and many instances of one frame from 837 chunk.
for different number of frames seems first, last and all internal frames are the same.
sample with only one data frame consists of first and last frame, without internal frames.
Update: captured again with 48kHz sample rate for better match with 8kHz discretization
Lets feed it with garbage :))
great minds think alike :P that would be my very next request!
What do you make of the frequency peaks at multiples of ~555 Hz? It shows up really clearly in the bursts in test_codec_state.wav but is also present in the phrase spectrum above. Would that be due to DCT or vector quantization steps?
Also noticed each tone burst is about 14 milliseconds longer than the multiple of 32 bytes per 16 msec; eg one repeat is 30 msec, 2 is 46 msec, 4 is 79 msec. So the decoder must be looking ahead by about 14-16 bytes in order to smooth transitions?
Original data is very likely 8kHz/16 bit - it's format required by both utilities from voice tools.
which two utilities do you mean specifically? I ended up with the following executables:
oh also, if you see another ebay fpga board that's compatible with your code, I'm interested to acquire one!
I'm not analyzed spectrum data throughtly... So this 555Hz peaks can be anything :))
Try to use some other window function - it can be some ghost and not a real signal in original sample...
About bursts - see my previous post :)) Seems that each output frame is linear combination of previous and current decoded frame:
output(t) = decode(t-16ms)*(16ms - t%16ms)/16ms + decode(t)*(t%16ms)/16ms
lets assume 8 samples for one decoded frame (really 128)
silence data: [0, 0, 0, 0, 0, 0, 0, 0]
signal data: [5, 0,-5, 0, 5, 0,-5, 0]
curent frame multipliers: [0, 1/8, 2/8, 3/8, 4/8, 5/8, 6/8, 7/8]
previous frame multiplies:[8/8, 7/8, 6/8, 5/8, 4/8, 3/8, 2/8, 1/8]
so for sequence of [silence, data, silence] we got 2 unsilenced frames:
[0,0,0,0,0,0,0,0] (silence decreasing + silence increasing)
[0*8 + 5*0, 0*7 + 0*1, 0*6 + -5*2, ... ] (silence decreasing + data increasing)
[5*8 + 0*0, 0*7 + 0*1, ...] (data decreasing + silence increasing)
[0,0,0,0,0,0,0,0] (silence decreasing + silence increasing)
for sequence of [silence, data, data, silence] we got 3 unsilenced frames:
silence + silence
silence + data
data + data
data + silence
and in result we have 4 types of frames:
increasing data (first)
original data (internal)
decreasing data (last)
and each burst consists of one first frame, n-1 internal frames and one last frame.
for changing data frames we will newer see original data in output - only it's combinations with another frames.
according to this - applying frequency analyzis to full sample is not practical :)) real freq domain data has sense only on internal (original) frames.
And on first and last frames we will get ghosts...
About utilities - I'm not used it for long time and I have only 2:
sacm2000.exe 266240 bytes
s485372c.exe 45056 bytes
Possibly i have deleted FM related utilities, and not opened all archives...
About fpga boards.
There are two ways :))
You can get any prototyping board with big enought FPGA and memory, and you will get docs, tools and schematics in package.
But for really big fpga it cost a lot :))
Prototyping board for 500k spartan costs ~300$, 1.6M Virtex2 - ~1500$.
I'm going another way :))
I seek used boards with big FPGA's which is usually sold for gold or for unsoldering of chips.
And try to get it working :)) So it's not a guaranted result deal :)) I have bayed 2 exactly for this reason - it's possible that first will die in first 5 minutes :))
But I hope to have 8 2millions Virtexes for 200$ including shipping :) (It's not arrived still :)) )
And for VHDL code - there are independent code, which will work anywhere (cpu, spi sniffer, sd reader) and very specific code - like DRAM controller and toplevel with pin assigments.
If you get board with soldered ram - you'll need to interface with exact this ram type.
In my code interface of memory controller is simple enought, so reimplementing it for another memory type is not a hard task.
Heh, installed setup.exe from voice tools - all yours utilities found :))
Now trying to encode using it :))
tried all this utilities, only sacm2000 output has similar to robopanda files framed structure...
Used Audacity some more :))
seems there are something interesting in spectra mode... :)) (set fft size to 1024 in settings )
mirroring around 4kHz - product of 8kHz sample rate
Seems this resource can be helpfull for our current problem :))
Are you using the newest version of audacity? I just got the beta (v1.2?) the other day. I tried several window sizes for the spectra and still see the harmonics of ~555 Hz; that's probably the horizontal bands in your first FFT image?
This link from that forum may also be helpful (downloadable book) http://www.dspguide.com/ (or maybe not so great, now that i've looked at it closer)
Will you be trying to get someone from that forum to take a look over here?
heh :)) one more test done:
and spectra image:
In this chunk I change 1 bit from magic sequence for each 10 frames.
Seems that first 16 nibbles encode volume of each of 16 subbands...
and 16 x ~500 Hz = 8kHz, it all works out so neatly!
Just so I'm clear, you use the term 'frame' to refer to the 32 bytes taken at each read, correct?
Yes, it's me again :))
Frame - yes, I mean 32 bytes (for 7 codec)
for 16x500 = 8000, it's not so
for 8kHz discretization max frequency is 4kHz.
I have less than 1 hour left for today experimenting :)) So I try to quicky acquire max number of samples - so you can analyze it throughly :)))
there are need to find band freqs, volume levels and anything else you can find :))
And the last one for today :)) Have a good night :))
test_bits_for_band.wav (big file: 43M)
This chunk is similar to test_bits, but for remaning bits for each of bands.
Code used for generation of each test:
return [int(a,16) for a in data.split()]
return "".join([chr(a) for a in data])
hdr = pack2(pack("07 80"))
foot = pack2(pack("FF FF 00 00"))
silence = pack("00 00 00 00 00 00 00 00 18 11 11 11 18 11 11 11 18 11 11 11 18 11 55 18 11 55 58 88 88 88 88 88")
frame1 = pack("71 7B 5B 89 B9 EC DC DB 70 91 CF 68 B8 D8 82 F2 B4 92 32 39 52 0A C1 2E 03 A7 DB 06 42 73 D2 D8")
data = silence*10 + frame1 + silence*10
for i in range(len(silence)):
for j in range(8):
test_frame = silence[:]
mask = 1
looking at test_bits_for_band - seems there are some aliasing of the same bits for different bands
But looking at training.png there are the same situation - central frequences of several bands moving together.
Here's the main band frequencies (Hz) from the test_volumes.wav file, each band has two major peaks separated by ~160 Hz.
and a plot
sevik do you think you could capture test_volumes and test_bits_for_band again, with a higher gain (esp for test_bits, the amplitude is quite low). I can only see the 10 or so highest volume levels, the rest are too low (maybe its a log volume scale). Lots of data to chew on here!
by 'aliasing of same bits for different bands' do you mean how the same bit is not affecting different bands the same way, or (maybe the same thing) different bits cause similar changes in each band? It looks like some of these bits are affecting the volume of sub-band frequencies. Maybe we should refer to the first 16 nibbles as the main bands, and these as 'sub-bands'?
Two peaks for each band caused by magic string really. Seeems it has some default values for band coefficients which gets masked by 0 volume.
I have recaprured test_volumes and test_bits_for_band using string of 0 fs silence - and got clear sinus waves with 250*n frequences. Yes - 0 band is 0 freq :)).
By aliasing - I mean that the same bits affects different bands. See 4 and 5 band for example - they share same bits for central freq offset. (update it was an error, there are no aliasing really)
About exact values of low volume levels - I dont think it too important, we must be able to guess expression based on top values. It's very likely logarithmic.
for meaning of bits, seems there are several groups for each band:
at start of frame starting after main volumes:
1 bit inverting of main tone
3 bits freq offset of main tone
7 groups of additional tones with fixed offsets and phases
4 bits volume of additional tone
at end of frame counting backwards:
4 bits for something
for first 4 bands there are exact groups of bits for each band, for next bands some bit groups shared for some bands.
Recaptured with all 0 silence frame
test_bits_for_band_0.wav (big file: 43M)
Heh, this explains almost all!
Don't listen for in public places :))
silence = pack("00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00")
test_frame_base = silence[:]
for i in range(16): # band
# set volume for band
bnum, shift = divmod(i,2)
shift *= 4
test_frame_base[bnum] |= 14
magic frame decomposition:
silence = pack("""
00 00 00 00 00 00 00 00 #main band volumes
18 11 11 11 #0 band
18 11 11 11 #1
18 11 11 11 #2
18 11 55 #3
18 11 55 #4