Robopanda Hacks and mods

281 posts / 0 new
Last post
milw
milw's picture

Nocturnal, do you have a high res image of the panda cpu showing all the pin names? The corner I can see in your article has a good match to the pins of the SunPlus GPCE (here's one datasheet link). My apologies if you've already figured that out, my eyes get tired of reading orange on black!
ps also can you see the xtal frequencey?

Nocturnal
Nocturnal's picture

Close, thats actually almost identical to the one I think it probably is, your just missing the ICE port and a little bit of memory.

Its actually a pity that the resolution of the photos in the FCC filings is so low, otherwise we would be able to read the markings on the chip. Unfortunately as they are, its a little to blurry.

The gallery limits image to 900px, so its no surprise you can't read the labeling. I have a photo of the top and bottom. If you want better quality and more detail I'll have to pull out the scanner.

I also have made available copies of the datasheets for the probable cpu, probably SPI ROM, bus extenders, possible SND01A chip, and the eeprom here. The unlabeled epoxy chip is probably an audio amplifier, but I haven't confirmed that.

Oh, I almost forgot, I can't quite read the labeling on the crystal (I'll need to pull out a magnifying glass), but it looks like a standard 32.768khz watch crystal.

BTW, have you ever heard of SACM_S200?

sevik
sevik's picture

I have found somewhere on chinese sites set of audio utilities for sunplus.

I have encoded sample files, but they does not looks similar (none of it has 32/48 byte frames).

I'll post sample files and utilites at evening when I will be at home.

Found it again here: http://211.64.32.2/dzxywlx/shiyanshi/index0/pic/faming/

Nocturnal
Nocturnal's picture

*Nods* I found that as well. The difference would be that SACM_S480 is a much higher quality codec.

sevik
sevik's picture

for 07 codec 32 bytes readed for every 12.5 ms - it's 20kbit/s...

32/0.0125*8=20480

09 codec - 48 bytes at 20 ms - 24kbit/s

48/0.016*8=24000

So it looks like a2000 codec, but byte patterns is very different...

milw
milw's picture

heh, after reading GPCE data sheets last night, I'd settled on the GPCE061AV11 as well.

milw
milw's picture

Nocturnal said:
...Oh, I almost forgot, I can't quite read the labeling on the crystal (I'll need to pull out a magnifying glass), but it looks like a standard 32.768khz watch crystal.
BTW, have you ever heard of SACM_S200?

 That'd make sense since the crystal pads are marked 'X32', which fits for a watchdog timer. Some of the datasheets make it look like the 'SACM' are voice synthesis algorithms rather than codecs. Sevik, which of the links on that page are you referring to for the utilities? Thx also for the links to images etc!

Nocturnal
Nocturnal's picture

Look near the bottom of the page, on the left hand side, in the box labeled " Voice TOOLS"

sevik
sevik's picture

I have tried different values as codecs.

Sound for 4,5,6,7 codecs looks similar and noise-like with increasing frequency.

Sound for 8 fnl 9 codec is very different and sounds as random beeps.

Looking at rates and frame sizes 4-7 codecs can be 2bit adpcm or something like wit 5,6,7 and 8 kHz sample rate.

And according to sound 8 and 9 codecs is something like mp3/lpc/celp/etc framed, bit-packed codec.

But similar looking 00 00 .. 88 sequences for very different codecs support idea of scrabling audiodata.

8000 254+34X reset          no sound
8001 254+34X reset          no sound
8002 254+34X reset          no sound
8003 254+34X reset          no sound

8004 254+128 20/16ms = 10k  sound
8005 254+132 24/16ms = 12k  sound
8006 254+136 28/16ms = 14k  sound
8007 254+140 32/16ms = 16k  sound

8008 254+148 40/16ms = 20k  another sound
8009 254+156 48/16ms = 24k  yet another sound

800A 254+34X reset          no sound
800B 254+34X reset          no sound

Update:
Some more experimenting with more regular data saws no principal difference between different codecs...

Seems difference was dependent on frame size of test audio and its relation to tested codecs frame size (was used encoded using sunplus tools 20k sample file)

All 0 datafile roduces silence in 7 and 8 codecs (others not tested), so scrambling idea not confirmed...

milw
milw's picture

A question re stringing together the audio chunks, are they read sequentially, or is there any thing that indicates the sequence of chunks? Also, did you guys look at the 'SPCE061' section on that page? That seems to be a closer match than the GPCE061 imho. Not that knowing the exact mcu would help much, as I have painfully learned!

sevik
sevik's picture

There are CPU bytecode for starting of chunk number X play

Code used for testing of player:

0020: 0004   push      #0004  // push 0004 to stack                                                                                       
0021: 291A   volume           // pop volume from stack                                                                                    
0022: 0002   push      #0002  // push 0002 to stack                                                                                      
0023: 2928   play             // pop audio index from stack and start play
       L0_1:                  // xrefs: 0026                                                                                               
0024: 0000   push      #0000  // push 0000 to stack                                                                                        
0025: 297E   drop             // pop and drop value from stack                                                                             
0026: E7FE   rjump     L0_1   // jump to 0024                                    

push #2 at 0022 - it's an index of audio chunk in table at start of cartridge

index is 1 based starting at 0005, 3 bytes for chunk address

sevik
sevik's picture

I tried to shift data 1 byte right with xoring by magic string before and after - this produces Darth Vader like sound :))

playing codec 7 audio data with codec 6 - produces r2d2 like sound :))

Seems there are star wars inside :))

http://sevik.org/robopanda/test_codecs.wav

Sounds:
original sound
shifted 1 byte sound
original sound
codec 06
original sound
codec 08
original sound
codec 09

Code:

0020: 0004      push   #0004    // push 0004 to stack                                                                                        
0021: 291A      volume          // pop volume from stack                                                                                     
0022: 0001      push   #0001    // push 0001 to stack                                                                                        
0023: 2928      play            // pop audio index from stack and start play                                                                 
0024: 2923      wait_play       // wait for and of play                                                                                      
0025: 0002      push      #0002 // push 0002 to stack                                                                                        
0026: 2928      play            // pop audio index from stack and start play                                                                 
0027: 2923      wait_play       // wait for and of play                                                                                      
0028: 0001      push      #0001 // push 0001 to stack                                                                                        
0029: 2928      play            // pop audio index from stack and start play                                                                 
002A: 2923      wait_play       // wait for and of play                                                                                      
002B: 0003      push      #0003 // push 0003 to stack                                                                                        
002C: 2928      play            // pop audio index from stack and start play                                                                 
002D: 2923      wait_play       // wait for and of play                                                                                      
002E: 0001      push      #0001 // push 0001 to stack                                                                                        
002F: 2928      play            // pop audio index from stack and start play                                                                 
0030: 2923      wait_play       // wait for and of play                                                                                      
0031: 0004      push      #0004 // push 0004 to stack                                                                                        
0032: 2928      play            // pop audio index from stack and start play                                                                 
0033: 2923      wait_play       // wait for and of play                                                                                      
0034: 0001      push      #0001 // push 0001 to stack                                                                                        
0035: 2928      play            // pop audio index from stack and start play                                                                 
0036: 2923      wait_play       // wait for and of play                                                                                      
0037: 0005      push      #0005 // push 0005 to stack                                                                                        
0038: 2928      play            // pop audio index from stack and start play                                                                 
0039: 2923      wait_play       // wait for and of play                                                                                      
          L0_1:                 // xrefs: 003C                                                                                               
003A: 0000      push      #0000 // push 0000 to stack                                                                                        
003B: 297E      drop            // pop and drop value from stack                                                                             
003C: E7FE      rjump     L0_1  // jump to 003A                                        

milw
milw's picture

hm interesting... so is 'touch my arm to see what i've learned' a single audio chunk, and is it normally the first chunk in the index? or have you just stored it there for convenience in your testing? I'm still not clear on whether multiple chunks are played in sequential order when panda says something, or is each chunk a complete saying/recording?

sevik
sevik's picture

it's first in original black cartridge

for testing I compiled test cartridge with 5 different variants of this chunk:
original index 1
shifted index 2
with 06 80 instead of 07 80 in header index 3
08 80 - index 4
09 80 - index 5

and code from previous message as cpu code.

http://sevik.org/robopanda/test_codecs.bin

sevik
sevik's picture

chunks which I checked - is complete phrases.

milw
milw's picture

ok, so a whole sentence is in one chunk then, at least most of the time. Would it be possible for you to capture the speaker output as a wav file, so we have a 'before' and 'after' to compare? (I know it won't be perfect because your capture would only be an approximation of what was fed to the dac).

sevik
sevik's picture

I have not dissected my panda yet :)) You can write this cartridge image to any spi rom and check for yourself :))

For before and after - I think it will be very different.

For real audiodata analyze there are need for many small test for each bit position in frame to identify bit fields boundaries and it's meaning.

I think there are one general principle for all codecs implemented in robopanda and some of codecs implemented by "Voice Tools". They must have comparable number of coefficients, but with different resolution and some coefficients omitted in lower bitrates.

If you check voice tools output for 20k codec - there are 96 bytes frame with similar structure - low "1" count at start in random positions, and many 10..0 sequences at end.

sevik
sevik's picture

I think magic string is tightly related to bifields boundaries :))

sevik
sevik's picture

Attached to DAC1 output...

Using scope - there are not seen remains of discretization rate or PWM of some sort.

Will check further using linear input of audiocard.

sevik
sevik's picture
sevik
sevik's picture

anybody still alive? :))

tools and emu_logs tarballs updated with new disassembler.

milw
milw's picture

sorry, just getting off work... where in the world are you, Sevik? I'm in Midwest USA. you seem to work about 0xA times faster than anyone else I know!

Nocturnal
Nocturnal's picture

Still alive, just waiting for some suitable spi flash memory chips I ordered to arrive. I should already have something suitable, but I can't seem to locate them.

sevik
sevik's picture

:))

I'm in Ukraine (UTC+3).

I'm waiting for new FPGA board from ebay too :)) Just in case ordered 2 of them :))

When it arrives I will slow down with Robopanda for some time :))

sevik
sevik's picture

For now most important task with robopanda is audio data decoding, but I dont have enought exp wih this...

Anybody with needed skills wanna help? :))

According to webserver logs there are about 10 persons downloaded *.wav and *.aud :)) So dont hide, we known - you are there :))

Nocturnal
Nocturnal's picture

I didn't think you were in the US, not with the times your were posting. I'm in Australia (UTC+10).

About 17 people have downloaded the binary module images in the last 30 days, we haven't heard a peep out of them, we're probably not going to hear a pep out of the people downloaded your audio file (one of them would have been me), but we can always hope there is an expert audio engineer lurking out there.

I can think of a few things to try with the audio formats, but its outside my skill set. I've only ever worked with audio at a high level, never low level.

sevik
sevik's picture

I believe in humanity :))

We cannot be lost in soundless universe :))

milw
milw's picture

ditto deleted
I'm another who downloaded from both your sites, and in similar situation vis a vis skillsets... but it's never to late to learn, if we can find a source to learn from. I think the university level computer science courses have audio modules that may be helpful, I'm looking for a local one today.

milw
milw's picture

hi Sevik, looking at your capture of the 00792.wav (and the others), there are pronounced dips in the spectrum at 8 and 16 kHz. You captured these at 44100 kHz, right? To me, this looks like filtering of the original sampling frequency, so I'd suggest the audio was originally sampled at 8k before compressing. Nocturnal, have you looked at the circuitry between the ADC outputs and speaker? I'm curious if only one or both ADCs are in use?

Just playing with the data rates, the 00792 chunk from the black cartridge is 7845 bytes, or 245 reads x 32 bytes. At 16 msec per read that's 3.9925 seconds to read the whole thing, which correlates pretty well with the length of your recording. That amount of time recorded in 16 bit PCM at 8 kHz would be 62760 bytes, so that means an 8-fold compression ratio. (which also pops out if you assume a read of 8kHz x 16 bit sound every 16 msec requires 256 bytes per read, or exactly 8 x the 32 bytes actually read). Ah well, off to troll the library!

sevik
sevik's picture

Original data is very likely 8kHz/16 bit - it's format required by both utilities from voice tools.

If you do spectra for short samples - you will see peaks doubled around 8 kHz and doubled again around 16kHz. It's a classic sample of discretization spectra.

Second dac output seems to be unconnected, see this photo

I'm going to capture one of songs (9 codec), to see what we get from it :))

Pages