Tuesday, April 21, 2009

impure_data

Some bugs are definitely easier to track than others. And bugs that appear in standard libraries are definitely the trickiest you're likely to encounter, because you know little of what's inside the library, after all ^^"

This specific bug happened in _fread_r, the re-entrant version of fread, meaning that neither PC (current program location, within _fread_r) nor LR (register holding the last return, within fread) were of any help. If 10 years of C programming tought me one thing, it is that bugs are not in the standard libraries, but in how you invoke the library. Fread, strcpy and friends trust you to give them pointers where pointers are due and will not try to do anything fancy to ensure you're actually entitled to read/write to those memory locations. Once you got that wired in /dev/brain, you can start debugging.

Chers lecteurs francophones, ceci est un post technique sur la mise au point de programmes sur la console DS, comment interpréter le contenu de la pile, le désassemblage du code et tutti quanti. Si vous pensez avoir déjà les connaissances techniques (rudiments d'assembleur, registres, pile, adresses) mais que l'anglais vous bloque, laissez un commentaire et je traduirai...

As usual, my starting point is the guru meditation screen of the DS, but this time, registers are of little help: what really matters is the stack dump just below.

00000011 00000001
00000e0c 00000000
00000000 00000000
00000000 0b003bd6
00000001 00000011
00000000 020299c7
00000000 ffffffff
0b003c20 00000000
00000000 020362cf
0b003bd6 0200c721

2029894 : crash within _fread_r
_fread_r internal variables
_fread_r saving registers
20299c6 : call by fread
fread internal variables
fread saving registers
20362ce : call by FileRead::read
FileRead::read saving registers
200c720 call by XMtransport::load

Unfortunately such "stack unwinding" is more complicated on ARM cpus than on the x86 architecture, because parameters to function calls are typically kept in registers (r0..r7 at least) rather than pushed on the stack, and because there is nothing like the "base frame pointer". Knowing how many words on stack each function takes can only be deduced by disassembling the corresponding function with "arm-eabi-objdump -drl <file.arm9.elf>", e.g.

fread():
# address code disassembled
20299ac: b570 push {r4, r5, r6, lr}
20299ae: 1c16 adds r6, r2, #0
20299b0: 4a07 ldr r2, [pc, #28] (20299d0 <.text+0x296d0>)
20299b2: 1c0d adds r5, r1, #0
20299b4: b082 sub sp, #8
20299b6: 1c04 adds r4, r0, #0
20299b8: 1c21 adds r1, r4, #0
20299ba: 6810 ldr r0, [r2, #0]
20299bc: 9300 str r3, [sp, #0]
20299be: 1c2a adds r2, r5, #0
20299c0: 1c33 adds r3, r6, #0
20299c2: f7ff ff4f bl 2029864 <_fread_r>
20299c6: b002 add sp, #8
20299c8: bc70 pop {r4, r5, r6}
20299ca: bc02 pop {r1} ; retrieve LR
20299cc: 4708 bx r1 ; return
20299ce: 0000 lsls r0, r0, #0
20299d0: ebe8 0204 undefined
teaches us that fread saves 4 registers on the stack before it starts executing and that it needs 8 bytes of local storage for its own use. The order of the arguments in push commands is a bit confusing but it works as such: lr will be pushed first, then r6, then r5 and r4 will show at the top of the stack when the processor will be ready to execute next instruction at 20299ae. You shouldn't be confused by the fact that sub sp, #8 is "reserve 8 bytes for local variables": the stack grows downwards in virtually every CPU architecture.
Now, i have to admit that this is quite a tedious way to go, so you're more likely to just check the output of arm-eabi-objdump -h arm9/runme.arm9.elf :
Idx Name          Size      VMA       LMA       File off  Algn
0 .init 000002dc 02000000 02000000 00008000 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .text 00040520 02000300 02000300 00008300 2**6
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .fini 0000000c 02040820 02040820 00048820 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
that tells us simple and basic that everything between address 02000300 and 02040820 is our code. If you spot any value within that range on your stack dump, it's very likely to be a value of lr that has been pushed and you can just call arm-eabi-addr2line to figure out where it is in your program. Note, however, that addr2line doesn't manage to extract function names out of the standard library component, which will just be refered to as "crtstuff:0" ... however, as soon as your code is reached, you usually know by reading line 95 of FileRead::read that the library function you're calling is actually fread...

But what actually puzzled me in this debug session was a suspicious data address that objdump resolved into "impure_data". It did not appeared on the program map (build/.map) and was co-located with _impure_ptr in lib_a-impure.o ... As usual, there are tons of bug reports that can be found by google about impure_data, but little clue. So i downloaded the sources of newlib (the libc used in the devkitpro project) and quite immediately located newlib/libc/reent/impure.c and

struct _reent __ATTRIBUTE_IMPURE_DATA__ _impure_data = _REENT_INIT(_impure_data);
struct _reent *__ATTRIBUTE_IMPURE_PTR__ _impure_ptr = &_impure_data;

Of course, i should have guessed that right from the start: "impure" is just a nickname for the "reentrant datastructure" of the C library in its newlib incarnation, following the precept that "A pure function is one with no side effects; an impure function is any other". Data that must be kept apart of functions so that e.g. you can call strtok within an interrupt handler even if strtok has been itself interrupted.

Btw, Peter Schraut wrote a nice blog entry about DS "guru mediation" and how to handle them.

2 comments:

PypeBros said...

A noter:
- la valeur de LR est une adresse de _retour_, donc l'instruction suivant l'appel et pas l'appel lui-même
- ici, LR a généralement une valeur _impaire_ bien que les instructions fassent toutes 2 ou 4 bytes. En réalité, le bit #0 indique que le code est en mode "Thumb", qui sacrifie un peu l'expressivité du processeur ARM contre une meilleure densité du code.

http://www.simplemachines.it/doc/ARM_COMBO_ap01.html

PypeBros said...

J'étais perplexe avec toutes ces instructions ARM qui se terminent par un S. adds, movs, lsls ... Il ne s'agit pas d'un indicateur sur la taille de l'opération mais d'un flag qui précise que le contenu du "status register" doit être modifié.

En fait, en mode "thumb", il n'y a généralement pas moyen d'interdire à une intruction de modifier les flags, ou la forcer à le faire dans les cas où elle est programmée pour ne pas y toucher.