(Note: Others have reported problems on Windows, and/or with Nvidia cards. It crashes on Linux with a non-Nvidia card too. And I don't remember any crashes before 3.2.6.0, although I haven't played RnD in a long time.)
Well, the game was too unstable to be playable, so I decided to debug this. I'm posting this here in hopes it may be helpful to Holger, and also that someone might be able to explain this to me, because I certainly don't understand the result.
I concluded the editor crashes and game crashes probably have the same origin, and decided to investigate the editor, as that's much easier to reproduce.
Using gdb, I found out that a floating point exception occurs in tools.c:942, in function getSizedGraphicSource, because width_div is zero. (Sometimes it's height_div.) But how can it be zero? Maybe offset_calc_pos is out of offset_calc's bounds? Nope. The actual content of offset_calc changed, and it contains completely bogus values.
Using a conditional breakpoint at tools.c:934 (with an empty temporary function: if(offset_calc[0].width_div!=16) tmp_func();):
Code: Select all
$ gdbtui ./rocksndiamonds
[...]
Breakpoint 1, tmp_func () at tools.c:917
(gdb) up
#1 0x08056be2 in getSizedGraphicSource (graphic=765, frame=0, tilesize_raw=16, bitmap=0xbfcfeee0, x=0xbfcfeedc, y=0xbfcfeed8)
at tools.c:936
(gdb) print offset_calc
$1 = {{width_mult = 15, width_div = 512, height_mult = 640, height_div = 0}, {width_mult = 178999928, width_div = 0,
height_mult = 178999928, height_div = -1209309548}, {width_mult = 179026168, width_div = 179026168, height_mult = -1076892104,
height_div = -1209533302}, {width_mult = 179026168, width_div = -1076892052, height_mult = 179026168, height_div = -1076891968}, {
width_mult = -1076891896, width_div = 548, height_mult = 6, height_div = -1221870324}, {width_mult = -1076891240,
width_div = -1076891080, height_mult = -1076892040, height_div = -1209309548}}
Code: Select all
$ valgrind -v ./rocksndiamonds
[...]
--4033-- Command line
--4033-- ./rocksndiamonds
--4033-- Startup, with flags:
--4033-- -v
--4033-- Contents of /proc/version:
--4033-- Linux version 2.6.27-tbARCH (root@kalafunos) (gcc version 4.3.2 (GCC) ) #1 SMP PREEMPT Wed Dec 24 13:26:11 CET 2008
--4033-- Arch and hwcaps: X86, x86-sse1-sse2
[...valgrind initialization...]
[...rocksndiamonds starts, but when I scroll left in the editor:...]
==4415== Thread 1:
==4415== Invalid write of size 4
==4415== at 0x8056BC1: getSizedGraphicSource (tools.c:932)
==4415== by 0x8056EEF: getMiniGraphicSource (tools.c:980)
==4415== by 0x8057519: DrawMiniGraphicExt (tools.c:1111)
==4415== by 0x80574C6: DrawMiniGraphic (tools.c:1102)
==4415== by 0x8059979: DrawMiniElementOrWall (tools.c:1813)
==4415== by 0x80AC7EE: ScrollMiniLevel (editor.c:5186)
==4415== by 0x80BC19C: HandleControlButtons (editor.c:10970)
==4415== by 0x80DA71F: HandleGadgets (gadgets.c:1859)
==4415== by 0x80D91A2: ClickOnGadget (gadgets.c:1426)
==4415== by 0x80BD1D3: HandleLevelEditorKeyInput (editor.c:11462)
==4415== by 0x8054905: HandleKey (events.c:875)
==4415== by 0x805386B: HandleKeyEvent (events.c:370)
==4415== Address 0xbef7704c is just below the stack ptr. To suppress, use: --workaround-gcc296-bugs=yes
==4415==
==4415== Conditional jump or move depends on uninitialised value(s)
==4415== at 0x4153D8B: SDL_UpperBlit (in /usr/lib/libSDL-1.2.so.0.11.2)
==4415== by 0x80EDB8C: SDLCopyArea (sdl.c:380)
==4415== by 0x80D34AF: sysCopyArea (system.c:326)
==4415== by 0x80D3464: BlitBitmap (system.c:520)
==4415== by 0x8057557: DrawMiniGraphicExt (tools.c:1112)
==4415== by 0x80574C6: DrawMiniGraphic (tools.c:1102)
[...clipped, much more errors...]
at 0x8056BC1: getSizedGraphicSource (tools.c:932)
Note that this is the first reported error. No memory corruption issues were reported prior to it.
Code: Select all
$ gdbtui ./rocksndiamonds
[...]
(gdb) x 0x8056BC1
0x8056bc1 <getSizedGraphicSource+61>: 0x158ba5f3
(gdb) disassemble 0x8056BC1
Dump of assembler code for function getSizedGraphicSource:
0x08056b84 <getSizedGraphicSource+0>: push %ebp
0x08056b85 <getSizedGraphicSource+1>: mov %esp,%ebp
0x08056b87 <getSizedGraphicSource+3>: push %edi
0x08056b88 <getSizedGraphicSource+4>: push %esi
0x08056b89 <getSizedGraphicSource+5>: sub $0xf0,%esp
0x08056b8f <getSizedGraphicSource+11>: lea -0xb8(%ebp),%eax
0x08056b95 <getSizedGraphicSource+17>: mov %eax,-0xc0(%ebp)
0x08056b9b <getSizedGraphicSource+23>: movl $0x8152900,-0xc4(%ebp)
0x08056ba5 <getSizedGraphicSource+33>: movl $0x18,-0xc8(%ebp)
0x08056baf <getSizedGraphicSource+43>: mov -0xc0(%ebp),%edi
0x08056bb5 <getSizedGraphicSource+49>: mov -0xc4(%ebp),%esi
0x08056bbb <getSizedGraphicSource+55>: mov -0xc8(%ebp),%ecx
0x08056bc1 <getSizedGraphicSource+61>: rep movsl %ds:(%esi),%es:(%edi)
[...clipped...]
(gdb) info line tools.c:933
Line 933 of "tools.c" starts at address 0x8056bc3 <getSizedGraphicSource+63> and ends at 0x8056bd8 <getSizedGraphicSource+84>.
The last command says that line 933 starts at getSizedGraphicSource+63, that is, the invalid write from getSizedGraphcSource+61 occurs before line 933.
So it seems there's only one statement where it can be:
Code: Select all
/* tools.c:920 */
struct
{
int width_mult, width_div;
int height_mult, height_div;
}
offset_calc[6] =
{
{ 15, 16, 2, 3 }, /* 1 x 1 */
{ 7, 8, 2, 3 }, /* 2 x 2 */
{ 3, 4, 2, 3 }, /* 4 x 4 */
{ 1, 2, 2, 3 }, /* 8 x 8 */
{ 0, 1, 2, 3 }, /* 16 x 16 */
{ 0, 1, 0, 1 }, /* 32 x 32 */
};
Next, I tried looking at gcc's assembler output:
Code: Select all
$ gcc -o tools.S -S tools.c -DTARGET_SDL `sdl-config --cflags`
/* tools.S:2275..2320 */
.section .rodata
.align 32
.type C.241.10079, @object
.size C.241.10079, 96
C.241.10079:
.long 15
.long 16
.long 2
[...initial offset_calc data snipped...]
.long 1
.long 0
.long 1
.text
.globl getSizedGraphicSource
.type getSizedGraphicSource, @function
getSizedGraphicSource:
pushl %ebp
movl %esp, %ebp
pushl %edi
pushl %esi
subl $240, %esp
leal -184(%ebp), %eax
movl %eax, -192(%ebp)
movl $C.241.10079, -196(%ebp)
movl $24, -200(%ebp)
movl -192(%ebp), %edi
movl -196(%ebp), %esi
movl -200(%ebp), %ecx
rep movsl
I don't have any idea what's going on. I don't know assembler, I don't know gcc's internals, I don't know rocksndiamonds code in much depth. If anyone finds out what's happening, I'm very interested in finding out.
To conclude, I found a workaround: declare offset_calc as const (that is, replace "struct" with "const struct") and the "invalid write" disappears, along with all the FPEs and display bugs. However, I strongly discourage simply using this "fix" and forgetting about the bug. I think it signals a deeper issue, and should be fixed once and for all.
Thank you for reading this, and sorry if it was too long. Information in the first paragraph was probably better suited for the Bug Reports section, but the bulk of the post belongs to Programmer's Corner.