Using GDB to Diagnose a Crash

This is a repost of an article I wrote ages ago. I'm posting it here now just for posterity.

Ok. So you're fiddling around one day with the latest and greatest nifty new feature for your MUD. You've labored for hours adding the code, playing with it to get things just right. You've compiled it, and GCC didn't raise any complaints. You're home free.... except... wait? What the hell does it mean "Segmentation fault (core dumped)" and why won't the MUD boot!

Chances are at some point in your coding career you'll be greeted with this dreadful scenario. All of us have been there at one time or another. All of us know what it feels like to scratch your head wondering what happened.

My background in coding is primarily with Smaug muds, and specifically with the AFKMud project. I've had my fair share of things go wrong over the years and I'm no stranger to core dumps. I also find it's best to cover these things with real examples, so I'll share one I just caused in my own code today. In order for GDB to provide you with meaningful information, you need to make sure your MUD has been compiled to provide debug information. This is generally done with the -g parameter. I tend to stick with -g2 or better. This will usually be found on one of the flag lines in your Makefile.

We're in the process of moving AFKMud to use C++ code, and some of you may be aware of pitfalls involved. I just got done making descriptors into a class and am still shaking things down. Lo and behold, I reboot, run a command, and am greeted with:

[samson@boralis: ~/Alsherok/src] Segmentation fault (core dumped)


Uh oh, looks like I fubared something. The first thing you need to do when a core dump happens is determine where your core file is. With Smaug, the core file will usually end up in your area directory. So you'll need to go there. Change into your area directory, and you should type something like:

gdb -c core ../src/smaug


In my case, AFKMud moves the core to the same directory as the source code, so I would do this:

[samson@boralis: ~/Alsherok/src] gdb -c core afkmud


Upon doing so, I am greeted by a whole bunch of output:
GNU gdb Red Hat Linux (5.3post-0.20021129.18rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
Core was generated by `../src/afkmud 9500'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libstdc++.so.5...done.
Loaded symbols for /usr/lib/libstdc++.so.5
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_nisplus.so.2...done.
Loaded symbols for /lib/libnss_nisplus.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
#0  0x080fafa5 in descriptor_data::compressEnd() (this=0x8a5a2f0) at features.c:158
158        if( !mccp->out_compress )


Wow. Ok. So it's basically told me it loaded all of the symbols for everything the MUD uses. What does all that mean? Generally not a great deal. Everything above where it says #0 is system libraries you won't need to worry about. It's the stuff after that you need to pay attention to.

So now you have a general idea of what caused the problem. Something in the compressEnd() function did something it wasn't supposed to do. This however is generally not enough information to go on. You probably want to know what led up to this problem. So with that in mind, you'll want to trace the history of what caused this. Fortunately GDB makes that easy with the bt, or backtrace
command.

(gdb) bt
#0  0x080fafa5 in descriptor_data::compressEnd() (this=0x8a5a2f0) at features.c:158
#1  0x080f2634 in ~descriptor_data (this=0x8a5a2f0) at descriptor.c:136
#2  0x0814ece7 in rent_adjust_pfile(char*) (argument=0x8a573df "******";) at rent.c:1542
#3  0x0814fcc9 in rent_update() () at rent.c:2108
#4  0x0814530f in do_pfiles (ch=0x8a0b570, argument=0xbfffbf8c "tar -cf ../player/pfiles.tar ../player/*";) at pfiles.c:550
#5  0x08113e61 in interpret(char_data*, char*) (ch=0x8a0b570, argument=0xbfffd9f6 "";) at interp.c:907
#6  0x080dc9e9 in game_loop() () at comm.c:785
#7  0x080dd651 in main (argc=2, argv=0x8a0bcf8) at comm.c:1233
#8  0x40154758 in __libc_start_main () from /lib/tls/libc.so.6
Current language:  auto; currently c++


The backtrace will be listed in reverse, starting with the first function the MUD called, and ending with the last one it was in when it crashed. In this case, it began in main() and ended in compressEnd(). So why did it do this? You find that out by entering the stack "frames", or functions, and asking it what certain things were at the time. So in this case, we'll check frame 1,
which is in descriptor.c on line 136:

(gdb) frame 1
#1  0x080f2634 in ~descriptor_data (this=0x8a5a2f0) at descriptor.c:136
136        compressEnd( );


You see here the call to compressEnd(), ok, that's not enough info yet.
Lets look at the call that killed it, in frame 0:

(gdb) frame 0
#0  0x080fafa5 in descriptor_data::compressEnd() (this=0x8a5a2f0) at features.c:158
158        if( !mccp->out_compress )


Aha, this is a hint - something in features.c on line 158 is amiss. Start checking this line methodically. Begin by asking it what "mccp" was equal to at the time:

(gdb) print mccp
$1 = (mccp_data *) 0x0


This tells you that the "mccp" portion of the call was NULL, 0x0 stands for NULL, basically the absence of any data. Nothing, zero, zilch, etc. In this particular case, telling us that the structure which holds the data for this person's mccp_data is empty. It hasn't been initialized. Attempting to access NULL data in any way will result in a crash, which is what happened.

Now that you know what happened, lets exit GDB.

(gdb) quit


You should return to a shell prompt. It's time to go fix your bug and try again.

Hopefully this article has proven useful. There are more advanced things you can do with GDB, but this should cover the basics of investigating a crash after the fact.

In reference, this is the code which crashed:

descriptor_data::~descriptor_data()
{
   close( descriptor );
   DISPOSE( host );
   delete [] outbuf;
   DISPOSE( pagebuf );
   STRFREE( client );

   compressEnd( );
   DISPOSE( mccp );
}


And this is what fixes it:

descriptor_data::~descriptor_data()
{
   close( descriptor );
   DISPOSE( host );
   delete [] outbuf;
   DISPOSE( pagebuf );
   STRFREE( client );

   if( mccp != NULL )
      compressEnd( );
   DISPOSE( mccp );
}


Noting in the second version that we verify mccp isn't NULL before ending compression.

For a much more in-depth article on the use of GDB which also covers things like setting breakpoints and debugging while the game is running, check out Nick Gammon's GDB Guide.
.........................
"It is pointless to resist, my son." -- Darth Vader
"Resistance is futile." -- The Borg
"Mother's coming for me in the dragon ships. I don't like these itchy clothes, but I have to wear them or it frightens the fish." -- Thurindil

Well. I guess that's that then.

       
« Lost Time
Toyota's Downfall »

Posted on Mar 8, 2010 1:53 am by Samson in: | 16 comment(s) [Closed]
Comments
Nicely written, maybe a link to Nick Gammon's GDB Guide is worth mention as well? ;)

       
Samson said:

We're in the process of moving AFKMud to use C++ code


Heh. Ages ago indeed.

You so totally need to be able to hide long ass posts behind folds.

       
Well, I wasn't posting this here new, just in the interests of preservation mostly. Even though I published it at MudBytes too. Nick wrote his a year after I wrote mine, so that should tell you something. I'd have backdated it, but that would have driven it immediately to the bottom of the pile :)

Hiding long posts behind folds.... yes... I was thinking about that with the Legalese post. I guess I'll have to raid Wordpress to find out how to do that.

       
Regina [Anon] said:
Comment #4 Mar 8, 2010 3:49 pm
Hi all, it's me - I can't seem to get the login to work, so I'm going to stop fighting it for now.

Anyway, Samson, I sent you an email about a week ago - did you see it? Speaking of crashes, my computer just developed a corrupt file system. A reformat and reinstall will fix it, but this was a good reminder that the computer's about to come to the end of its warranty and one more major error or component break will be the death of it. So I emailed you (and Whir) looking for some advice. I'm probably going to make the purchase within the week so I'd love to hear from you if you have a chance.

       
Er, that's because for some reason your password update didn't take. I reissued it. I guess I should probably put in a way for users to do that themselves too :P

I'll have to see if your email is in here somewhere. I buried myself under a mountain of error reports from the site because of all the hackery I'm doing.

Corrupt filesystems are no fun. Probably about as enjoyable as the failing harddrive controller issue I had a couple months back.

       
Being able to hide long posts (or posts you're just not interested in) behind folds would be really cool, but I'm not sure how much use I'd really get from it since I tend to go read every new post/comment anyway. :lol: :redface:

Oddly enough, on this thread (this visit?) instead of seeing smilies, I'm seeing their file names:
Samson said:

Well, I wasn't posting this here new, just in the interests of preservation mostly. Even though I published it at MudBytes too. Nick wrote his a year after I wrote mine, so that should tell you something. I'd have backdated it, but that would have driven it immediately to the bottom of the pile smile.gif

Maybe it's my internet acting up again, but see how it actually says "smile.gif" at the end of that line? My post above that also ends in a wink.gif, and your post right above this one has a line ending with tongue.png as well. :( (These don't have image placeholders like an image just didn't load either, just the filename as plain text like it's part of what was typed.)

Welcome back, again, Regina. Sorry to hear that your password reset didn't work right and that your file system got corrupted, I'll second Samson's sentiment, they are no fun at all.

Putting in a "lost password" sort of password reset option might make sandbox start looking awfully professional.. are you sure you want to go there? :P

I can imagine all the emails all your recent changes must be driving into your inbox right now.. maybe you could have your box sort by sender if you haven't got the mails being filtered/sorted to folders or some such otherwise?

       
Apparently some page speed optimizations aren't worth the added code hassle. I'll take ease of adding an emoticon over browser page flow speeds any day.

       
Yay! My little emoticon buddies are back. :D

So, what'd you change that caused them to go away last night? (Given that it sounds like you're saying it they'd vanished due to some sort of page flow thing.)

       
I tried to add code that would insert their IMG tag size attributes but it backfired and isn't worth the hassle of writing even more code to fix. It's not important enough for what little gain there would be.

       
Ah, that actually makes sense. I can't argue the logic either, for what little potential benefit that sounds likely to offer (at least on the first surface glance - I might not be thinking that one all the way through), it's probably not worth having to come up with code to make the code work, it's not like any of them are really needing fine tuning in their display anyway. Though I did notice that you added :ninja: since you'd added my :shrug: and :nuke: sometime today.. I think I know what the ninja one is about, but what's up with the nuke? (Not that I ever figured out what :cyclops: :ghostface: :robot: or :unclesam: were really intended for... but, hey, might as well ask about the newest member of the emoticon gang while he's still a fresh recruit, right? ;))

       
Jesus, probably a good thing I retrieved this when I did. Whoever "Deimos" is has informed me that the article published at MudBytes was deleted without notice. I should probably make sure the MCCP article gets preserved as well just in case.

       
Now that's ridiculous. Davion's ego or whatever aside, that's uncalled for. It's a good article that predates Nick's and one of only a few that exist out there for new mud admins to use to try to guess at why their mud just crashed on them. :mad:

If they're going to be that way over there, yeah, maybe you should make an effort to preserve the MCCP article along with the others you had a hand in like the one about IMC too. :headbang:

       
Oh yeah, good point, guess I should go snag the IMC protocol one too since it was work done by the AFKMud team.

I already did a raw source grab on the MCCP one. I still have the old originals at mccp.smaugmuds.org but the updated one on mudbytes is formatted better and on a single page.

       
Maybe while you're there you should see if any of the other articles that haven't been deleted yet are likely to be due to your contributions as well. (Though it's really sad that they'd shoot themselves this way to begin with.)

       
The rest of them are all silly things I had little or nothing to do with and probably wouldn't be missed if they vanished.

       
Edited by Samson on Mar 22, 2010 3:06 pm
In that case, no worries, but it's stlil odd that they'd go all out to trash their own site this way, regarless of how they may feel about you, deleting your account was undoubtably disruptive enough, but to actually go and remove major content just because you'd contributed it.. of course, they're naturally going to claim that either you had requested it be done with the removal of your other submissions or that they expected your request to be forthcoming and were acting preemptively. :rolleyes:

       
<< prev 1 next >>
Comments Closed
Comments for this entry have been closed.
Anonymous
Register

Forgot Password?

SuMoTuWeThFrSa
 1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31