2007/09/12
IRC [07:07] <partymola> Dossy: got it! got a functional core dump! :O
IRC [07:25] <Dossy> oh? cool!
IRC [07:25] <Dossy> lets take a look?
IRC [08:00] <partymola> yep
IRC [08:00] <partymola> go!
IRC [08:02] <partymola> Dossy: new corefile is nsd.14336.core
IRC [08:02] <partymola> I have just copied it in your HOME
IRC [08:08] <Dossy> OK, gdb 6.1.1 gets a partial stacktrace, but gdb 6.6 doesn't. How did you build gdb 6.6?
IRC [08:10] <partymola> from ports
IRC [08:10] <partymola> let me see
IRC [08:10] <partymola> normal compilation, not any special flags
IRC [08:11] <partymola> CONFIGURE_ARGS= --program-suffix=${PORTVERSION:S/.//g} \
IRC [08:11] <partymola> --enable-target=all \
IRC [08:11] <partymola> --enable-tui --with-libiconv-prefix=${LOCALBASE}
IRC [08:11] <Dossy> Is this AOLserver 4.5 from CVS HEAD?
IRC [08:11] <partymola> yes
IRC [08:12] <partymola> aolserver-HEAD-20070909
IRC [08:13] <Dossy> which server dropped this corefile?
IRC [08:13] <Dossy> grepping through the server logs, I don't see it ...
IRC [08:14] <partymola> this morning webstats was down, and i had this corefile
IRC [08:14] <Dossy> [12/sep/2007:12:59:56][14336.134800896][-sched:3-] Notice: Ns_PgExec: Trying to
IRC [08:14] <Dossy> reopen database connection
IRC [08:14] <Dossy> nsd in free(): error: chunk is already free
IRC [08:14] <partymola> around the time i told you
IRC [08:15] <partymola> ooops... when reopening db connection? why?
IRC [08:15] <Dossy> there. that's webstats.
IRC [08:15] <Dossy> double-free bug -- are you using the latest nspostgres?
IRC [08:16] <partymola> hmmm. curious
IRC [08:16] <partymola> yes, i took it from cvs
IRC [08:16] <partymola> let me check what version i have exactly
IRC [08:16] <Dossy> Jim Lynch recently made some changes to it ...
IRC [08:16] <partymola> 2007-06-12 tag nspostgres_v4_r1 << this is the last entry in the changelog
IRC [08:17] <Dossy> ok, good
IRC [08:18] <partymola> ok, i have just found out in our activity log, that postgresql-server was restarted around that time to reload configuration (my partner did it)
IRC [08:19] <partymola> he did it remotely, so he wasn't here to tell me
IRC [08:19] <partymola> and nspostgres failed reconnecting
IRC [08:19] <partymola> so this is not the kind of problem i really had before
IRC [08:21] <Dossy> who knows.
IRC [08:21] <Dossy> *shrug*
IRC [08:21] <Dossy> but yes, I didn't see these kind of log messages before
IRC [08:21] <Dossy> but--if there's a crash bug relating to PostgreSQL restarts, it'd be nice to get that fixed too :)
IRC [08:21] <Dossy> I'm going to file this at Sourceforge, just in case.
IRC [08:22] <partymola> ok, this crash was originated because postgresql got down while executing a serie of commands
IRC [08:22] <partymola> yes, do it please, this is a bug after all
IRC [08:22] <partymola> but it happened only in this server
IRC [08:22] <partymola> the other aolservers are working nice
IRC [08:56] <Dossy> Possible double-free crash in nspostgres_v4_r1: http://aolserver.com/sf/bug/1793118
IRC [09:01] <partymola> el servidor ha cerrado la conexi<C3><B3>n inesperadamente,
IRC [09:01] <partymola> probablemente porque termin<C3><B3> de manera anormal
IRC [09:01] <partymola> antes o durante el procesamiento de la petici<C3><B3>n. << Translation: server closed the connection unexpectedly, probably because it ended abnormally before or while processing the petition.
IRC [09:10] <partymola> i sent you an e-mail with logs, Dossy
IRC [09:19] <partymola> http://www.youtube.com/watch?v=-8cniOcAbDU << lmao
IRC [09:44] <partymola> i am leaving for taking an exam... will be back in 3-4 hours. later ;P
IRC [09:49] <Dossy> good luck
IRC [13:03] *** holymoly joined the chat.
IRC [13:31] *** jim joined the chat.
IRC [13:32] <jim> about bug #1793118
IRC [13:32] <jim> it looks like postgres was in the process of shutting down
IRC [13:33] <jim> not sure if there's a lot I can do about that :)
IRC [13:35] <partymola> jim: it was me who reported that
IRC [13:35] <partymola> i am leaving atm
IRC [13:35] <partymola> but if you want to, i can give you all the information later
IRC [13:35] <partymola> and hi holycow :D
IRC [13:36] <jim> sure, np... but do I have that right?
IRC [13:36] <jim> was postgres shutting down?
IRC [13:36] <jim> or
IRC [13:36] <jim> had postgres shut down during the run of aolserver?
IRC [13:36] <jim> either one of those?
IRC [14:07] <holycow> mornin
IRC [14:08] *** holycow parted the chat.
IRC [15:29] *** holycow joined the chat.
IRC [15:34] *** holycow parted the chat.
IRC [16:40] *** holycow joined the chat.
IRC [17:28] <partymola> jim: postgres was being shutted down while the select statement was being executed
IRC [17:34] *** holycow parted the chat.
IRC [19:04] <jim> partymola: hi...
IRC [19:04] <partymola> hi
IRC [19:04] <jim> that might be beyond my driver :)
IRC [19:04] <partymola> u mean ns_db , right?
IRC [19:05] <jim> nspostgres
IRC [19:05] <partymola> i mean the problem may be in nsdb
IRC [19:05] <jim> well ok, not -my- driver
IRC [19:05] <jim> think of what would have to happen in order to restore these connections
IRC [19:06] <partymola> idk about how aolserver manages it internally ;)
IRC [19:06] <jim> (1) the -existing- handles would have to reconnect
IRC [19:06] <jim> (therefore these handles would have to know their data source)
IRC [19:06] <partymola> the crash only happens if there's a running query
IRC [19:07] <partymola> if there's not a query running, it doesn't crash
IRC [19:07] <jim> but why was the db being shut down?
IRC [19:08] <partymola> to reload configuration
IRC [19:08] <partymola> change in permissions
IRC [19:08] <partymola> another host was given permission to access to a database
IRC [19:08] <partymola> so pg_hba.conf was changed, and the server had to be restarted
IRC [19:10] <jim> let me ask you this... if there were no query,
IRC [19:10] <jim> and you pulled pg down and restarted it,
IRC [19:10] <jim> then ran another query on one of the pooled connection handles,
IRC [19:11] <jim> would that work?
IRC [19:12] <partymola> yes, it works
IRC [19:12] <jim> how many connections do you have in your pool?
IRC [19:12] <partymola> 2-4
IRC [19:12] <partymola> in the moment of the restart
IRC [19:12] <partymola> i had 4 aolservers connected to that database
IRC [19:12] <partymola> only the one running the query failed
IRC [19:12] <partymola> the other ones reconnected after the database was up again
IRC [19:13] <partymola> but the one doing the query died badly
IRC [19:13] <jim> interesting... so the -existing- connections worked
IRC [19:13] <jim> is there any way to tell aolserver to "sleep" for a moment?
IRC [19:13] <partymola> sleep() ? lol
IRC [19:14] <jim> well kinda like that
IRC [19:14] <partymola> you can also enter in a loop waiting for the connection to be up again
IRC [19:14] <jim> but what I really mean is pause until an unpause is received
IRC [19:14] <partymola> there may be a way to suspend a thread...
IRC [19:15] <partymola> idk... i've not done threads programming yet :S
IRC [19:15] <jim> I mean suspend all of aolserver including all threads
IRC [19:15] <partymola> looool
IRC [19:15] <partymola> stop even serving static pages?
IRC [19:15] <partymola> that feels like too much...
IRC [19:15] <jim> because,,, if I recall correctly, your issue is about the time it takes to restart aolserver
IRC [19:15] <jim> yes?
IRC [19:16] <partymola> the issue is that i don't want aolserver to crash
IRC [19:16] <partymola> database may go down anytime... and having aolserver crashing is not a funny thing
IRC [19:17] <jim> well hmm
IRC [19:17] <partymola> i think it should control better if the database query fails... it actually dies because nsd frees a memory address that is already freed
IRC [19:17] <partymola> so it looks like something failed, but the caller didn't check if the operation was right or not
IRC [19:17] <partymola> so it frees what it shouldn't
IRC [19:17] <jim> what would be a better response from aolserver is to block if any problem happens
IRC [19:18] <partymola> giving an error code back is a good solution imo
IRC [19:18] <partymola> if not, putting the thread that renders that request into some kind of loop waiting for the db to be up again, and retrying the query aftewards
IRC [19:19] <jim> this requirement the database server is allowed to go down is kinda tough... it's in the throes of creating a tabular data structure to receive the data... the data starts coming... and boom
IRC [19:19] <partymola> yes, i know it's complex
IRC [19:20] <partymola> i am not requiring it to be done
IRC [19:20] <partymola> i just reported it :)
IRC [19:20] <partymola> in fact, we were looking for another bug when this one just popped in our face :)
IRC [19:20] <jim> I understand that, I used the word requirement because it looks like one in your work context
IRC [19:21] <partymola> i've been lately in contact with Dossy, because he's been trying to track down some crashes on one of my aolservers
IRC [19:21] <partymola> i find pretty annoying aolserver to crash suddenly with strange messages
IRC [19:22] <partymola> at least, the reason for this crash is known...
IRC [19:22] <partymola> i mean this database restarting one
IRC [19:22] <partymola> i am trying to focus in helping to get a more stable aolserver
IRC [19:23] <jim> that would make it more stable allright
IRC [19:26] <jim> what you'd have to do is connect up the exceptonal conditions
IRC [19:26] <jim> btw, is this your query that you are running? one you wrote?
IRC [19:27] <jim> err, that your aolserver ran at the time of the crash
IRC [19:27] <partymola> yes
IRC [19:27] <partymola> it runs a PL/SQL function
IRC [19:28] <partymola> refresh_matview($table)
IRC [19:28] <jim> ok, does that function write?
IRC [19:28] <jim> or just read?
IRC [19:28] <jim> are you running it inside a catch?
IRC [19:28] <partymola> let me see
IRC [19:29] <partymola> i do: select refresh_matview('$table');
IRC [19:29] <partymola> just that, and the database knows what to do
IRC [19:29] <partymola> it returns nothing
IRC [19:29] <partymola> and no, i am not running it inside a catch
IRC [19:29] <jim> how are you running that query?
IRC [19:29] <partymola> let me see
IRC [19:29] <jim> my assumption right now is:
IRC [19:30] <jim> "soeewhere in your tcl code, you are running this query"
IRC [19:30] <partymola> ok, it's inside a catch
IRC [19:30] <partymola> foreach tabla {fechas datetime} {
IRC [19:30] <partymola> set sql "select refresh_matview ('$tabla')"
IRC [19:30] <partymola>
IRC [19:30] <partymola> catch { ns_db select $db $sql }
IRC [19:30] <partymola> }
IRC [19:30] <jim> where do you get $db?
IRC [19:31] <partymola> set db [ns_db gethandle "postgres_pool"]
IRC [19:31] <jim> you do this right before the catch?
IRC [19:32] <partymola> yes
IRC [19:32] <partymola> proc procesarlogs::actualizar_vistas_materializadas {} \
IRC [19:32] <partymola> {
IRC [19:32] <partymola> # Conectamos a la BBDD
IRC [19:32] <partymola> set db [ns_db gethandle "postgres_pool"]
IRC [19:32] <partymola> # Actualizar las vistas materializadas
IRC [19:32] <partymola> foreach tabla {fechas datetime} {
IRC [19:32] <partymola> set sql "select refresh_matview ('$tabla')"
IRC [19:32] <partymola>
IRC [19:32] <partymola> catch { ns_db select $db $sql }
IRC [19:32] <partymola> }
IRC [19:33] <jim> is that the end of the func?
IRC [19:33] <partymola> no
IRC [19:33] <jim> could you show the next few things you do?
IRC [19:33] <partymola> it runs like 5-6 more sentences
IRC [19:33] <partymola> yes
IRC [19:33] <partymola> wait, i'll post the whole function in pastebin
IRC [19:34] <jim> ok, cool
IRC [19:34] <partymola> http://pastebin.ca/694756
IRC [19:34] <partymola> nothing special actually
IRC [19:38] <jim> is there any way you could turn this into one query? you have this double foreach and the body of the inner one hits the db
IRC [19:38] <jim> these are constants?
IRC [19:38] <partymola> yes, they're constants
IRC [19:39] <partymola> it does 9 queries total, you know
IRC [19:39] <partymola> 3 loops in the outer foreach, and 3 more in the inner
IRC [19:39] <jim> do you do a loop exactly like this, only difference different constants anywhere else?
IRC [19:40] <partymola> no
IRC [19:40] <partymola> where do you want to get?
IRC [19:40] <jim> you -might- consider moving this loop to plpgsql
IRC [19:41] <partymola> hmmm... not a bad idea actually
IRC [19:41] <jim> if you do that, 9 or 10 db hits become one
IRC [19:42] <jim> I'm also saying this won't solve the crash problem
IRC [19:42] <partymola> but that has nothing to do with the aolserver crash, does it? lol
IRC [19:42] <partymola> yeah
IRC [19:42] <jim> (but maybe, randomly, it might make it harder to hit)
IRC [19:43] <jim> database hits over the net are spennnnsive :)
IRC [19:43] <partymola> really? very much?
IRC [19:44] <partymola> 10ms overhead for each call?
IRC [19:44] <jim> yeah, and -really- bad if db is not on same machine as aolserver
IRC [19:44] <jim> then it's not lo you're using
IRC [19:44] <partymola> well, actually it's the same machine
IRC [19:44] <partymola> but someday we'll have to move it away
IRC [22:28] *** jim parted the chat.