2005/01/31
IRC [02:25] *** booyaa joined the chat.
IRC [03:54] *** frankie joined the chat.
IRC [05:05] *** torbutte joined the chat.
IRC [09:36] *** holycow parted the chat.
IRC [11:17] <Dossy> Morning everyone.
IRC [11:21] <frankie> Dossy: hi
IRC [11:21] <frankie> Dossy: do you remember about the amd64 linuxthreads problem on debian?
IRC [11:22] <frankie> i just patched to check for __x86_64__ to manage the thing, i could followup to bugs on sf.net if you like so
IRC [11:24] <Dossy> frankie: IS that really the right fix though?
IRC [11:24] <Dossy> I'm still concerned that this is a problem with Debian in some way --
IRC [11:24] <Dossy> why does the ia32 build of LinuxThreads have the function, but not the ia64 build?
IRC [11:25] <Dossy> not LinuxThreads, but NPTL rather.
IRC [11:25] <Dossy> NPTL 0.60 to remain backwards-compatible with LinuxThreads defines the symbol and the function should be a no-op.
IRC [11:37] <frankie> dunno, i'm not an arch expert
IRC [11:38] <frankie> i know only that function is obsolete nowdays, probably it's around for compatibility in debian lib
IRC [11:39] <frankie> i'll follow your question to debian-glibc
IRC [11:51] <Dossy> please do --
IRC [11:52] <Dossy> as I said in my reply to your email, I'm running glibc 2.3 and NPTL 0.60, but I have that symbol while you don't
IRC [11:52] <Dossy> to me, that's a debian-specific problem, likely
IRC [11:54] <frankie> but your build is not an amd64, isn't it?
IRC [11:55] <frankie> that's currently an unsupported version in debian
IRC [11:55] <Dossy> no
IRC [11:55] <Dossy> my build is ia32
IRC [11:55] <frankie> unfortunately that's a amd64 specific issue
IRC [11:55] <Dossy> right
IRC [11:55] <Dossy> and what I'm saying is, it means the amd64 build is broken, not AOLserver
IRC [11:56] <Dossy> if they've done some debian-specific hackery to glibc that removed that symbol, then they need to "fix" that
IRC [11:56] <Dossy> esp. for the amd64 platform kind of thing
IRC [11:56] <frankie> i don't know really why amd64 is using that kind of implementation
IRC [11:56] <frankie> anyway i'll ask to libc people
IRC [11:58] <Dossy> please do - i'm interested to know what the actual story is here
IRC [11:58] <Dossy> i mean, if i go and download the regular glibc 2.3 source, and grep for pthread_kill_other_threads_np() ... will I find it defined as a no-op?
IRC [11:59] <Dossy> if so, then the debian amd64 build of glibc is broken if it doesn't define that symbol
IRC [11:59] <Dossy> does that make sense?
IRC [12:00] <Dossy> although google'ing around seems to indicate that newer glibc may not define pthread_kill_other_threads_np() at all -- that would be interesting
IRC [12:01] <Dossy> However, on my box running libc6 2.3.2, I definitely see it:
IRC [12:01] <Dossy> # nm -A /lib/tls/libpthread-0.60.so | grep pthread_kill_other_threads_np
IRC [12:01] <Dossy> /lib/tls/libpthread-0.60.so:0000ab30 t __pthread_kill_other_threads_np
IRC [12:02] <Dossy> /lib/tls/libpthread-0.60.so:0000ab30 T pthread_kill_other_threads_np@GLIBC_2.0
IRC [12:02] <Dossy> So, without a doubt, at least the Debian libc6 2.3.2.ds1-18 package definitely defines that symbol.
IRC [12:02] <frankie> afaik NPTL obsoleted pthread_kill_other_threads_np() which was a workaround for a inner architectural bug of old linuxthreads
IRC [12:02] <frankie> probably it's still around in i386 for back-compatibility
IRC [12:02] <Dossy> frankie: Yes, absolutely. However, NPTL, to provide backwards-compatibility with LinuxThreads, defines the symbol and makes it a no-op.
IRC [12:03] <Dossy> think: pthread_kill_other_threads_np() {}
IRC [12:03] <frankie> god knows why it is removed in amd64
IRC [12:03] <Dossy> exactly. that's what you need to find out :)
IRC [12:03] <frankie> btw, i'll ask on the list
IRC [12:04] <Dossy> At some future date, when nobody's running LinuxThreads anymore, we can completely remove the pthread_kill_other_threads_np() call. But today, I know for certain that AOL still runs RH AS 2.1 which IIRC is still LinuxThreads.
IRC [12:04] <Dossy> And, I'd rather not #ifdef around the call to pthread_kill_other_threads_np() if the upstream glibc/NPTL _does_ define pthread_kill_other_threads_np() and only the amd64 Debian package doesn't.
IRC [12:05] <Dossy> There. I hope my position makes sense and is clear. Of course, frankie, you're welcome to do whatever's necessary to get an amd64 Debian package of AOLserver, but I'm not keen on integrating the change upstream for the reason I just mentioned - make sense?
IRC [12:10] <frankie> yes
IRC [12:11] *** frankie parted the chat.
IRC [13:07] *** cnk joined the chat.
IRC [13:07] <Dossy> hey cnk :)
IRC [13:07] * Dossy bounces. Any interesting updates?
IRC [13:07] <cnk> hello http://salinger.caltech.edu/autobench/output/NoACSwDB.jpg
IRC [13:08] <Dossy> scaling factor on the graph makes it kinda hard to appreciate :)
IRC [13:08] <cnk> the red/green is 4 db handles and default threads. The purple/blue is your suggested 16 handles and max = min =50 threads
IRC [13:08] <Dossy> so, more threads and handles means better response time even at higher load
IRC [13:08] <Dossy> which makes total sense.
IRC [13:09] <cnk> sort of. But it shows that just a little tuning can give much better performance ;)
IRC [13:09] <Dossy> nice to see it stay under 200msec even at the high end
IRC [13:09] <cnk> exactly
IRC [13:09] <Dossy> still, except for the anomalous spikes, staying under 600msec is respectable for most sites
IRC [13:09] <Dossy> did you say the machine under test was a dual xeon?
IRC [13:09] <cnk> yes
IRC [13:10] <Dossy> if you really want to make the oracle db cry, try making conns in the db pool = maxthreads. or at least maxthreads * 2/3
IRC [13:10] <Dossy> but still, 8000 req/sec in under 200msec is "more than adequate" :)
IRC [13:10] <Dossy> but this graph definitely points to the bottleneck being somewhere in ACS code. now, it'd be way interesting to see if there's any single place that's contributing 90% of the slowdown, etc.
IRC [13:11] <Dossy> i'm thinking it's bound to be the session stuff, but that's a WAG
IRC [13:11] <cnk> yes - and don't forget, you can double that - 16,000 req/sec - I am hitting it with 2 load generators simultaneously
IRC [13:11] <Dossy> it could be the underlying "request processor" framework which I've looked at a few times, and each time it's given me a bad headache.
IRC [13:11] <Dossy> nice, 16K req/sec - ha
IRC [13:11] <Dossy> how many concurrent connections are you actually getting? are you tweaking max open fd's to be >1024 ?
IRC [13:12] <cnk> yes. I tried hacking some of the ACS stuff out - just commenting it out. But I did not get the boost I was expecting
IRC [13:12] <Dossy> cnk: yeah ... my hunch is that the wacky overhead of all the "request processor" stuff is probably just too fat.
IRC [13:13] <cnk> I didn't do anything that would alter fd's Let me see error log - nsmain: max files: FD_SETSIZE = 1024, rl_cur = 1024, rl_max = 1024
IRC [13:14] <Dossy> nod - ok
IRC [13:14] <Dossy> it'd be interesting to crank that to 8192 and see if it has any measurable impact
IRC [13:15] <Dossy> my hunch is it'd only make the high-end respond faster as you may be maxing out the number of fd's
IRC [13:15] <cnk> where is that set? is that a compile time param?
IRC [13:15] <Dossy> but then, you'd probably want/need to bump threads >50 ...
IRC [13:15] <Dossy> it's a runtime param, look at 'ulimit -Hn'
IRC [13:16] <Dossy> although in previous versions of AOLserver (maybe even current), it limits you to max 1024
IRC [13:16] <Dossy> Ah, in 4.0.10 I changed it from an enforced max of FD_SETSIZE to a logged warning if you set the limit >FD_SETSIZE
IRC [13:16] <cnk> ulimit -Hn = 1024; so that is an OS limit
IRC [13:16] <cnk> ?
IRC [13:17] <Dossy> yeah.
IRC [13:17] <Dossy> default is FD_SETSIZE (1024) so that apps which use select() don't break.
IRC [13:17] <Dossy> AOLserver doesn't use select() when it doesn't have to (it uses poll()) but some parts of the Tcl core still unfortunately use select(). Beware.
IRC [13:20] <cnk> OK so for most folks, messing with FD_SETSIZE is probably more dangerous than worth it. I think rather than seeing how fast we can make AOLserver scream with static pages or very simple db selects, we need to sort out what the sweet point between fast and lots of cool API stuff is.
IRC [13:21] <cnk> seems that ACS 3 was on the bad side of too much cool API - but I would like to sort out where it got 'too heavy'
IRC [13:22] <Dossy> indeed.
IRC [13:22] <Dossy> I wish I could help there ...
IRC [13:22] <Dossy> Also, what version of AOLserver are your benchmarks running against? 3.x? 4.x?
IRC [13:22] <cnk> right now 3.3.1+ad13
IRC [13:23] <cnk> is there a high level summary of changes between 3.x and 4.x?
IRC [13:24] <Dossy> Not really, that I'm aware of.
IRC [13:27] <cnk> OK coffee and then I'll start writing this stuff up
IRC [13:44] <Dossy> yay!
IRC [13:44] <Dossy> you rock.
IRC [14:22] *** frankie joined the chat.
IRC [15:19] *** bartt joined the chat.
IRC [15:28] *** torbutte parted the chat.
IRC [15:57] *** bartt parted the chat.
IRC [16:37] *** bartt joined the chat.
IRC [16:52] *** bartt parted the chat.
IRC [18:16] *** frankie parted the chat.
IRC [22:20] *** booyaa joined the chat.
IRC [23:18] *** pingdashf joined the chat.
IRC [23:18] *** pingdashf parted the chat.