#linuxcnc-devel | Logs for 2012-03-28

Back
[00:17:08] -!- joe9 has quit [Quit: leaving]
[00:17:13] -!- maximilian_h has quit [Quit: Leaving.]
[00:27:39] -!- seb_kuzminsky has quit [Ping timeout: 245 seconds]
[00:35:49] -!- JT-Shop has quit [Quit: ChatZilla 0.9.88.1 [Firefox 10.0.2/20120215223356]]
[00:59:04] -!- seb_kuzminsky [[email protected]] has joined #linuxcnc-devel
[01:13:24] -!- rob_h has quit [Ping timeout: 260 seconds]
[01:19:59] -!- Valen| has quit [Quit: Bye]
[01:58:12] -!- jtr [[email protected]] has joined #linuxcnc-devel
[02:01:48] -!- phantoxe has quit [Read error: Connection reset by peer]
[02:08:37] -!- z-log has quit [Remote host closed the connection]
[02:54:04] -!- Tom_L has quit []
[02:54:14] -!- atom1 has quit [Quit: Leaving]
[03:02:42] -!- Nick001 has quit [Read error: Connection reset by peer]
[03:12:39] -!- sumpfralle has quit [Ping timeout: 245 seconds]
[03:34:54] -!- koax [[email protected]] has joined #linuxcnc-devel
[03:38:19] -!- koax_ has quit [Ping timeout: 265 seconds]
[03:48:04] -!- demacus has quit [Ping timeout: 245 seconds]
[04:17:54] -!- ve7it has quit [Remote host closed the connection]
[04:29:51] -!- seb_kuzminsky has quit [Ping timeout: 252 seconds]
[04:48:29] -!- pcw_ has quit [Ping timeout: 244 seconds]
[04:48:38] -!- pcw__ has quit [Ping timeout: 240 seconds]
[04:50:11] -!- seb_kuzminsky [[email protected]] has joined #linuxcnc-devel
[05:08:32] -!- psha[work] [psha[work][email protected]] has joined #linuxcnc-devel
[05:35:21] -!- mhaberler [[email protected]] has joined #linuxcnc-devel
[05:50:11] -!- cevad has quit [Quit: Leaving]
[05:51:13] <mhaberler> seb_kuzminsky: a) thanks! b) q: what's the update trigger on buildbot.linuxcnc.org/clang ? time? your motivation ;-?
[05:56:14] <seb_kuzminsky> mhaberler: hi :-)
[05:56:24] <seb_kuzminsky> what do you mean by update trigger?
[05:56:36] <mhaberler> I what causes the next run?
[05:56:46] <seb_kuzminsky> oh - the buildbot does a clang build every time anyone pushes to git.linuxcnc.org
[05:57:10] <seb_kuzminsky> "my motivation"?! then it'd never happen! :-P
[05:57:35] <seb_kuzminsky> i'm the slackingest hacker of all time
[05:57:59] <mhaberler> oh. looked at the wrong commit. they *are* gone in in the error listing. sorry, my bad
[05:58:28] <seb_kuzminsky> yay for dead bugs
[05:58:40] <mhaberler> yep, that struck me as a unlikely trigger
[05:59:02] <seb_kuzminsky> "every time seb opens a beer, the buildbot does a clang build"
[05:59:39] <mhaberler> do you have an SNMP trap for that or what?
[06:00:18] <mhaberler> I'm tickling the libmodbus folks to update the precise package and inject a lucid package; we'll see
[06:00:36] <seb_kuzminsky> the buildbot sends me a text message "i'm starting clang build, get a beer stat!"
[06:00:46] <mhaberler> oh, polling.
[06:00:53] <seb_kuzminsky> thanks for dealing with the modbus people
[06:00:58] <mhaberler> sure
[06:01:09] <seb_kuzminsky> i bet there wont be a lucid package, at least in the canonical archive
[06:01:15] <mhaberler> why?
[06:01:28] <seb_kuzminsky> too old and frozen
[06:01:39] <seb_kuzminsky> i'd love to be proven wrong
[06:01:46] <mhaberler> NB: I'm a total package illiterate
[06:01:59] <mhaberler> frozen means only bugfixes, I assume
[06:02:01] <seb_kuzminsky> i have two modbus vfds i want to write drivers for, and i'd love to use the real modbus lib
[06:02:03] <seb_kuzminsky> yes
[06:02:08] <seb_kuzminsky> "NB"?
[06:02:12] <mhaberler> oh, which type?
[06:02:26] <mhaberler> nota bene
[06:02:35] <seb_kuzminsky> polish?
[06:02:40] <mhaberler> latin
[06:02:47] <seb_kuzminsky> i'm a latin illiterate ;-)
[06:02:49] <seb_kuzminsky> and polish
[06:03:42] <seb_kuzminsky> one's a hitachi sj200, the other's a yaskawa v7-4x
[06:03:43] <mhaberler> I'm working with robh on a multi-slave modbus driver; maybe that is a better starting point if you have several identical devices: http://git.mah.priv.at/gitweb/emc2-dev.git/shortlog/refs/heads/modbus-generic-driver
[06:03:50] <mhaberler> ah, then no point
[06:05:31] <mhaberler> I found the toughest part is making sense of the docs; I wrote a little utility to manually investigate the device over modbus to come up with something remotely coherent: http://git.mah.priv.at/gitweb/modio.git
[06:07:46] <mhaberler> the absolute highlight was the following register description (chinese servo driver): "Pn201 The first electronic gear molecule"
[06:07:53] <seb_kuzminsky> i'll check it out when i start working on the vfd driver, thanks! :-)
[06:07:59] <seb_kuzminsky> haha
[06:50:47] -!- vladimirek has quit [Remote host closed the connection]
[07:06:47] -!- capricorn_one has quit [Remote host closed the connection]
[07:13:39] -!- maximilian_h [[email protected]] has joined #linuxcnc-devel
[07:14:12] -!- Valen has quit [Ping timeout: 245 seconds]
[08:02:45] -!- pjm__ has quit [Quit: TTFO]
[08:57:20] -!- rob_h [[email protected]] has joined #linuxcnc-devel
[09:07:29] -!- Guest62612 has quit [Quit: Miranda IM! Smaller, Faster, Easier. http://miranda-im.org]
[09:22:25] <CIA-51> 03mhaberler 07master * re77f5987c693 10/configs/sim/remap/ (6 files in 6 dirs): configs/sim/remap: remove references to *_LD_PRELOAD
[09:22:27] <CIA-51> 03mhaberler 07master * r3bf31d63b40f 10/debian/configure: Merge branch 'master' of git://git.linuxcnc.org/git/emc2
[09:22:29] <CIA-51> 03mhaberler 07master * rd21a488a9e82 10/ (3 files in 2 dirs): configure: test for libgl1-mesa-dri bug and workaround
[09:24:39] <mhaberler> ok, the *_LD_PRELOAD requirement for ini files with configs using Python in the interp is history
[09:29:18] <CIA-51> 03mhaberler 07master * rb10bb4de1b3c 10/docs/src/remap/structure.txt: docs: make note in manual that the explicit *_LD_PRELOAD workarounds are history
[09:41:46] -!- mhaberler has quit [Quit: mhaberler]
[09:48:37] -!- mhaberler [[email protected]] has joined #linuxcnc-devel
[09:50:36] <mhaberler> uh, there seems to be an issue on amd64 with the workaround check
[09:52:25] -!- seb_kuzminsky has quit [Ping timeout: 265 seconds]
[10:04:47] <mhaberler> could some kind soul with an amd64 box try this and mail me the output? http://static.mah.priv.at/public/testbug.sh
[10:10:23] -!- seb_kuzminsky [[email protected]] has joined #linuxcnc-devel
[10:10:57] <seb_kuzminsky> mhaberler: http://buildbot.linuxcnc.org/buildbot-admin/builders/precise-amd64-sim/builds/358/steps/configuring/logs/stdio
[10:11:01] <seb_kuzminsky> good night
[10:11:36] <mhaberler> thanks.. I pushed a branch which is more verbose on stdio
[10:53:48] -!- koax has quit [Ping timeout: 265 seconds]
[11:08:55] -!- maximilian_h has quit [Quit: Leaving.]
[11:12:30] -!- maximilian_h [[email protected]] has joined #linuxcnc-devel
[11:22:27] -!- Radium has quit []
[11:38:47] -!- Radium has quit []
[11:53:25] -!- Radium has quit []
[11:55:38] -!- sumpfralle has quit [Read error: Operation timed out]
[11:57:49] -!- jthornton [[email protected]] has joined #linuxcnc-devel
[12:15:25] -!- maximilian_h has quit [Quit: Leaving.]
[12:20:02] -!- Radium has quit [Ping timeout: 246 seconds]
[12:27:35] -!- Sulfur has quit [Remote host closed the connection]
[12:36:32] -!- jthornton has quit [Quit: ChatZilla 0.9.88.1 [Firefox 11.0/20120312181643]]
[12:40:45] -!- JT-Shop [[email protected]] has joined #linuxcnc-devel
[12:40:51] -!- joe9 has quit [Read error: Connection reset by peer]
[12:41:12] -!- JT-Shop [[email protected]] has parted #linuxcnc-devel
[12:41:15] -!- JT-Shop has quit [Client Quit]
[12:41:28] -!- JT-Shop [[email protected]] has joined #linuxcnc-devel
[12:41:50] -!- JT-Shop [[email protected]] has parted #linuxcnc-devel
[12:41:52] -!- JT-Shop has quit [Client Quit]
[12:42:19] -!- JT-Shop [[email protected]] has joined #linuxcnc-devel
[12:42:28] -!- JT-Shop has quit [Client Quit]
[12:42:37] -!- JT-Shop [[email protected]] has joined #linuxcnc-devel
[12:49:02] <jepler> mhaberler: 64-bit 8.04: http://pastebin.com/97xpKksL
[12:49:34] <mhaberler> oh man, a prehistoric g++
[12:49:37] <mhaberler> thanks
[12:50:07] <jepler> 64-bit 10.04: http://pastebin.com/QXFXEeY8
[12:51:18] <jepler> 64-bit Debian sid: http://pastebin.com/8ZWHx1qk
[12:51:20] <mhaberler> could you try inserting -fPIC to "g++ -c ldpreload_crash.cpp -o ldpreload_crash.o" and see whether this fixes it?
[12:52:28] <jepler> ldpreload_crash.o: %.o: %.cpp
[12:52:28] <jepler> - $(CXX) -c $< -o $@
[12:52:28] <jepler> + $(CXX) -fPIC -c $< -o $@
[12:52:33] <jepler> [foo]
[12:52:41] <jepler> that fixes it on the last system
[12:52:43] <mhaberler> yes
[12:53:06] <jepler> and on the 8.04 system, so probably on all of them.
[12:53:20] <mhaberler> great, thanks a lot! I'll fix it and push
[12:54:59] <jepler> it's funny to read gcc documentation
[12:55:00] <jepler> Position-independent code requires special support, and therefore
[12:55:01] <jepler> works only on certain machines. For the 386, GCC supports PIC for
[12:55:01] <jepler> System V but not for the Sun 386i.
[12:55:23] <jepler> (yeah, I guess that "Sun 386i" and "System V" just about wraps up the "386" systems that people are using..)
[12:55:41] <mhaberler> ok, still works on i386 lucid
[12:56:12] <mhaberler> ;)
[12:59:07] <CIA-51> 03mhaberler 07master * r8c84ee066355 10/scripts/test-libgl-bug.sh: config: fix test script for libgl-mesa-dri bug workaround for amd64
[12:59:38] <mhaberler> now for the buildbot fallout ;)
[13:01:35] -!- pcw__ has quit [Ping timeout: 272 seconds]
[13:01:55] -!- pcw_ has quit [Ping timeout: 264 seconds]
[13:02:23] -!- ries has quit [Ping timeout: 246 seconds]
[13:02:35] <mhaberler> did you use pbuilder, or do you have separate VM's for releases?
[13:03:05] -!- bedah has quit [Ping timeout: 246 seconds]
[13:03:10] <jepler> When I was release manager, I used a combination of real machines and VMs to do the building
[13:03:28] <jepler> I think cradek and seb_kuzminsky are working together so that in the 2.5 series the buildbot will produce the debs that go in the main repository.
[13:03:45] <mhaberler> and this test?
[13:04:23] <mhaberler> pretty quick for a reboot
[13:04:42] <jepler> oh, the three systems I tested on? those are real physical systems I have ssh access to
[13:05:06] <mhaberler> aja, great heaters, those amd's
[13:05:56] <mhaberler> I had one once, and I could turn down the room heating..
[13:06:22] <jepler> oh, you're referring to the uname? "uname -m" is amd64 even if the CPU manufacturer is Intel (as it was in 2 out of those 3 systems)
[13:06:44] <mhaberler> no, heat dissipation
[13:10:17] <jepler> I'm not much of a fanboi but I don't think there's a substantial difference in TDP between AMD and Intel's desktop chips these days. Core i7 has TDP from 95W to 130W, and Phenom II has TDP from 65W to 125W.
[13:41:31] -!- Radium has quit [Ping timeout: 265 seconds]
[13:44:21] -!- joe9 [[email protected]] has joined #linuxcnc-devel
[13:47:11] -!- Paragon-ws has quit [Ping timeout: 272 seconds]
[13:48:50] -!- psha[work] has quit [Quit: Lost terminal]
[13:58:26] -!- Radium has quit [Ping timeout: 252 seconds]
[13:59:57] -!- ewidance [[email protected]] has joined #linuxcnc-devel
[14:07:54] -!- Loetmichel has quit [Ping timeout: 260 seconds]
[14:07:59] Cylly is now known as Loetmichel
[14:12:37] -!- Radium has quit [Read error: Connection reset by peer]
[14:20:19] -!- pcw__ has quit [Ping timeout: 244 seconds]
[14:21:57] -!- ewidance [[email protected]] has parted #linuxcnc-devel
[15:04:19] -!- SWPLinux has quit [Ping timeout: 245 seconds]
[15:07:19] -!- phantoxe has quit [Ping timeout: 264 seconds]
[15:38:39] -!- maximilian_h [[email protected]] has joined #linuxcnc-devel
[15:38:40] -!- maximilian_h [[email protected]] has parted #linuxcnc-devel
[15:42:10] -!- nots has quit [Ping timeout: 260 seconds]
[15:45:46] -!- skunkworks [[email protected]] has joined #linuxcnc-devel
[15:47:36] -!- nots has quit [Remote host closed the connection]
[15:52:30] -!- nots has quit [Ping timeout: 265 seconds]
[15:55:12] -!- Thetawaves has quit [Quit: This computer has gone to sleep]
[15:58:08] -!- phantoxe has quit [Read error: Connection reset by peer]
[15:58:14] -!- nots has quit [Remote host closed the connection]
[16:05:25] -!- ve7it [[email protected]] has joined #linuxcnc-devel
[16:07:55] -!- nots has quit [Ping timeout: 264 seconds]
[16:12:38] -!- nots has quit [Ping timeout: 252 seconds]
[16:18:11] -!- nots has quit [Ping timeout: 250 seconds]
[16:23:35] -!- nots has quit [Ping timeout: 260 seconds]
[16:28:57] -!- nots has quit [Remote host closed the connection]
[16:30:43] -!- WalterN has quit [Read error: Connection reset by peer]
[16:33:26] -!- PCW has quit [Ping timeout: 246 seconds]
[16:33:30] PCW___ is now known as PCW
[16:35:06] -!- Thetawaves has quit [Quit: This computer has gone to sleep]
[16:37:02] <mhaberler> I think a found a pretty cool improvement for RFL - interpreter watchpoints. Here's a preview (this is only interp support, no UI/task support yet - proof of concept)
[16:37:20] <mhaberler> http://git.mah.priv.at/gitweb/emc2-dev.git/shortlog/refs/heads/interpreter-watchpoints
[16:37:33] -!- n2diy has quit [Quit: Ex-Chat]
[16:38:18] <mhaberler> a set of arbitrary python expressions, evaluated after each block - the conditon set will eventually ride down to task on interplist
[16:38:24] -!- SWPLinux [[email protected]] has joined #linuxcnc-devel
[16:38:59] <mhaberler> using compiled python expressions is pretty fast - about 500nS for a single compare
[16:51:36] -!- n2diy has quit [Client Quit]
[16:53:36] -!- n2diy has quit [Client Quit]
[16:57:11] -!- n2diy has quit [Client Quit]
[17:04:11] -!- jepler has quit [Read error: Operation timed out]
[17:04:23] -!- jepler [jepler!~jepler@emc/developer/pdpc.professional.jepler] has joined #linuxcnc-devel
[17:07:10] -!- maximilian_h [[email protected]] has joined #linuxcnc-devel
[17:11:32] -!- factor has quit [Quit: Leaving]
[17:13:20] -!- psha [[email protected]] has joined #linuxcnc-devel
[17:28:13] -!- iwoj has quit [Quit: Computer has gone to sleep.]
[17:31:46] -!- mhaberler has quit [Quit: mhaberler]
[17:39:43] -!- mhaberler [[email protected]] has joined #linuxcnc-devel
[17:54:22] -!- phantoxe has quit []
[17:57:03] -!- IchGuckLive [[email protected]] has joined #linuxcnc-devel
[17:58:09] <IchGuckLive> the mailing list is now spamed with questions i woudt like to unsubscribe is there a documentation howto for this ?
[18:03:18] <IchGuckLive> ok im off Thanks
[18:03:28] -!- IchGuckLive [[email protected]] has parted #linuxcnc-devel
[18:15:32] -!- IchGuckLive has quit [Quit: ChatZilla 0.9.87 [Firefox 10.0.2/20120216080748]]
[18:22:35] -!- n2diy has quit [Quit: Ex-Chat]
[18:29:55] -!- n2diy has quit [Client Quit]
[18:44:52] <joe9> hello, wondering if anyone has any suggestions: this is my latency times http://codepad.org/aCWop4f1
[18:45:13] <joe9> i am trying to figure out why those spikes in latency and what could be causing them.
[18:45:55] <joe9> though they are in the sub-10 microsecond range, the reason I am interested in them is that whenever I have a disk access they sometimes jump to 100 microsec range.
[18:55:43] <mhaberler> what kind of disk interface?
[18:55:50] <mhaberler> sata?
[19:02:24] -!- vladimirek has quit [Remote host closed the connection]
[19:24:12] -!- psha has quit [Quit: Lost terminal]
[19:57:14] -!- bedah has quit [Quit: Ex-Chat]
[20:19:38] <joe9> mhaberler: yes, sata, the linux module is pata_amd on raid-1
[20:20:02] <mhaberler> oh, raid
[20:20:15] <joe9> i am thinking that I should start using periodic mode instead of one-shot mode as periodic mode seems to have a higher frequency.
[20:20:22] <joe9> yes, software raid.
[20:20:25] <joe9> mdadm stuff.
[20:21:04] <joe9> do you have any links on how the calibration process of rtai works?
[20:21:13] <mhaberler> you could try isolating it by moving stuff to a ramdisk
[20:21:15] <mhaberler> no
[20:21:35] <joe9> mhaberler: that is a brilliant idea and I have been thinking about it too.
[20:21:44] <mhaberler> wel...
[20:21:47] <joe9> i should probably try the periodic mode before.
[20:23:49] <joe9> i may have found out the issue. It appears that "no Forced preemption" seems to help.
[20:23:52] <mhaberler> my gut feeling is the mdadm raid could be a cause - pure conjecture
[20:24:37] <mhaberler> try disabling one drive, it should keep running
[20:24:41] <joe9> I think the whole voluntary preemption thing does not seem to sit well with rtai.
[20:25:04] <mhaberler> what kernel are you using
[20:25:22] <joe9> the problem is that it does not happen all the time. it only happens for some random disk access once in a while.
[20:25:37] <joe9> other than those 5000 latencies which happen frequently.
[20:25:40] <joe9> 2.6.38.8
[20:25:47] <joe9> and vulcano rtai.
[20:25:56] <mhaberler> self-built?
[20:28:59] <mozmck> joe9: if you'll document your findings it may be of help when getting the 12.04 kernel working with rtai.
[20:29:13] <joe9> yes, sir.
[20:29:20] <joe9> sure, not a problem at all.
[20:29:34] <joe9> for one thing, deadline IO scheduler > CFQ
[20:30:04] <joe9> and, another, No Forced Preemption (Server) > voluntary preemption (Desktop) mode.
[20:30:17] <mozmck> I'm guessing the mdadm raid may be causing some of the latency, especially since you mentioned that you see spikes with disk accesses.
[20:30:22] <joe9> mhaberler: yes.
[20:30:58] <mozmck> CFQ I guess is the Completely Fair scheduler? I not real familiar with either...
[20:31:42] <mozmck> Those are good to know. It seems off the top of my head that I used voluntary preemption for the 10.04 kernel.
[20:32:58] -!- mhaberler has quit [Ping timeout: 245 seconds]
[20:33:02] <mozmck> I played with different setting and tested on 3 or 4 different computers and got some other people to run tests at some point as well.
[20:37:23] <joe9> mozmck: yes, cfq and deadline are the IO scheduler's.
[20:37:54] <joe9> mozmck: voluntary preemption was giving me one large value once in a while.
[20:38:18] <mozmck> interesting
[20:38:42] <joe9> i think in the mailing list, there is a recommendation that "No Forced preemption" is better by paolo, the man.
[20:40:05] -!- syyl has quit [Quit: Leaving]
[20:40:58] <mozmck> Looks like I used the Low Latency Desktop option
[20:42:47] -!- JT-Shop_ [[email protected]] has joined #linuxcnc-devel
[20:43:16] -!- DJ9DJ has quit [Quit: bye]
[20:43:55] -!- JT-Shop has quit [Ping timeout: 264 seconds]
[20:43:58] JT-Shop_ is now known as JT-Shop
[20:54:07] <joe9> mozmck: i take that back about the "no forced preemption". not sure yet.
[20:54:13] <joe9> seeing some high numbers now.
[21:01:19] -!- Radium has quit []
[21:06:29] -!- SolarNRG has quit [Read error: Connection reset by peer]
[21:13:29] <mozmck> Are you stress testing the system? I would run glxgears, firefox, and other things all at once to see the effects on latency.
[21:13:51] -!- mhaberler [[email protected]] has joined #linuxcnc-devel
[21:22:56] -!- skunkworks has quit []
[21:36:44] -!- vladimirek has quit [Remote host closed the connection]
[21:47:08] -!- Tom_L has quit [Client Quit]
[21:49:53] <joe9> http://www.hotaboutlinux.com/2010/01/tuning-the-linux-kernels-completely-fair-scheduler/ good article.
[21:50:06] <joe9> mozmck: yes, glxgears, dd if=/dev/zero bs=100M | gzip | gzip -d | gzip | gzip -d | gzip | gzip -d >| /dev/null
[21:50:39] <joe9> but, I find that some kind of random disk access is responsible for the once-in-a-while max figures.
[21:51:13] <joe9> it normally runs within 100ns to 10,000ns. but, sometimes, jumps to 400,000 ns oslt.
[21:51:19] <joe9> and i find that confusing.
[21:51:31] <joe9> and, for some reason, function tracing does not work.
[21:51:42] <joe9> not sure if it is because of the rtai patches.
[21:54:02] -!- Fox_Muldr has quit [Ping timeout: 265 seconds]
[21:55:08] -!- seb_kuzminsky has quit [Ping timeout: 240 seconds]
[21:55:18] <joe9> http://doc.opensuse.org/products/draft/SLES/SLES-tuning_sd_draft/cha.tuning.taskscheduler.html
[22:02:26] -!- maximilian_h has quit [Ping timeout: 246 seconds]
[22:08:53] -!- Radium has quit []
[22:19:49] -!- maximilian_h [[email protected]] has joined #linuxcnc-devel
[22:23:11] <cradek> joe9: (I haven't read back) I think the regular linux scheduler runs when the rtai threads are idle. so it seems like while it might have some effect on your rtai latency, it can't fix whatever basic problem there is with your hardware that doesn't let rtai run when it wants to.
[22:24:27] <cradek> it = kernel scheduler settings
[22:29:53] <joe9> cradek: any thoughts on how I can figure out the reasons behind once-in-a-while spikes of rtai latency.
[22:30:05] <joe9> cradek: what you say makes sense.
[22:30:43] <joe9> but, I do see a change in latency values when changing the kernel settings.
[22:32:33] <cradek> I don't doubt you, but I wonder if you're really changing the worst possible number. you might be changing the worst typical/frequent number.
[22:32:49] <cradek> and sadly the only way I know to debug latency is guess and check
[22:33:08] <cradek> by far the most common correct guesses are bios settings and video hardware
[22:33:24] -!- kbarry has quit [Client Quit]
[22:33:27] <cradek> I really have not heard of a disk-related problem at all.
[22:33:43] <joe9> i disabled all the agp and graphics stuff. no usb or firewire either.
[22:33:51] <joe9> it is a headless system.
[22:34:05] <cradek> in the very old days (emc1 days on rtlinux) rumor said scsi was bad, but I had no trouble with my scsi systems.
[22:34:58] <cradek> if it's clearly disk-related I'm not surprised none of that helped.
[22:35:21] <cradek> software raid makes your system unique or nearly unique amongst our users
[22:35:36] <cradek> also (obviously) your kernel/rtai build
[22:36:07] <cradek> did you try different sata/pata emulation bios settings?
[22:38:02] <joe9> cradek, I have dma in my bios for the harddisk and it has options for "auto, swdma[012],udma[0-5]" and so on.
[22:38:12] <joe9> i tried the swdma and it did not help.
[22:38:58] <joe9> the only processes from the kernel tracing (irqsoff and preemptoff) that seemed to be taking too much time are the disk stuff.
[22:39:40] <joe9> and, I can see the tracing that there are some situations where the rt delay_usec does not happen at the time that it is supposed to.
[22:39:47] <joe9> some disk activity would be going on there.
[22:40:07] <joe9> hence, my thinking that rtai is somewhat dependent on the kernel scheduling.
[22:40:46] <cradek> you already obviously know more about it than I do...
[22:41:41] <joe9> i wish i could get the function tracing to work. but, it does not. i am guessing some bug in the rtai code regarding that.
[22:43:58] <mozmck> You know more about it than me as well. How do you do the tracing to see whats going on?
[22:44:11] -!- pjm has quit [Read error: Connection reset by peer]
[22:45:27] -!- sumpfralle has quit [Ping timeout: 245 seconds]
[22:47:07] <joe9> cradek, this is an irqsoff trace http://codepad.org/3pZe1HhL
[22:47:16] <joe9> mozmck: ^^
[22:47:43] <joe9> sshd-390 0d.s1. 17us+: delay_tsc <-__const_udelay is the rtai timer waiting for the interrupt moment to fire off
[22:49:41] <joe9> the busy waiting subroutine, I think.
[22:50:01] <joe9> this is stuff I picked from reading here and there. I could be totally offbase here, mind you.
[22:52:05] -!- iwoj has quit [Ping timeout: 260 seconds]
[22:55:01] <joe9> http://www.mjmwired.net/kernel/Documentation/trace/ftrace.txt
[22:57:04] -!- seb_kuzminsky [[email protected]] has joined #linuxcnc-devel
[23:08:40] <joe9> http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=%2Fliaai%2Fsaptuning%2Fsaptuningadjust.htm -- interesting article
[23:09:23] <joe9> kernel.sched_wakeup_granularity_ns = 5000 -- seems to be doing the trick for me, until now.
[23:10:33] <joe9> rtai "busy wait" helps a lot, imho..
[23:14:57] <joe9> or, maybe not: http://codepad.org/yujyLs3o
[23:15:11] -!- SolarNRG has quit []
[23:16:38] -!- linuxcnc-build has quit [Ping timeout: 246 seconds]
[23:17:07] -!- hm2-buildmaster has quit [Ping timeout: 245 seconds]
[23:17:50] <joe9> it went fine for 15 mins, before that sudden spike.
[23:21:34] <joe9> mhaberler: do you think I can fit linuxcnc on a ramdisk?
[23:22:03] <joe9> i think I should get away from using any hard disk and move to a ramdisk
[23:22:05] <mhaberler> I dont think you need to, just go for the parts which are accessed
[23:22:38] <joe9> mhaberler: what do you mean by "just go for the parts which are accessed"?
[23:23:05] <mhaberler> files - like the linuxcnc stuff
[23:23:34] <mhaberler> whatever's written to/read from - maybe the home directory
[23:24:41] <joe9> mhaberler: sorry, i do not understand what you meant.
[23:25:06] <joe9> are you saying that I do not need the ramdisk stuff?
[23:25:56] <mhaberler> no. you'll have a working set of files which are accessed - move the likely suspects to ramdisk
[23:26:15] <joe9> oh, ok. just move what I need to the ramdisk. oh, ok.
[23:26:31] <joe9> like the linuxcnc executable and those files.
[23:26:34] <mhaberler> home directory; /usr/bin; wherever emc lives; that kind of thing
[23:26:45] <mhaberler> replace by links
[23:27:30] <joe9> do a chroot or switch_root(?) to the new directory, and, before that unmount the raid and harddisk, correct/
[23:27:40] <joe9> s,/,?,
[23:27:45] <mhaberler> no, that doesnt change anything
[23:28:02] <mhaberler> that just changes filesystem namespace
[23:28:02] <joe9> i cannot use links. if i use links the hard disk and raid would still be on.
[23:28:23] <mhaberler> soft links should be cached
[23:28:58] <joe9> mhaberler: oh, I think I understand what you are saying. thanks, will try that.
[23:29:36] <joe9> btw, another fundamental question, is it a bad idea to not worry about those once-in-a-while max latency reads?
[23:29:45] <mhaberler> it should be possible to run linux on a r/o root with only /tmp ; /var; /home as r/w filesystems; the embedded systems all do that
[23:29:58] -!- linuxcnc-build [[email protected]] has joined #linuxcnc-devel
[23:30:20] -!- hm2-buildmaster [[email protected]] has joined #linuxcnc-devel
[23:30:37] <mhaberler> what do you want, a servo or a host-stepgen, or an onboard-stepgen type system
[23:30:46] <joe9> i have a gecko g540
[23:30:55] <joe9> that I am trying to interface with this system.
[23:31:00] <joe9> stepper motors.
[23:31:03] <mhaberler> thats a stepper driver I guess
[23:31:07] <joe9> yes.
[23:31:24] <mhaberler> do you stepgen with hal, or an say mesa card?
[23:32:28] <joe9> i have no idea at this point. i do not have a mesa card, so, i guess hal.
[23:32:39] <mhaberler> so hal stepgen
[23:32:46] <mhaberler> with a base thread
[23:33:05] <mhaberler> what kind of motherboard is that?
[23:33:41] <joe9> let me check, one minute. cpuinfo: http://codepad.org/camxmwwq
[23:34:16] <joe9> lspci -v: http://codepad.org/zzif3d8p
[23:34:42] <mhaberler> ok, throw out the nvidia card
[23:35:05] <mhaberler> oops, ATI
[23:35:20] <joe9> yes, i am not using that.
[23:35:59] <joe9> ok, can get rid of it.
[23:36:41] <mhaberler> hold on. what do you mean by 'not using it' - I would assume the kernel recognizes it during startup and loads a driver?
[23:37:01] <joe9> http://codepad.org/fwpv4XWq is my latency and I have been running it for the last 15 mins with glgears and dd if=/dev/zero bs=100M | gzip | gzip -d | gzip | gzip -d | gzip | gzip -d >| /dev/null
[23:37:15] <joe9> mhaberler: no, I removed the driver from the kernel.
[23:37:23] <mhaberler> I see
[23:37:29] <joe9> the kernel has no agp or drm driver.
[23:37:49] <joe9> http://codepad.org/0AbmTQXq cat /proc/interrupts
[23:38:43] <joe9> when I try to access something that I have not accessed before, such as a write/read from something in the /sys folder or disk, i get a big latency reading.
[23:38:58] <joe9> other than that I get around 10us consistently.
[23:39:48] <mhaberler> from /sys - well that would be an inode lookup, maybe a single disk access, rest should be kernel memory access only
[23:40:12] <joe9> without the glxgears and the dd command, the latency is < 100, with 5000ns once for every 10 readings
[23:40:28] <mhaberler> that is in fact a pretty good value
[23:41:19] <mhaberler> can you reproduce the spike with access to other pathes as well?
[23:41:34] <joe9> this is without any load (no glxgears and dd): http://codepad.org/ZbIb6gvu
[23:41:36] <mhaberler> I mean paths which are definitely on-disk
[23:41:45] <joe9> http://codepad.org/a1OjdToZ
[23:42:04] <joe9> the problem is I cannot replace the spike reliably.
[23:42:13] <joe9> I can say that it happens with a disk access.
[23:42:26] <joe9> but, the same file after a reboot would not show a spike.
[23:43:23] <mhaberler> do you have space for a spare filesystem on your disk?
[23:43:28] <joe9> yes
[23:43:52] <joe9> it is a raid-1 disk, could be a reason too.
[23:44:07] <joe9> raid-1 disk with 2 devices.
[23:44:28] <mhaberler> ok, I'd suggest you try: make a fs on disk, and a ramdisk; mount both; and see whether you can reproduce the spikes on both
[23:44:28] <joe9> http://codepad.org/ULnMaP1g cat /proc/mdstat
[23:44:53] <joe9> ok, let me do that. thanks.
[23:45:13] <joe9> will have to read up on the ramdisk stuff. will get back to you in the next few hours with the results or tomorrow.
[23:45:16] <mhaberler> an chance to connect another disk, or flash stick or somesuch?
[23:45:18] <joe9> will you be around?
[23:45:30] <joe9> i have flash stick.
[23:45:56] <joe9> will it help, but, I do not think the machine can boot off of a flash stick.
[23:46:07] <joe9> i think the bios does not support booting off of usb.
[23:46:25] <mhaberler> play with the flash stick and see if you can tickle the spike
[23:46:26] <mhaberler> no, good enough to read/write/unmount/read/write, the like; just as a user fs
[23:46:26] <joe9> http://codepad.org/yn0YQ8FT meminfo
[23:46:32] <joe9> i have around a 1 gb ram.
[23:47:17] <joe9> the flash stick size == ram size, in my case.
[23:47:31] <mhaberler> ?
[23:47:53] <mhaberler> just mount the flash stick as a filesystem, and see if i/o to it causes a spike
[23:48:03] <joe9> oh, that is what you meant.
[23:48:27] <joe9> will have to recompile the kernel with the drivers. i removed everything from my kernel. will do that and keep you posted.
[23:48:42] <joe9> mhaberler: thanks for your help.
[23:48:43] <mhaberler> leave it in as modules
[23:48:48] <joe9> ok, will do.
[23:49:46] <mhaberler> my gut feeling is the mdadm error checking on disc access doesnt like being interrupted, but I might be totally off the bat
[23:50:47] <mhaberler> i suggest excluding that first by using a fs which doesnt use mdadm
[23:57:10] <mhaberler> triage: mdadm, sata, none of both