#linuxcnc-devel | Logs for 2013-02-15

Back
[00:07:52] -!- rob__H [[email protected]] has joined #linuxcnc-devel
[00:07:52] -!- rob_h has quit [Read error: Connection reset by peer]
[00:13:40] -!- asdfasd has quit [Ping timeout: 256 seconds]
[00:18:38] -!- robh__ [[email protected]] has joined #linuxcnc-devel
[00:21:02] -!- alex_joni has quit [Ping timeout: 252 seconds]
[00:21:18] -!- alex_joni [alex_joni!~alex_joni@emc/board-of-directors/alexjoni] has joined #linuxcnc-devel
[00:21:18] -!- mode/#linuxcnc-devel [+v alex_joni] by ChanServ
[00:21:23] -!- archivist_herron has quit [Ping timeout: 255 seconds]
[00:21:23] -!- zomg has quit [Ping timeout: 255 seconds]
[00:21:56] -!- rob__H has quit [Ping timeout: 256 seconds]
[00:22:19] zomg is now known as Guest59563
[00:24:48] -!- V0idExp has quit [Quit: Leaving.]
[00:31:00] Guest59563 is now known as zomg
[00:34:44] -!- Nick001-Shop has quit [Remote host closed the connection]
[00:47:38] -!- zzolo has quit [Quit: zzolo]
[00:49:05] -!- robh__ has quit [Ping timeout: 248 seconds]
[00:58:34] -!- ybon has quit [Quit: WeeChat 0.3.8]
[01:07:38] -!- micges has quit [Quit: Leaving]
[01:11:58] -!- andypugh has quit [Quit: andypugh]
[01:15:51] -!- Keknom has quit [Read error: Connection reset by peer]
[01:17:22] -!- Sendoushi has quit [Read error: Connection reset by peer]
[01:34:59] -!- mhaberler has quit [Quit: mhaberler]
[01:45:22] -!- Youdaman has quit []
[02:00:53] -!- maximilian_h [[email protected]] has joined #linuxcnc-devel
[02:03:12] -!- Tom_itx has quit [Ping timeout: 252 seconds]
[02:05:23] -!- maximilian_h has quit [Ping timeout: 248 seconds]
[02:26:43] -!- mattions has quit [Ping timeout: 248 seconds]
[02:29:04] -!- adb has quit [Ping timeout: 256 seconds]
[02:29:10] -!- zzolo has quit [Quit: zzolo]
[02:36:28] -!- Wildhoney has quit [Ping timeout: 256 seconds]
[02:52:14] -!- nots has quit [Ping timeout: 240 seconds]
[02:52:35] -!- Tom_itx has quit [Ping timeout: 255 seconds]
[02:54:10] -!- ve7it has quit [Remote host closed the connection]
[02:59:00] -!- sumpfralle has quit [Quit: Leaving.]
[03:04:16] -!- Tom_itx has quit [Ping timeout: 252 seconds]
[03:09:23] -!- ravenlock has quit [Ping timeout: 248 seconds]
[03:16:19] -!- ravenlock_ has quit [Ping timeout: 248 seconds]
[03:23:25] -!- Sendoushi has quit [Remote host closed the connection]
[04:04:13] -!- Keknom has quit [Quit: Leaving.]
[04:08:20] -!- skunkworks has quit [Remote host closed the connection]
[04:14:46] -!- kb8wmc has quit [Quit: ChatZilla 0.9.89 [Firefox 18.0.2/20130201233328]]
[04:18:20] -!- AR_ has quit [Ping timeout: 252 seconds]
[04:33:12] -!- dgarr has quit [Ping timeout: 256 seconds]
[04:48:20] -!- fragalot has quit [Ping timeout: 246 seconds]
[04:48:57] -!- FinboySlick has quit [Quit: Leaving.]
[04:49:39] fragalot is now known as Guest27773
[05:30:55] -!- Gene34 has quit []
[05:31:12] -!- Valen has quit [Quit: Leaving.]
[05:34:21] -!- L33TG33KG34R has quit [Ping timeout: 276 seconds]
[05:47:43] -!- wildbilldonovan has quit [Quit: EOT]
[06:02:35] -!- Fox_Muldr has quit [Ping timeout: 260 seconds]
[06:07:47] -!- mhaberler [[email protected]] has joined #linuxcnc-devel
[06:18:57] -!- Loetmichel has quit []
[06:22:13] -!- cmorley1 [[email protected]] has joined #linuxcnc-devel
[06:24:35] -!- cmorley has quit [Ping timeout: 260 seconds]
[06:26:17] -!- kwallace1 [[email protected]] has parted #linuxcnc-devel
[06:32:16] -!- psha[work] [psha[work][email protected]] has joined #linuxcnc-devel
[06:35:58] -!- tjb1 has quit [Quit: tjb1]
[06:46:18] -!- tjb1 has quit [Client Quit]
[06:54:01] -!- Tom_itx has quit [Ping timeout: 276 seconds]
[06:56:44] -!- cmorley1 has quit [Ping timeout: 240 seconds]
[06:57:05] -!- cmorley [[email protected]] has joined #linuxcnc-devel
[07:04:22] -!- Youdaman has quit []
[07:22:44] -!- cmorley1 [[email protected]] has joined #linuxcnc-devel
[07:24:23] -!- cmorley has quit [Ping timeout: 255 seconds]
[07:55:32] -!- cmorley [[email protected]] has joined #linuxcnc-devel
[07:57:14] -!- cmorley1 has quit [Ping timeout: 240 seconds]
[08:05:11] -!- racycle has quit [Quit: racycle]
[08:18:43] -!- mhaberler has quit [Quit: mhaberler]
[08:23:22] -!- mhaberler [[email protected]] has joined #linuxcnc-devel
[08:40:59] -!- emel has quit [Excess Flood]
[08:46:01] -!- rob_h [[email protected]] has joined #linuxcnc-devel
[08:53:29] -!- pingufan has quit [Quit: Konversation terminated!]
[09:11:54] -!- odogono has quit [Read error: Connection reset by peer]
[10:28:49] -!- Tom_itx has quit [Ping timeout: 248 seconds]
[10:29:23] -!- mattions_ has quit [Client Quit]
[10:39:40] -!- uw has quit [Ping timeout: 272 seconds]
[10:41:55] uw is now known as Guest2135
[11:23:22] -!- maximilian_h [[email protected]] has joined #linuxcnc-devel
[11:28:43] -!- toudi_ [[email protected]] has joined #linuxcnc-devel
[11:28:47] toudi_ is now known as micges
[11:32:59] -!- maximilian_h has quit [Quit: Leaving.]
[11:40:14] -!- pikeaero has quit [Remote host closed the connection]
[11:45:24] -!- mhaberler has quit [Quit: mhaberler]
[11:50:22] -!- mattions_ has quit [Quit: Leaving]
[11:53:35] -!- maximilian_h [[email protected]] has joined #linuxcnc-devel
[11:53:40] -!- maximilian_h has quit [Client Quit]
[11:55:21] -!- phantoxeD has quit [Read error: Connection reset by peer]
[11:59:43] -!- mackerski has quit [Quit: mackerski]
[12:09:03] -!- micges has quit [Quit: Leaving]
[12:26:34] -!- Gene34 has quit [Ping timeout: 244 seconds]
[12:40:19] -!- toudi_ [[email protected]] has joined #linuxcnc-devel
[12:40:38] toudi_ is now known as micges
[12:48:48] -!- Gene34 has quit [Ping timeout: 264 seconds]
[13:11:26] -!- ravenlock has quit [Remote host closed the connection]
[13:23:33] -!- mhaberler [[email protected]] has joined #linuxcnc-devel
[13:24:22] -!- phantoxeD has quit [Read error: Connection reset by peer]
[13:24:51] -!- jpk has quit [Ping timeout: 248 seconds]
[13:31:27] -!- dgarr [[email protected]] has joined #linuxcnc-devel
[13:33:20] -!- mk0 has quit [Quit: Leaving]
[13:37:50] -!- b_b has quit [Changing host]
[13:47:27] -!- skunkworks [[email protected]] has joined #linuxcnc-devel
[13:48:28] -!- Sendoushi has quit [Read error: Connection reset by peer]
[13:54:43] -!- jpk has quit [Ping timeout: 248 seconds]
[14:00:43] -!- psha[work] has quit [Quit: Lost terminal]
[14:16:40] -!- mattions_ has quit [Quit: Leaving]
[14:33:04] -!- mackerski has quit [Quit: mackerski]
[14:33:25] <skunkworks> logger[psha],
[14:45:46] -!- Sendoushi has quit [Read error: Connection reset by peer]
[14:45:51] -!- Sendoush_ has quit [Read error: Connection reset by peer]
[14:58:29] <seb_kuzminsky> i've been running the new t0 tests in a loop, on a heavily loaded vm, all night, and so far: no failures :-/
[14:58:40] <seb_kuzminsky> i rebuilt the tip of master on the buildbot and it too passed
[14:58:49] <seb_kuzminsky> the bug went into hiding
[15:02:13] <skunkworks> yeck
[15:02:38] <skunkworks> this was the missed/out of order mdi commands?
[15:05:10] <seb_kuzminsky> yeah
[15:09:37] <seb_kuzminsky> but the good news is both 2.5 and master will probably pass their tests most of the time now - the bug rarely bites after that "set wait done" workaround
[15:11:49] <seb_kuzminsky> but the bad news is, i wonder how often this bug bites in real life (as opposed to in the tests)
[15:12:14] <skunkworks> can't say I have ever run into it..
[15:12:33] <seb_kuzminsky> me too
[15:12:43] <skunkworks> *that I know of ;)
[15:12:47] <seb_kuzminsky> heh
[15:19:49] -!- jthornton_ [[email protected]] has joined #linuxcnc-devel
[15:19:50] -!- JT-Shop-2 [[email protected]] has joined #linuxcnc-devel
[15:19:50] -!- V0idExp has quit [Read error: Connection reset by peer]
[15:19:50] -!- JT-Shop has quit [Read error: Connection reset by peer]
[15:19:50] -!- jthornton has quit [Read error: Connection reset by peer]
[15:22:56] -!- odogono has quit [Read error: Connection reset by peer]
[15:25:37] JT-Shop-2 is now known as JT-Shop
[15:39:11] -!- PCW_ [[email protected]] has joined #linuxcnc-devel
[15:40:30] -!- PCW has quit [Ping timeout: 256 seconds]
[15:40:34] PCW_ is now known as PCW
[15:47:34] -!- Youdaman has quit []
[15:48:58] -!- kwallace [[email protected]] has joined #linuxcnc-devel
[15:52:57] -!- skunkworks has quit [Read error: Connection reset by peer]
[15:56:03] -!- kwallace has quit [Read error: Connection reset by peer]
[15:56:26] -!- kwallace [[email protected]] has joined #linuxcnc-devel
[15:57:52] -!- L84Supper has quit [Ping timeout: 246 seconds]
[15:59:45] -!- mephux has quit [Excess Flood]
[16:07:16] -!- wboykinm has quit [Remote host closed the connection]
[16:07:32] -!- James628 has quit [Ping timeout: 245 seconds]
[16:07:35] -!- pingufan has quit [Quit: Konversation terminated!]
[16:08:34] -!- L84Supper [L84Supper!~Larch@unaffiliated/l84supper] has joined #linuxcnc-devel
[16:11:02] -!- smsfail has quit [Remote host closed the connection]
[16:13:42] -!- juma has quit [Client Quit]
[16:15:07] -!- tayy has quit [Remote host closed the connection]
[16:21:47] -!- odogono has quit [Read error: Connection reset by peer]
[16:38:24] <seb_kuzminsky> the MDI command gets to task, but task drops it on the floor because the mdi_input_queue is full (4 entries)
[16:38:46] <seb_kuzminsky> emctaskmain.cc around line 1400, "case EMC_TASK_PLAN_EXECUTE_TYPE:"
[16:42:25] ybon is now known as ybonlog
[16:44:56] -!- dgarr has quit [Quit: Leaving.]
[16:46:32] -!- ybonlog has quit [Quit: WeeChat 0.3.8]
[16:47:35] -!- ybon has quit [Client Quit]
[16:51:48] <cradek> 4 seems like a surprisingly small number
[16:51:54] -!- skunkworks [[email protected]] has joined #linuxcnc-devel
[16:52:06] <cradek> the mdi queueing is newish and probably has never been beaten on much
[16:52:57] ybon- is now known as ybon
[16:52:59] <cradek> I'm impressed that you found the problem
[17:01:55] -!- dhoovie has quit [Ping timeout: 246 seconds]
[17:05:36] <skunkworks> logger[psha],
[17:06:56] <skunkworks> Probably didn't think about automated stuff getting sent through the mdi channel
[17:07:43] <cradek> yes, someone queuing mdi commands at the console of a running machine isn't going to type too far ahead...
[17:07:54] <skunkworks> right
[17:08:08] <skunkworks> (although I have done a few ahead.
[17:08:09] <skunkworks> _
[17:08:11] <skunkworks> )
[17:08:28] <cradek> yeah, one or two is something I commonly do too
[17:08:58] <cradek> I especially do that with drill cycles
[17:09:43] <skunkworks> right
[17:09:59] <skunkworks> when I am too lazy to create a program ;)
[17:10:07] -!- toudi_ [[email protected]] has joined #linuxcnc-devel
[17:10:54] <skunkworks> seen any videos of the russian asteroid? Pretty spectacualr
[17:11:27] <cradek> archivist's find: http://cs6081.userapi.com/v6081385/508f/hhp8_8Hlg7g.jpg
[17:11:49] <skunkworks> heh
[17:12:09] <cradek> my find: http://rt.com/politics/zhirinovsky-meteorite-american-weapon-316/
[17:12:35] <cradek> surprise, the US doesn't have all the nutjobs
[17:13:19] <skunkworks> wow - I thought thought I was reading the onion for a second
[17:13:35] -!- micges has quit [Ping timeout: 252 seconds]
[17:13:41] <skunkworks> The joke here was that we are lucky we didn't get nuked
[17:14:46] <cradek> yes, glad it happened at a relatively peaceful time (well US vs. russia peace anyway)
[17:14:54] <skunkworks> right
[17:15:03] <cradek> and in the age of dash-cams
[17:15:15] <skunkworks> lots of glass replacement
[17:18:00] toudi_ is now known as micges
[17:18:37] <skunkworks> wonder where psha is
[17:20:35] -!- phantoxeD has quit [Read error: Connection reset by peer]
[17:22:07] -!- dway has quit [Quit: NOOOOOOooooooooo……]
[17:30:36] -!- ve7it [[email protected]] has joined #linuxcnc-devel
[17:35:01] -!- spiderdijon has quit [Ping timeout: 245 seconds]
[17:35:33] <mhaberler> the MDI queue size is configurable.
[17:35:44] -!- Guest2135 has quit [Quit: Leaving]
[17:36:28] <seb_kuzminsky> mhaberler: right, that's good
[17:36:44] <seb_kuzminsky> i guess increasing it to 100 or something would hide the bug for now, and maybe forever
[17:36:56] jthornton_ is now known as jthornton
[17:37:03] <mhaberler> i happens during your linuxcncrsh test I guess?
[17:37:15] <mhaberler> which bug?
[17:37:31] <seb_kuzminsky> i think you saw it during the linuxcncrsh test
[17:37:50] <seb_kuzminsky> it's been happening most often recently with the new t0 tests
[17:38:07] <seb_kuzminsky> any test that uses sim (as opposed to sai) and does a lot of mdi is susceptible
[17:38:15] <mhaberler> not sure that was the case, because with the wait_done it ran fine; that means all of the commands were executed
[17:38:54] <seb_kuzminsky> i think wait_done waits for the NML message to get to Task and the reply to get back from Task to the UI
[17:39:06] <seb_kuzminsky> that's not the same as the mdi queue in Task to drain
[17:39:30] <seb_kuzminsky> so wait_done might slow things down enough that the bug bites less often, but it's not an actual fix, if i understand things correctly
[17:39:32] <mhaberler> the proper way to fix this is to adapt linuxcncrsh such that it considers queue full
[17:39:52] <cradek> iirc, there are two kinds of wait: wait for task receive, and wait for execute completion
[17:40:06] <seb_kuzminsky> oooh, that sounds promising
[17:40:14] <mhaberler> let me look at the patch, it's been a while
[17:40:23] <mhaberler> any sha?
[17:40:48] <cradek> EMC_WAIT_RECEIVED and EMC_WAIT_DONE
[17:41:12] <seb_kuzminsky> cradek: yes, you're right
[17:41:13] <mhaberler> ah. I adapted Axis to test for queue full, but I hadnt though of linuxcncrsh
[17:41:44] <cradek> do you know which one your set wait thing does?
[17:41:44] <seb_kuzminsky> cradek: both of those are configurable in linuxcncrsh
[17:41:59] <seb_kuzminsky> brb
[17:42:08] <cradek> sounds like mhaberler is on the path
[17:42:45] <mhaberler> the proper way to fix this is to adapt the Axis behavior to linuxcncrsh
[17:43:06] <mhaberler> see second diff: http://git.mah.priv.at/gitweb?p=emc2-dev.git;a=blobdiff;f=src/emc/usr_intf/axis/scripts/axis.py;h=d893b9346f4e15a45aeaf4b1628df75948b96f46;hp=f8338f3b3348173a2d40e0f4a5205d466be58158;hb=55d93a8fea53bb5963af47143e44d00d58a664bd;hpb=394be3e65914908ac20cd9bdf3153432d17b28e5
[17:44:34] <mhaberler> waiting for RCS_DONE is probably the easier path
[17:45:21] <mhaberler> or its equivalent at the task intput side, maybe EMC_WAIT_DONE
[17:46:05] <cradek> yeah if that still does what I think, it's the simple fix for seb's problem. he doesn't need or want queueing of these commands.
[17:46:23] <mhaberler> seb: what was the kw you used, set wait_done?
[17:47:05] <mhaberler> curious why that dont work
[17:47:44] <mhaberler> it probably does, the problem is on the input side ramming down MDI commands and not considering the queue limit
[17:48:20] -!- psha [[email protected]] has joined #linuxcnc-devel
[17:48:36] <seb_kuzminsky> an unrelated bug in linuxcncrsh means the waiting fails when the mdi command fails (for example G10 L1 P0 is not allowed, so the mdi returns an error and the linuxcncrsh wait never returns)
[17:48:37] <mhaberler> that is some sad code
[17:48:59] Guest27773 is now known as fragalot
[17:49:04] -!- fragalot has quit [Changing host]
[17:49:23] <seb_kuzminsky> when using 'set wait done', the problem happens less often
[17:49:37] <seb_kuzminsky> when not using 'set wait done', the problem happens more often
[17:50:04] <mhaberler> let me understand the sequence: send MDI command, command fails, and then what?
[17:50:45] <seb_kuzminsky> look at tests/t0/shared-test.sh
[17:50:57] <mhaberler> you mean a wait condition doesnt terminate if theres an error?
[17:50:59] <mhaberler> ok
[17:51:07] <seb_kuzminsky> linuxcncrsh sends an mdi command and calls 'set wait done'
[17:51:14] <seb_kuzminsky> the mdi command fails because it's invalid
[17:51:26] <seb_kuzminsky> the error comes back to linuxcncrsh, but doesn't trigger a return from the wait
[17:51:29] <seb_kuzminsky> test hangs
[17:51:56] <seb_kuzminsky> so that's a bug that needs to be fixed, and seems pretty easy to fix
[17:52:17] <seb_kuzminsky> another bug:
[17:52:52] <seb_kuzminsky> with 'set wait done' after most (but not all) mdi commands that are known to not fail, sometimes mdi commands still get lost
[17:53:35] <seb_kuzminsky> for example: http://buildbot.linuxcnc.org/buildbot/builders/hardy-amd64-sim/builds/777
[17:54:03] <seb_kuzminsky> that's with 'set wait done' after most (but not all) mdi commands
[17:54:32] <seb_kuzminsky> the mdi command that got dropped, and the one before it, both had 'set wait done' right after issuing the mdi commands
[17:54:41] <mhaberler> the t0 test in that bb link?
[17:54:47] <seb_kuzminsky> yeah
[17:54:56] <seb_kuzminsky> search for 'checkresult' in the runtest stdio
[17:56:42] <mhaberler> reading emcrsh.cc spec for set_wait done it looks thats a but, with the lockstepping you have it should never overrun the queue even if the sending side wouldnt check for it
[17:57:31] <seb_kuzminsky> i didnt understand what you just said
[17:57:55] <mhaberler> make me understand: is the problem exclusively with lines 37-51 here: http://git.linuxcnc.org/gitweb?p=linuxcnc.git;a=blob;f=tests/t0/shared-test.sh;h=f4e8c4c0ca93b7b68ffe2f942c5ca569ece318b7;hb=04a76281d98670c0e8265695bc15dc035143be5c ?
[17:58:09] <mhaberler> ok, this is why:
[17:58:20] <mhaberler> you do set mdi yadayada
[17:58:25] <mhaberler> then set wait done
[17:58:50] <mhaberler> if it actually waited for the previous command to complete it couldnt overrun a queue of size 1 even
[17:58:58] <mhaberler> thats what I mean by lock-step
[17:59:06] <seb_kuzminsky> ah, i see
[17:59:35] <mhaberler> gut feeling: it waits for command received, but not done. lets see
[17:59:47] <seb_kuzminsky> the introspect function runs m100 with some numbered parameter, m100 appends the value of the numbered parameters to a file, and at the end of the test that file is compared to the expected values
[18:00:19] <seb_kuzminsky> the problem is that sometimes, some of the mdi's in introspect dont get run, so the corresponding lines are missing from the output file
[18:00:32] <seb_kuzminsky> and that causes a test failure, like the bbot one i just linked above
[18:01:31] <mhaberler> 'dont get run' means the shell script dont execute, I assume?
[18:01:43] <mhaberler> the M100 script I mean
[18:01:55] <seb_kuzminsky> i think so
[18:02:09] <seb_kuzminsky> the actual effect that's observable is that the line doesn't appear in the output file
[18:02:23] <mhaberler> background: I want to know whether this is related to shell execution, or the command not hitting the interp at all
[18:02:25] <seb_kuzminsky> the most likely cause of that effect, i think, is that the mdi m100 script doesn't get run
[18:02:58] <seb_kuzminsky> i think linuxcnc sends the mdi to task, and i think task doesn't send it to interp
[18:03:03] <mhaberler> jeesh, I've never used emrsh
[18:03:04] <seb_kuzminsky> but i'm not 100% sure yet
[18:03:11] <seb_kuzminsky> DONT USE IT!!! it sucks
[18:03:25] <mhaberler> talk to me. just reading that code..
[18:03:28] -!- cmorley1 [[email protected]] has joined #linuxcnc-devel
[18:04:01] <seb_kuzminsky> i'm using it because i want to script mdi commands along with some other stuff
[18:04:05] <mhaberler> ok, so lets grind one cat at a time, first the wait; I'm sure that can be trapped with gdb in task
[18:04:12] <mhaberler> sure
[18:04:22] <mhaberler> well it better work even if it is appalling
[18:04:50] <seb_kuzminsky> it was the first thing i found, but i think there's a python module that does a similar thing, i might switch to that in the future
[18:04:54] <seb_kuzminsky> but anyway, back to this cat
[18:05:24] <mhaberler> it seems somebody invented yet another ascii string parsing mess of a protocol here
[18:05:53] -!- andypugh [andypugh!~andy2@cpc16-basl9-2-0-cust685.20-1.cable.virginmedia.com] has joined #linuxcnc-devel
[18:06:06] -!- cmorley has quit [Ping timeout: 264 seconds]
[18:06:20] <mhaberler> oh my, that talks to this linuxcncserver thingie
[18:06:32] <mhaberler> and that talks to task
[18:06:37] <mhaberler> what a rube goldber
[18:06:40] <mhaberler> g
[18:06:57] <seb_kuzminsky> wait what, someone talks to linuxcncsvr? i thought it just created the NML buffers
[18:07:14] -!- PCW_ [[email protected]] has joined #linuxcnc-devel
[18:07:17] <seb_kuzminsky> the only reason anyone should talk to linuxcncsvr is to found out where the NML buffers are, right? it's a startup thing
[18:08:03] <mhaberler> hm, need to read that abit
[18:08:35] -!- PCW has quit [Ping timeout: 248 seconds]
[18:08:39] PCW_ is now known as PCW
[18:10:14] <andypugh> In the UK we use "Heath Robinson" instead of "Rube Goldberg". http://3.bp.blogspot.com/-5mmakxIPIJs/Tzu5UkjP3cI/AAAAAAAAFyo/IDB6mM0ur9E/s1600/HeathRobinson.jpg
[18:10:43] <mhaberler> need to pull and build master and see what that thing is actually doing..
[18:10:58] <mhaberler> it clearly warrants more than one name ;)
[18:13:07] <skunkworks> http://www.youtube.com/watch?v=qybUFnY7Y8w
[18:15:43] <mhaberler> spot on
[18:16:18] <mhaberler> design consensus: http://mah.priv.at/zenphoto/index.php?album=diesdas/album35&image=gerhard_gepp_narrenturmgruppe.jpg
[18:17:33] -!- micges has quit [Quit: Leaving]
[18:17:43] <mhaberler> ok, we have: emcsh talks to emcrsh over a tcp socket with a homegrown ascii text proto. emcrsh actually talks NML to task.
[18:18:05] <seb_kuzminsky> i agree emcrsh/linuxcncrsh talks NML to task
[18:18:36] <seb_kuzminsky> i dont know anything about emcsh, i talk to linuxcncrsh directly via TCP (echo | nc)
[18:19:15] -!- micges [[email protected]] has joined #linuxcnc-devel
[18:19:24] <mhaberler> in shcom.cc, which is linked to emcrsh.cc you have sendMdi() which evaluates the waittype
[18:19:52] <mhaberler> lets think of some really long running MDI command
[18:20:10] <mhaberler> so I can gdb into sendMDI and see when it returns
[18:20:19] <skunkworks> g4p10000?
[18:20:23] <seb_kuzminsky> m100 with sleep 100?
[18:20:28] <seb_kuzminsky> oh, skunkworks suggestion is better
[18:20:31] <mhaberler> ah, whats that wait, Mx
[18:20:51] <seb_kuzminsky> g4
[18:20:53] <mhaberler> dwell or somesuch
[18:21:00] <mhaberler> is that dwell?
[18:21:31] <mhaberler> aja
[18:21:37] <mhaberler> ok, thats the one
[18:21:43] <mhaberler> letssee
[18:23:19] -!- motioncontrol has quit [Quit: Sto andando via]
[18:23:40] <mhaberler> you need to enable the emcrsh server in the ini I guess?
[18:23:42] -!- Loetmichel has quit [Ping timeout: 256 seconds]
[18:24:19] <mhaberler> ah, you use it as DISPLAY
[18:35:46] Cylly is now known as Loetmichel
[18:39:14] <mhaberler> it has nothing to with the MDI queueing, there's never more that 1 command queued even if you ram several down with set set_wait_none, and the default queue size is 10
[18:40:05] <seb_kuzminsky> i've seen task claim that is has an mdi queue length of 4
[18:41:33] <mhaberler> still that wouldnt be a queue drop
[18:42:05] <mhaberler> do you mean this msg: mdi_execute_hook: MDI command 'g4p6' done (remaining: 0)
[18:42:11] <mhaberler> (remaining: 4) ?
[18:44:11] <mhaberler> if you copy and paste a few mdi commands into telnet, with set_wait none only the last one actually reaches task!
[18:44:44] -!- cmorley [[email protected]] has joined #linuxcnc-devel
[18:45:47] <mhaberler> with set_wait done, they are qeued only after finishing, so emcrsh pushes one at a time
[18:47:31] -!- cmorley1 has quit [Ping timeout: 260 seconds]
[18:47:39] <seb_kuzminsky> mhaberler: MDI: queueing 'm100 P2 Q#5422' (queue len=4)
[18:48:13] <seb_kuzminsky> this is without any 'set wait'
[18:49:36] <mhaberler> then lets try this with M100's
[18:52:37] <mhaberler> I cant reproduce this, ist this the sequence from tests/linuxcncrsh ?
[18:53:54] <seb_kuzminsky> i've seen it with that test and also with the t0 tests
[18:54:27] <seb_kuzminsky> anything that uses linuxcncrsh and a full linuxcnc config seems susceptible (sai tests, like most of our tests, seem immune)
[18:54:49] <seb_kuzminsky> i have to load the system very heavily and run the test in a loop until failure
[18:55:27] -!- sumpfralle has quit [Ping timeout: 260 seconds]
[18:55:55] <seb_kuzminsky> with 'set wait done' in introspect(), the bug is much less likely to bite
[18:56:16] <seb_kuzminsky> without 'set wait done in introspect(), the bug is more likely to bite (but it still takes many repetitions of the test to see it on my test VM)
[18:57:01] <mhaberler> I paste this into emcrsh telnet : http://hastebin.com/ruleyalipi.pas
[18:57:23] <mhaberler> this is the commands from test.sh before the blob is sent
[18:57:53] <mhaberler> I get Can't issue MDI command when not homed ??? (not no sleep)
[18:57:53] <mhaberler>
[18:58:47] <seb_kuzminsky> you need to wait after sending 'set home', to let the simulated machine home
[18:59:25] <seb_kuzminsky> maybe 'set wait done' there would do it? i dont know, in my test script i have a sleep after sending the home commands
[18:59:42] <mhaberler> lets try
[19:00:26] <mhaberler> it already barfs in line 4 (mode manual)
[19:00:33] <mhaberler> despite set wait done
[19:02:02] <mhaberler> can you try to reproduce pasting manually into telnet?
[19:02:29] <mhaberler> 1-3 are ok, set mode manual fails
[19:03:22] <mhaberler> (SET MODE NAK)
[19:03:40] <mhaberler> this modeswitch doesnt seem to reach task?
[19:04:15] <mhaberler> wtf...
[19:05:20] -!- cmorley1 [[email protected]] has joined #linuxcnc-devel
[19:05:27] <seb_kuzminsky> mhaberler: i can't test right now
[19:05:30] <seb_kuzminsky> later this afternoon
[19:05:31] <mhaberler> I see
[19:05:46] <mhaberler> what are you trying to verify with the test?
[19:05:58] <seb_kuzminsky> but the linuxcncrsh and t0 tests almost always work for me
[19:06:19] <seb_kuzminsky> the linuxcncrsh test checks for an input handling bug in linuxcncrsh (since fixed)
[19:07:20] <mhaberler> well my idea of remote scripting would be rather to use python 'import linuxcnc' and run commands from there; this is going to be a morass
[19:08:09] <mhaberler> the python mdi stuff and status lockstep works - all the uis use it
[19:08:32] -!- cmorley has quit [Ping timeout: 252 seconds]
[19:08:47] <seb_kuzminsky> sounds much better than linuxcncrsh
[19:09:20] <mhaberler> I mean before you put effort into this, deleting emcsh/emcrsh etc and replacing it by a minimal server frame around python/import linuxcnc is bound to be less work and promises to actually work
[19:09:55] <mhaberler> any idea of emcrsh heavy uses? I fear the tcl binding uses it
[19:10:02] <mhaberler> not sure though
[19:10:44] <mhaberler> halrmt got into bitrot and I think it was remove - same style, and good it was
[19:11:42] <mhaberler> so tests/linuxcncrsh is specifically there to validate a bugfix in emcrsh?
[19:14:02] -!- Tom_itx has quit [Ping timeout: 255 seconds]
[19:14:02] <mhaberler> well emcsh does not use emcrsh, it uses shcom.cc directly so no need for the ascii proto there
[19:14:11] -!- mattions has quit [Ping timeout: 248 seconds]
[19:14:42] -!- mattions_ has quit [Ping timeout: 256 seconds]
[19:20:17] <mhaberler> micges: congratulations on the 7i80 rtnet project! was that with John's 3.5.7 kernel?
[19:20:49] -!- Tom_itx has quit [Client Quit]
[19:23:15] -!- motioncontrol has quit [Quit: Sto andando via]
[19:28:33] <micges> nope
[19:28:36] <micges> with 3.2.21
[19:28:40] -!- cmorley [[email protected]] has joined #linuxcnc-devel
[19:29:13] -!- cmorley1 has quit [Ping timeout: 240 seconds]
[19:29:44] <micges> mhaberler: I'll try with 3.5.7 later today
[19:29:59] <mhaberler> super; shouldnt be much of a difference
[19:33:29] <micges> I hope so
[19:34:23] <mhaberler> the rtnet build is fine so far; need to check for a card
[19:34:24] <micges> pretty stable 34us latency on rtnet up on 3.2.21 xeno
[19:34:51] <mhaberler> what board?
[19:35:11] <micges> asus P41-c31
[19:35:28] <micges> 20 us without lcnc loaded
[19:36:16] <mhaberler> hm, when it's seen some use it might be an option to integrate it into the rtos branch; for you that means a debian package for rtnet ;)
[19:36:50] <mhaberler> and configure ..
[19:37:24] <micges> relax, there is somewhere packaging patch for rtnet (I've seen it)
[19:37:32] <mhaberler> good
[19:40:24] <andypugh> Argh! Why do folk always say thngs like "LinuxCNC wouldn't start, with a bunch of error messages" and assume that it's unimportant to list the error messages? The latest says the error message is "can not find..."
[19:41:01] <cradek> this behavior is everywhere. it's not just our users.
[19:41:13] -!- motioncontrol has quit [Quit: Sto andando via]
[19:41:33] <cradek> many people do not have any skills that make them good at troubleshooting or other methodical things.
[19:42:12] <andypugh> It has occurred to me that error messages that say how to fix what is wrong would be more useful than ones that just tell the programmer where they messed up. And I am at least as guilty of that as anyone.
[19:42:40] <cradek> you mean messages that GUESS how to fix what is wrong?
[19:42:46] <andypugh> Yes.
[19:42:53] <cradek> firefox sometimes tells me to reboot my computer
[19:43:04] <andypugh> Sometimes you can make a very good guess. Sometimes not.
[19:43:05] <cradek> from my perspective, those error messages suck
[19:43:25] <cradek> say what is wrong AND guess how to fix it, maybe
[19:43:40] <andypugh> Oh, yes, absolutely.
[19:44:22] <andypugh> A good example is saying "Your password is wrong, by the way do you know you have caps-lock on?"
[19:44:49] <cradek> looking for the star trek episode with the race who say "he is smart! he will make it go!"...
[19:45:24] <cradek> all I'm getting is stuff about prayer, which is also funny
[19:47:10] <mhaberler> micges: re comment: //rest of init must be done in rt context
[19:47:33] <mhaberler> I have never tried, but I see a few ways to leave it in init
[19:47:45] <micges> I'm listening
[19:48:29] <mhaberler> what you could do is spawn an RT thread during rtapi_app_main and wait until its done
[19:49:33] <mhaberler> not sure if one can switch domains for a while
[19:50:05] <mhaberler> in ase you havent tried, I'd really suggest you ask the xenomai list, there's very helpful folk there
[19:50:47] <mhaberler> oh, you do that anyway with "probe"
[19:50:56] <mhaberler> oops
[19:51:03] <mhaberler> strike out
[19:51:40] <mhaberler> I rest my case ;)
[19:53:06] <mhaberler> dont the debug printfs kill RT and cause a switch to secondary domain?
[19:58:28] <micges> I've no idea
[20:00:06] -!- tjb1 has quit [Quit: tjb1]
[20:00:13] -!- zzolo has quit [Quit: zzolo]
[20:01:02] <micges> yeah I don't know why but spawning tasks and waiting didn work
[20:01:08] <micges> under RTIA
[20:01:21] -!- rob__H [[email protected]] has joined #linuxcnc-devel
[20:02:32] <mhaberler> well typically plain system calls like anything with a fd (files, sockets, etc) will cancel rt scheduling for that thread and switch it to linux cattle class scheduling
[20:02:57] <mhaberler> but you get a SIGXCPU and a backtrace if that happens
[20:04:05] -!- rob_h has quit [Ping timeout: 255 seconds]
[20:07:05] -!- Sendoushi has quit [Remote host closed the connection]
[20:08:55] <mhaberler> hm, interesting - somebody revived miniemc2 on a raspberry: http://www.linuxcnc.org/index.php/english/forum/18-computer/20514-emc2-running-on-raspberry-pi/30121
[20:09:46] -!- Motioncontrol1 has quit [Client Quit]
[20:10:03] <andypugh> I am not sure how much the RPi is doing there.
[20:10:09] -!- motioncontrol has quit [Ping timeout: 248 seconds]
[20:10:18] -!- Motioncontrol1 has quit [Client Quit]
[20:11:16] <andypugh> And I suspect keystick is a lower overhead solution.
[20:15:20] -!- Wildhoney has quit [Ping timeout: 256 seconds]
[20:16:01] -!- mattions has quit [Ping timeout: 248 seconds]
[20:16:28] -!- mattions_ has quit [Ping timeout: 256 seconds]
[20:16:52] -!- toastydeath has quit [Read error: Connection reset by peer]
[20:23:42] -!- mhaberler has quit [Read error: Operation timed out]
[20:36:07] -!- AR_ has quit [Read error: Connection reset by peer]
[20:42:50] -!- mhaberler [[email protected]] has joined #linuxcnc-devel
[20:44:33] -!- psha has quit [Quit: leaving]
[20:51:41] -!- holgi has quit [Ping timeout: 245 seconds]
[21:02:34] -!- syyl_ws has quit [Quit: Verlassend]
[21:21:31] -!- tmcw has quit [Remote host closed the connection]
[21:23:00] -!- mrsun has quit [Ping timeout: 264 seconds]
[21:25:15] -!- motioncontrol has quit [Quit: Sto andando via]
[21:35:21] -!- FinboySlick has quit [Quit: Leaving.]
[21:35:31] -!- jpk has quit [Ping timeout: 248 seconds]
[21:36:01] -!- Tom_itx has quit [Ping timeout: 248 seconds]
[21:36:42] <skunkworks> I have 2 intel systems now that exibit the same issue >100us latency
[21:36:51] -!- DJ9DJ has quit [Quit: bye]
[21:37:44] <skunkworks> zultron, are you sure the smi fix is working? It seems to have no effect.
[21:37:55] <skunkworks> but - we will have to re-visit this next week.
[21:38:06] <skunkworks> although it detects smi
[21:38:31] <zultron> I'm reasonably sure that smictrl reads/writes the SMI register.
[21:38:52] <zultron> Do you suspect SMI is the problem?
[21:39:14] <skunkworks> well.. I don't know.
[21:39:40] <skunkworks> we might just have to do an in-depth work out of one of these syetems
[21:40:44] <zultron> Ok. Next week, let's collect information about it. I might want to ask the Xeno guys before delving in too deep.
[21:40:48] <skunkworks> This is a dell optiplex - I even installed a pci video card.
[21:40:54] <skunkworks> right
[21:41:06] <zultron> They know more about the SMI business and smictrl than I do.
[21:41:46] <skunkworks> it seems to happen pretty quick - but I don't know if a do-nothing loop would help (have not gottne that far yet.) idle=poll doesn't work
[21:41:51] <zultron> What does the SMI register contain before and after smictrl?
[21:42:29] -!- cmorley1 [[email protected]] has joined #linuxcnc-devel
[21:45:13] -!- cmorley has quit [Ping timeout: 240 seconds]
[21:47:01] <skunkworks> zultron, http://pastebin.ca/2314259
[21:47:41] <zultron> So, in this case, global SMIs appear to be locked down in the BIOS, same as my Dell.
[21:47:57] <zultron> See the last digit is odd? The final bit is the global SMI flag.
[21:48:34] <skunkworks> oh
[21:48:48] <skunkworks> I looked through the bios and didn't see anything
[21:48:54] <zultron> I'm not 100% sure I'm right about it, so we can ask the Xeno guys.
[21:48:55] -!- Sendoushi has quit [Remote host closed the connection]
[21:49:03] <zultron> Yeah, it's nothing you can control, AFAIK.
[21:49:22] <zultron> If you have an RTAI install to try out, it could be informative.
[21:49:29] <skunkworks> The atom board though seems to set it correctly... Right?
[21:49:38] <skunkworks> (that still have problems)
[21:49:44] <skunkworks> has
[21:49:47] <zultron> Check whether you have the same latency issues, and check smictrl to see the register values.
[21:50:05] <zultron> I think it did, yes, but I'm a bit hazy.
[21:50:06] -!- Tecan has quit [Remote host closed the connection]
[21:50:18] <skunkworks> Pretty sure.
[21:50:23] <skunkworks> oh well - next week ;)
[21:50:32] <zultron> Alright. Have a great w/e!
[21:50:34] -!- Tecan has quit [Changing host]
[21:51:51] -!- skunkworks has quit [Read error: Connection reset by peer]
[22:00:32] -!- V0idExp has quit [Ping timeout: 255 seconds]
[22:03:49] -!- dr00bie has quit [Ping timeout: 256 seconds]
[22:05:24] -!- cmorley [[email protected]] has joined #linuxcnc-devel
[22:06:57] -!- ve7it has quit [Read error: Operation timed out]
[22:07:25] -!- ve7it [[email protected]] has joined #linuxcnc-devel
[22:07:50] -!- cmorley1 has quit [Ping timeout: 252 seconds]
[22:28:01] -!- cmorley1 [[email protected]] has joined #linuxcnc-devel
[22:28:57] -!- odogono has quit [Quit: odogono]
[22:30:14] -!- cmorley has quit [Ping timeout: 255 seconds]
[22:32:46] -!- wboykinm has quit [Remote host closed the connection]
[22:33:20] -!- mephux has quit [Excess Flood]
[22:44:58] -!- Tecan has quit [Remote host closed the connection]
[22:47:51] -!- cmorley [[email protected]] has joined #linuxcnc-devel
[22:50:51] -!- cmorley1 has quit [Ping timeout: 256 seconds]
[22:53:42] -!- Tecan has quit [Changing host]
[22:59:38] -!- dr00bie has quit [Read error: Operation timed out]
[23:07:29] -!- tmcw has quit [Remote host closed the connection]
[23:08:25] -!- cmorley1 [[email protected]] has joined #linuxcnc-devel
[23:09:46] -!- mhaberler has quit [Quit: mhaberler]
[23:11:34] -!- cmorley has quit [Ping timeout: 256 seconds]
[23:12:05] -!- skunkworks [skunkworks!~chatzilla@str-broadband-ccmts-ws-26.dsl.airstreamcomm.net] has joined #linuxcnc-devel
[23:15:22] -!- ravenlock has quit [Remote host closed the connection]
[23:16:18] -!- Tom_itx has quit [Ping timeout: 264 seconds]
[23:17:42] -!- zzolo has quit [Quit: zzolo]
[23:23:37] -!- jfrmilner has quit [Quit: bye]
[23:25:45] -!- servos4ever has quit [Quit: ChatZilla 0.9.85 [SeaMonkey 2.0.11/20101206162726]]
[23:36:52] -!- racycle has quit [Quit: racycle]
[23:46:43] -!- Nick001-Shop has quit [Ping timeout: 276 seconds]
[23:46:43] Nick001-Shop_ is now known as Nick001-Shop