#linuxcnc-devel | Logs for 2012-12-14

Back
[00:00:26] <mhaberler> that doesnt follow from the tes
[00:00:30] <mhaberler> test
[00:00:44] <kb8wmc> cradek: hello, how have you been sir?
[00:01:07] <mhaberler> the thread isnt 'interrupted', weird underlying OS scheduling makes the test fail
[00:01:30] <mhaberler> its all about relative timing, not preemption
[00:03:18] <mhaberler> it is clean on real machines, just fails on virtualbox in a sim config, but if it fails I rather would like to know which assumption is violated
[00:06:15] <mhaberler> I mean I can fudge the test, but thats not a real solution
[00:06:44] <mhaberler> anyway, we'll figure eventually - cu, folks
[00:06:48] -!- mhaberler has quit [Quit: mhaberler]
[00:06:49] -!- logger[mah] has quit [Remote host closed the connection]
[00:06:55] -!- logger[mah] [logger[mah][email protected]] has joined #linuxcnc-devel
[00:11:01] hdokes is now known as hdokes|werkin
[00:36:51] -!- ybon has quit [Ping timeout: 256 seconds]
[00:40:36] -!- adb has quit [Ping timeout: 252 seconds]
[00:41:26] -!- kwallace [[email protected]] has parted #linuxcnc-devel
[00:47:46] -!- andypugh has quit [Quit: andypugh]
[00:54:21] -!- kmiyashiro has quit [Ping timeout: 265 seconds]
[00:58:55] -!- Nick001-Shop has quit [Quit: ChatZilla 0.9.89 [Firefox 17.0.1/20121128204232]]
[01:16:27] -!- ybon has quit [Ping timeout: 276 seconds]
[01:23:55] -!- ve7it has quit [Remote host closed the connection]
[01:32:49] -!- pikeaero has quit [Read error: Connection reset by peer]
[02:06:00] -!- rob_h has quit [Ping timeout: 250 seconds]
[03:10:54] -!- sumpfralle has quit [Ping timeout: 260 seconds]
[03:14:04] zz_satyag is now known as satyag
[03:20:12] <jepler> mhaberler: (hope you see this later) the manpage hal_create_thread (the text of which was copied either from the hal design documents or docstrings in the C code) says the magic phrase...
[03:20:17] <jepler> "rate monotonic priority schedling"
[03:20:51] <jepler> two minutes of googling gives me the *impression* that this means the lower priority thread is never preempted by the higher priority thread
[03:21:27] <jepler> I emphasize that this is a quick impression about what rate-monotonic scheduling is and may be mistaken
[03:21:39] <jepler> but if you want the statement of original intent in the design of hal, I think that's the magic phrase you're looking for.
[03:23:31] <jepler> 'night
[03:25:52] <cradek> where you said priority, I think you meant period
[04:01:39] -!- Keknom has quit [Quit: Leaving.]
[04:05:02] satyag is now known as zz_satyag
[04:17:53] -!- Thetawaves has quit [Client Quit]
[04:22:41] -!- Loetmichel has quit [Ping timeout: 265 seconds]
[04:32:19] -!- dzig has quit [Quit: Page closed]
[04:51:25] -!- kmiyashiro has quit [Quit: kmiyashiro]
[06:02:59] -!- Fox_Muldr has quit [Ping timeout: 260 seconds]
[06:05:13] -!- kmiyashiro has quit [Quit: kmiyashiro]
[06:39:58] zz_satyag is now known as satyag
[06:54:24] -!- cmorley has quit [Quit: Leaving.]
[06:57:21] -!- vladimirek [[email protected]] has joined #linuxcnc-devel
[07:07:50] -!- tjb1 has quit [Quit: tjb1]
[07:12:11] -!- phantoneD has quit [Ping timeout: 260 seconds]
[07:23:13] -!- Cylly has quit []
[07:46:14] -!- archivist_herron has quit [Ping timeout: 260 seconds]
[08:05:22] -!- dhoovie has quit [Ping timeout: 246 seconds]
[08:09:52] -!- adb [[email protected]] has joined #linuxcnc-devel
[08:10:09] -!- adb_ [[email protected]] has joined #linuxcnc-devel
[08:13:26] -!- adb has quit [Client Quit]
[08:13:40] adb_ is now known as adb
[08:16:57] -!- kha_ has quit [Remote host closed the connection]
[08:22:23] satyag is now known as zz_satyag
[08:31:02] -!- archivist_herron has quit [Ping timeout: 252 seconds]
[08:39:50] -!- Thetawaves has quit [Quit: This computer has gone to sleep]
[08:40:49] -!- mhaberler [[email protected]] has joined #linuxcnc-devel
[08:46:21] -!- psha[work] [psha[work][email protected]] has joined #linuxcnc-devel
[08:57:54] -!- racycle has quit [Quit: racycle]
[08:58:28] -!- theos has quit [Ping timeout: 245 seconds]
[09:20:05] -!- toraxe has quit [Quit: Page closed]
[09:20:29] -!- kb8wmc has quit [Remote host closed the connection]
[09:34:55] e-ndy- is now known as e-ndy
[09:54:26] -!- bmwyss has quit [Ping timeout: 250 seconds]
[09:54:26] bmwyss__ is now known as bmwyss
[09:56:04] -!- bmwyss_ has quit [Ping timeout: 248 seconds]
[10:07:06] -!- mackerski has quit [Ping timeout: 252 seconds]
[10:07:06] mackerski_ is now known as mackerski
[10:10:44] -!- sumpfralle has quit [Ping timeout: 244 seconds]
[10:21:05] -!- mackerski has quit [Remote host closed the connection]
[10:21:18] -!- MercuryRising has quit [Ping timeout: 265 seconds]
[10:24:42] -!- _ink has quit [Ping timeout: 252 seconds]
[10:25:44] -!- rob_h [[email protected]] has joined #linuxcnc-devel
[10:39:02] -!- mal`` has quit [Ping timeout: 245 seconds]
[10:40:21] -!- mal`` [mal``!~mal``@li125-242.members.linode.com] has joined #linuxcnc-devel
[10:57:52] zz_satyag is now known as satyag
[11:01:53] -!- sumpfralle1 has quit [Ping timeout: 244 seconds]
[11:04:39] -!- Valen has quit [Quit: Leaving.]
[11:09:11] -!- automata_ has quit [Read error: Connection reset by peer]
[11:13:18] -!- phantoxeD has quit [Ping timeout: 250 seconds]
[11:18:17] <alex_joni> cradek: the literature term refers to priority
[11:18:38] <alex_joni> as threads aren't really common. each task has a scheduling period and priority
[11:18:51] <alex_joni> and you can have tasks with higher priority but lower period
[11:19:21] <alex_joni> in emc2's case we only have threads which are prioritized by their period (faster thread higher priority)
[11:20:49] satyag is now known as zz_satyag
[11:30:39] -!- automata_ has quit [Read error: Connection reset by peer]
[11:30:43] -!- asdfasd has quit [Ping timeout: 260 seconds]
[11:32:59] zz_satyag is now known as satyag
[11:36:05] -!- holst has quit [Ping timeout: 244 seconds]
[11:44:03] -!- cncbasher has quit [Remote host closed the connection]
[11:44:04] -!- cncbasher_ has quit [Read error: Connection reset by peer]
[11:53:44] -!- automata_ has quit [Read error: Connection reset by peer]
[12:21:51] satyag is now known as zz_satyag
[12:46:41] zz_satyag is now known as satyag
[13:04:51] -!- skunkworks has quit [Remote host closed the connection]
[13:07:33] -!- karavanjoW has quit [Ping timeout: 276 seconds]
[13:08:58] <jepler> cradek: rate monotonic scheduling makes the thread with the shorter deadline the higher-priority one. "The static priorities are assigned on the basis of the cycle duration of the job: the shorter the cycle duration is, the higher is the job's priority." -- wikipedia
[13:12:07] <jepler> mhaberler: please read what I said in scroolback about rate monotonic scheduling
[13:12:13] <jepler> (scroolback?)
[13:12:13] <jepler> bbl
[13:12:25] <mhaberler> right, saw it - thanks
[13:12:31] <mhaberler> just reading up
[13:20:00] <mhaberler> I guess the question one should test for is 'was a deadline missed'
[13:20:23] <mhaberler> NB: the whole thing only happens on Vbox with bizarre scheduling of the kernel
[13:21:23] <mhaberler> the next question IMO is - does threads.0 test for a missed deadline
[13:22:03] <mhaberler> obviously not, because there are runs of say 1..17 meaning the slow thread deadline was missed and it isnt flagged as an error
[13:22:56] <mhaberler> what is actually flagged is an 'early' release of the slow thread (or a very late one) so you have 2,3,4,0,1...
[13:23:54] <mhaberler> no, not very late. 2,3,4,0 must be an early release of the slow thread
[13:25:00] <mhaberler> I mean all this period and priority calculation is wonderful, what I am missing is how this relates to scheduling deadlines provided (or not provided) by a given threading system
[13:26:18] <mhaberler> if this is to be whats called 'harmonic' (fixed invocation rates based on thread periods) then it's hard to see how separate OS threads can guarantee that
[13:27:26] <mhaberler> if it were a single fast thread scheduling the slow thread when the deadline approaches it's clear how harmonic can be ascertained even for the vbox case, but not by a pretty random underlying thread system
[13:27:47] <mhaberler> (alone)
[13:29:28] <jepler> the result of "0" in any captured sample means that the guarantees of rate monotonic scheduling have been violated, because a slower / lower priority thread interrupted a faster one
[13:29:57] <mhaberler> right
[13:30:24] <jepler> but a result of 17 is fishy too, as that implies either the fast thread is running too often, or the slow thread has missed deadlines
[13:30:39] <jepler> .. but we don't expect deadlines to be met on emulated machines in the first place; they have broken realtime and that's no surprise.
[13:31:40] <mhaberler> my question is: where is the connection between RM scheduling as implied by the comment you cited, and the semantics of the underlying thread system
[13:31:46] <jepler> so maybe there should be two tests: one that always fails if 0 is seen and is expected to pass everywhere, and one that is expected to fail on non-realtime systems but tests that the 'top' number is always close to ten or tests the long-term average?
[13:32:41] <mhaberler> I would think it matters for 'sim' anyway; I want to understand the link between that comment and thread system X semantics
[13:33:35] <mhaberler> the way I read the code: there's a missing link - or maybe I'm overlooking an assumption or guarantee of RTAI, xenomai and rt-preempt threads
[13:34:53] <mhaberler> vbox is 'emulated emulated' - rtapi threads over linux over OSx in my case;)
[13:36:10] <mhaberler> reword: I would think it does *not* matter for sim anyway
[13:37:36] -!- automata_ has quit [Ping timeout: 250 seconds]
[13:39:05] <mhaberler> Assume the missing link exists: it could well be the case that thread systems used so far, including the new ones, are precise enough that the test doesnt trigger; but that doesnt prove its harmonic, just that it looks harmonic
[13:39:56] <mhaberler> IMO if you were to do a harmonic RM scheduler, it must be one OS base thread scheduling all other threads; then you can guarantee fixed rates
[13:41:29] <mhaberler> turning around, such a scheduler would look harmonic even in the vbox scenario wrt to relative invocation rates; the absolute timing would still be lousy but that is a different result
[13:44:00] <mhaberler> having said that, I'm not aware of the semantics of any HAL components depending on the RM behaviour (quality of output of course does, but it isnt a fatal failure AFAICT)
[13:45:22] <mhaberler> hal/rtapi works event though conceptually say the mesa cards have their own timing and hence the 'mesa card rate' isnt harmonic wrt hal threads to start with
[13:45:47] <mhaberler> in fact cannot be since there is no common timing source
[13:47:11] <mhaberler> iow: fine, we have that comment in the code. Now, does that mean it is relevant for correctness, or just for quality of results? IMO it's the latter only
[13:49:23] <mhaberler> if the answer is 'quality only', then the way to deal with threads.0 is to flag failure as error only in non-sim modes
[13:49:42] -!- skunkworks [[email protected]] has joined #linuxcnc-devel
[13:54:33] satyag is now known as zz_satyag
[13:55:19] <mhaberler> maybe JMK has an opinion on this
[13:55:48] <jepler> there are a number of components that are intended to have functions running in two separate threads
[13:56:14] <jepler> do they exchange data in a way that is 'safe' if the slow thread interrupts the fast one?
[13:56:25] <mhaberler> right, so the question is - will they fail if relative invocation rates are off
[13:56:54] <mhaberler> I am not sure that what we are seeing here is preemption
[13:56:56] <jepler> I wish I remembered history better
[13:57:07] <jepler> .. exactly which system caused seb to weaken the test
[13:57:18] <mhaberler> oh.
[13:57:26] <mhaberler> this test?
[13:57:41] <jepler> yeah, looking at the git history of threads.0.
[13:58:12] <jepler> probably we discussed it in irc at the time but I haven't made the effort to locate the logs yet
[13:58:51] <mhaberler> I see, yes seb changed it
[13:59:18] <mhaberler> oh, that was stricter before
[14:00:17] <mhaberler> I'll ask him
[14:00:35] <mhaberler> seb_kuzminsky: around?
[14:01:20] <mhaberler> well, he'll show up. Gotta leave, cu
[14:01:47] <jepler> :qa
[14:01:48] <jepler> argh
[14:08:54] -!- Loetmichel has quit [Ping timeout: 256 seconds]
[14:08:55] -!- jthornton has quit [Read error: Connection reset by peer]
[14:09:24] -!- jthornton [[email protected]] has joined #linuxcnc-devel
[14:11:36] -!- Simooon has quit [Ping timeout: 244 seconds]
[14:20:14] -!- adb has quit [Ping timeout: 256 seconds]
[14:20:32] -!- adb [[email protected]] has joined #linuxcnc-devel
[14:22:58] -!- djheinz has quit [Ping timeout: 265 seconds]
[14:24:10] -!- psha[work] has quit [Quit: Lost terminal]
[14:26:43] -!- kb8wmc [[email protected]] has joined #linuxcnc-devel
[14:36:26] -!- oterral has quit [Read error: Connection reset by peer]
[14:41:34] -!- Cylly has quit []
[14:54:44] -!- Loetmichel has quit [Ping timeout: 248 seconds]
[15:03:35] -!- Loetmichel has quit []
[15:10:16] -!- Youdaman has quit []
[16:05:35] -!- riz_ [riz_!62dd7d6e@gateway/web/freenode/ip.98.221.125.110] has joined #linuxcnc-devel
[16:05:56] <riz_> Hello all!
[16:06:18] -!- yuvipanda has quit [Ping timeout: 244 seconds]
[16:06:23] <riz_> In tpRunCycle, does anyone know the difference between the reference to primary and secondary?
[16:06:47] <riz_> One example is primary_displacement and secondary_displacement
[16:08:52] -!- yuvipanda_ has quit [Ping timeout: 248 seconds]
[16:12:00] <awallin> looks like it has to do with blending
[16:12:29] <awallin> those names are relatively new I am guessing, not been there 'forever' (years) ?
[16:13:01] <mhaberler> riz: if you're trying to locate an author, try 'git blame file'
[16:14:40] <riz_> How do I contact them? Are there emails?
[16:16:13] <mhaberler> well for instance tp.c here: http://git.mah.priv.at/gitweb/emc2-dev.git/tree/c20e22fb5f44a6b64454a08aa14aeb2eccba48ff:/src/emc/kinematics
[16:16:17] <awallin> I bet most people won't touch tp.c even with a stick :)
[16:16:27] <mhaberler> try klicking 'blame'
[16:17:28] <mhaberler> you find a commit id next to the line in question, which hints at the author
[16:17:32] <awallin> mhaberler: is that doxygen documentation? is that also in master?
[16:17:44] <mhaberler> no, standard gitweb
[16:18:03] <mhaberler> not sure if its enabled in linuxcnc.org. let me see
[16:18:18] <mhaberler> nope
[16:18:38] <awallin> hm maybe I am looking at 2.5_branch then over here, it seems to have only c-style comments
[16:18:40] <mhaberler> it does create load, proabably thats why
[16:19:28] <mhaberler> this http://git.mah.priv.at/gitweb/emc2-dev.git/shortlog/refs/heads/v2.5_branch is tracked nightly and has blame on
[16:20:04] <mhaberler> for instance this: http://git.mah.priv.at/gitweb/emc2-dev.git/blame/d25bb2d179b6784642842051447f319ebdd34a4a:/src/move-if-change
[16:20:23] <awallin> hm, only tc.c seems to have doxygen-like comments, not tp.c for example
[16:20:28] <mhaberler> gives commit (clickable) and JE - initials of author - Jeff Epler
[16:21:34] <mhaberler> not sure what you mean by doxygen-like comments, but this works: http://git.mah.priv.at/gitweb/emc2-dev.git/blame/04df8081666c4f15c74da6c2e320aab97469d2b9:/src/emc/kinematics/tp.c
[16:22:31] <riz_> It says Chris Radek
[16:22:48] <mhaberler> that was to be expected ;)
[16:26:06] -!- bmwyss has quit [Quit: bmwyss]
[16:26:50] <riz_> I see cradek has a + next to his name, does that mean he is an administrator?
[16:27:00] <riz_> Is he not here?
[16:27:17] <awallin> still morning(ish) in the US..
[16:27:20] <skunkworks> He is around.. He is a board member
[16:27:49] <mhaberler> didnt know one follows the other;)
[16:28:31] <mhaberler> which + are you referring to?
[16:30:06] <riz_> it shows up as +cradek next to his name in this chat
[16:30:17] <seb_kuzminsky> jepler, mhaberler, about the threads.0 test...
[16:30:30] <skunkworks> heh - that is why I thought he had a + by his name ;)
[16:30:41] <mhaberler> yeah, I'd be curious about the need to relax it
[16:31:18] <seb_kuzminsky> the old test failed on the virtual machines that that buildbot uses
[16:31:49] <mhaberler> I succeeded in making the new one fail, too
[16:31:53] <seb_kuzminsky> scheduling issue, similar to what you're reporting on virtualbox
[16:32:24] <seb_kuzminsky> jmk (or was it jepler) said at the time that the original test was testing something that was too strict, and my proposed change was accepted
[16:32:53] <mhaberler> I guess we need to ask jmk to clear this up
[16:32:57] <seb_kuzminsky> i think the original intent was something in between "test if threads are working" and "test if threads are working well enough to reasonably run a machine"
[16:33:13] <seb_kuzminsky> and the decision was made to dial it back towards the former
[16:33:52] <seb_kuzminsky> what do you want to ask jmk? he has mostly drifted to other things, we dont see him around much anymore
[16:34:37] <mhaberler> about the intended meaning for rate monotonic scheduling, and how that is mapped onto the thread system available - I dont see the connection
[16:35:32] <mhaberler> it is one thing to write that in a comment, but I dont see how multiple independent OS threads are told to remain in harmonic ratio
[16:36:01] <seb_kuzminsky> i don't think rate monotony guarantees harmonic scheduling ratio
[16:36:02] <mhaberler> most likely not at all, and we see a libth artefact, and precise enough timing in RT builds it doesnt show
[16:36:19] <seb_kuzminsky> i think it only says, "higher-frequency threads run at higher priority"
[16:36:44] <mhaberler> then I am at loss what threads.0 actually is supposed to prove
[16:37:07] <mhaberler> right, shorter interval higher priority
[16:37:22] <seb_kuzminsky> yeah
[16:37:25] <mhaberler> libpth artefact
[16:38:35] <seb_kuzminsky> well, threads.0 does test basic thread creation and (somewhat) interaction
[16:38:50] <seb_kuzminsky> it may not be what the README says, but it's still a useful test
[16:39:31] <mhaberler> right, but in reality all says is 'both threads run' in sim mode; I am unsure the error criterium makes any sense in sim
[16:39:39] -!- Simon3 has quit [Read error: Connection reset by peer]
[16:40:25] <seb_kuzminsky> i bet jmk intended primarily (exclusively?) to test realtime behavior
[16:41:23] <mhaberler> likely so, but still I'd want to know if the comment about RM scheduling was intent, requirement or just a plan
[16:41:58] -!- ve7it [[email protected]] has joined #linuxcnc-devel
[16:42:04] <mhaberler> it is a fairly central part of the whole animal so I want to assure we dont blunder here
[16:43:05] <mhaberler> do you remember if the buildbot vm failures happened in sim, rt or both?
[16:43:30] <seb_kuzminsky> i don't remember, sorry
[16:43:46] <seb_kuzminsky> you could push a branch based on 2.5 with that one threads.0 test reverted and see what happens?
[16:43:52] <mhaberler> hm, we could back that change out,, right
[16:43:59] <mhaberler> 'dynamite fishing'
[16:48:44] <KGB-linuxcnc> 03git 05threads-0-test b98f7d5 06emc2 10tests/threads.0/checkresult
[16:48:44] <KGB-linuxcnc> Revert "This changes the "pass" criterion to be more forgiving of slow hardware."
[16:49:28] <jepler> I checked out v2.5.1-141-ga71f29a and then added the version of tests/threads.0/checkresult from ref d786bd95db48a9446b2770e8d1e4abf97e6615bc
[16:49:31] <jepler> it passes on sim
[16:49:44] <seb_kuzminsky> linuxcnc-build: notify list
[16:49:44] <linuxcnc-build> The following events are being notified: []
[16:49:48] <seb_kuzminsky> linuxcnc-build: notify list/win 6
[16:49:48] <linuxcnc-build> try 'notify on|off <EVENT>'
[16:49:52] <seb_kuzminsky> whoops
[16:50:28] <jepler> that's sim in a real ubuntu 10.04 desktop
[16:50:37] <seb_kuzminsky> linuxcnc-build: notify finished
[16:50:38] <linuxcnc-build> try 'notify on|off <EVENT>'
[16:50:41] <seb_kuzminsky> err
[16:50:45] <seb_kuzminsky> linuxcnc-build: help notify
[16:50:45] <linuxcnc-build> Usage: notify on|off|list [<EVENT>] ... - Notify me about build events. event should be one or more of: 'started', 'finished', 'failure', 'success', 'exception' or 'xToY' (where x and Y are one of success, warnings, failure, exception, but Y is capitalized)
[16:50:49] <mhaberler> the one I just pushed passes under vbox too..
[16:50:54] <seb_kuzminsky> linuxcnc-build: notify on finished
[16:50:54] <linuxcnc-build> The following events are being notified: ['finished']
[16:50:57] <seb_kuzminsky> ok
[16:51:49] <mhaberler> is that buildbot Eliza?
[16:52:02] <seb_kuzminsky> i wish it were that smart
[16:52:18] <mhaberler> well, does it ask about your mother?
[16:52:26] <mhaberler> if so, it clearly lacks clue
[16:54:25] <seb_kuzminsky> https://www.youtube.com/watch?v=Umc9ezAyJv0
[16:57:36] <mhaberler> haha
[17:03:42] <linuxcnc-build> build #652 of lucid-rtai-i386-clang is complete: Success [3build successful] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/lucid-rtai-i386-clang/builds/652
[17:04:23] -!- V0idExp has quit [Quit: Leaving.]
[17:05:31] <seb_kuzminsky> linuxcnc-build: describe, in single words, only the good things that come into your mind about your mother
[17:08:29] -!- tronwizard has quit [Ping timeout: 255 seconds]
[17:09:26] -!- oterral1 has quit [Quit: Leaving.]
[17:13:14] <mhaberler> did I miss the runtests results there? dont see them
[17:13:30] <linuxcnc-build> build #452 of precise-amd64-sim-clang is complete: Success [3build successful] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/precise-amd64-sim-clang/builds/452
[17:14:39] <seb_kuzminsky> the two builds that just finished are clang builds, i dont runtests on those
[17:14:55] <seb_kuzminsky> the next couple of tests are built by gcc (which is how we ship), and those will run the tests
[17:15:08] <mhaberler> aha
[17:22:37] -!- ink has quit [Disconnected by services]
[17:22:55] <linuxcnc-build> build #656 of precise-amd64-sim is complete: Success [3build successful] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/precise-amd64-sim/builds/656
[17:23:01] <linuxcnc-build> build #654 of precise-i386-sim is complete: Success [3build successful] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/precise-i386-sim/builds/654
[17:23:12] <seb_kuzminsky> these will have runtests results
[17:23:15] <seb_kuzminsky> and they're green!
[17:23:19] <seb_kuzminsky> both are sim
[17:24:35] <mhaberler> hm, I dont expect th rt runtests to fail - strange
[17:24:49] <linuxcnc-build> build #652 of lucid-i386-sim is complete: Success [3build successful] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/lucid-i386-sim/builds/652
[17:25:01] <linuxcnc-build> build #652 of lucid-amd64-sim is complete: Success [3build successful] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/lucid-amd64-sim/builds/652
[17:25:05] <seb_kuzminsky> even rt on a virtual machine? i think that's where i had trouble
[17:25:09] <seb_kuzminsky> it might bave been intermittent
[17:25:11] <mhaberler> aja
[17:25:14] <linuxcnc-build> build #654 of hardy-amd64-sim is complete: Failure [4failed runtests] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/hardy-amd64-sim/builds/654 blamelist: Michael Haberler <[email protected]>
[17:25:19] <mhaberler> ah!
[17:25:46] <seb_kuzminsky> http://buildbot.linuxcnc.org/buildbot/builders/hardy-amd64-sim/builds/654/steps/runtests/logs/stdio
[17:26:05] <linuxcnc-build> build #652 of hardy-amd64-realtime-rip is complete: Success [3build successful] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/hardy-amd64-realtime-rip/builds/652
[17:26:25] <linuxcnc-build> build #653 of lucid-i386-realtime-rip is complete: Success [3build successful] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/lucid-i386-realtime-rip/builds/653
[17:26:28] <seb_kuzminsky> heh, it was a totally different test that failed
[17:26:44] <mhaberler> right,
[17:27:12] <mhaberler> I'd think its not supposed to
[17:28:41] <linuxcnc-build> build #653 of hardy-i386-sim is complete: Success [3build successful] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/hardy-i386-sim/builds/653
[17:28:48] <mhaberler> I get that one once in a while too, would be worth drilling down but thats not the one I'm after
[17:29:05] <linuxcnc-build> build #653 of hardy-i386-realtime-rip is complete: Success [3build successful] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/hardy-i386-realtime-rip/builds/653
[17:29:05] <linuxcnc-build> build #651 of checkin is complete: Failure [4failed] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/checkin/builds/651 blamelist: Michael Haberler <[email protected]>
[17:29:20] -!- zz_satyag has quit [Read error: Connection reset by peer]
[17:29:48] <seb_kuzminsky> that's the 'checkin' build, it just monitors all the compile & test builders, it failed because hardy-amd64-sim failed up above
[17:30:12] <seb_kuzminsky> so, no real data out of that run
[17:30:49] <seb_kuzminsky> i bet it was an intermittent problem
[17:31:05] <KGB-linuxcnc> 03TODO: deletor 05threads-0-test b98f7d5 06emc2 04. * branch deleted
[17:31:10] <seb_kuzminsky> bbl
[17:31:42] <mhaberler> intermittent problems with threads is code red
[17:32:08] -!- mackerski has quit [Quit: mackerski]
[17:45:16] <jepler> the only thing I can find from the irc logs about threads.0 is in http://emc.mah.priv.at/irc/%23linuxcnc-devel/2008-11-10.html
[17:45:25] <jepler> [20:57:22] <jepler> I should review what the threads test is trying to test
[17:45:30] <jepler> [21:01:59] <jepler> threads.0 may also turn out to be a test of realtime latency, and it may test a stronger condition than rtapi threads provide
[17:45:47] <jepler> this after we had noted failures in the buildbot waterfal, probably of rtai tests
[17:46:21] <jepler> It seems like somewhere in #emc-devel logs should be the cia message about the commit
[17:46:29] <jepler> I suppose it's git, it could have been pushed somewhat after november 14..?
[17:48:40] <jepler> a few more hits exist ... http://emc.mah.priv.at/irc/search?q=threads.0&channel=linuxcnc-devel&go=Go
[17:53:27] <jepler> I'd look closest at the few components that have functions intended to run in different components. src/hal/components/encoder.c:capture() vs update(), for instance
[17:53:40] <jepler> if capture can run concurrently with update, is the code still right?
[17:53:57] <jepler> (or is it
[17:54:02] <jepler> errr
[17:59:44] <mhaberler> ok, what we might try to do is use the pth pthreads emulation and see if that changes things back to previous behaviour; even if it does though, I dont think this clears up anything wrt to this discussion here; we make sim threads.0 compatible but without a clear rationale
[18:01:46] -!- Simooon has quit [Ping timeout: 246 seconds]
[18:18:06] -!- yuvipanda has quit [Ping timeout: 264 seconds]
[18:31:29] -!- mattions has quit [Ping timeout: 244 seconds]
[19:00:06] -!- sumpfralle has quit [Ping timeout: 264 seconds]
[19:09:22] -!- jthornton_ [[email protected]] has joined #linuxcnc-devel
[19:09:23] -!- jthornton has quit [Read error: Connection reset by peer]
[19:17:05] -!- andypugh [andypugh!~andy2@cpc2-basl1-0-0-cust639.basl.cable.virginmedia.com] has joined #linuxcnc-devel
[19:19:00] -!- micges [[email protected]] has joined #linuxcnc-devel
[19:21:16] -!- Bruce has quit [Quit: Page closed]
[19:22:32] -!- IchGuckLive has quit [Quit: ChatZilla 0.9.87 [Firefox 16.0.2/20121025205401]]
[19:23:36] -!- L33TG33KG34R has quit [Ping timeout: 265 seconds]
[19:30:40] -!- motioncontrol has quit [Ping timeout: 252 seconds]
[19:36:53] <KGB-linuxcnc> 03seb 05encoder-modparam f25efee 06emc2 10tests/encoder/ 10(44 files in 8 dirs) * add a test of encoder module loading
[19:36:53] <KGB-linuxcnc> 03seb 05encoder-modparam 7ffb387 06emc2 10src/hal/components/encoder.c * fix a module parameter parsing bug in encoder
[19:48:09] <mhaberler> jepler: can you share what abs.0 and your comment here means: http://linuxcnc.mah.priv.at/irc/%23emc-devel/2008-11-10.html#20:57:22
[19:48:17] <mhaberler> "supposed to fail" ?
[19:48:25] -!- riz_ has quit [Quit: Page closed]
[19:49:40] <linuxcnc-build> build #653 of lucid-rtai-i386-clang is complete: Success [3build successful] Build details are at http://buildbot.linuxcnc.org/buildbot/builders/lucid-rtai-i386-clang/builds/653
[19:55:41] <seb_kuzminsky> linuxcnc-build: notify list
[19:55:41] <linuxcnc-build> The following events are being notified: ['finished']
[19:55:45] <seb_kuzminsky> linuxcnc-build: notify on failure
[19:55:46] <linuxcnc-build> The following events are being notified: ['failure', 'finished']
[19:55:52] <seb_kuzminsky> linuxcnc-build: notify off finished
[19:55:52] <linuxcnc-build> The following events are being notified: ['failure']
[20:03:51] -!- dgarr [[email protected]] has joined #linuxcnc-devel
[20:05:15] <dgarr> seb_kuzminsky: i think the bug reported for encoder may be more fundamental, for consideration:
[20:05:17] <dgarr> http://www.panix.com/~dgarrett/stuff/0001-rtapi.h-RTAPI_MP_ARRAY_-should-pass-num.patch
[20:07:11] <seb_kuzminsky> dgarr: i just pushed an experimental branch to fix it: http://git.linuxcnc.org/gitweb?p=linuxcnc.git;a=shortlog;h=refs/heads/encoder-modparam
[20:07:15] <seb_kuzminsky> i'll check out your patch
[20:07:36] <seb_kuzminsky> ah, i see
[20:07:43] <andypugh> So, we have been cheerfully passing a number to RTAPI_MP_ARRAY which has been equally cheerfully always ignored?
[20:07:51] <seb_kuzminsky> yes
[20:07:55] <seb_kuzminsky> and it's not wrong to do so
[20:08:15] <seb_kuzminsky> because not every rtos supported by rtapi provides you with that number, so you can't rely on it
[20:08:47] <andypugh> So why not miss it out of the #define altogether?
[20:09:07] <andypugh> (It's a bit late now, I guess)
[20:09:52] <seb_kuzminsky> dgarr: i think your patch is wrong
[20:10:54] <seb_kuzminsky> see how a pointer to __dummy_##var is an argument to module_param_array? later versions of linux tell you how many array entries were initialized at load-time, by setting that variable
[20:11:10] <seb_kuzminsky> assigning it at declaration-time does no good, i think
[20:11:49] -!- micges has quit [Quit: Leaving]
[20:12:47] <seb_kuzminsky> here's the blurb about module_param_array from the ldd3 book: http://www.makelinux.net/ldd3/chp-2-sect-8
[20:17:11] <andypugh> Is the problem that param_array will happily fill every element in the array, but our code looks for a null in the last place. (I do that a lot)
[20:17:28] yuvipanda is now known as rmoen|away
[20:17:33] <seb_kuzminsky> yes, exactly
[20:17:55] <seb_kuzminsky> if every element of the array has a non-null value, then the old code would read off the end of the array and get, who knows what
[20:18:33] <seb_kuzminsky> i wrote some tests to try the boundary conditions (in the first of the two commits on that branch)
[20:18:45] <seb_kuzminsky> it should be easy to adapt to our other modules
[20:18:56] <dgarr> seb_kuzminsky: i defer to your knowledge
[20:19:05] <andypugh> 5 extra program lines? Code bloat!
[20:19:10] <seb_kuzminsky> hah
[20:19:43] rmoen|away is now known as MaxSem
[20:19:52] <seb_kuzminsky> yeah, all those stupid error checks really bloat our code
[20:20:13] MaxSem is now known as Guest43866
[20:20:14] Guest43866 is now known as rmoen|away
[20:20:27] <andypugh> for(i=0; names[i] * (MAX_CHAN - i); i++) {howmany = i+1;}
[20:20:35] <andypugh> :-)
[20:21:03] <seb_kuzminsky> did you mean , instead of *?
[20:21:15] rmoen|away is now known as rrnoen
[20:21:23] <andypugh> No, I was being deliberately bad
[20:21:28] <seb_kuzminsky> i guess it doesnt matter
[20:21:42] <andypugh> Is a comma valid?
[20:21:46] rrnoen is now known as rmoen|away
[20:21:55] <seb_kuzminsky> in C it makes an aggregate statement
[20:22:00] <seb_kuzminsky> i think
[20:22:06] <seb_kuzminsky> i don't use it, there's no need
[20:23:42] <andypugh> If I was actually coding that expression I would use for(i=0; names[i] != NULL && i < MAX_CHAN; i++) {howmany = i+1;}
[20:23:58] <seb_kuzminsky> optimize for folk reading the code
[20:24:08] <andypugh> But that looks untidy, so would end up with your version I reckon
[20:24:24] <andypugh> Let me finish, dude!
[20:24:26] <seb_kuzminsky> and it reads off the end of the array still
[20:25:10] <andypugh> Reading off the end is OK isn't it? As long as you don't use that value.
[20:25:17] <seb_kuzminsky> depends what's after it
[20:25:39] <seb_kuzminsky> if it's an invalid virtual address, you'll segfault if you're in userspace and you'll panic if you're in the kernel
[20:25:51] <andypugh> FPGA code, where reading has side-effects, I guess.
[20:26:33] <andypugh> Anyway, I was deliberately coding badly to cause amusement.
[20:26:51] <seb_kuzminsky> ok :-)
[20:27:15] <seb_kuzminsky> i feel like i'm drowning in bad code all around, the humor misfired on me :-/
[20:27:17] <jepler> mhaberler: Many items in the testsuite rely on halsampler to record the data which is the point of the test.
[20:28:00] <jepler> mhaberler: but sampler has the possibility of overrunning and giving a failed test result when the real 'problem' is that for whatever reason userspace couldn't pull samples out of the halsampler shared memory fast enough
[20:28:33] <jepler> so I built into the runtests script a general "detect overruns in halsampler" facility
[20:28:43] <jepler> mhaberler: then it occurred to me that I needed to test if it worked
[20:28:47] -!- mas_ has quit [Quit: Page closed]
[20:28:49] -!- vladimirek has quit [Remote host closed the connection]
[20:29:19] <jepler> mhaberler: I originally wrote the test 'overrun' in such a way that it *always* overran, and then marked it as "expected to fail" (the run_without_overruns function in runtests gives up after 10 tries)
[20:29:50] <jepler> so at that time, a failure in 'overrun' was expected; the 'failure' report indicated that the testing infrastructure was working
[20:30:13] <mhaberler> and that has changed?
[20:30:31] <jepler> that was kind of stupid, so later I made the 'overrun' test work in a way that did not act as an (expected) failure
[20:30:41] <jepler> so if 'overrun' is failing now, it is a failure that needs to be looked at
[20:30:52] -!- RoyOnWheels has quit [Ping timeout: 246 seconds]
[20:30:57] <mhaberler> oh bwoy
[20:31:01] <jepler> it looks like I made this change in 2010 (5aa42c)
[20:31:25] rmoen|away is now known as YuviPanda
[20:31:31] <mhaberler> so likely cause - servo thread too slow?
[20:31:43] <jepler> overrun test doesn't even use realtime now
[20:32:10] <jepler> hm I guess it was commit 82235c that changed the test around, but there was a trivial problem I didn't fix until 5aa42c
[20:33:00] <jepler> anyway, that's tests/overrun
[20:33:27] <jepler> tests/abs.0 looks pretty self-explanatory .. not sure what I can say about it
[20:35:29] -!- gentux has quit [Ping timeout: 265 seconds]
[20:35:29] -!- emel has quit [Ping timeout: 265 seconds]
[20:35:29] -!- theos has quit [Ping timeout: 265 seconds]
[20:35:40] <mhaberler> uhum. I will record 'self-explanatory' ..
[20:36:23] <jepler> it's testing that the abs component works .. e.g., that abs(-64) is 64
[20:36:29] emel- is now known as emel
[20:36:46] <mhaberler> right. Now: tests/overrun succeeds.
[20:36:56] <mhaberler> abs.0 fails with overruns.
[20:37:03] <jepler> is there a specific runtests result I should be looking at?
[20:37:07] <mhaberler> what does this suggest?
[20:37:34] <mhaberler> on both arm platformas I currently have:
[20:37:35] <mhaberler> --- /home/mah/emc2-dev/tests/abs.0: overrun detected in sampler, re-running test
[20:37:36] <mhaberler> --- /home/mah/emc2-dev/tests/abs.0: 10 overruns detected, giving up
[20:37:37] <mhaberler> *** /home/mah/emc2-dev/tests/abs.0: FAIL: test run exited with 1
[20:37:59] <mhaberler> Running test: /home/mah/emc2-dev/tests/overrun -- fine
[20:39:38] <jepler> what's in "result"?
[20:39:56] <jepler> if it's empty that tends to indicate that realtime is nice and broken
[20:40:03] <jepler> also might make sense to look at "stderr"
[20:40:11] <jepler> all left in tests/abs.0 after the test fails
[20:40:24] <mhaberler> overrun
[20:40:25] <mhaberler> -64.000000 64.000000
[20:40:26] <mhaberler> overrun
[20:40:26] <mhaberler> -64.000000 64.000000
[20:41:19] <mhaberler> really repeats of last expected
[20:42:25] <jepler> the intent is that sampler_usr.c prints "overrun\n" on stdout when it finds that the FIFO has overflowed since userspace last read it out
[20:45:20] <jepler> you could increase depth= if the problem is that userspace doesn't get to run 'often enough'
[20:45:28] <jepler> since as it is, it only gets 35ms before overflow
[20:45:38] <jepler> or increase period or both
[20:46:26] <mhaberler> well the arm is running via NFS root and a slow server, so that could be an issue then
[20:47:45] <mhaberler> ah. thread period x 10 -> test succeeds.
[20:48:22] <mhaberler> ok, so that depends on a certain loading speed of things it seems
[20:50:29] <mhaberler> doubling period already fixes it. uh - race conditions in unit test..
[20:51:25] <mhaberler> I wonder if it can modified to avoid the race. Anyway, now I understand what's happening and I dont feel guilty ;)
[20:51:28] <mhaberler> thanks!
[20:52:33] <mhaberler> fixes it on both platforms.
[20:55:38] <mhaberler> ok, with the exception of the vbox scheduling/threads.0 issue rtos-integration-preview2 passes on all thread styles and platforms, including beaglebone and raspberry. getting closer.
[20:57:42] <skunkworks> Mile stone!
[20:58:21] <mhaberler> Operation leatherbutt ;)
[21:04:46] <seb_kuzminsky> ugh
[21:05:05] <seb_kuzminsky> any advise on merging translation files?
[21:05:29] <seb_kuzminsky> i'm trying to merge 2.5 into master, and getting a conflict on src/po/fr.po
[21:05:55] <seb_kuzminsky> this is auto-generated by some crazy tool somewhere?
[21:06:16] <seb_kuzminsky> or rather, the list of original strings is autogenerated, and the translations are created by hand by tissf?
[21:06:40] <mhaberler> jepler: do you have an idea how to close that race condition?
[21:07:29] <andypugh> seb_kuzminsky: I think a tool creates the strings by looking for _(" sequences?
[21:07:40] <seb_kuzminsky> that seems right
[21:09:17] <andypugh> I guess that if a string changes in the source code, then the translation changes later, then the merge won't know where to insert the delta.
[21:19:26] abetusk is now known as Guest44898
[21:19:26] -!- Guest44898 has quit [Killed (card.freenode.net (Nickname regained by services))]
[21:19:26] abetusk_ is now known as abetusk
[21:25:59] <jepler> seb_kuzminsky: there's some automated-ish way to merge po files, but it's been so long that I've forgotten how to do it
[21:26:12] <jepler> ah, look at git-merge-po in src/lib/po
[21:26:19] <seb_kuzminsky> thx!
[21:26:46] <jepler> I don't recall how well it actually worked :-/ but it dealt better with the po format than the builtin
[21:27:10] -!- dway has quit [Quit: dway]
[21:28:15] <jepler> mhaberler: neat
[21:29:17] -!- skunkworks has quit [Read error: Connection reset by peer]
[21:30:19] -!- YuviPanda has quit [Quit: YuviPanda]
[21:35:31] -!- yuvipanda has quit [Ping timeout: 260 seconds]
[21:36:47] <seb_kuzminsky> jepler: i don't know if it did the right thing, but there were many fewer conflicts, thanks
[21:37:03] <seb_kuzminsky> i wish tissf was here to proof-read all of linuxcnc after this merge
[21:43:04] <jepler> mhaberler: I think if increasing period1 by 10x is enough to make it pass consistently, do that and stop worrying about it
[21:44:12] <jepler> since it runs for 7 realtime periods, that means adding about 900us*7 to the minimum runtime, which is not much in the grand scheme of anything
[21:44:38] <jepler> the last two lines of abs.0/test.hal are this:
[21:44:39] <jepler> start
[21:44:39] <jepler> loadusr -w halsampler -n 7
[21:44:50] <jepler> realtime is started before it starts the process of loading the halsampler userspace component
[21:45:06] <jepler> .. and once halsampler has 7 samples, it exits, causing halcmd to exit, causing realtime to be torn down
[21:45:31] <jepler> I think it's now possible to change this design by using
[21:45:33] <jepler> loadusr halsampler -n 7
[21:45:34] <jepler> start
[21:45:36] <jepler> waituser halsampler
[21:45:45] <jepler> except I don't know if halsampler has a predictable component name
[21:46:07] <jepler> snprintf(comp_name, sizeof(comp_name), "halsampler%d", getpid());
[21:46:10] <jepler> unfortunately it doesn't
[21:46:32] <jepler> so the steps would be: add halsampler flag to specify exact component name; use it in loadusr and waitusr, with the start between loadusr and waituser
[21:47:58] <KGB-linuxcnc> 03seb 05v2.5_branch f25efee 06emc2 10tests/encoder/ 10(44 files in 8 dirs) * add a test of encoder module loading
[21:47:58] <KGB-linuxcnc> 03seb 05v2.5_branch 7ffb387 06emc2 10src/hal/components/encoder.c * fix a module parameter parsing bug in encoder
[21:48:13] <jepler> then your budget of 35ms until overflow no longer applies to the time that halsampler is setting up..
[21:48:51] <KGB-linuxcnc> 03seb 05questionable-merge 93f5cbc 06emc2 10(9 files in 8 dirs) * Merge branch 'v2.5_branch'
[21:49:36] <KGB-linuxcnc> 03TODO: deletor 05encoder-modparam 7ffb387 06emc2 04. * branch deleted
[21:51:47] -!- wboykinm has quit [Remote host closed the connection]
[21:52:36] -!- tom3p has quit [Quit: Ex-Chat]
[21:56:37] Guest25190 is now known as OoBIGeye
[21:58:02] -!- FinboySlick has quit [Quit: Leaving.]
[22:05:19] <mhaberler> what about a start ping for these comps
[22:17:08] -!- sumpfralle has quit [Ping timeout: 252 seconds]
[22:17:42] -!- DJ9DJ has quit [Quit: bye]
[22:18:41] <seb_kuzminsky> addf?
[22:24:57] -!- motioncontrol has quit [Quit: Sto andando via]
[22:31:44] <mhaberler> that might do
[22:33:23] -!- chillly has quit [Quit: Leaving]
[22:33:44] <jepler> you need a way to finish the test
[22:34:02] <jepler> that has to be loadusr -w or waitusr, I think...
[22:41:50] <mhaberler> what about this: http://hastebin.com/bebujobena.coffee
[22:42:08] <mhaberler> does *not* fail on arms at full thread speed
[22:46:15] <mhaberler> I can go down to the limit of thread speed where I get overrun errors from rtapi/kernel
[22:50:14] -!- opticdelusion has quit [Ping timeout: 256 seconds]
[23:07:59] -!- Thetawaves has quit [Quit: This computer has gone to sleep]
[23:21:33] -!- servos4ever has quit [Quit: ChatZilla 0.9.85 [SeaMonkey 2.0.11/20101206162726]]
[23:26:22] -!- mhaberler has quit [Quit: mhaberler]
[23:34:50] -!- dgarr [[email protected]] has parted #linuxcnc-devel
[23:46:15] -!- racycle has quit [Quit: racycle]