Separate System from *Method #345

sixpearls · 2016-04-06T04:22:07Z

This is a running PR to separate out the System class from the *Method classes as discussed in an email discussion and on gitter. My suggestion is that System holds onto the vector of states, the vector of expressions for "force" (or is it a generalized impulse?), and optionally a mass matrix. The changes are small, but the decoupling allows for creating a system with manually defined equations as shown in the Van Der Pol example.

If this looks good, I can go ahead and complete the PR (tests and documentation). Actually, I may need help formulating the tests.

Right now, System can be instantiated using the old signature of passing a *Method instance, as long as it *Method.q and *Method.u form the states, *Method.forcing_full forms the RHS, and *Method.mass_matrix_full forms the mass matrix.

…d force

moorepants · 2016-04-06T04:46:32Z

pydy/codegen/ode_function_generators.py

- M(q, p) * u' = f(q, u, t, r, p)
- q' = g(q, u, t)
+ M(x, p) * x'_d = f(x, t, r, p)
+ x'_k = g(x, t)


For equations formed from Kane's method you have the ability to specify explicit expressions for the kinematical equations, q' = g(q, u, t) ,above. You have to solve for q' given the potentially complex expressions when you define u. These are generally solved explicitly for q' at the symbolic stage and in the KanesMethod class are accessed via the .kinddiffdict() method. For other methods (Newton-Euler, Lagrange, Hamilton, etc) you simply have the simplest case for the first order form: q' = u, i.e. g(q, u, t) = u. With q' already available in explicit form you only need to numerically solve for u'. And if you have any other method besides Kane's you have q' explict also. This means you only ever have to invert the n x n mass matrix.

I'm not sure what your changes represent here. Why the change in notation?

I was trying to indicate the difference between the a "dynamic" states x'_d and "kinematic" states x'_k, which was me trying to generalize out the generalized coordinates and speeds. I'm happy to revert back, it just seemed a bit odd to keep q and u since they aren't actually kept separately. But I have no strong opinion on the notation.

I do want to to keep the minimum mass matrix form usable, however I don't recall the use-case for it and I couldn't see how that form could be used based on the current implementation. If there is an example using this form, I can make sure it works.

The tests for ode_function_generator tests this case. The use case is any system with which you have the q'=g(x, t) explicitly defined (which is most all systems). For large degree of freedom systems this form should provide much faster evaluation of x' because you only have to invert an n x n matrix instead of an > n x n matrix. We should also support a form like:

M(q, p) u' = f(q, u, t, r, p)
A(q, p) q' = g(q, u, t)

To be more complete, come to think of it.

This last form is the preferred form for computational speed needs.

If you change the notation it needs to be:

M(x_q, p) x'_u = f(x, r, p, t)
x_q' = g(x, p, t)

Where x = [x_u, x_q]^-1

Great, I'll work on the notation.

For the case with an A matrix and g vector, won't most solvers take advantage of the block-matrix nature if M and A were combined?

Do you have an example of a System generating the minimum mass matrix form?

I don't know of any linear system solvers that check for that blocking expect maybe sparse solvers. I think you typically just pass in the blocks to individual solve commands.

Any system can be put in the minimum mass matrix form.

moorepants · 2016-04-06T19:39:57Z

So the solution I had in mind for this was to do this:

class MBodyDiffEquations(object):
    """This class holds information about the ODEs or DAEs that describe a multibody rigidbody system, but can also hold more general systems if needed, some method may not apply to non-multibody systems."""
    def __init__(self, right_hand_side, coordinates, speeds, constants,
                 mass_matrix=None, coordinate_derivatives=None,
                 specifieds=None):
        # All of the args to init would become attributes.
    def rhs(self):
        """Solve for the explicit right hand side."""
    def rhs_of_accelerations(self):
        """Returns u' = M^-1 F"""
    def states(self):
        """Returns the states in the correct order to match the result of rhs."""
    def solve_kinematical_differential_equations(self):
        """Returns q'= g(q, u, t)"""

This class can be populated manually with the necessary information and would carry methods to do general symbolic manipulations of the ODE/DAEs. It could detect whether they are ODEs or DAEs for example. It could provide the equations in different forms from the most general form. I think there are a number of methods that will be useful to associate with this.

Now the *Methods classes could either inherit from this class or produce objects of this type. This object will hold as much information that we may want to know about the symbolic ODE/DAEs and can be used by the classes in PyDy, like System and ODEFunctionGenerator.

This gives us an object that can be passed around that has and can produce more information than simply a collection of SymPy expressions, symbols, and functions.

So:

>>> kane = KanesMethod(...)
>>> odes = kane.odes()
>>> type(odes)
<MBodyDiffEquations>
>>> odes.states
(q1, q2, u2, u2)
>>> odes.rhs()
Matrix([[...], [...], [...], [...]])
>>> sys = System(odes, constants={m: 5.0, c: 1.0, k: 2.0}, ...)
>>> trajectory = sys.integrate()
>>> sys.plot(trajectory)

sixpearls · 2016-04-06T20:04:08Z

@moorepants, when you have a minute, could you point me to some examples where the knowledge of speeds, coordinates, and minimum mass form would be useful at the integration stage? That would help me think about the abstractions more.

moorepants · 2016-04-06T20:52:45Z

For any multibody system you will be able to form symbolic expressions for M, F, and G in:

M u' = F
q' = G

where q are the generalized coordinates and u are the generalized speeds (Kane-speak) where len(q) does not necessarily equal len(u).

But generally one needs to pass H to the ode integrator (odepack, sundials, etc) where:

x' = H and x = [q, u]^T (transpose)

H is a function of M, F, and G. If you have analytical expressions for H (the explicit form of the first order odes) then you hand a function to the ode integration routine that evaluates H given the system state, i.e. it computes the derivatives of the states.

But there are different types of ODE integrators, not all accept H. Some accept M* and F* where:

M* x' = F*

i.e. they accept this implicit form of the ODEs. Other routines might accept M* x' - F*.

Lastly, computing inv(M) or inv(M*) [1] are the killers here. Finding x' or u' from M, F, M*, or F* are bounded by O(n**3) where n is the size of M or M*. To be most efficient you should only ever solve for u' because n is smaller than for M*. This can be done symbolically (limits are around n=20 in sympy right now) or better yet, you should always do this numerically. The ODEFunctionGenerator does this operation numerically for two of the cases that it supports.

ODEFunctionGenerator (case 3) generates a numerical function that evaluates M, F, and G given the current state. It then uses lapack's LU decomposition (to solve for u' and then forms x' to pass to odepacks lsoda (scipy.odeint). So it tries to be efficient at this computation because it has great bearing on integration speed, especially when you have a stiff system.

[1] You never expclitly invert M and M* you use LU decomposition or Cholesky Decomposition to solve for u' or x'.

moorepants · 2016-04-06T20:56:22Z

The above also skips the case when you have Lagrange Multipliers due to constraints. See issue #127. We also need to support that case.

sixpearls · 2016-04-06T21:17:48Z

This matches my current understanding of the integration portion of what System should be helping with. It seems to me that in modern computers, the setup cost of computing M^-1 numerically may make the difference in O(n^3) vs O(m^3) neglible.

For this reason, I would suggest that the parent class for the generic dynamical system (if you're still interested in including it) just hold the M, x, and F. I am interested in the case where you do have the distinct minimum mass matrix form, although I'm not sure there will be a real performance improvement.

For the Lagrange multiplier case, isn't it possible to augment the mass matrix and F expressions with an extra column? I believe implicit ODE/DAE solvers can handle that case.

sixpearls · 2016-04-06T21:18:17Z

** an extra ROW for the mass matrix and F expression to handle lagrange multiplers

moorepants · 2016-04-06T21:57:24Z

For the basic case where n = len(u) = len(q) so that M.shape = (n, n) and M*.shape = (2*n, 2*n) you have O(n^3) and O((2*n)^3), for inverting M and M* respectively with LU decomposition. So the computation time of inverting M* is always 8 times greater than inverting M. For example if it takes 10 microseconds to invert M, then it will take 80 microseconds to invert M*. If your integrator evaluates x' one million times this will mean the difference in the simulation time of 70 seconds, wrt 10 seconds. You will always have an ~8 times slower integration time if you invert M instead of just inverting M*. This is a huge deal for real time simulation needs and shooting based optimization needs. There is some overhead associated with populating the x' vector with the results of solve(M, F) and G but definitely nothing compared to this 8x difference. I don't see how this is negligible as you suggest.

sixpearls · 2016-04-06T22:13:27Z

I think a significant portion of the 10 microsecond solution to inverting an n x n would be for setup. Let's say, it's 5-8 micro seconds for setup. In that case, we're looking at 54 to 24 microseconds for the 2n matrix vs 10. Similarly for the overall simulation, I suspect there are much longer operations that would make the difference in the actual matrix inversion neglible in "normal" use cases. Certainly taking the extra time off makes sense for real-time applications.

This type of speed up would primarily be used in the case where M is singular AND we only have explicit solvers, so we need to M(t) x' = F(t) at every time-step? I was trying to find out which python integrators support mass matrices / implicit ODEs / DAEs. It looks like we'd have to use a separate tool from the scipy integrators, which is unfortunate.

sixpearls · 2016-04-06T22:16:43Z

Of course, you can shift the balance back to minimizing M if len(q) > len(u). I'm not sure which is typically bigger

moorepants · 2016-04-06T22:34:55Z

What do you mean by setup? I'm not sure what you are referring to. There really isn't any setup in evaluating the right hand side.

For complex mbody systems the most time consuming portion of evaluating x' is the evaluation of M, F, and G which are very large expressions. For most multibody systems solved with this kind of software n <= 20. So the numerical inversion of M and M* are fast (I'm seeing 32 microseconds and 48 microseconds on my machine for M and M* n=20 using np.linalg.solve() and random matrices). This doesn't follow the theory above, only showing a 1.5X difference in computation speed.

So if you have a preallocated array for x', evaluate G to find x'[:n], and evaluate M and F, then solve for x'[n:] the computation time will be bounded by the evaluation time of M, F, and G or the LU decomposition for n <= 20 depending on which of those takes longer. For n > about 2 my guess it that the evaluation of M, F, and G start to dominate. Can do a demo later when I have time.

sixpearls · 2016-04-06T22:45:58Z

That 1.5x computation speed is about what I would expect for inverting M and M*. By setup, I'm referring to to literally putting the matrix in memory, setting up the register blocks, etc. that is virtually independent of the size of the problem. It seems that a huge portion of the solution time is just this setup-- even more than I thought.

I would be interested to see if the the literal evaluation of the expression plays an even bigger role than the LU decomposition setup. I suspect it might, in which case the biggest performance improvement would come from CSE features.

Are you using (or planning on using) this for real time control? That would be pretty sweet...

moorepants · 2016-04-06T22:59:45Z

This script compares the two for a 20 DoF system:

import numpy as np
import sympy as sm
from pydy.models import n_link_pendulum_on_cart
from pydy.codegen.ode_function_generators import CythonODEFunctionGenerator

sys = n_link_pendulum_on_cart(20)
args = np.random.random(42), 0.0, np.random.random(1), np.random.random(42)
M = sys.eom_method.mass_matrix
F = sys.eom_method.forcing
M_star = sys.eom_method.mass_matrix_full
F_star = sys.eom_method.forcing_full
g = CythonODEFunctionGenerator(F, sys.coordinates, sys.speeds, sys.constants_symbols,  mass_matrix=M, coordinate_derivatives=sm.Matrix(sys.speeds), specifieds=sys.specifieds_symbols)
f = g.generate()
g_star = CythonODEFunctionGenerator(F_star, sys.coordinates, sys.speeds, sys.constants_symbols, mass_matrix=M_star, specifieds=sys.specifieds_symbols)
f_star = g_star.generate()

Results:

In [41]: %timeit f(*args)
1000 loops, best of 3: 605 µs per loop

In [42]: %timeit f_star(*args)
1000 loops, best of 3: 613 µs per loop

So, nothing huge for n=20 and huge expressions for M, F and G as the simplest form.

For n = 2 and simple expressions for M, F, G I get:

In [53]: %timeit f(*args)
The slowest run took 4.20 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 565 µs per loop

In [54]: %timeit f_star(*args)
The slowest run took 5.51 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 527 µs per loop

Interestingly the F* form is faster. I'm not sure why I'm getting the caching warning though.

So this exercise seems to indicate that you don't gain much from having the M, F, G form.

moorepants · 2016-04-06T23:36:58Z

That 1.5x computation speed is about what I would expect for inverting M and M*.

Why would you expect that when the theory is very clear about what the difference is?

For the "setup" the memory for the matrix can be preallocated before any evaluation of x' happens. Maybe if you profile the above code you can demonstrate what the setup you are talking about is.

This is what I'm seeing with cprofile:

         7540003 function calls (7260003 primitive calls) in 8.211 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   180000    0.803    0.000    2.120    0.000 basic.py:278(__eq__)
180000/100000    0.623    0.000    1.119    0.000 evalf.py:1274(evalf)
  1660000    0.575    0.000    0.575    0.000 {built-in method builtins.hasattr}
   740000    0.495    0.000    0.932    0.000 <frozen importlib._bootstrap>:996(_handle_fromlist)
40000/20000    0.314    0.000    5.809    0.000 expr.py:3094(round)
    10000    0.278    0.000    0.547    0.000 linalg.py:296(solve)
    20000    0.277    0.000    0.715    0.000 version.py:208(__init__)
   180000    0.277    0.000    0.564    0.000 sympify.py:53(sympify)
60000/40000    0.269    0.000    1.123    0.000 function.py:458(_eval_evalf)
   830000    0.244    0.000    0.244    0.000 {built-in method builtins.isinstance}
    40000    0.240    0.000    2.193    0.000 evalf.py:1329(evalf)
    10000    0.238    0.000    1.042    0.000 ode_function_generators.py:376(_parse_old_style_extra_args)
    10000    0.196    0.000    6.100    0.001 ode_function_generators.py:409(_convert_constants_dict_to_array)
60000/40000    0.138    0.000    0.784    0.000 evalf.py:1422(_to_mpmath)
    20000    0.129    0.000    0.406    0.000 basic.py:398(atoms)
    20000    0.124    0.000    0.137    0.000 version.py:353(_cmpkey)
    20000    0.122    0.000    2.670    0.000 expr.py:3218(_mag)
   180000    0.120    0.000    0.684    0.000 sympify.py:329(_sympify)
    20000    0.118    0.000    0.118    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
60000/40000    0.116    0.000    0.191    0.000 expr.py:322(is_number)
    20000    0.116    0.000    0.396    0.000 symbol.py:158(as_real_imag)
   120000    0.109    0.000    0.145    0.000 numbers.py:526(__hash__)
    10000    0.104    0.000    0.104    0.000 {pydy_codegen_3.eval}
100000/60000    0.096    0.000    0.157    0.000 basic.py:1778(_preorder_traversal)
    10000    0.089    0.000    8.132    0.001 ode_function_generators.py:538(rhs)
        1    0.079    0.079    8.211    8.211 <string>:1(<module>)
    80000    0.079    0.000    0.079    0.000 version.py:217(<genexpr>)
    40000    0.079    0.000    0.131    0.000 libmpf.py:64(dps_to_prec)
    10000    0.078    0.000    0.739    0.000 ode_function_generators.py:723(base_rhs)
    10000    0.078    0.000    0.082    0.000 ode_function_generators.py:454(_parse_specifieds)
    60000    0.064    0.000    0.505    0.000 evalf.py:240(evalf_re)
   220000    0.063    0.000    0.063    0.000 basic.py:102(__hash__)
    20000    0.062    0.000    5.905    0.000 expr.py:181(__int__)
    20000    0.062    0.000    0.091    0.000 expr.py:114(__abs__)

Are you using (or planning on using) this for real time control? That would be pretty sweet...

Yes, we already do. I have some applications that this stuff is important for. We are working on an 11 DoF stiff human arm model that is executed at real time to give a virtual arm to a paralyzed person. This requires fast integration which my past PI solve with implicit integration methods.

moorepants · 2016-04-06T23:46:12Z

Here is a better profile using the pure array arg types from:

%prun for i in range(10000): f(x, t, r, p)

         350003 function calls in 0.625 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.198    0.000    0.348    0.000 linalg.py:296(solve)
    10000    0.182    0.000    0.182    0.000 {pydy_codegen_5.eval}
    10000    0.045    0.000    0.581    0.000 ode_function_generators.py:723(base_rhs)
    10000    0.028    0.000    0.050    0.000 linalg.py:139(_commonType)
    10000    0.019    0.000    0.019    0.000 {method 'astype' of 'numpy.ndarray' objects}
        1    0.019    0.019    0.625    0.625 <string>:1(<module>)
    20000    0.017    0.000    0.038    0.000 linalg.py:106(_makearray)
    10000    0.016    0.000    0.605    0.000 ode_function_generators.py:606(rhs)
    20000    0.014    0.000    0.018    0.000 numeric.py:406(asarray)
    10000    0.012    0.000    0.018    0.000 linalg.py:209(_assertNdSquareness)
    30000    0.009    0.000    0.013    0.000 linalg.py:111(isComplexType)
    10000    0.009    0.000    0.009    0.000 linalg.py:101(get_linalg_error_extobj)
    10000    0.008    0.000    0.008    0.000 ode_function_generators.py:528(slice_x)
    10000    0.008    0.000    0.009    0.000 linalg.py:198(_assertRankAtLeast2)
    20000    0.008    0.000    0.011    0.000 linalg.py:124(_realType)
    50000    0.006    0.000    0.006    0.000 {built-in method builtins.issubclass}
    10000    0.005    0.000    0.187    0.000 ode_function_generators.py:789(<lambda>)
    20000    0.004    0.000    0.004    0.000 {built-in method numpy.core.multiarray.array}
    10000    0.004    0.000    0.004    0.000 {built-in method builtins.max}
    20000    0.003    0.000    0.003    0.000 {built-in method builtins.getattr}
    20000    0.003    0.000    0.003    0.000 {method 'get' of 'dict' objects}

For n=20 we see that the evaluation of M and F take about the same time as the solve(M, F) command does.

moorepants · 2016-04-06T23:49:22Z

And with:

%prun for i in range(10000): f_star(x, t, r, p)

         350003 function calls in 0.765 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.352    0.000    0.506    0.000 linalg.py:296(solve)
    10000    0.189    0.000    0.189    0.000 {pydy_codegen_6.eval}
    10000    0.028    0.000    0.050    0.000 linalg.py:139(_commonType)
    10000    0.020    0.000    0.020    0.000 {method 'astype' of 'numpy.ndarray' objects}
        1    0.019    0.019    0.765    0.765 <string>:1(<module>)
    10000    0.017    0.000    0.746    0.000 ode_function_generators.py:606(rhs)
    10000    0.016    0.000    0.716    0.000 ode_function_generators.py:712(base_rhs)
    20000    0.016    0.000    0.040    0.000 linalg.py:106(_makearray)
    20000    0.016    0.000    0.020    0.000 numeric.py:406(asarray)
    10000    0.013    0.000    0.013    0.000 ode_function_generators.py:528(slice_x)
    10000    0.012    0.000    0.018    0.000 linalg.py:209(_assertNdSquareness)
    10000    0.009    0.000    0.009    0.000 linalg.py:101(get_linalg_error_extobj)
    30000    0.009    0.000    0.012    0.000 linalg.py:111(isComplexType)
    10000    0.009    0.000    0.010    0.000 linalg.py:198(_assertRankAtLeast2)
    20000    0.008    0.000    0.011    0.000 linalg.py:124(_realType)
    50000    0.006    0.000    0.006    0.000 {built-in method builtins.issubclass}

You see now that that solving the linear system of size 40 takes twice as long as the M*, F* evaluation.

moorepants · 2016-04-06T23:54:18Z

I forgot this detail to get the fastest speeds and remove as much overhead:

import numpy as np
import sympy as sm
from pydy.models import n_link_pendulum_on_cart
from pydy.codegen.ode_function_generators import CythonODEFunctionGenerator

sys = n_link_pendulum_on_cart(20)
args = np.random.random(42), 0.0, np.random.random(1), np.random.random(42)
M = sys.eom_method.mass_matrix
F = sys.eom_method.forcing
M_star = sys.eom_method.mass_matrix_full
F_star = sys.eom_method.forcing_full
g = CythonODEFunctionGenerator(F, sys.coordinates, sys.speeds, sys.constants_symbols,  mass_matrix=M, coordinate_derivatives=sm.Matrix(sys.speeds), specifieds=sys.specifieds_symbols, constants_arg_type='array', specifieds_arg_type='array')
f = g.generate()
g_star = CythonODEFunctionGenerator(F_star, sys.coordinates, sys.speeds, sys.constants_symbols, mass_matrix=M_star, specifieds=sys.specifieds_symbols, constants_arg_type='array', specifieds_arg_type='array')
f_star = g_star.generate()

Now we see the 15 microsecond gain from the M, F, G form:

In [76]: %timeit f(x, t, r, p)
The slowest run took 4.07 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 49.5 µs per loop

In [77]: %timeit f_star(x, t, r, p)
10000 loops, best of 3: 64.9 µs per loop

And the change is due to the speed of the solver as shown in the above two profiles.

I'm having dejavu. I did all these tests when I wrote all this code and implemented the three cases due to these differences. The last one to check is the symbolic LU decomp.

moorepants · 2016-04-07T04:18:48Z

For 5 link pendulum if you solve for x' symoblically you get speed ups (F**):

In [2]: %timeit f(x, t, r, p)
The slowest run took 16.94 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 28 µs per loop

In [3]: %timeit f_star(x, t, r, p)
10000 loops, best of 3: 28.7 µs per loop

In [4]: %timeit f_star_star(x, t, r, p)
The slowest run took 5.20 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.26 µs per loop

Since n is small, the linear system solve doesn't affect the timing like it does for larger n.

%prun for n in range(10000): f(x, t, r, p)

         350003 function calls in 0.415 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.125    0.000    0.270    0.000 linalg.py:296(solve)
    10000    0.051    0.000    0.051    0.000 {pydy_codegen_0.eval}
    10000    0.048    0.000    0.373    0.000 ode_function_generators.py:723(base_rhs)
    10000    0.028    0.000    0.050    0.000 linalg.py:139(_commonType)
    10000    0.018    0.000    0.018    0.000 {method 'astype' of 'numpy.ndarray' objects}
        1    0.017    0.017    0.415    0.415 <string>:1(<module>)
    20000    0.017    0.000    0.037    0.000 linalg.py:106(_makearray)
    10000    0.016    0.000    0.398    0.000 ode_function_generators.py:606(rhs)
    20000    0.013    0.000    0.017    0.000 numeric.py:406(asarray)
    10000    0.011    0.000    0.017    0.000 linalg.py:209(_assertNdSquareness)
    30000    0.010    0.000    0.013    0.000 linalg.py:111(isComplexType)
    10000    0.009    0.000    0.010    0.000 linalg.py:198(_assertRankAtLeast2)
    10000    0.008    0.000    0.008    0.000 ode_function_generators.py:528(slice_x)
    20000    0.008    0.000    0.011    0.000 linalg.py:124(_realType)
    10000    0.007    0.000    0.007    0.000 linalg.py:101(get_linalg_error_extobj)
    50000    0.006    0.000    0.006    0.000 {built-in method builtins.issubclass}
    10000    0.005    0.000    0.056    0.000 ode_function_generators.py:789(<lambda>)
    10000    0.004    0.000    0.004    0.000 {built-in method builtins.max}
    20000    0.004    0.000    0.004    0.000 {built-in method numpy.core.multiarray.array}
    20000    0.003    0.000    0.003    0.000 {built-in method builtins.getattr}
    20000    0.003    0.000    0.003    0.000 {method 'get' of 'dict' objects}
    10000    0.002    0.000    0.002    0.000 {built-in method builtins.min}
    10000    0.002    0.000    0.002    0.000 {method '__array_prepare__' of 'numpy.ndarray' objects}
    10000    0.001    0.000    0.001    0.000 {built-in method builtins.len}

In [6]: %prun for n in range(10000): f_star(x, t, r, p)

         350003 function calls in 0.400 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.148    0.000    0.290    0.000 linalg.py:296(solve)
    10000    0.047    0.000    0.047    0.000 {pydy_codegen_1.eval}
    10000    0.028    0.000    0.049    0.000 linalg.py:139(_commonType)
        1    0.018    0.018    0.400    0.400 <string>:1(<module>)
    10000    0.017    0.000    0.017    0.000 {method 'astype' of 'numpy.ndarray' objects}
    20000    0.017    0.000    0.037    0.000 linalg.py:106(_makearray)
    10000    0.017    0.000    0.382    0.000 ode_function_generators.py:606(rhs)
    10000    0.013    0.000    0.354    0.000 ode_function_generators.py:712(base_rhs)
    20000    0.012    0.000    0.016    0.000 numeric.py:406(asarray)
    10000    0.012    0.000    0.012    0.000 ode_function_generators.py:528(slice_x)
    10000    0.011    0.000    0.017    0.000 linalg.py:209(_assertNdSquareness)
    30000    0.009    0.000    0.013    0.000 linalg.py:111(isComplexType)
    10000    0.008    0.000    0.009    0.000 linalg.py:198(_assertRankAtLeast2)
    20000    0.008    0.000    0.010    0.000 linalg.py:124(_realType)
    10000    0.007    0.000    0.007    0.000 linalg.py:101(get_linalg_error_extobj)
    50000    0.006    0.000    0.006    0.000 {built-in method builtins.issubclass}
    10000    0.004    0.000    0.051    0.000 ode_function_generators.py:789(<lambda>)
    20000    0.004    0.000    0.004    0.000 {built-in method builtins.getattr}
    10000    0.004    0.000    0.004    0.000 {built-in method builtins.max}
    20000    0.004    0.000    0.004    0.000 {built-in method numpy.core.multiarray.array}
    20000    0.003    0.000    0.003    0.000 {method 'get' of 'dict' objects}
    10000    0.002    0.000    0.002    0.000 {built-in method builtins.min}
    10000    0.002    0.000    0.002    0.000 {method '__array_prepare__' of 'numpy.ndarray' objects}
    10000    0.001    0.000    0.001    0.000 {built-in method builtins.len}

In [7]: %prun for n in range(10000): f_star_star(x, t, r, p)
         40003 function calls in 0.055 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.025    0.000    0.025    0.000 {pydy_codegen_2.eval}
        1    0.010    0.010    0.055    0.055 <string>:1(<module>)
    10000    0.008    0.000    0.045    0.000 ode_function_generators.py:606(rhs)
    10000    0.008    0.000    0.008    0.000 ode_function_generators.py:528(slice_x)
    10000    0.003    0.000    0.029    0.000 ode_function_generators.py:789(<lambda>)
        1    0.000    0.000    0.055    0.055 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

sixpearls · 2016-04-07T13:45:07Z

Thanks for running all the profiling. Perhaps they should go into a separate write-up for whoever eventually goes on to improve the code generation efficiency?

Another path toward efficiency is to use a sparse linear solver. I think most of the mass matrices will have structure if they're not minimized (i.e., have a diagonal block or something) which can be used for a system with many states. It seems that with many states, having efficient expression execution (which can be improved with CSE) will be about as important as effective inversion.

By the way, how long did it take to invert the mass matrix symbolically?

As far as the inversion timing not matching theory: as I mentioned I think of this is as the "setup" for memory allocation, blocking etc. Most recently, I saw this explained in Prof Boyd's lectures on convex optimization. But I believe we touched on it in my numerical analysis course as well. It also fit my experience when testing chunks of my MS thesis.

moorepants · 2016-04-07T17:03:27Z

This info should be added here: https://github.com/pydy/pydy/wiki/PyDy-Speed

These matrices are not sparse, they are dense. Sparse solvers are generally slower given a dense matrix. They typically use slower iterative methods. I don't think sparse solvers apply to this.

Note that all the profiling I show above has CSE implemented. The CythonODEGenerator uses common sub expressions to speed up execution.

Speed of symbolic rhs:

In [1]: from pydy.models import n_link_pendulum_on_cart

In [2]: %time sys = n_link_pendulum_on_cart(5)
CPU times: user 1.06 s, sys: 0 ns, total: 1.06 s
Wall time: 1.05 s

In [3]: %time rhs = sys.eom_method.rhs()
CPU times: user 292 ms, sys: 0 ns, total: 292 ms
Wall time: 289 ms

In [4]: %time sys = n_link_pendulum_on_cart(10)
CPU times: user 7.04 s, sys: 12 ms, total: 7.06 s
Wall time: 7.05 s

In [5]: %time rhs = sys.eom_method.rhs()
CPU times: user 1.75 s, sys: 0 ns, total: 1.75 s
Wall time: 1.74 s

In [6]: %time sys = n_link_pendulum_on_cart(20)
CPU times: user 59.3 s, sys: 52 ms, total: 59.3 s
Wall time: 59.3 s

In [7]: %time rhs = sys.eom_method.rhs()

Last one has been running for minutes and is still going. We've just recently got some demos that speed this stuff up at least 10X using symengine as a backend to sympy.

Note that there is very little memory allocation going on in the evaluation and I don't think that is the issue. This is a computation overhead. In the rhs eval there is memory allocation for the tempory variables from cse but that could be pre-allocated outside of the function very easily. Lapack's lsoda can be setup so that minmal memory allocation is happening there too. It can be set to simply manipulate the existing array. See the overwrite_a and overwrite_b options in http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.linalg.solve.html.

sixpearls · 2016-04-07T18:44:45Z

Isn't the difference between a minimum mass matrix formulation and a full mass matrix formulation of the same system that the full mass matrix is block diagonal, and one of those blocks is identity?

As far as "setup time" affecting the timing, when I say memory I'm more talking about the registers, etc. LAPACK does a lot of super fancy stuff at a super low level and I have only a very vague idea of what is actually happening. But I do know that it means that numerical inversion is not something I generally need to worry about until n gets very very large.

Is there a guide to getting SymEngine as the as the backend to sympy? I was helping a professor transition a wavelets class to python. His main complaint was how slow the symbolic portion was.

moorepants · 2016-04-07T19:02:54Z

Isn't the difference between a minimum mass matrix formulation and a full mass matrix formulation of the same system that the full mass matrix is block diagonal, and one of those blocks is identity?

This is only the case if you have the simple definitions of the kinematical equations. Using Kane's method you can specify the lower block to be as complicated a function as you desire as long as it is linear in q'.

As far as "setup time" affecting the timing, when I say memory I'm more talking about the registers, etc. LAPACK does a lot of super fancy stuff at a super low level and I have only a very vague idea of what is actually happening. But I do know that it means that numerical inversion is not something I generally need to worry about until n gets very very large.

I don't know what Lapack's solvers due internally. All I know is that they are fast and return stable correct results. I just demonstrated that with n=20 and this "simple" pendulum problem that the LU decomposition takes substantial time wrt to the evaluation of M, F, M*, and F*. 20 is not very very large. Numerical inversion is something that I need to worry about for the small DoF problems I solve, if I want to ensure maximum rhs evaluation speed. It is very easy to create a multibody model with much greater than 20 degrees of freedom, so I think this matters a lot for our use cases. I understand that there are uses cases where it doesn't matter, but I'd like this to handle as large of DoF problems as we can manage and retain minimal overhead. There are many other ways to speed things up for specific problems, but this basic definition should be as fast as it can be. Another nice thing to do would be to remove the python callback overhead for the call to Lapack and do that in C or Fortran.

Is there a guide to getting SymEngine as the as the backend to sympy? I was helping a professor transition a wavelets class to python. His main complaint was how slow the symbolic portion was.

No there isn't. The SymEngine and SymPy interoperability is currently being figured out and developed. It is no where close to a drop in replacement (except for some of the core). You can ask on the SymPy or SymEngine list how to get things going. We have a student working on getting symengine running as a backend for sympy.physics.mechanics right now, so look for some results soon.

moorepants · 2016-04-07T20:18:19Z

Also, you should report any performance issues with SymPy to the issue tracker on that repo. We will look into it. We can also add a benchmark to our benchmarking app to track the performance of the symbolic operations.

moorepants · 2016-04-09T00:34:27Z

@sixpearls Just came across this: https://github.com/bjodah/pyodesys, which you may find more suitable than PyDy's System class.

sixpearls · 2016-04-09T02:11:01Z

Thanks! I'm just taking a look, but it does seem more aligned with what I'm looking for. Thanks for pointing it out!

sixpearls added 3 commits April 5, 2016 18:42

combine coordinate and speed into state, system takes state, mass, an…

65e303b

…d force

update scene so chaos pendulum examples work

bc1f45c

a few tweaks and an example of a manual system

efaa1f8

moorepants reviewed Apr 6, 2016
View reviewed changes

moorepants mentioned this pull request May 31, 2016

EOMBase class migration of property attributes sympy/sympy#11182

Closed

moorepants mentioned this pull request Jun 10, 2016

[WIP] Created a basis on which to discuss EOM class #353

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate System from *Method #345

Separate System from *Method #345

sixpearls commented Apr 6, 2016

moorepants Apr 6, 2016

sixpearls Apr 6, 2016

moorepants Apr 6, 2016

moorepants Apr 6, 2016

sixpearls Apr 6, 2016

moorepants Apr 6, 2016

moorepants commented Apr 6, 2016

sixpearls commented Apr 6, 2016

moorepants commented Apr 6, 2016

moorepants commented Apr 6, 2016

sixpearls commented Apr 6, 2016

sixpearls commented Apr 6, 2016

moorepants commented Apr 6, 2016

sixpearls commented Apr 6, 2016

sixpearls commented Apr 6, 2016

moorepants commented Apr 6, 2016

sixpearls commented Apr 6, 2016

moorepants commented Apr 6, 2016

moorepants commented Apr 6, 2016

moorepants commented Apr 6, 2016

moorepants commented Apr 6, 2016

moorepants commented Apr 6, 2016

moorepants commented Apr 7, 2016

sixpearls commented Apr 7, 2016

moorepants commented Apr 7, 2016

sixpearls commented Apr 7, 2016

moorepants commented Apr 7, 2016

moorepants commented Apr 7, 2016

moorepants commented Apr 9, 2016

sixpearls commented Apr 9, 2016

Separate System from *Method #345

Are you sure you want to change the base?

Separate System from *Method #345

Conversation

sixpearls commented Apr 6, 2016

moorepants Apr 6, 2016

Choose a reason for hiding this comment

sixpearls Apr 6, 2016

Choose a reason for hiding this comment

moorepants Apr 6, 2016

Choose a reason for hiding this comment

moorepants Apr 6, 2016

Choose a reason for hiding this comment

sixpearls Apr 6, 2016

Choose a reason for hiding this comment

moorepants Apr 6, 2016

Choose a reason for hiding this comment

moorepants commented Apr 6, 2016

sixpearls commented Apr 6, 2016

moorepants commented Apr 6, 2016

moorepants commented Apr 6, 2016

sixpearls commented Apr 6, 2016

sixpearls commented Apr 6, 2016

moorepants commented Apr 6, 2016

sixpearls commented Apr 6, 2016

sixpearls commented Apr 6, 2016

moorepants commented Apr 6, 2016

sixpearls commented Apr 6, 2016

moorepants commented Apr 6, 2016

moorepants commented Apr 6, 2016

moorepants commented Apr 6, 2016

moorepants commented Apr 6, 2016

moorepants commented Apr 6, 2016

moorepants commented Apr 7, 2016

sixpearls commented Apr 7, 2016

moorepants commented Apr 7, 2016

sixpearls commented Apr 7, 2016

moorepants commented Apr 7, 2016

moorepants commented Apr 7, 2016

moorepants commented Apr 9, 2016

sixpearls commented Apr 9, 2016