• Higher Order Logic Programming and Autograd

    From Mild Shock@21:1/5 to All on Sat Mar 15 16:35:51 2025
    Somehow I shied away from implementing call/n for
    my new Prolog system. I thought my new Prolog system
    has only monomorphic caches , I will never be able to

    replicate what I did for my old Prolog system with
    arity polymorphic caches. This changed when I had
    the idea to dynamically add a cache for the duration

    of a higher order loop such as maplist/n, foldl/n etc…

    So this is the new implementation of maplist/3:

    % maplist(+Closure, +List, -List)
    maplist(C, L, R) :-
    sys_callable_cacheable(C, D),
    sys_maplist(L, D, R).

    % sys_maplist(+List, +Closure, -List)
    sys_maplist([], _, []).
    sys_maplist([X|L], C, [Y|R]) :-
    call(C, X, Y),
    sys_maplist(L, C, R).

    Its similar as the SWI-Prolog implementation in that
    it reorders the arguments for better first argument
    indexing. But the new thing is sys_callable_cacheable/1,

    which prepares the closure to be more efficiently
    called. The invocation of the closure is already
    quite fast since call/3 is implemented natively,

    but the cache adds an itch more speed. Here some
    measurements that I did:

    /* SWI-Prolog 9.3.20 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
    maplist(succ,L,_),fail; true)), fail.
    % 2,003,000 inferences, 0.078 CPU in 0.094 seconds

    /* Scryer Prolog 0.9.4-350 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
    maplist(succ,L,_),fail; true)), fail.
    % CPU time: 0.318s, 3_007_105 inferences

    /* Dogelog Player 1.3.1 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
    maplist(succ,L,_),fail; true)), fail.
    % Zeit 342 ms, GC 0 ms, Lips 11713646, Uhr 10.03.2025 09:18

    /* realla Prolog 2.64.6-2 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
    maplist(succ,L,_),fail; true)), fail.
    % Time elapsed 1.694s, 15004003 Inferences, 8.855 MLips

    Not surprisingly SWI-Prolog is fastest. What was
    a little surprise is that Scryer Prolog can do it quite
    fast, possibly since they heavily use maplist/n all

    over the place, they came up with things like '$fast_call'
    etc.. in their call/n implementation. Trealla Prolog is
    a little bit disappointing at the moment.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mild Shock@21:1/5 to Mild Shock on Sat Mar 15 16:36:48 2025
    What can we do with these new toys, we
    can implement vector operations and matrice
    operations. An then apply it for example

    to layered neural networks by
    representing them as:

    /**
    * Network is represented as [N0,M1,N1,...,Mn,Nn]
    * - Where N0 are the input neurons vector
    * - Where N1 .. Nn-1 are the hidden neurons vectors
    * - Where Nn are the output neurons vector
    * . Where M1 .. Mn are the transition weights matrice
    */

    ?- mknet([3,2], X).
    X = [''(-1, 1, 1), ''(''(1, 1, -1), ''(1, 1, -1)), ''(-1, 1)].

    The model evaluation at a data point
    is straight forward:

    eval([V], [V]) :- !.
    eval([V,M,_|L], [V,M|R]) :- !,
    matmul(M, V, H),
    vecact(H, expit, J),
    eval([J|L], R).

    The backward calculation of deltas
    is straight forward:

    back([V], U, [D]) :- !,
    vecact(U, V, sub, E),
    vecact(E, V, mulderiv, D).
    back([V,M,W|L], U, [D2,M,D|R]) :-
    back([W|L], U, [D|R]),
    mattran(M, M2),
    matmul(M2, D, E),
    vecact(E, V, mulderiv, D2).

    You can use this to compute weight changes
    and drive a gradient algorithm.

    Mild Shock schrieb:
    Somehow I shied away from implementing call/n for
    my new Prolog system. I thought my new Prolog system
    has only monomorphic caches , I will never be able to

    replicate what I did for my old Prolog system with
    arity polymorphic caches. This changed when I had
    the idea to dynamically add a cache for the duration

    of a higher order loop such as maplist/n, foldl/n etc…

    So this is the new implementation of maplist/3:

    % maplist(+Closure, +List, -List)
    maplist(C, L, R) :-
       sys_callable_cacheable(C, D),
       sys_maplist(L, D, R).

    % sys_maplist(+List, +Closure, -List)
    sys_maplist([], _, []).
    sys_maplist([X|L], C, [Y|R]) :-
       call(C, X, Y),
       sys_maplist(L, C, R).

    Its similar as the SWI-Prolog implementation in that
    it reorders the arguments for better first argument
    indexing. But the new thing is sys_callable_cacheable/1,

    which prepares the closure to be more efficiently
    called. The invocation of the closure is already
    quite fast since call/3 is implemented natively,

    but the cache adds an itch more speed. Here some
    measurements that I did:

    /* SWI-Prolog 9.3.20 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
       maplist(succ,L,_),fail; true)), fail.
    % 2,003,000 inferences, 0.078 CPU in 0.094 seconds

    /* Scryer Prolog 0.9.4-350 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
       maplist(succ,L,_),fail; true)), fail.
        % CPU time: 0.318s, 3_007_105 inferences

    /* Dogelog Player 1.3.1 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
       maplist(succ,L,_),fail; true)), fail.
    % Zeit 342 ms, GC 0 ms, Lips 11713646, Uhr 10.03.2025 09:18

    /* realla Prolog 2.64.6-2 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
        maplist(succ,L,_),fail; true)), fail.
    % Time elapsed 1.694s, 15004003 Inferences, 8.855 MLips

    Not surprisingly SWI-Prolog is fastest. What was
    a little surprise is that Scryer Prolog can do it quite
    fast, possibly since they heavily use maplist/n all

    over the place, they came up with things like '$fast_call'
    etc.. in their call/n implementation. Trealla Prolog is
    a little bit disappointing at the moment.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mild Shock@21:1/5 to Mild Shock on Sat Mar 15 16:37:37 2025
    But where is Autograd, automatic derivation from
    some symbolic input? In general you can objectify
    neural networks which I already did with the Prolog

    list, and routines such as back/3 are pure Prolog.
    Basically you could symbolically derive expit
    (activation), mulderiv (the product with the derivative

    of the activation) and matrran (the jacobian without
    activation) from a DAG of vector functions. In a linear
    neural network, the jacobian without activation is

    the same as the weights, and expit has a simple derivative
    that is based on the expit result itself which is
    already stored as the activation:

    /* g(x) = logistic function */
    expit(X, Y) :- Y is 1/(1+exp(-X)).

    /* g'(x) = g(x)*(1-g(x)) */
    mulderiv(X, Y, Z) :- Z is X*Y*(1-Y).
    See also:

    A Gentle Introduction to torch.autograd https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

    Mild Shock schrieb:
    What can we do with these new toys, we
    can implement vector operations and matrice
    operations. An then apply it for example

    to layered neural networks by
    representing them as:

    /**
     * Network is represented as [N0,M1,N1,...,Mn,Nn]
     * - Where N0 are the input neurons vector
     * - Where N1 .. Nn-1 are the hidden neurons vectors
     * - Where Nn are the output neurons vector
     * . Where M1 .. Mn are the transition weights matrice
     */

    ?- mknet([3,2], X).
    X = [''(-1, 1, 1), ''(''(1, 1, -1), ''(1, 1, -1)), ''(-1, 1)].

    The model evaluation at a data point
    is straight forward:

    eval([V], [V]) :- !.
    eval([V,M,_|L], [V,M|R]) :- !,
       matmul(M, V, H),
       vecact(H, expit, J),
       eval([J|L], R).

    The backward calculation of deltas
    is straight forward:

    back([V], U, [D]) :- !,
       vecact(U, V, sub, E),
       vecact(E, V, mulderiv, D).
    back([V,M,W|L], U, [D2,M,D|R])  :-
       back([W|L], U, [D|R]),
       mattran(M, M2),
       matmul(M2, D, E),
       vecact(E, V, mulderiv, D2).

    You can use this to compute weight changes
    and drive a gradient algorithm.

    Mild Shock schrieb:
    Somehow I shied away from implementing call/n for
    my new Prolog system. I thought my new Prolog system
    has only monomorphic caches , I will never be able to

    replicate what I did for my old Prolog system with
    arity polymorphic caches. This changed when I had
    the idea to dynamically add a cache for the duration

    of a higher order loop such as maplist/n, foldl/n etc…

    So this is the new implementation of maplist/3:

    % maplist(+Closure, +List, -List)
    maplist(C, L, R) :-
        sys_callable_cacheable(C, D),
        sys_maplist(L, D, R).

    % sys_maplist(+List, +Closure, -List)
    sys_maplist([], _, []).
    sys_maplist([X|L], C, [Y|R]) :-
        call(C, X, Y),
        sys_maplist(L, C, R).

    Its similar as the SWI-Prolog implementation in that
    it reorders the arguments for better first argument
    indexing. But the new thing is sys_callable_cacheable/1,

    which prepares the closure to be more efficiently
    called. The invocation of the closure is already
    quite fast since call/3 is implemented natively,

    but the cache adds an itch more speed. Here some
    measurements that I did:

    /* SWI-Prolog 9.3.20 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
        maplist(succ,L,_),fail; true)), fail.
    % 2,003,000 inferences, 0.078 CPU in 0.094 seconds

    /* Scryer Prolog 0.9.4-350 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
        maplist(succ,L,_),fail; true)), fail.
         % CPU time: 0.318s, 3_007_105 inferences

    /* Dogelog Player 1.3.1 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
        maplist(succ,L,_),fail; true)), fail.
    % Zeit 342 ms, GC 0 ms, Lips 11713646, Uhr 10.03.2025 09:18

    /* realla Prolog 2.64.6-2 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
         maplist(succ,L,_),fail; true)), fail.
    % Time elapsed 1.694s, 15004003 Inferences, 8.855 MLips

    Not surprisingly SWI-Prolog is fastest. What was
    a little surprise is that Scryer Prolog can do it quite
    fast, possibly since they heavily use maplist/n all

    over the place, they came up with things like '$fast_call'
    etc.. in their call/n implementation. Trealla Prolog is
    a little bit disappointing at the moment.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mild Shock@21:1/5 to Mild Shock on Sun Mar 16 23:00:47 2025
    Ok some progress report here. I have currently a
    library(linear) in the working which is only a few
    lines of code, but it provides vectors and matrixes.
    One can use the library to define matrix exponentiation:

    matexp(M, 1, M) :- !.
    matexp(M, N, R) :- N mod 2 =:= 0, !,
    I is N // 2,
    matexp(M, I, H),
    matmul(H, H, R).
    matexp(M, N, R) :-
    I is N-1,
    matexp(M, I, H),
    matmul(H, M, R).

    And then do fancy stuff like answering the question
    what are the last 8 digits of fibonacci(1000000):

    ?- time((fib(1000000, _X), Y is _X mod 10^8)).
    % Zeit 28 ms, GC 0 ms, Lips 88857, Uhr 16.03.2025 22:48
    Y = 42546875

    The 28 ms execution time are not bad, since modulo was not
    integrated into matexp/3, making it to compute the full
    fibonacci(1000000) before taking the modulo. Not sure whether
    JavaScript bigint is faster or slower than GMP ?

    So what can we do with library(linear) besides implementing
    eval/3 and back/3 ? We can finally update a neural network
    and do this iteratively. Using a very simple random pick
    to choose some training data sample:

    update([V], _, [V]) :- !.
    update([V,M|L], [_,M3|R], [V,M4|S]) :-
    maplist(maplist(compose(add,mul(0.1))), M3, M, M4),
    update(L, R, S).

    iter(0, _, N, N) :- !.
    iter(I, Z, N, M) :-
    random(R), K is floor(R*4)+1,
    call_nth(data(Z, X, Y), K),
    eval(N, X, U),
    back(U, Y, V),
    update(U, V, W),
    J is I-1,
    iter(J, Z, W, M).

    Disclaimer: This is only a proof of concept. It mostlikely
    doesn’t have all the finess of Python torch.autograd. Also
    it uses a very simple update of the weights via μ Δwij with
    μ = 0.1. But you can already use it to learn an AND

    or to learn an XOR.



    Mild Shock schrieb:
    Somehow I shied away from implementing call/n for
    my new Prolog system. I thought my new Prolog system
    has only monomorphic caches , I will never be able to

    replicate what I did for my old Prolog system with
    arity polymorphic caches. This changed when I had
    the idea to dynamically add a cache for the duration

    of a higher order loop such as maplist/n, foldl/n etc…

    So this is the new implementation of maplist/3:

    % maplist(+Closure, +List, -List)
    maplist(C, L, R) :-
       sys_callable_cacheable(C, D),
       sys_maplist(L, D, R).

    % sys_maplist(+List, +Closure, -List)
    sys_maplist([], _, []).
    sys_maplist([X|L], C, [Y|R]) :-
       call(C, X, Y),
       sys_maplist(L, C, R).

    Its similar as the SWI-Prolog implementation in that
    it reorders the arguments for better first argument
    indexing. But the new thing is sys_callable_cacheable/1,

    which prepares the closure to be more efficiently
    called. The invocation of the closure is already
    quite fast since call/3 is implemented natively,

    but the cache adds an itch more speed. Here some
    measurements that I did:

    /* SWI-Prolog 9.3.20 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
       maplist(succ,L,_),fail; true)), fail.
    % 2,003,000 inferences, 0.078 CPU in 0.094 seconds

    /* Scryer Prolog 0.9.4-350 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
       maplist(succ,L,_),fail; true)), fail.
        % CPU time: 0.318s, 3_007_105 inferences

    /* Dogelog Player 1.3.1 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
       maplist(succ,L,_),fail; true)), fail.
    % Zeit 342 ms, GC 0 ms, Lips 11713646, Uhr 10.03.2025 09:18

    /* realla Prolog 2.64.6-2 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
        maplist(succ,L,_),fail; true)), fail.
    % Time elapsed 1.694s, 15004003 Inferences, 8.855 MLips

    Not surprisingly SWI-Prolog is fastest. What was
    a little surprise is that Scryer Prolog can do it quite
    fast, possibly since they heavily use maplist/n all

    over the place, they came up with things like '$fast_call'
    etc.. in their call/n implementation. Trealla Prolog is
    a little bit disappointing at the moment.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mild Shock@21:1/5 to Mild Shock on Sun Mar 16 23:10:55 2025
    What made me do the lillte prototype? Try this one,
    it has a little Java code. But its a little ancient
    technologie using the sigmoid activation function. And
    it seems to me it uses some graph datastructure:

    Neural Networks
    Rolf Pfieffer et al. - 2012

    https://www.ifi.uzh.ch/dam/jcr:00000000-7f84-9c3b-ffff-fffffb34b58a/NN20120315.pdf

    I guess it corresponds to this here, which is a SWI-Prolog and C
    hybrid, when using FANN_SIGMOID:

    FANN - Fast Artificial Neural Network
    Package for SWI-Prolog - 2018
    https://www.swi-prolog.org/pack/list?p=plfann

    Translating the Java code to Prolog from the Pfeiffer
    paper into linear algebra using vectors and matrixes, I
    have now a little piece of pure Prolog code, that runs

    also in the Browser, that can already learn an
    AND, and its using the ReLU activation function,
    i.e. not the FANN_SIGMOID activation function anymore.

    I simulated the bias by an extra input neuron
    which is always 1, because I was to lazy to have
    bias in the model. Sample output:

    A -- 0.99 ---\
    \
    B -- 0.99 -----+-- ReLu -->
    /
    1 -- -0.98 --/

    It can als learn an XOR. Libraries such as PyTorch
    cooperate with optimizer libraries that provide a
    variety of gradient search methods. One needs

    to study how these library are architectured so that
    they provide plug and play. Maybe can bring the same
    architecture to Prolog:

    A Gentle Introduction to torch.autograd

    Next, we load an optimizer, in this case SGD with a
    learning rate of 0.01 and momentum of 0.9. We register all
    the parameters of the model in the optimizer.

    optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

    https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

    Mild Shock schrieb:
    Ok some progress report here. I have currently a
    library(linear) in the working which is only a few
    lines of code, but it provides vectors and matrixes.
    One can use the library to define matrix exponentiation:

    matexp(M, 1, M) :- !.
    matexp(M, N, R) :- N mod 2 =:= 0, !,
       I is N // 2,
       matexp(M, I, H),
       matmul(H, H, R).
    matexp(M, N, R) :-
       I is N-1,
       matexp(M, I, H),
       matmul(H, M, R).

    And then do fancy stuff like answering the question
    what are the last 8 digits of fibonacci(1000000):

    ?- time((fib(1000000, _X), Y is _X mod 10^8)).
    % Zeit 28 ms, GC 0 ms, Lips 88857, Uhr 16.03.2025 22:48
    Y = 42546875

    The 28 ms execution time are not bad, since modulo was not
    integrated into matexp/3, making it to compute the full
    fibonacci(1000000) before taking the modulo. Not sure whether
    JavaScript bigint is faster or slower than GMP ?

    So what can we do with library(linear) besides implementing
    eval/3 and back/3 ? We can finally update a neural network
    and do this iteratively. Using a very simple random pick
    to choose some training data sample:

    update([V], _, [V])  :- !.
    update([V,M|L], [_,M3|R], [V,M4|S]) :-
       maplist(maplist(compose(add,mul(0.1))), M3, M, M4),
       update(L, R, S).

    iter(0, _, N, N) :- !.
    iter(I, Z, N, M) :-
       random(R), K is floor(R*4)+1,
       call_nth(data(Z, X, Y), K),
       eval(N, X, U),
       back(U, Y, V),
       update(U, V, W),
       J is I-1,
       iter(J, Z, W, M).

    Disclaimer: This is only a proof of concept. It mostlikely
    doesn’t have all the finess of Python torch.autograd. Also
    it uses a very simple update of the weights via μ Δwij with
    μ = 0.1. But you can already use it to learn an AND

    or to learn an XOR.



    Mild Shock schrieb:
    Somehow I shied away from implementing call/n for
    my new Prolog system. I thought my new Prolog system
    has only monomorphic caches , I will never be able to

    replicate what I did for my old Prolog system with
    arity polymorphic caches. This changed when I had
    the idea to dynamically add a cache for the duration

    of a higher order loop such as maplist/n, foldl/n etc…

    So this is the new implementation of maplist/3:

    % maplist(+Closure, +List, -List)
    maplist(C, L, R) :-
        sys_callable_cacheable(C, D),
        sys_maplist(L, D, R).

    % sys_maplist(+List, +Closure, -List)
    sys_maplist([], _, []).
    sys_maplist([X|L], C, [Y|R]) :-
        call(C, X, Y),
        sys_maplist(L, C, R).

    Its similar as the SWI-Prolog implementation in that
    it reorders the arguments for better first argument
    indexing. But the new thing is sys_callable_cacheable/1,

    which prepares the closure to be more efficiently
    called. The invocation of the closure is already
    quite fast since call/3 is implemented natively,

    but the cache adds an itch more speed. Here some
    measurements that I did:

    /* SWI-Prolog 9.3.20 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
        maplist(succ,L,_),fail; true)), fail.
    % 2,003,000 inferences, 0.078 CPU in 0.094 seconds

    /* Scryer Prolog 0.9.4-350 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
        maplist(succ,L,_),fail; true)), fail.
         % CPU time: 0.318s, 3_007_105 inferences

    /* Dogelog Player 1.3.1 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
        maplist(succ,L,_),fail; true)), fail.
    % Zeit 342 ms, GC 0 ms, Lips 11713646, Uhr 10.03.2025 09:18

    /* realla Prolog 2.64.6-2 */
    ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
         maplist(succ,L,_),fail; true)), fail.
    % Time elapsed 1.694s, 15004003 Inferences, 8.855 MLips

    Not surprisingly SWI-Prolog is fastest. What was
    a little surprise is that Scryer Prolog can do it quite
    fast, possibly since they heavily use maplist/n all

    over the place, they came up with things like '$fast_call'
    etc.. in their call/n implementation. Trealla Prolog is
    a little bit disappointing at the moment.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)