Bayes Factor – The Rambling Hadron

I am using R for a project of mine; I had used R a few years ago in a very elementary way, but I had never gone into it seriously.

Thanks to a statistician—ESR of the AMVA4NewPhysics network, Grzegorz Kotkowski—who did an internship with my supervision at the Universidad de Oviedo last year, I got acquainted with RStudio, and decided to give it a try.

I had a few troubles at the beginning, mostly to figure out the peculiarities of R with respect to other languages, and nowadays I mostly fight with ggplot for tweaking the graphics of my plots.

However, today I was unit-testing my code for a plot I made that was highly suspicious; the variable to be plotted appeared to have a value of Inf most of the times.

Digging inside the code, I figured out that the Bayes Factor in these cases was… negative!!! Now, this is bad in a huge way, because the Bayes Factor is supposed to be positive-defined (at least in usual Bayesian statistics, that obeys the Kolmogorov axioma); it turns out that the negative sign came from the Gauss hypergeometric function ${}_2 F_1(a, b, c; x)$ that I was importing from the package Appell 0.0-4.

According to the package documentation , two methods for computing the function are used, taken from literature: the Forrey method and the Michel-Stoitsov method, the latter being the default.

Now, what I am interested in is ${}_2 F_1(0.5, 76.5, 31.5; -1)$ , so I naively tried to change the default method:

hyp2f1(0.5, 76.5, 31.5, -1, algorithm='michel.stoitsov')
[1] -3.575286e-19+0i
> hyp2f1(0.5, 76.5, 31.5; -1, algorithm='forrey')
[1] NaN+NaNi

As you can see, the alternative method (Forrey) doesn’t even work. However, I understand the input parameters have relatively large values, so I was not worried about that. What worried me was still the negative value for the real part as given by the Appell implementation of the Michel-Stoitsov algorithm!

I then cross-checked with Wolfram-Alpha:

> hypergeometric2f1(30.5,76.5,31.5,-1)
7.84840 × 10^-22

Indeed, the result is positive. What to do? Should I assume Wolfram-Alpha is the correct one? I had no idea, nor I wanted to dig into the details of the two calculations, so I thought of cross-checking with Python. Why Python? Because in Python ${}_2 F_1(a, b, c; x)$ is implemented in both mpmath and scipy, two packages that are routinely used and debugged by lots of people;

> library(reticulate)
> mpmath <- import('mpmath')
> scipy_special <- import('scipy.special')
> scipy_special$hyp2f1(30.5, 76.5, 31.5, -1)
[1] 7.848397e-22
> 
> mpmath$hyp2f1(30.5, 76.5, 31.5, -1)
7.84839654445994e-22
>

OK, it definitely looks like the Appell implementation has an issue!

It might also be that Wolframalpha, mpmath, and scipy all have the wrong implementation, but they are far more under scrutiny (active developers, etc) than Appell, and the value I get from them is the one that yields a meaningful result (positive probability…); at this point I would not bet on Appell’s implementation being correct.

Now the situation forks into a practical solution and a proper solution. The practical solution is that my plot now uses the scipy implementation (the mpmath one had some issues in the R vectorization, whereas scipy works fine).

The proper solution is that I tried to file a bug report: however, the CRAN page for the Appell package does not provide any means of filing a bug. A quick google search pointed me to a CRAN read-only github repository, but it is not possible to open an issue (the interface has not been activated, apparently), and I am not really in this moment available to debug the (FORTRAN, ouch!) code and prepare a pull request.

I therefore wrote an email to Daniel, the package maintainer, and to Gábor, which if I understand correctly is the CRAN responsible for committing the package (if I interpreted correctly the Github repository and commits).

Gábor actually answered, but he is not involved with the package itself, whereas the email to Daniel bounced back, because of “inexisting email address”. I think I’ll just wait until I will have time and willingness to stick the head into the FORTRAN code, and eventually prepare a pull request, but I don’t know when this will happen, nor I know if and when I will get an answer back from the maintainers.

This post is a bit of a public bug report, and a bit of an attempt of getting in touch with the maintainer; if you know of a (more proper?) way of reporting a bug for an R package, please let me know in the comments below!

Tag: Bayes Factor

Possibly a bug in the R package “Appell”?