{"id":117,"date":"2023-04-18T02:00:00","date_gmt":"2023-04-18T02:00:00","guid":{"rendered":"https:\/\/renegrothmann.de\/?p=117"},"modified":"2023-05-09T20:08:19","modified_gmt":"2023-05-09T20:08:19","slug":"computing-primes-with-python-and-emt","status":"publish","type":"post","link":"https:\/\/renegrothmann.de\/?p=117","title":{"rendered":"Computing Primes with Python and EMT"},"content":{"rendered":"\n<p>Another old posting from my previous blog.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Continuing to get acquainted with Python, I looked at the book \u201eIntroduction to Data Science\u201c by Joel Grus. I have it in German as an electronic book, which is unfortunately no fun to read, although the content is solid \u2013 with restrictions. The most important problem is the formatting of the Python code, which does not even work in landscape mode. The printed book uses a smaller font and lines are completely in one line. You can and should download the examples. There are also some misprints. I found one in the introduction to the Bayes formula, where the fraction is reversed. But most importantly, I have my doubts if the book is really a help for the starting data scientist. The presentation is very short. You will need a good background in math and programming, or have someone to ask if you do not understand the theory or the code. The code gets pretty advance pretty fast.<\/p>\n\n\n\n<p>However, this is not a review of the book. But I take an example of the book to show how Python can be used from within Euler Math Toolbox. The following code reads the bible and prints the 10 most common words.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>infile = open(\"bible.txt\",\"r\")\n\nfrom collections import Counter\n\nnum_words = 10\n\ncounter = Counter(word.lower()\n    for line in infile\n    for word in line.strip().split()\n    if word)\n\nfor word, count in counter.most_common(num_words):\n    print(str(count)+\"\\t\"+word)\n\ninfile.close()<\/code><\/pre>\n\n\n\n<p>I downloaded the file \u201ebible.txt\u201c from the net and put it into the folder, where the IPython notebook resides. The output is the following.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>64183\tthe\n51379\tand\n34746\tof\n13643\tto\n12799\tthat\n12559\tin\n10263\the\n9840\tshall\n8987\tunto\n8835\tfor<\/code><\/pre>\n\n\n\n<p>Now, let us do this programming exercise in Euler Math Toolbox via Python.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&gt;&gt;&gt; from collections import Counter\n&gt;function python testbible (filename) ...\n$\"\"\" 10 most common words in the bible\n$\"\"\"\n$infile = open(filename,\"r\")\n$num_words = 10\n$counter = Counter(word.lower()\n$    for line in infile\n$    for word in line.strip().split()\n$    if word)\n$res = \"\"\n$for word, count in counter.most_common(num_words):\n$    res = res+str(count)+\"\\t\"+word+\"\\n\"\n$infile.close()\n$return res\n$endfunction\n&gt;testbible(\"bible.txt\")\n 64183   the\n 51379   and\n 34746   of\n 13643   to\n 12799   that\n 12559   in\n 10263   he\n 9840    shall\n 8987    unto\n 8835    for\n \n&gt;&gt;&gt; help(testbible)\n Help on function testbible in module __main__:\n \n testbible(filename)\n     10 most common words in the bible\n<\/code><\/pre>\n\n\n\n<p>The import was done with a direct Python command. The function returns a string with the answer. But it could just as well use \u201eeumat.dump(\u201e\u2026.\u201c)\u201c for a direct output. Note, that you do not need to indent the Python definition. That is done automatically by EMT.<\/p>\n\n\n\n<p>The \u201e$\u201c-signs in the function definition allow copying and pasting the code into an EMT notebook. They are not visible in the EMT window.<\/p>\n\n\n\n<p>Here is another example, done in IPython.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def sieve(N):\n    v = &#91;True for _ in range(N+1)]\n    for k in range(2,N+1):\n        if k*k&gt;N:\n            break\n        if v&#91;k]:\n            m=2*k\n            while m&lt;=N:\n                v&#91;m]=False\n                m=m+k\n    return &#91;k for k in range(N+1) if v&#91;k] and k&gt;1]\n\ns=sieve(100_000_000)\n\nfrom math import log\nx=&#91;k for k in range(1000,len(s),1000)]\ny=&#91;(s&#91;k]\/(k*log(k))-1-(log(log(k))-1)\/log(k))*log(k)**2 for k in \n     range(1000,len(s),1000)]\n\nimport matplotlib as mpl\nimport matplotlib.pyplot as plt\nfig, ax = plt.subplots(figsize=(8,8))\nax.plot(x,y)\n<\/code><\/pre>\n\n\n\n<p>This code computes the primes up to 100 million and plots the actual size of the k-th prime, divided by a known asymptotic for the size of the k-th prime.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img fetchpriority=\"high\" decoding=\"async\" width=\"483\" height=\"479\" src=\"https:\/\/renegrothmann.de\/wp-content\/uploads\/2023\/04\/Download.png\" alt=\"\" class=\"wp-image-118\" srcset=\"https:\/\/renegrothmann.de\/wp-content\/uploads\/2023\/04\/Download.png 483w, https:\/\/renegrothmann.de\/wp-content\/uploads\/2023\/04\/Download-300x298.png 300w, https:\/\/renegrothmann.de\/wp-content\/uploads\/2023\/04\/Download-150x150.png 150w\" sizes=\"(max-width: 483px) 100vw, 483px\" \/><\/figure>\n\n\n\n<p>We only use Python to do the sieve. For the plot, we use Euler Math Toolbox.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&gt;function python sieve (N) ...\n$v = &#91;True for _ in range(N+1)]\n$for k in range(2,N+1):\n$    if k*k&gt;N:\n$         break\n$    if v&#91;k]:\n$         m=2*k\n$         while m&lt;=N:\n$             v&#91;m]=False\n$             m=m+k\n$return &#91;k for k in range(N+1) if v&#91;k] and k&gt;1]\n$endfunction\n&gt;tic; s = sieve(100&nbsp;000&nbsp;000); toc;\n Used 29.407 seconds\n&gt;n = length(s)\n 5761455\n&gt;k = 1000:1000:n;\n&gt;y = (s&#91;k]\/(k*log(k))-1-(log(log(k))-1)\/log(k))*log(k)^2;\n&gt;plot2d(k,y):<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"717\" height=\"717\" src=\"https:\/\/renegrothmann.de\/wp-content\/uploads\/2023\/04\/Primes.png\" alt=\"\" class=\"wp-image-119\" srcset=\"https:\/\/renegrothmann.de\/wp-content\/uploads\/2023\/04\/Primes.png 717w, https:\/\/renegrothmann.de\/wp-content\/uploads\/2023\/04\/Primes-300x300.png 300w, https:\/\/renegrothmann.de\/wp-content\/uploads\/2023\/04\/Primes-150x150.png 150w\" sizes=\"(max-width: 717px) 100vw, 717px\" \/><\/figure>\n\n\n\n<p>As you see, the sieve takes about 30 seconds on my computer (Core I7, 4.0GHz). Of course, it is much faster to do that in C.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&gt;function tinyc csieve (N) ...\n$ARG_DOUBLE(N);\n$int n=(int)N;\n$CHECK(n&gt;10,\"Need a positive integer.\");\n$char *v = (char *)getram((n+1)*sizeof(char));\n$CHECK(v,\"Out of heap space.\");\n$\n$for (int i=0; i&lt;n+1; i++) v&#91;i]=1; \n$for (int k=2; k&lt;n+1; k++)\n${\n$   if (k*k&gt;n) break;\n$   if (v&#91;k])\n$   {\n$       int m=2*k;\n$       while (m&lt;=n)\n$       {\n$           v&#91;m]=0; m+=k;\n$       }\n$   }\n$}\n$int count=0;\n$for (int k=2; k&lt;n+1; k++) count+=v&#91;k];\n$\n$header *out=new_matrix(1,count);\n$CHECK(out,\"Out of heap space\");\n$double *res=matrixof(out);\n$\n$int j=0;\n$for (int k=2; k&lt;n+1; k++)\n$   if (v&#91;k])\n$       res&#91;j++]=k;\n$\n$moveresults(out);\n$endfunction\n&gt;s=csieve(100)\n &#91;2,  3,  5,  7,  11,  13,  17,  19,  23,  29,  31,  37,  41,  43,  47,\n 53,  59,  61,  67,  71,  73,  79,  83,  89,  97]\n&gt;tic; s=csieve(100&nbsp;000&nbsp;000); toc;\n Used 1.459 seconds\n<\/code><\/pre>\n\n\n\n<p>The procedures to communicate from EMT to Tiny C are a bit involved. But the result is 15 times faster. The basic code is a direct translation of the Python code.<\/p>\n\n\n\n<p>Depending on your heap space of Euler Math Toolbox, you can do more primes. If you compress the sieve using bits, you could to a lot more. But for this code and the default size, 100 million is the limit. Increasing the stack size to 2048 MB, I could do all primes up to 1 billion in 15 seconds.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"717\" height=\"717\" src=\"https:\/\/renegrothmann.de\/wp-content\/uploads\/2023\/04\/Primes-1.png\" alt=\"\" class=\"wp-image-120\" srcset=\"https:\/\/renegrothmann.de\/wp-content\/uploads\/2023\/04\/Primes-1.png 717w, https:\/\/renegrothmann.de\/wp-content\/uploads\/2023\/04\/Primes-1-300x300.png 300w, https:\/\/renegrothmann.de\/wp-content\/uploads\/2023\/04\/Primes-1-150x150.png 150w\" sizes=\"(max-width: 717px) 100vw, 717px\" \/><\/figure>\n\n\n\n<p>Published\u00a0<time datetime=\"2022-03-09T18:06:51+01:00\">2022\/03\/09<\/time>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Another old posting from my previous blog. Continuing to get acquainted with Python, I looked at the book \u201eIntroduction to Data Science\u201c by Joel Grus. I have it in German as an electronic book, which is unfortunately no fun to read, although the content is solid \u2013 with restrictions. The most important problem is the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,7],"tags":[],"class_list":["post-117","post","type-post","status-publish","format-standard","hentry","category-euler-math-toolbox","category-python"],"_links":{"self":[{"href":"https:\/\/renegrothmann.de\/index.php?rest_route=\/wp\/v2\/posts\/117","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/renegrothmann.de\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/renegrothmann.de\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/renegrothmann.de\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/renegrothmann.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=117"}],"version-history":[{"count":3,"href":"https:\/\/renegrothmann.de\/index.php?rest_route=\/wp\/v2\/posts\/117\/revisions"}],"predecessor-version":[{"id":196,"href":"https:\/\/renegrothmann.de\/index.php?rest_route=\/wp\/v2\/posts\/117\/revisions\/196"}],"wp:attachment":[{"href":"https:\/\/renegrothmann.de\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=117"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/renegrothmann.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=117"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/renegrothmann.de\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=117"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}