EzDev.org

incanter

Clojure-based, R-like statistical computing and graphics environment for the JVM Incanter: Statistical Computing and Graphics Environment for Clojure


Importing a CSV with different row widths into Incanter?

I'm trying to import a CSV file with rows of many different lengths into Incanter using the read-dataset function. Unfortunately, it appears to truncate the rows down to the length of the first row. Short of reordering the dataset, or searching for the largest row and adding a row at the top of that width, is there a way to solve this problem? The documentation doesn't seem to offer any optional parameters to read-dataset.


Source: (StackOverflow)

Is there something like Incanter for Haskell?

Incanter is an R-like library for Clojure. Is there anything like this for Haskell?


Source: (StackOverflow)

Using Incanter and Clojure Soup together

I am learning Clojure - it's a lot of fun! I am trying to use Incanter and Clojure Soup in the same file:

(require '[jsoup.soup :as soup])
(use '(incanter core stats io charts datasets))

And I get the following error:

CompilerException java.lang.IllegalStateException: $ already refers to: #'jsoup.soup/$ in namespace: user, compiling

I think I understand why, but how can I solve this problem? Appreciate this website and all the gurus on it!

Thanks.


Source: (StackOverflow)

How can I modify a column in an Incanter dataset?

I'd like to be able to transform an individual column in an incanter data set, and save the resulting data set to a new (csv) file. What is the simplest way to do that?

Essentially, I'd like to be able to map a function over a column in the data set, and replace the original column with this result.


Source: (StackOverflow)

Fast vector math in Clojure / Incanter

I'm currently looking into Clojure and Incanter as an alternative to R. (Not that I dislike R, but it just interesting to try out new languages.) I like Incanter and find the syntax appealing, but vectorized operations are quite slow as compared e.g. to R or Python.

As an example I wanted to get the first order difference of a vector using Incanter vector operations, Clojure map and R . Below is the code and timing for all versions. As you can see R is clearly faster.

Incanter and Clojure:

(use '(incanter core stats)) 
(def x (doall (sample-normal 1e7))) 
(time (def y (doall (minus (rest x) (butlast x))))) 
"Elapsed time: 16481.337 msecs" 
(time (def y (doall (map - (rest x) (butlast x))))) 
"Elapsed time: 16457.850 msecs"

R:

rdiff <- function(x){ 
   n = length(x) 
   x[2:n] - x[1:(n-1)]} 
x = rnorm(1e7) 
system.time(rdiff(x)) 
   user  system elapsed 
  1.504   0.900   2.561

So I was wondering is there a way to speed up the vector operations in Incanter/Clojure? Also solutions involving the use of loops, Java arrays and/or libraries from Clojure are welcome.

I have also posted this question to Incanter Google group with no responses so far.

UPDATE: I have marked Jouni's answer as accepted, see below for my own answer where I have cleaned up his code a bit and added some benchmarks.


Source: (StackOverflow)

Scientific dataset manipulation in Clojure -- reading ByteBuffers into matrices

I'm looking to use Clojure and Incanter for processing of a large scientific dataset; specifically, the 0.5 degree version of this dataset (only available in binary format).

My question is, what recommendations do you have for elegant ways to deal with this problem in Java/Clojure? Is there a simple way to get this dataset into Incanter, or some other java matrix package?

I managed to read the binary data into a java.nio.ByteBuffer using the following code:

(defn to-float-array [^String str]
  (-> (io/to-byte-array (io/to-file str))
      java.nio.ByteBuffer/wrap
      (.order java.nio.ByteOrder/LITTLE_ENDIAN)))

Now, I'm really struggling with how I can begin to manipulate this ByteBuffer as an array. I've been using Python's NumPy, which makes it very easy to manipulate these huge datasets. Here's the python code for what I'm looking to do:

// reshape row vector into (time, lat_slices, lon_slices)
// then cut out every other row
rain_data = np.fromfile("path/to/file", dtype="f")
rain_data = rain_data.reshape(24, 360, 720);
rain_data = rain_data[0:23:2,:,:];

After this slicing, I want to return a vector of these twelve arrays. (I need to manipulate them each separately as future function inputs.)

So, any advice on how to get this dataset into Incanter would be much appreciated.


Source: (StackOverflow)

Step-by-step procedure for nested logistic regression in Incanter

After finding this enormously helpful guide in R, it got me wondering how I might do something similar in Incanter. Being relatively new to Incanter, it would be lovely if someone could reproduce this answer.

In addition to illustrating a nested model, the discussion on that answer also included some good discussion of how to iteratively generate a list of un-nested models. I'd be curious as to what is the most idiomatic way of doing that in Clojure/Incanter is.


Source: (StackOverflow)

Incanter-numpy interop

I would like to use Clojure's Incanter, but I'd like to mix in calls to Python's extensive Numpy/Scipy numerical libraries. Is there an interoperability bridge between Incanter and Numpy that allows an embedded runtime of CPython to be run from Clojure and that interconverts Numpy's and Incanter's matrix data structures?

Jython isn't sufficient since Numpy requires CPython.

I am aware of (but have never used) http://jepp.sourceforge.net/, which allows Java programs to control an embedded CPython runtime -- but Numpy/Incanter matrix interconversion is still needed.

I'm looking for something similar to https://github.com/jolby/rincanter (which i have also not yet used) but for CPython/Numpy instead of R.


Source: (StackOverflow)

Clojure: incanter.stats Linear Regression Model Not Working

I am following the linear regression example here

(use '(incanter core stats datasets))
(def plant-growth (to-matrix (get-dataset :plant-growth) :dummies true))
(def y (sel plant-growth :cols 0))
(def x (sel plant-growth :cols [1 2]))
(def lm (linear-model y x))

However I get this error:

=> (def lm (linear-model y x))
ClassCastException clojure.lang.LazySeq cannot be cast to java.lang.Number clojure.lang.Numbers.lt (Numbers.java:219)

What is going on here?

Update: Neither does this example from the latest 1.4.1 (Stable) docs:

(use '(incanter core stats datasets charts))
(def iris (to-matrix (get-dataset :iris) :dummies true))
(def y (sel iris :cols 0))
(def x (sel iris :cols (range 1 6)))
(def iris-lm (linear-model y x)) ; with intercept term

Output:

=>  (def iris-lm (linear-model y x))
ClassCastException clojure.lang.LazySeq cannot be cast to java.lang.Number  clojure.lang.Numbers.lt (Numbers.java:219)

I'm using Clojure 1.5.1 and Incanter 1.4.1. Is this a bug that needs fixing? Where can I find authoritative, working examples?


Source: (StackOverflow)

Struggling with BFGS minimization algorithm for Logistic regression in Clojure with Incanter

I'm trying to implement a simple logistic regression example in Clojure using the Incanter data analysis library. I've successfully coded the Sigmoid and Cost functions, but Incanter's BFGS minimization function seems to be causing me quite some trouble.

(ns ml-clj.logistic
  (:require [incanter.core :refer :all]
            [incanter.optimize :refer :all]))


(defn sigmoid
  "compute the inverse logit function, large positive numbers should be
close to 1, large negative numbers near 0,
z can be a scalar, vector or matrix.
sanity check: (sigmoid 0) should always evaluate to 0.5"
  [z]
  (div 1 (plus 1 (exp (minus z)))))

(defn cost-func
  "computes the cost function (J) that will be minimized
   inputs:params theta X matrix and Y vector"
  [X y]
  (let
      [m (nrow X)
       init-vals (matrix (take (ncol X) (repeat 0)))
       z (mmult X init-vals)
       h (sigmoid z)
       f-half (mult (matrix (map - y)) (log (sigmoid (mmult X init-vals))))
       s-half (mult (minus 1 y) (log (minus 1 (sigmoid (mmult X init-vals)))))
       sub-tmp (minus f-half s-half)
       J (mmult (/ 1 m) (reduce + sub-tmp))]
    J))

When I try (minimize (cost-func X y) (matrix [0 0])) giving minimize a function and starting params the REPL throws an error.

ArityException Wrong number of args (2) passed to: optimize$minimize  clojure.lang.AFn.throwArity (AFn.java:437)

I'm very confused as to what exactly the minimize function is expecting.

For reference, I rewrote it all in python, and all of the code runs as expected, using the same minimization algorithm.

import numpy as np
import scipy as sp
data = np.loadtxt('testSet.txt', delimiter='\t')

X = data[:,0:2]
y = data[:, 2]


def sigmoid(X):
    return 1.0 / (1.0 + np.e**(-1.0 * X))

def compute_cost(theta, X, y):
    m = y.shape[0]
    h = sigmoid(X.dot(theta.T))
    J = y.T.dot(np.log(h)) + (1.0 - y.T).dot(np.log(1.0 - h))
    cost = (-1.0 / m) * J.sum() 
    return cost

def fit_logistic(X,y):
    initial_thetas = np.zeros((len(X[0]), 1))
    myargs = (X, y)
    theta = sp.optimize.fmin_bfgs(compute_cost, x0=initial_thetas,
                                     args=myargs)
    return theta

outputting

Current function value: 0.594902
         Iterations: 6
         Function evaluations: 36
         Gradient evaluations: 9
array([ 0.08108673, -0.12334958])

I don't understand why the Python code can run successfully, but my Clojure implementation fails. Any suggestions?

Update

rereading the docstring for minimize i've been trying to calculate the derivative of cost-func which throws a new error.

(def grad (gradient cost-func (matrix [0 0])))
(minimize cost-func (matrix [0 0]) (grad (matrix [0 0]) X))
ExceptionInfo throw+: {:exception "Matrices of different sizes cannot be differenced.", :asize [2 1], :bsize [1 2]}  clatrix.core/- (core.clj:950)

using trans to convert the 1xn col matrix to a nx1 row matrix just yields the same error with opposite errors.

:asize [1 2], :bsize [2 1]}

I'm pretty lost here.


Source: (StackOverflow)

How to manipulate legend in Incanter chart

I'm trying to include a legend in an Incanter chart, but I'm having some troubles getting what I want:

  1. I want to be able to instantiate a chart with no data first (using [] [] as my x y arguments), then add the data points in a separate step. However the only way to add a legend is to specify :legend true after the initial x y points are given in the constructor. Cannot specify :legend true without x y arguments, and I have not found any add-legend function.

  2. The legend option captures the code I use when adding the chart data, which means if I don't want ugly code to appear in the legend I have to create a nice-looking vars for the X and Y points, rather than just calling a function in line.

  3. Therefore the legend that is created includes the [][] used when creating the blank plot, it includes the function calls used when getting the data for the points, and it includes the name-mangled anonymous function (fn*[p1__3813#](second p1__3813#)) which is non-communicative to consumers of my chart.

  4. I just want to be able to associate a string with each group of points in the legend like in matlab, excel, etc.

Here is my current code;

(def lux-ratios-plot
   (doto (scatter-plot [] [] :legend true
                             :title  "Lux/CH0 vs. CH1/CH0"
                             :x-label "CH1/CH0"
                             :y-label "Lux/CH0")
     (view)))

(doseq [dut [incs hals cfls leds]]
  (add-points lux-ratios-plot (get-vals :CH1/CH0 dut) (get-vals :Lux/CH0 dut) :points true))

; Show the trend line for each bulb
(doseq [fit [inc-fit hal-fit cfl-fit led-fit]]
  (add-lines lux-ratios-plot (map #(second %) (:x fit)) (:fitted fit)))

Therefore is there any way in Incanter plots to specify a legend string with each (add-lines ...) or (add-points ...) call?

Thanks a lot

Michael


Source: (StackOverflow)

Troubles Importing Clojure Libs in Paradise

I occasionally get this problem, and generally work around it, but it's rather frustrating.

I have all of Incanter (check it out if you don't know it: it's superb) on my classpath. I try to import it (through a Slime REPL) like this: user> (use 'incanter.core), but fail.

Doing this: user> (use 'clojure.contrib.def) works just fine, and this file is in the same place–on my classpath.

Regardless, the error isn't anything about classpath: it's this:

Don't know how to create ISeq from: clojure.lang.Symbol
  [Thrown class java.lang.IllegalArgumentException] 

You can see my entire terminal here (a screenshot.)

I don't know what's going on here, and it's really frustrating, as I really would like to use Incancter, and I can from the Incanter binary's REPL. I definitely don't want to develop from that–and this should work.

Any help would be much appreciated.

EDIT:

It appears as though Incanter requires Clojure 1.2, and lein swank gives me Clojure 1.1. This might be the cause of my problems: if so, is there a way to continue to use Swank & Lein with Clojure 1.2?

Thanks again!

EDIT:

Apparently if you start using Clojure-1.1 and lein swank, you're stuck with it unless you make a new project.

If future people have this problem, this article helped me out, but also, at least for me, you must start a new lein project if you had begun it using leink swank and Clojure-1.1. Simply changing your project.clj file and then lein swanking again doesn't work.


Source: (StackOverflow)

Incanter dependency

I am trying to implement the following link http://data-sorcery.org/category/pca/ and found myself stuck trying to load the necessary Incanter libraries, i.e.

(use '(incanter core stats charts datasets))

The only dependency that I have for Incanter is [incanter "1.5.4"]. Is this enough to load the libraries, am I just missing something?

I am not really sure how to load the 4 highlighted libraries in the link. To note I have been able to use Incanter previously in the REPL.

Edit: My text editor has the following

(ns my-namespace.filename
(:use [incanter.core]
      [incanter.stats]
      [incanter.charts]
      [incanter.datasets]))


(def iris (to-matrix (get-dataset :iris))) 
(view iris)

which returns the error CompilerException javax.net.ssl.SSLProtocolException: handshake alert: unrecognized_name, compiling:(pca.clj:11:22)

The error seems to stem from the inner part, namely the get-dataset... which I am unsure how to fix.


Source: (StackOverflow)

Idiomatic way to add error bars to plot in Incanter

I'm creating a plot of a robot's belief of its distance to a landmark. The x-axis is number of measurements, and the y-axis is distance to landmark, which should include error bars to indicate the confidence in this estimate.

I haven't been able to find an good way to add error bars to the plot based off a value for the variance. Currently I'm creating a box-plot at each measurement by generating sample data about the mean with my value for the variance. This is clearly not ideal, in that it is computationally inefficient and is an imprecise representation of the information I'm trying to display.

Any ideas for how to do this? Ideally it would be on an xy-plot, and it could be done without having to resort to JFreeChart commands.


Source: (StackOverflow)

Clojure / Incanter Data Transformations Capabilities

I'm considering Clojure / Incanter as an alternative to R and just wondering if clojure / incanter have the capabilities to do the following:

  1. Import the result of a SQL statement as a data-set ( I do this in R using dbGetQuery ).
  2. Reshape the data-set - turning rows into columns also known as "pivot" / "unipivot"- I do this in R using the reshape, reshape2 packages ( in the R world it's called melting and casting data ).
  3. Save the reshaped data-set to a SQL table ( I do this in R using dbWriteTable function in RMySQL )

Cheers !


Source: (StackOverflow)