dummy-link

JudyDicts

Judy Array for Julia

Readme

Judy arrays are fast associative arrays with low memory usage.

This is a wrapper over the Judy C library at http://judy.sourceforge.net/, and also provides array like syntax for ease of use.

Note: This package is deprecated since Julia 0.4.

Speed comparison (output of test/time_test.jl):

time_test.jl 40000

items 40000 compare: JudyDict{Int, Int} vs. Dict{Int64, Int64}
set => dict: 0.003764559, judy: 0.007593068
get => dict: 0.003972227, judy: 0.008115711

items 40000 compare: JudyDict{String, Int} vs. Dict{String, Int64} vs. Trie{Int64}
set => dict: 0.026546593, trie: 1.617041746, judy: 0.024602444
get => dict: 0.025077909, trie: 0.333609694, judy: 0.011907627

items 40000 compare: JudyDict{String, ASCIIString} vs. Dict{String, ASCIIString} vs. Trie{ASCIIString}
set => dict: 0.016444282, trie: 1.928200929, judy: 0.079096568
get => dict: 0.018696934, trie: 0.447669909, judy: 0.018319763

time_test.jl

items 1000000 compare: JudyDict{Int, Int} vs. Dict{Int64, Int64}
set => dict: 0.39570294, judy: 0.206779173
get => dict: 0.232123599, judy: 0.236621416

items 1000000 compare: JudyDict{String, Int} vs. Dict{String, Int64} vs. Trie{Int64}
not comparing Trie for so many entries
set => dict: 1.486466472, trie: NaN, judy: 0.647465437
get => dict: 0.817610285, trie: NaN, judy: 0.352016509

items 1000000 compare: JudyDict{String, ASCIIString} vs. Dict{String, ASCIIString} vs. Trie{ASCIIString}
not comparing Trie for so many entries
set => dict: 1.454996025, trie: NaN, judy: 5.155431162
get => dict: 0.714579466, trie: NaN, judy: 0.369600479

These tests are just indicative and extensive testing hasn't been done yet. JudyDict seems better performing when the key is a String.

JudyDict with Julia objects as value type actually also hold the object references in an internal Dict to prevent them being gc'd. It thus takes a performance penalty during set operations. It could be faster if it could somehow indicate certain object_ids to gc as protected, but unfortunately there doesn't seem to be a way to do that.

Example (simple):

julia> using JudyDicts

julia> ja = JudyDict{Int, Int}()
JudyDict{Int64,Int64} (empty)

julia> ja[1] = 100
100

julia> ja[2] = 200
200

julia> println(ja[2] / ja[1])
2.0

julia> ja[2] * ja[1]
0x0000000000004e20

julia> ja = JudyDict{String, Int}()
JudyDict{String,Int64} (empty)

julia> ja["First"] = 100
100

julia> ja["One More"] = 200
200

julia> println(ja["One More"] / ja["First"])
2.0

julia> for x in ja
         println(x)
       end
(0x0000000000000064,"First")
(0x00000000000000c8,"One More")

Other APIs:

  • ju_mem_used: Return bytes of memory used currently by the judy array
  • ju_set: Set value (val) at index (idx). Return a pointer to the value. the pointer is valid till the next call to a judy method. Return C_NULL on error
  • ju_unset: Unset value at index. Return 1 if index was previously set (successful), otherwise 0 if the index was already unset (unsuccessful).
  • ju_get: Get the value at index (if set). For Judy1 arrays, return 1 if index's bit was set, 0 if it was unset (index was absent). Otherwise, return a tuple (value, pointer to value) if index's bit was set, (undefined, C_NULL) if it was not. The pointer is valid till the next call to a judy method.
  • ju_count: Count the number of indexes present between idx1 and idx2 (inclusive).
  • ju_by_count: locate the nth index that is present (n starts wih 1)
  • ju_first, ju_next, ju_last, ju_prev: iterators over populated indices
  • ju_first_empty, ju_next_empty, ju_last_empty, ju_prev_empty: iterators over empty slots

First Commit

03/21/2013

Last Touched

over 2 years ago

Commits

23 commits

Requires:

Used By: