Skip to content

Julia has stolen my heart

Julia is a modern data science programming language that has clean syntax, the ease of use of an interpreted environment, like R, with the speed of a compiled script like C.

julialang.org

Table of Contents

Most of the time, I'm using the SAT Verbal hemisphere, but I also like exercising the SAT Math side. Here's my first beyond Hello, World! effort in the Julia programming language. For most of the last 17 years, I've been using R, which displaced Python for most purposes. R has a wealth of statistical treasures and an impeccable pedigree.

Breeding doesn't get better than this, trust me

But it is a crappy procedural language and it's ducktyped. It is functional as it presents in the REPL and that is a great advantage to the procedural/imperative style of C and its progeny. But unlike Haskell, which is also a functional language, it's not strongly typed. That means that objects tend to get composed of linked lists and can be tedious to navigate. But Haskell is kinda geeky. When I discovered Julia I was smote.

But, soft! what light through yonder window breaks?It is the east, and Julia is the sun. stability.ai

The problem that spurred me here is this: A warehouse in England ships pallets to various postcodes (the Brits are crazy, they have like 1.7 million ZIP+5 equivalents) and are charged by the pallet with discounts given for multiple pallets upto a standard lorryful of 26 pallets. How can they tell if they are being billed by the carrier correctly?

How, indeed, using spreadsheets? Thousands of shipments to scores of addresses in various load sizes. Wasn't gonna happen. So, here's what I came up with.

  
# audit.jl
# compare invoiced shipping costs to UK postcodes
# given number of pallets against the rate card
# provided by carrier; returns 0 if no differences
# otherwise use `truth_table` vector to subsect 
# DataFrame `obj`
# author: Richard Careaga
# Date: 2024-02-28

#------------------------------------------------------------------

# libraries

using CSV
using DataFrames
using Dates

#------------------------------------------------------------------

# constant

objs = "/Users/me/postal/objs/"

#------------------------------------------------------------------

# data

# single origin shipments to separate destinations
df   = CSV.read(objs * "/PE330JF.csv",DataFrame)
# =
5_000 records
```markdown
| variable    | eltype    |
|-------------|-----------|
| Symbol      | DataType  |
| dt          | DateTime  |
| from        | String15  |
| to          | String15  |
| pallets     | Int64     |
| cost        | Float64   |
| key         | String7   |
| origin      | String7   |
| destination | String15  |
| meters      | Float64   |
| miles       | Float64   |
```
=#

# corresponding charges per pallet
card = CSV.read(objs * "rate_card.csv", DataFrame, header = false)

#=
100 records
```markdown
| variable | eltype  |
|----------|---------|
| Symbol   | DataType|
| Column1  | String3 |
| Column2  | Float64 |
| Column3  | Float64 |
      ⋮         ⋮    
| Column27 | Float64 |
```
=#

#------------------------------------------------------------------

# preprocessing

df.dt = DateTime.(df.dt, "m/d/yy H:M") + Year(2000)

# Create a new column to hold the first two letters of the 'key' column
# the keep only the columns for key, number of pallets and cost

df[!, :key_group] = first.(df.key, 2)
grp               = unique(df[:,[:key_group,:pallets,:cost]])
rename!(grp,[:key,:pallets,:cost])
sort!(grp, :key)

# discard unused rows from rate card

ours      = unique(df.key_group)
theirs    = card.Column1
common    = intersect(ours,theirs)
underlap  = setdiff(ours,theirs)
short     = filter(row -> in(row.Column1, common), card)
new_names = Symbol[:key; [Symbol("p", i) for i in 1:26]]
short     = DataFrame(short[!, 1:27], new_names)

# principal object

obj =  leftjoin(grp,short, on = :key)
obj = obj[completecases(obj),:]

# Generate column names to compare against 'cost'
# prefix the value of the pallets column with the letter p
# to correspond to p1 … p26, the cost for a given number of pallets

obj.p_column = "p" .* string.(obj.pallets)

#------------------------------------------------------------------

# main

# Perform comparison and create truth table

truth_table = [obj[i, :cost] != obj[i, Symbol(obj[i, :p_column])] for i in 1:size(obj, 1)]

# Result will equal 0 if all costs correspond to the rate card for that 
# number of pallets

sum(truth_table)
  

I cheated. I had the help of Claude, an AI bot who answered my simple questions about syntax and greatly assisted in the real work, which is properly framing the question systematically. When you go back to school algebra and start thinking $y = f(x)$ pieces start falling into place easily.

$x$ is what is at hand, $y$ is what is desired and $f$ is the operator or script that will get you there. Any of these may be, and in fact usually are, composite. So you start doing things like $y = g(f(x)$, etc. Approaching it that way is like having a conversation:

$y$, darling, tell me what you need to be happy
$x$, sweetheart, do you think I could give nicknames to your beautiful eyes?

so, $f$ if I make this little change can you get me closer?

Far too much formal education is wasted trying to find the answer, which is too bad. Answers change all the time, but the good questions are durable.

Latest

Counting chickens

Counting chickens

Refining polling models. It's the received wisdom of the ages—don't rely on events that haven't happened yet. On the other hand, don't bring a knife to a gunfight, either.

Members Public