Tuesday, October 16, 2012

CouchDB, Part 2 - Data Structure and JSON

Couchdb stores the documents in a native erlang format.  When it runs a view function, it runs a json encode (out of the mochiweb framework) to convert the native erlang object into JSON that the JavaScript interpreter can parse.  It runs the function on that document, and takes the output of the emit and turns it back into a native erlang object.

In this post, I will do my best to explain how the native erlang object translates to JSON, and JSON to native erlang.  The reason this is important is to know how to get values out of non-trivial JSON objects.  Most examples show how to get one level deep (proplists:get_value).

First is a short explanation of erlang objects.  There are three things you need to know about how erlang structures data.  The tuple, the list, and the bit string.  See this reference manual for Erlang data types.

The Tuple
Tuples are defined as "Compound data type with a fixed number of terms."  You'll see later that they are very similar to Lists, except the number of terms is fixed.  As far as I can tell, when storing JSON objects in Erlang, Tuples are mostly used to represent a name/value pair.  The other use is to contain lists.

The List
Lists are defined as "Compound data type with a variable number of terms."  Lists are obviously very handy for representing arrays.  They are also used to hold all of the name/value pairs stored in Tuples.

Bit String
"A bit string is used to store an area of untyped memory."  In this specific example, all strings are stored as bit strings in couchdb.  Bit strings look like this:  <<"foobar">>.  Empty strings either look like this:  <<>>, or this:  <<"">>.  Bit strings actually have quite a few more uses, but for this post we'll just stick to using them for strings.

Putting it together
So how does a JSON object look in Erlang?  Let's start with a simple example.

In Javascript:


{
_id: "article_A9CC7889-B880-1CEE-2493-1B7605619241",
_rev: "21-500f57210e2856f2bc24368555f67209",
type: "article",
title: "Phelps Wins!",
uri: "/news/phelps-wins"
}

This would look like this in Erlang:


{
[
{<<"_id">>, <<"article_A9CC7889-B880-1CEE-2493-1B7605619241">>},
{<<"_rev">>, <<"21-500f57210e2856f2bc24368555f67209">>},
{<<"type">>, <<"article">>},
{<<"title">>, <<"Phelps Wins!">>},
{<<"uri">>, <<"/news/phelps-wins">>}
]
}

Pretty simple so far.  What about something a bit more complicated, like an article with a list of tags:

Javascript:


{
_id: "article_A9CC7889-B880-1CEE-2493-1B7605619241",
_rev: "21-500f57210e2856f2bc24368555f67209",
type: "article",
title: "Phelps Wins!",
uri: "/news/phelps-wins",
tags: [ "news", "olympics", "swim" ]
}

Erlang:
{[
{<<"_id">>, <<"article_A9CC7889-B880-1CEE-2493-1B7605619241">>},
{<<"_rev">>, <<"21-500f57210e2856f2bc24368555f67209">>},
{<<"type">>, <<"article">>},
{<<"title">>, <<"Phelps Wins!">>},
{<<"uri">>, <<"/news/phelps-wins">>},
{<<"tags">>, [<<"news">>, <<"olympics">>, <<"swim">>]}
]}


And with a property that has another object as a value:

Javascript:

{
_id: "article_A9CC7889-B880-1CEE-2493-1B7605619241",
_rev: "21-500f57210e2856f2bc24368555f67209",
type: "article",
title: "Phelps Wins!",
uri: "/news/phelps-wins",
tags: [ "news", "olympics", "swim" ],
metas: {
pubDate: "2012-10-02T15:11:34Z",
author: "asmith",
description: "Phelps wins again in the Olympics"
}
}

Erlang:
{
[
{<<"_id">>, <<"article_A9CC7889-B880-1CEE-2493-1B7605619241">>},
{<<"_rev">>, <<"21-500f57210e2856f2bc24368555f67209">>},
{<<"type">>, <<"article">>},
{<<"title">>, <<"Phelps Wins!">>},
{<<"uri">>, <<"/news/phelps-wins">>},
{<<"tags">>, [<<"news">>, <<"olympics">>, <<"swim">>]},
{<<"metas">>, {[
{<<"pubDate">>, <<"2012-10-02T15:11:34Z">> },
{<<"author">>, <<"asmith">> },
{<<"description">>, <<"Phelps wins again in the Olympics">> }
]}
]
}

There are a few simple rules when going from JSON to an Erlang object.  First, the entire structure should be a tuple, containing a list.  Each item in the list should be a tuple with 2 items.  The first should always be a bit string (property name), and the second (value) can be a bit string, list, or tuple.  If it is a list, it should be a list of bit strings (an array).  If it is a tuple, the tuple should contain a list, and each item in the list should be a tuple with 2 items.  Wash, rinse, repeat.


Read your couchdb log.  It is EXTREMELY verbose and will dump the object that the view failed on.  This helps greatly when trying to figure out how the object is structured.

In my next post, I'll show you some patterns I've used in Erlang to get data out of non-trivial objects, as well as some ways to protect your view against bad data.


Monday, October 15, 2012

You've got to tell them resources is people

Bob Schatz, a.k.a. "Scrum Bob" told me once that "People are not resources."  He was very emphatic and brought up some very good points.  Namely, you don't want management to think of your top talent (and even your middle and low talent) as being something they can mine like a mineral or a tree.  They should be thought of as people with the same needs and motivators as any other person.

Fighting the urge to call people "resources" is hard.  The further you go up the chain, the harder it is to change this behavior.  The further disconnected you are from the people who would be considered resources you get, the harder it gets to see them as people.  Upper level management, executives, VP's, even directors must focus on the bigger picture.  This is especially true in a larger company.

So why fight it?  Why not work with it?

What prompted this line of thought?  I saw The Lorax with my kids.  If you've not seen it, lets just say that it is very transparent about its underlying message:  unregulated corporate greed is killing the environment.  The main message is that if you over utilize your resources (in this case, the Truffula tree) your business will fail once the resource is depleted.  No more resources means no more product.

So as long as managers are aware of this and understand the concept of sustainability the company not fail (well, at least not due to depleted resources).  I'm quite alright with being called a resource as long as I'm not mined/logged/fracked into extinction (or near extinction).

This draws a lot of analogies.

Some resources are like a forest.  During critical times, they can be heavily deforested providing wood for fire, shelter, furniture, musical instruments, etc...  The problem is, if you take trees out of the forest as fast as you can, the forest doesn't have time to regrow.  Eventually you run out of trees.  If you don't take the time to plant new trees you will wipe out an entire ecosystem.

The analogy here is that new, talented employees might start with a whole forest full of ideas.  If we constantly harvest those ideas without taking the time to plant new ones or allow old ones to germinate and grow new ideas, our new talent will start showing diminishing returns.  100% utilization is not sustainable.  Creativity and  new ideas need time to grow.

So go ahead and call people resources, as long as you understand that people and resources both need to be treated responsibly and used in a sustainable manner.