Shift8 Creative Graphic Design and Website Development

lithium

Getting Started with Minerva

Posted by Tom on Tue, Jul 19 2011 09:18:00

Minerva was only ever a few ideas I threw around. However, I noticed people were following it. So I wanted to write a short blog post to help people get started with it. This would allow you to see how it works. By no means should you attempt to use this for a production site. I mean, you could...But, I'm telling you don't. The reason is because I don't want you to be disappointed when something doesn't work the way you'd expect or when you run into a problem that you can't figure out. Further, there's really no documention. Lots of comments within the code...But no documentation.

That said. I would absolutely love for others to get involved and fork the CMS. I would love to hear ideas and feedback.

First, the requirements... You need to be running PHP 5.3+ and using MongoDB. I know some of you would love to use MySQL or some other database, but due to the schemaless nature of MongoDB, Minerva requires it. Sorry. You'll also need a few various libraries from the community. Don't worry, we'll go through those below. Personally, I'm ok with these requirements because I don't want to re-invent the wheel. There's plenty of good CMS' out there that use MySQL and older versions of PHP. It's not my intention to re-invent the wheel.

So, Minerva works as plugin that can sit on top of any of your Lithium apps. With a working copy of Lithium, clone Minerva to your app/libraries folder. In fact, you may want to use git submodules. So really, you'd want to run something like (you may need the -f flag to do this by the way):

git submodule add git@github.com:tmaiaroto/minerva.git app/libraries/minerva

Then do the same for li3_access and li3_flash_message.

git submodule add git@github.com:tmaiaroto/li3_access.git app/libraries/li3_access
git submodule add code@dev.lithify.me:li3_flash_message.git app/libraries/li3_flash_message

After that, you would run two more submodule commands in order to get the files you need (note if you cloned Lithium from github, your lithium library directory should also be a submodule so you'd need to do this for that as well):

git submodule init
git submodule update 

If you would like Facebook support (the ability to login to the CMS with your Facebook user) then you'll also need the li3_facebook library.

git submodule add git@github.com:tmaiaroto/li3_facebook.git app/libraries/li3_facebook

Note: With the li3_facebook library, there is a submodule for the Facebook SDK within the library. So you will need to run a git submodule init and update within app/libraries/li3_facebook in order to retrieve it.

That's it. All the files will be ready to go. Of course, now you need to add the libraries within your app/config/bootstrap/libraries.php file.

Minerva is designed so that you don't need to touch any of its files that are cloned from the repository. You can pass your config options within your own libraries.php file when you call Libraries::add(). The order in which you add these libraries is going to matter a little bit. You need to have li3_access loaded before Minerva. This will work:

Libraries::add('li3_access');
Libraries::add('li3_flash_message');
Libraries::add('minerva');

That's basically it. Note, if using Facebook integration you will not need to add the li3_facebook library since the minerva library will do that for you. However, you will need to specify values for 'appId' and 'secret' keys in the minerva library configuration under a 'facebook' key. ie. Libraries::add('minerva', array('facebook' => array('appId' => 123, 'secret' => 123))); Some other handy configuration keys include, 'connections' which is an array with 'production' and 'development' each normal connection settings arrays. Also a 'show_errors' and 'development_errors' key both with boolean values. More on this later.

Now you can go to yoursite.com/minerva. Actually, I use minerva.local, so I'll reference that from now on. Just replace that with whatever domain name you have setup. You'll see the welcome page. It has some copy, some of the instructions are actually dated and wrong. It will tell you that you can edit files within the Minerva library in order to change templates...I am changing that to add another template path which would be under your main app directory. Again, this is so you don't need to touch any files within the minerva library. No git ignore, etc. 

You likely want to see the backend though. You'll need a user account to do that. The first user account will become admin. No other users registering after that will have an admin role, so don't worry. I do plan to create an intstaller in the future. However, go to: http://minerva.local/minerva/users/register

This will be the front-end registration form. It'll register the first user as an administrator role. So you can go back to: http://minerva.local/minerva/admin which will redirect you to login. 

Now you should be logged in and see the CMS backend. You can add users, pages, etc. However, you'll like want to take a look at the minerva_blog library to see how pages can be hooked into and extended to create new types of pages. You'd add that after the minerva library, again, with Libraries::add('minerva_blog'); You could also add that as a submodule.

At this point, I think you get a general idea for the direction of the CMS. Keeping it modular is important, but dependency issues could arrise. If you went to try Minerva without having the li3_flash_message library or configure it for use with Facebook without the li3_facebook library then it would show you a warning message about missing dependencies. I hope to expand upon that, not just to maybe make it look a little nicer, but to allow all libraries to run through some sort of dependency check process. 

Along with keeping things modular, the other goal is to keep things clean. "Minerva Plugins" we'll call them, which are really just libraries, interact with the CMS without touching any of the core code. So you add and remove them very easily while still being able to stay up to date with the latest version of Minerva. The goal is to ensure backward compatibility, but of course problems could crop up with that as well. I'd preferably like it to warn you before an upgrade and I would like to have upgrades from the CMS control panel. So, there's a lot to consider and a lot left to do.

There are really no test cases for the CMS and things are in a state of change. However, most of the foundation is pretty solid. I don't imagine I'll be changing how plugins work and how you can extend core models to add new schema, validation, etc. The template paths are pretty much solid now too. Again the idea is that you put your templates outside the minerva library. 

Feedback, questions, and pull requests always welcome! Hopefully I can put up a roadmap soon so I can take this into more of an alpha status.


As noted above, there's a few configuration options. These will grow more robust, but for now the important ones are for including Facebook and debugging purposes. Here's a snip my app/config/boostrap/librarires.php for an example (note: these are all optional setting, the default database name is minerva and minerva_dev):

Libraries::add('li3_access');
Libraries::add('li3_flash_message');
Libraries::add('minerva', array(
    'connections' => array(
		'production' => array(
			'host' => 'localhost',
			'database' => 'minerva_cms'
		),
		'development' => array(
			'host' => 'localhost',
			'database' => 'minerva_cms_dev'
		)
    ),
    'facebook' => array(
        'appId' => 'XXXXXX',
        'secret' => 'XXXXX'
    ),
    'show_errors' => true,
    'development_errors' => true
));
Libraries::add('minerva_blog');

To the connections array you can add all the same settings you would normally. So, if you were running MongoDB in replica set mode, you'd want to have a key 'replicaSet' set to true. You would also want your 'host' key to have a comma separated list of host names.

How About Just Reduce in Lithium?

Posted by Tom on Tue, May 24 2011 23:32:00

So after realizing my big problem with schema design for my little metrics example, I've changed the design a bit. Obviously if you store all this visitor data on one document, you're going to exceed the 4MB limit. So this was not just an example for me, this was to be actual code! Yikes! I'm glad I caught it a few hours later. So now imagine a schema where your "metrics" collection has a bunch of documents with all the same recorded information. Illustrated like so:

/* 0 */
{
  "_id": {
    "$oid": "4ddc78d27f8b9ada44000002"
  },
  "browser": "Firefox",
  "browser_ver": 4,
  "os": "Win7",
  "mobile": false,
  "lang": "en-us",
  "project_url": "example",
  "_key": "192.168.126.2@example"
}

/* 1 */
{
  "_id": {
    "$oid": "4ddc78d97f8b9ade44000004"
  },
  "browser": "Chrome",
  "browser_ver": 11,
  "os": "Win7",
  "mobile": false,
  "lang": "en-us",
  "project_url": "example",
  "_key": "192.168.126.3@example"
}

/* 2 */
{
  "_id": {
    "$oid": "4ddc78dd7f8b9adc44000001"
  },
  "browser": "Chrome",
  "browser_ver": 11,
  "os": "Win7",
  "mobile": false,
  "lang": "en-us",
  "project_url": "example",
  "_key": "192.168.126.1@example"
}

Again this example is specific for some of my needs. I still want unique metrics on a per project basis in the system. So the _key field has a unique index on it. That simply takes care of that, whereas before I was using the $set operator, now I just save without worry. The slight difference is this time there won't be any updating of information if the user came back with a different browser later on. Like before the metrics aren't skewed because each "unique" visitor only has one entry per page/project tracked. Also like before, the downside is this isn't truly unique...It's more of a "per household" or a "per office" situation.

I now just have to make a very simple query to get all documents where the project_url is "example" in this case. This will eventually return a huge number of documents. Far too many to just pull up and loop using PHP each time I want an aggregate. So we don't need a map reduce now. We could use one, but it's overkill. We just want "half" of that...The reduce part. 

So we use the group() command and it kinda all works out to be similar in nature as the map/reduce. Again with Lithium, we want to get the connection using $db = ProjectMetric::connection(); and then we can run $db->connection->command() calls. Also don't forget the MongoCode class. We'll need that again for the reduce and finalize part of this.

Here's the juicy stuff.

$db = ProjectMetric::connection();

// We just need a reduce, not a map reduce in this case, it's actually a group() command
// We're also going to use a finalizer to format the data afterward.
$reduce = new \MongoCode("function(doc, prev) { ".
    // count browsers
    // first if the browser doesn't exist, set a key name for it with a value of 0
    "if(typeof(prev.browsers[doc.browser]) == 'undefined') {".
       "prev.browsers[doc.browser] = 0;".
    "}".
    // then increment
    "prev.browsers[doc.browser] += 1;".
    
    // count operating systems
    "if(typeof(prev.operating_systems[doc.os]) == 'undefined') {".
       "prev.operating_systems[doc.os] = 0;".
    "}".
    "prev.operating_systems[doc.os] += 1;".
    
    // count mobile devices... this is easier it's just one value
    "if(doc.mobile === true) {".
        "prev.mobile += 1;".
    "}".
    
    // count languages
    "if(typeof(prev.languages[doc.lang]) == 'undefined') {".
       "prev.languages[doc.lang] = 0;".
    "}".
    "prev.languages[doc.lang] += 1;".
"}");

// Make the output pretty, we don't want any spaces in key names for example...
// So we're taking those key names and setting them as values of a new "name" key.
// What if extract() was used or something? Those spaces would be an issue.
$finalize = new \MongoCode("function(prev) { ".
    // browsers
    "for (var x in prev.browsers) {".
        "prev.browsers.push({ name: x, count: prev.browsers[x] });".
        "delete prev.browsers[x];".
    "}".
    
    // operating systems
    "for (var x in prev.operating_systems) {".
        "prev.operating_systems.push({ name: x, count: prev.operating_systems[x] });".
        "delete prev.operating_systems[x];".
    "}".
    
    // languages
    "for (var x in prev.languages) {".
        "prev.languages.push({ name: x, count: prev.languages[x] });".
        "delete prev.languages[x];".
    "}".
    
    // mobile is different and is fine as is
    
    // return the data
    "return prev;".
"}");

// Run the command
$metrics = $db->connection->command(array(
    'group' => array(
        'ns' => 'project_metrics',
        'key' => 'project_url',
        'initial' => array('browsers' => array(), 'operating_systems' => array(), 'languages' => array(), 'mobile' => 0),
        '$reduce' => $reduce,
        'condition' => array('project_url' => $url),
        'finalize' => $finalize
    )
));

// Set the results, they'll be formatted how we want them to be
$results = $metrics['retval'][0];

// We can get unique visitors basically by the total number of documents pulled back
// because there is a unique index on the collection, It's on the "_key" field which
// takes the visitor ip and combines it with the "project_url" value. So there can't
// be duplicate entries so unique visitors is simply the total count.
$results['unique_visitors'] = $metrics['count'];

I commented it pretty well (honestly for my own benefit haha), but I'll briefly explain. The $db->connection->command(array('group'... part. We need to define "ns" which is the collection name, we also need to specify the "key" which in this case is the project url. This key is what to group the documents on.  So all documents with a "project_url" key of a given value are going to be grouped together. If we wanted to retrieve aggregate data for multiple URLs then this would come in handy. In this example, the "condition" key is going to kinda ruin the whole grouping thing because we're only going to return documents of the value we're grouping by. So one group always. Not several groups of data per URL.

The magic parts are the $reduce, initial, and finalize keys.  The reduce is actually another reduce function using the MongoCode class. So that's very similar to map/reduce. Same goes for the finalize function. The initial key is new. That essentially sets some default values for us to use. These will persist so as we loop we can easily keep adding on to them. In the most basic situation we have this "mobile" key set to 0. In the reduce function we're just incrementing it by one everytime the mobile value from the document (also named "mobile") is true. Simple. The others are also just counters.

The final results returned have array key names that have spaces in them. Ew. We don't want that in PHP. So I just made a simple finalize function that went through and moved the value (the count) to an array for each item to count that has a "count" key and a "name" key. The value for the "name" key of course can have spaces in it and there won't be any problems for PHP. It is possible to have spaces in the key names, but if we wanted to run extract() or something, there'd be issues.

That's about it. Another example. I'm not sure exactly how practical these examples are, but they do show you how you can make these operations in PHP when using Lithium. Once I wrap my head around things a bit more, I may try to give back a little and take a stab at making methods within the MongoDb adapter for these things.

On a final note. The group() command is designed to be executed in "real time" whereas map/reduce is more of a background task. So, back to practicality. This may still not serve my needs as the collection grows because the query could end up taking a long time. At which point I guess I'll have another map/reduce example to show for.

Map/Reduce in Lithium for Visitor Metrics

Posted by Tom on Tue, May 24 2011 11:01:00

Update: So after I got a little further along with this very example in a real life project I realized that while it makes for a very simplistic illustration of map/reduce (that I personally found helpful when learning how to perform map/reduce), it is not a very good real life example. The reason being... Smile ...The document size limit in MongoDB. Doh! You couldn't store metrics like this. However, ignoring the purpose of this, you can still continue reading about how to perform a map/reduce within Lithium.

Original Blog Entry

I'll start off by saying I love MongoDB and map/reduce after putting it off for some time. I dreaded learning the map reduce functions big time. It turns out, it's not that bad. A friend asked me to explain it in 10 words or less. So I did. It's not really all encompasing of the features, but it's a real good example for what map/reduce can do for you.

Use JavaScript to identify/"map" data to loop it to aggregate/"reduce."

Ok, so that's 12 words technically, I cheated by adding slashes and combinging two words. It's also really poor grammar. Anyway, that's the idea. In this example, I wanted to collect information about visitors on a web app. Obviously I'm not a masochist, I'd use Google Analytics if I could...Sadly, I could not. So what to do? Well, we can use MongoDB to record all this data and then use map/reduce to get some totals.

I may eventually turn this into a Lithium library (especially because there's a good browscap and language detection class that I'm not illustrating here), but for now I'm going over things at a high level and focusing on the actual map/reduce process.

That said, imagine a data set like this:

"metrics": {
    "pageviews": 63,
    "visitors": {
      "192-168-126-1": {
        "ip_address": "192.168.126.1",
        "browser": "Chrome",
        "browser_major_version": 11,
        "operating_system": "Win7",
        "mobile_device": false,
        "primary_language": "en-us"
      },
      "192-168-126-2": {
        "ip_address": "192.168.126.2",
        "browser": "Firefox",
        "browser_major_version": 4,
        "operating_system": "Win7",
        "mobile_device": false,
        "primary_language": "en-us"
      },
      "192-168-126-3": {
        "ip_address": "192.168.126.3",
        "browser": "Chrome",
        "browser_major_version": 11,
        "operating_system": "Win7",
        "mobile_device": false,
        "primary_language": "en-us"
      }
    }

Now, we have this "metrics" field where ever you like, but in my case on a document that contains some other information. Why not a separate "metrics" collection? We could and then we could also put in things like page URLs that were hit on the site to start getting analytic information about the pages on our site. In my case, I just wanted to get a sense for some high level information about my visitors. For now.

So the first thing here that you'll notice (and I've written about the $set operator before) is that each IP address is the key for each entry. The dots have been replaced with dashes so that it works as a key. Otherwise, I'd have a pretty deep object on my hands. Surprised

So each time a page is loaded the pageviews count goes up and the visitor's browser information is captured using $set so that if the user from the same IP address came back again with a different browser, it would update. My metrics would not be skewed. Yes, it's sad that we don't realize when/if the user actually uses two different browsers...More sad that we're likely counting entire office buildings as one user, but that's just how the cookie crumbles in this case.

Ok, so we have that data and we have some controller action in our Lithium project that's going to return to us an array that we'll pass to the view template to make some pretty pie charts. Why not pie charts? I love pie charts, they give everyone a sense of satisfaction that looking at numbers is really fun! ...Or something like that.

We'll dive right in. Here's the entire action I'm using with the map/reduce code. Note that Lithium's MongoDb adapter does not have any options for map/reduce in the find() or any other method. I may write something in the future for that myself if I end up doing enough of these (and I likely will). However, we can make straight up command() calls from it.

 public function metrics($url=null) {
        if(empty($url)) {
            return false;
        }
        
        $db = Project::connection();
        
        // construct map and reduce functions
        $map = new \MongoCode("function() { ".
            "emit(this.metrics.visitors, this.metrics.visitors);".
        "}");
        
        $reduce = new \MongoCode("function(k, vals) { ".
            "var visitors = vals[0];".
            "var unique_visitors = 0;".
            "var b_counts = new Array();".
            "var browsers = new Array();".
            "var os_counts = new Array();".
            "var operating_systems = new Array();".
            "var mobile_devices = 0;".
            "var ln_counts = new Array();".
            "var languages = new Array();".
            
            // loop all the emitted visitor metrics to aggregate some data
            "for (var i in visitors) {".
                // count browsers
                "if(typeof(b_counts[visitors[i].browser]) == 'undefined') {".
                    "b_counts[visitors[i].browser] = 0;".
                "}".
                "b_counts[visitors[i].browser] += 1;".
                
                // count operating systems
                "if(typeof(os_counts[visitors[i].operating_system]) == 'undefined') {".
                    "os_counts[visitors[i].operating_system] = 0;".
                "}".
                "os_counts[visitors[i].operating_system] += 1;".
                
                // count the primary languages
                "if(typeof(ln_counts[visitors[i].primary_language]) == 'undefined') {".
                    "ln_counts[visitors[i].primary_language] = 0;".
                "}".
                "ln_counts[visitors[i].primary_language] += 1;".
                
                // count the number of mobile devices
                "if(visitors[i].mobile_device == true) {".
                    "mobile_devices += 1;".
                "}".
                
                // count the number of unique visitors
                "unique_visitors += 1;".
            "}".
            
            // loop browsers counted and set for output
            "for (var x in b_counts) {".
                "browsers.push({ name: x, count: b_counts[x] });".
            "}".
            
            // loop operating systems counted and set for output
            "for (var x in os_counts) {".
                "operating_systems.push({ name: x, count: os_counts[x] });".
            "}".
            
            // loop languages counted and set for output
            "for (var x in ln_counts) {".
                "languages.push({ name: x, count: ln_counts[x] });".
            "}".
            
            // return the output
            "return { 'browsers': browsers, 'operating_systems': operating_systems, 'languages': languages, 'mobile_devices' : mobile_devices, 'unique_visitors': unique_visitors }; }");
        
        $metrics = $db->connection->command(array(
            'mapreduce' => 'projects', 
            'map' => $map,
            'reduce' => $reduce,
            'out' => array('merge' => 'mapReduceMetrics')
        ));
        
        $cursor = $db->connection->selectCollection($metrics['result'])->find()->limit(1);
        foreach ($cursor as $doc) {
            $results = $doc['value'];
        }
        
        // Get the total page views for this project
        $pageviews = Project::find('first', array('fields' => array('metrics.pageviews'), 'conditions' => array('url' => $url)));
        $results['pageviews'] = $pageviews->data('metrics.pageviews');
        
        return $results;
    }

Yea, it's not the prettiest to look at. It's my first run through and it's literally based off an example from php.net so that's why there's all those lines concatenated together like that. I wouldn't normally do that. Nor would I use heredoc...But something a little nicer, at least single quotes instead of double. Anywyay, with that you will be returned a nice array (in $results) that will show all the counts for browsers and such. Note, I did not take into account the browser major versions here in this example. Also note that I separately stored a pageview count on the document which does not require a map/reduce to retrieve. 

Now let's look at it deeper. There's a lot of good articles on map/reduce if you spend time with them, they should be pretty clear. Here is a good one. Then you can also look at the MongoDB Cookbook site's example. Also php.net's example. You'll see that you can use map/reduce for many things. Let's go over how I'm using it.

First, the map function. Pretty simple. In fact, you likely wouldn't do what I'm doing here. The idea of it is to basically grab keys and values for a given collection. Those keys should be unique. So in my case metrics.visitors are unique keys. They are also the values that I need. What this does is returns the values to a reduce function.

The reduce function. More complex, but it's all nice friendly JavaScript. Here you're just looping the values that are passed and simply counting some of them. As a disclaimer, my example could have probably been written a lot better and cleaner. I only loop once which is what I was concerned about mainly. The rest can be refactored later.

At the end of whatever you decide to do with all that data, you'll return your values. I'm returning an object here with all the counts. Here's what PHP gets back in $results:

array
  'browsers' => 
    array
      0 => 
        array
          'name' => string 'Chrome' (length=6)
          'count' => float 2
      1 => 
        array
          'name' => string 'Firefox' (length=7)
          'count' => float 1
  'operating_systems' => 
    array
      0 => 
        array
          'name' => string 'Win7' (length=4)
          'count' => float 3
  'languages' => 
    array
      0 => 
        array
          'name' => string 'en-us' (length=5)
          'count' => float 3
  'mobile_devices' => float 0
  'unique_visitors' => float 3
  'pageviews' => int 256

...And there ya have it. What I would do next is actually cache this data so each time I called the action, it didn't have to run the map reduce which could be quite expensive over time with a lot of data.

Cool note: In this example you see the $metrics = $db->connection->command(...) part? Run a var_dump() on $metrics. It will have some handy information for you. It could tell you about an error when it comes to parsing your functions (though I'm not sure how to actually debug things, sorry). It also will tell you if everything was ok and ran successfully. You may wish to check this before returning data. It's on my to do list myself. Also, it will show you how long the operation took which is very handy. You might need/want to index some fields and cache results based on how long things are taking.

Another note: With map/reduce you're actually outputting to a collection. So you're going to pick up your results with another query to that (temporary or not so temporary) collection. This changed in MongoDB version 1.8.0. You now have to specify that 'out' key in the command() call. Here's more information on that

Hopefully these snippets will be of some help to people. I didn't want to go too far in depth with explaining everything, I think there's other really good articles on that out there. My hope is that seeing an example, as it works within the Lithium framework, will be helpful.

Family Spoon and MongoDB

Posted by Tom on Fri, Mar 25 2011 08:15:00

Family SpoonI recently soft-launched my personal project, Family Spoon. You can go to it and use it, it works. It's a recipe sharing website for you, your family and friends. Basically it allows you to create recipes and share them (or keep them private). There's a bit more to it and the site will continue to grow over time. However, with just the sharing there were some considerable hurdles to overcome.

Database Schema
The juicy schema details? Look at the bottom for an example document structure, but for readability I'm not going to paste that here. I'll breifly go over the architecture a little bit. The site was developed in PHP using the Lithium framework. The use of MongoDB was critical to the way in which the site was built, especially for speed reasons. Each "recipe" on the site actually sits in a "page" collection in MongoDB. It's saved there by a "Page" model. Nearly all pages (except static pages and other form pages) are treated this way on the site. Not active, but built is a "blog" section for the site and that also uses this Page model and collection. Obviously, a recipe and a blog entry have very different information. It's due to MongoDB's schemaless design that these two "ideas," which are really the same thing (they are web pages with content), can exist in the same space. Both have titles, but one has ingredients and the other some body copy (and yes some more fields). So right there that saved a ton of time. You need to be aware of, but not meticulously design (and overdesign) your database schema and ensure it's properly normalized, etc. I can now, worry free, add a new model that extends the Page model to add a completely new section to the site and I know it's going to be stored in the same place in the same manner. Done.

In the past? With something like MySQL I would have many many tables to ensure I properly optimized and normalized things. Then when I wanted to add a new section, it would mean either adjusting, or if I did a really really good job, adding at least one more table for the new section. Now, what happens if (and I will) in the future I need to add more than just prep time, cook time, etc. on the recipe document? No problem! I can easily stick in another field and not even think twice. I can also remove them. Again, with other database, I'd have to watch and worry about schema changes. It would take much longer to not only build a site, but also change it. MongoDB makes maintaining a database for a site easy.

Furthermore and leading into the next section is datatypes. Along with database schema, we would traditionally have to think about exactly what kind of data was stored in these tables. Reserving too much is a bad thing and we're sworn off from things like BLOB, so you have to play this delicate planning game. Am I really going to have a user that has a recipe title more than 255 characters? Nah, no way. Really? Oh, well they can't. They just can't. The user will deal with it. The user will live. Eh...ok...But not great. We had to draw the line in the past, but no more! (except for maybe 4MB for now) Smile

The other big thing here with schema and data types is tagging. We love to tag these days. Photos, articles, people, etc. Now, recipes. Family Spoon allows you to tag your recipes on the site in order to be able to find them more easily in the future. These are custom user entered phrases that get stored as an array in the database. Before, we would immediately create a "tags" table to ensure the same word wasn't repeated God forbid. We would join a million times and a week later, we'd have our results. Ok, with memcached back seat driving we'd get there faster. So with MongoDB, we just store those tags in the natural manner we would expect. Again, done and done.

Search for RecipesSearching & Filtering
I'll focus on permissions and search (and filtering) next. First, search. The site features a search box at the top of each page and depending on what area you're in, it will search for different things (the search button copy changes to indicate this too). These are regex searches on the database. It was the use of MongoDB that allowed for this whereas FULLTEXT searches with MySQL wouldn't really work so well. Then again we have problems with searching tags and all those JOINs. No good.

So MongoDB is allowing Family Spoon to run a regex search on recipe titles and tags. It's extremely easy to adjust this to search for other things...Say, ingredients?? Then perhaps filter by ingredient? So let's take a detour. Say we want to add a feature to the site that shows a user only recipes that do not contain shrimp because they have an allergic reaction to that food. Done. It's extremely simple to build this query and have it working on the site. A query for MySQL would work with WHERE IN() and not and ... all that good jazz. Sure, but it's not just about the query. It's about being able to handle that tall order. If I was using MySQL for Family Spoon and wanted to show people recipes that did not contain shrimp...I would have to put all sorts of caching strategies in place and perhaps if a popular enough feature, would need additional hardware to support the increased load. In fact, I wouldn't do it. I'd use a search engine like Solr. Ok, so now something else to setup and configure. Yes, we know how to do it, but time, time, time.

Back to regex searches. Again, you simply wouldn't do it with MySQL. You'd setup a search engine. I may eventually need one for Family Spoon (for things like weighting), but right now, no. There's no additional search engine. It's just running queries on MongoDB. We're also filtering. We're going to roll right into permissions here with that.

Permissions & Access
So, we have filtering in addition to search because we can show "your" recipes, "public" recipes, or "your family" recipes. This is determined by the access that each recipe gets assigned by it's creator. So in addition to search, we're searching specific recipes. While searching for public recipes, we mean only return recipes that also are flagged as being shared with the public. When searching your own, obviously where your owner id is set. When searching or accessing your family recipes...This is a little different. We now have to determine exactly which recipes you have permission to see.

Hey remember that ACL thing in MySQL? Haha, yea you know the one where you have all those tables and numbers and JOINs? Kiss that one bye bye. The access rules are extremely simple for Family Spoon and without some awkward tree table to screw up, they're also more reliable. It's a faster query too. In fact, most the time just for loading the recipe page document you have all the information you need to determine access. So why would we want to then jump through a bunch of hoops to determine access? Be efficient...And should access be allowed, you also now have the data to display. One query. The user's data is cahced in the session. So... What more can I say? Yes, there's a bit more to it and not all situations allow for that. Family Spoon also uses Facebook and you can share recipes with your Facebook friends. So that's another call to somewhere else, but for the most part, we aren't really talking about a lot of strain on the database.

Recipe SharingFamily Spoon can get a bit more specific too. You can share recipes with specific individuals. Again, an array of ids is stored and a simple $in takes care of it. Or, returning the entire field and using PHP's in_array() function takes care of it. How many people can a recipe stored with? How many ids could be in there? A lot less than the number of rows that would be in ARO and ACO tables. So, we're efficient too when using MongoDB. Not just fast, but efficient.

The really "rapid" part that MongoDB helps us with is when it comes back to the code. Building forms and saving the data down to the collection is far easier than it would otherwise be with an ACL system inside a relational database. There's less being written to the database and less code to do it. Sure, your framework could give you a wonderful API for working with permissions, but the minute you need to deviate from that, you could be opening a can of worms.

Ratings & Voting
Along the same lines is something like ratings. On recipes we can have a star rating. Anything that we need to tally and aggregate or increment even. MongoDB makes this easy. So ratings alone aren't a show stopper for any database or system, but when you then say, "Ah yes, but I only want each person, user, IP, to vote one time only." Then you have to put your thinking cap on. You're going to end up with some more JOINs... Also, more code. You're going to have to make the query and check to ensure that some person isn't trying to vote twice. With MongoDB, you can use the $set call and simply keep adding these ratings to a field and make the key equal to the user id or IP and the value their rating. If they vote again, they aren't adding a new rating, they are simply updating their own. Ta da. Now we've reduced the number of tables, data, and code required to setup a rating system. I've actually posted another blog entry about this here.

Conclusion
So, for many reasons, using MongoDB has made building Family Spoon a much quicker process. Trust me, I actually did it both ways. Family Spoon has been completely re-built. While the other version never really appeared out in the wild, it did exist (and was online). Previously, Family Spoon was built using the CakePHP framework with MySQL for the database. So I can most definitely tell you the differences in the amount of code and time planning between the two versions. I rebuilt the site much faster not just because I knew what I was going to do the second time around, but also because I had less to think about when it came to the database schema. Yes, you need to be aware of "schema" and you can't go hog wild, but you also get more forgiveness and MongoDB works with you to solve your problems. It's very flexible. It's not something that you need to work around, it's something that you get to work with. Anytime that you have a situation like that as a developer, your day is going to be much more happy and productive.

Do I hate MySQL? No. Definitely not, years and years worth of use and relationship aren't easily erased. It's comfortable and it's familiar and just fine. In fac,t still preferrable under certain circumstances. Just not mine. Let's make this clear, I am the only person developing Family Spoon. Just one developer and, without using MongoDB and the Lithium framework, I can tell you that it would have taken me a lot longer to not only get the site online. It also would have taken longer to provide some of the advanced features that people will be looking for (filtering, searching, etc.). Both MongoDB and Lithium not only served me with rapidly getting the project up and working (with all the core functionality that I needed) but these technologies will also be serving me into the future with rapidly being able to grow the site and add new features.

Last, so this didn't interrupt your reading pleasure, this is an example document from the database. It's just an example and not complete, but I wanted to highlight the schema and how things like tagging and ingredients worked. Each ingredient is broken out and that's going to go a long way for filtering. Filtering without JOINs.

{
  "_id": "4d6564cb9bae6c1066000000",
  "created": "Wed, 23 Feb 2011 11:49:31 GMT -08:00",
  "directions": "Place the chicken in the crockpot. ...",
  "ingredients": [
    {
      "ingredient": "boneless chicken breasts",
      "quantity": "3",
      "measurement": "lbs"
    },
    {
      "ingredient": "milk",
      "quantity": "1",
      "measurement": "cup"
    },
    {
      "ingredient": "salt"
    },
  ],
  "modified": "Fri, 11 Mar 2011 09:46:31 GMT -08:00",
  "owner_id": "xxxxx",
  "owner_ids": [
    "xxxxx",
    "123456"
  ],
  "page_type": "recipe",
  "public": true,
  "public_rating": {
    "127-0-0-1": "4",
    "255-255-255-0": "5"
  },
  "published": true,
  "serves": "",
  "share_with_friends": false,
  "tags": [
    "chicken",
    "crockpot",
    "broccoli"
  ],
  "time": {
    "prep": {
      "amount": "15",
      "unit": "minutes"
    },
    "cook": {
      "amount": "4-6",
      "unit": "hours"
    }
  },
  "title": "Crockpot Broccoli Chicken",
  "url": "crockpot-broccoli-chicken"
}

Creating a Simple and Efficient Rating System with MongoDB

Posted by Tom on Fri, Feb 18 2011 14:51:00

I wrote a previous article on using MongoDB's non-locking updates and it's $inc operator in order to allow for a simple counter that would create minimal impact on your database. Related to that, let's suppose we want to make a rating system. Say it's a star rating; however, it can be any kind of voting or polling system really.

So here's the trick. You're storing this star rating on a piece of content (a document) and you already queried that so you can display the data on the page. We're in NoSQL land so we don't use JOINs and we really don't want to query another "rating" collection of some sort. There's no point. So we keep under this document a "ratings" array. Now when you load the document you can simply count and divide to get your average with the language of your choice. You could also make another count() query to MongoDB but again we want one query.

So how do you store ratings? You want to do an non-locking update to the database and much like incrementing, we want the impact on the database to be low. It's not that MongoDB isn't an amazing database that is more than capable of handling these updates, even if you need to make a read first, but we want to be efficient. Let's assume we're making millions of queries and this is all adding up to save us more than a few pennies in hosting costs.

Ok, so let's take a look at the $set operator. It's a simple one, nothing magical. It's just some cleverness with what we are setting. Under this "ratings" array we're going to set each users IP address (substitute for user id if you have some sort of authentication system you'd like to rely on instead) into this ratings array with a value. 

$set will set the value or update it. So you don't have to worry about repeat voters! They simply will change their vote and not skew your results. If you don't want the users to be able to vote twice then you can simply disable the link to vote on the page by looking for the IP in the array of ratings. Which, yes, is a simple deterrent and the user could technically lift the URL to hit to make the request to vote again...But again, your results aren't skewed and if you need further protection you can come up with something else. 

Let's apply this to my favorite framework here, Lithium. Here's what your query might look like:

$query = array(
  '$set'=> array('rating.' . str_replace('.', '-', $_SERVER['REMOTE_ADDR']) => $rating)
);

$conditions = array('_id' => $some_id);

$result = Page::update($query, $conditions, array('atomic' => false));

Do you see the gotchya? It's the IP address. You can't have a key name with dots in it. When passed to MongoDB it's going to translate to object hierarchy. So you'd end up with something like: rating" : { "192" : { "168" : { "126" : { "1" : "3" } } } } ... No good. So if we replace the dots with dashes, underscores, or hash it, etc. then you'll end up with something like this instead: "rating" : { "192-168-126-1" : "5", "192-168-126-2" : "3" }

Now if the same IP address voted again it would simply change the value in the "rating" array on the document. This way a user couldn't skew the rating by contstantly clicking on a star rating widget, submitting a form, etc. Of course you can further limit the impact to your database and server by setting a cookie.

Now it's just simple math to get the average. If you're using PHP, you'll run a count() on the ratings array and then add up the values and divide. Simple!

What would you have to do alternatively? Well, you might store another field with all IP's or user id's that voted in addition to a rating field... But now you've disassociated users with their votes. You could also store a different field for each option and then use $addToSet to add the IP to each option. Great, but then you allow a user to vote once for each option. You won't be completely skewed, but you wouldn't be as accurate. What about MySQL and relational database land? Well, you probably don't need to hear about all the ways you can use JOIN and how many rows you'll be scanning through to get your results and what kind of indexes you need to make sure you build.

Hope you found this little schema example useful in a schemaless world.

Minerva: Closer and Closer

Posted by Tom on Tue, Feb 15 2011 21:49:00

Minerva, my CMS built using the Lithium framework for PHP 5.3+ is really coming along. Or at least I think it is... I want to keep increasing awareness of it, but I don't want anyone getting the wrong idea. It's still just a starting point for a CMS and not actually a finished and usable product. However, it is getting pretty close.

I just hope that enough time goes by that gives people the chance to weigh in and give me their feedback and thoughts before I get too far along and commit to something irreversable (or something difficult to reverse). I've used many CMS' over the years, far too many to list (or remember), so I've drawn from that. I've added what I believe are the best parts of the CMS's I've seen and used...I've also simplified.

I truly believe there is no such thing as a 100% single serving CMS. People choose different CMS' for different reasons and to achieve different goals. Then you have to edit them and either code or fine add-ons in order to make it complete...That's just for those who actually use a readily available open source CMS. There's also the custom applications that get built. I've been there too. I've done it all and; in the end, it's all just a good waste of time. I'm hoping to strike a middle ground between a jam packed fully featured CMS and a collection of a few class that you put together using a framework. Using the Lithium framework, I think there's a flexibility for the CMS that can't be found elsewhere. 

If you aren't familiar or haven't read a blog post of mine in the past about it...It's a CMS for developers. It's not for just anyone to grab and use and I doubt there'd ever be a "for dummies" book out there for it. That said, I hope that I'm not the only the one that will benefit from it, though I definitely think I will benefit from it so that means it has some value. 

I've revamped it a bit and consolidated a lot of code. It has an admin interface and system for adding templates to help you build out the front end as well as gracefully modify the admin interface. There's also access control so technically it has most (if not all) the features you'd want in for a CMS excpet some of the "friendly" things like a bunch of helpers and tinyMCE. Again though, it's under heavy construction.

Give it a look on Github and check it out clone it. It's usable to the degree that you will get the idea and understand the general direction of it. I'm really hoping to get some feedback...Otherwise maybe it'll only end up being of value to me, but I hope not. Speak now or forever hold your peace. Smile

Auto Tagging Content

Posted by Tom on Fri, Jan 21 2011 11:07:00

It's 2011, I firmly believe that we should have systems smart enough to automatically tag our content. Our tweets, our FaceBook wall posts, our blog posts, etc. There's even face detection APIs available out there. So surely getting context and keywords from a copy should be easy, right?

There's a few major players in this arena and sure enough there exists some free APIs for getting keywords and context from copy. Reuter's OpenCalais, Alchemy, and Yahoo term extractor are all great services to use. However, how current are they?

I have a new project where I want to auto tag content (I also plan to add auto-tagging to my Minerva CMS), but there's a few issues. First, the content is going to be very very recent so these services may not have had a chance to pick up some of the terms. For example, "iPad" doesn't come back when using the Alchemy service. It doens't know that "iPad" like "iPhone" (which does come back as a hit) belongs in the Technology category and should be a keyword. Obviously this will change as time goes on, but as of Jan 2011, it's apparently not currently in their index. Yahoo's keyword extractor does seem to pick up "iPad" interestingly enough. Maybe you have to use multiple services for the best coverage.

So since you've found my site, you may or may not know that I'm in love with PHP, the Lithium framework, and MongoDB. Well, I am. So I took the Alchemy API for PHP and converted it into a Lithium library. Basically this involved namespacing the classes, breaking them out into separate files, and making sure the could depend on each other via "use." I then added two methods in the API classes (there's a CURL class and a "normal" one) that help out a bit. One grabs config information set when you call Libraries::add() so that you don't need to set your API key everytime you go to make a call and the other method is just for convenience so you can call the API statically. It's really not completely static of course, I didn't go through and re-write the API, it's just a static wrapper. Now you can simply do something like:

use alchemy\AlchemyAPI;
$data = AlchemyAPI::call('TextGetRankedKeywords', array('The iPad will feature a slot for reading SD cards unlike the iPhone', 'json'));
var_dump(json_decode($data));

With that, you'd then get back a nice array of data from the service. Again, I just like calling things like this statically, but you could also say $alchemy = new AlchemyAPI() and yada yada yada. If anyone wants this Alchemy library for Lithium, let me know because right now I'm unsure if I'm going to use it myself. It wasn't a lot of work to modernize these classes, but it might save you some time. I'll probably also end up making something for OpenCalais and Yahoo work as well for Lithium.

But back to our issue at hand. How do I go about getting relevant keywords? I'm half ready to spend an insane amount of time trying to use an artificial neural network or something (cool fun fact, there's a few for PHP actually)...Perhaps keep a list/dictionary in MongoDB and search against it? I'm not sure. There has to be something clever that MongoDB can help out with. Hopefully someone out there might know and leave a nice comment. Smile 

For now, I hope that maybe I gave you a few ideas and some resources to go look at if you've never attempted this before (or even if you have and were unaware of a few of the mentioned services) and keep an eye out for some possible Lithium libraries for this stuff and definitely expect to see it in the CMS I'm working on.

Lithium Quick Tip: Incrementing

Posted by Tom on Thu, Jan 20 2011 14:13:00

As I was pointed out today in the wonderful #li3 channel, MongoDB has an $inc operator, awesome! One I didn't see before (there are oh so many handy ones). So how do you write this in Lithium land? Well because we like being efficient and all, let's also look at how to do this without first reading the record (ie. an "update()" call).

Model::update(
// query
array(
  '$inc' => array(
    'field_name' => 1
  )
), 
// conditions
array(
  '_id' => $id
), 
// last but not least
array(
  'atomic' => false
)
);

There you have it. To decrement you'd put -1. Now also note that in my case (and it's worth mentioning) the value I had to increment was within an array actually. To get at that field you can simply use dot syntax to jump down into it. So the query would be more like:

array(
  '$inc' => array(
    'field_name.value_within' => 1
  )
)

Putting it altogether you can in pretty much one line, one call, very nicely increment or decrement values. Since MongoDB is so wonderful this gives you a very easy way to keep say a real time hit count or something for content within your Lithium application.

Lithium Quick Tip: Displaying Dates

Posted by Tom on Fri, Jan 14 2011 10:34:00

So if you're using Lithium and MongoDB you can set in your model the $_schema property which will allow you to set data types for your fields. So things may look like this: 

protected $_schema = array(
  '_id' => array('type' => 'id'),
  'created' => array('type' => 'date')
);

With that, the "created" field will be of type "date" which will be a MongoDB date object. This date object isn't going to display properly when you got to print it back out in your view templates. You'll end up with some weird number value. The timestamp is actually within this date object that's returned under the "sec" key. So it would be $document->created->sec in your view template (assuming $document was your result from the database).

How about formatting to a pretty date? If you're coming from CakePHP or some other frameworks you may be spoiled and have a time or date helper. Lithium is pretty lean and leaves that as one of your responsibilities. Fortunately, alkemann has created some helpers that mimic some from CakePHP for Lithium. These include a Time helper. So in your view template, you can use:

echo $this->time->to('nice', $document->created->sec); 

That will output something more readable like, "Fri, Jan 14th 2011, 02:29" instead now. Don't have or want those helpers? You may be crazy, but you can also use:

echo date('Y-M-d h:i:s', $document->created->sec); 

However, I would definitely give those AL13 helpers a look, they are pretty nice. The important thing to remember here though is that when using MongoDB date objects, your timestamp is under this "sec" key. Of course, you can always store timestamp integers or string values, but then you're really crazy.

1 | 2 | 3