Friday, December 17, 2010

It's not about scrum or agile, it's about people

One of my ongoing pet peeves it's the marketing over substance of the usual "agile" and "scrum" proponents.

I've found near constantly that the focus of this approach by it's advocates, is not to facilitate a team; but to prove that the process (i.e. scrum based agile) is the best (and typically only) way to run a technical project, and write software.

Agile? ......... sounds more ridged to me.

I find this ironic as the pushers  (appear, in my experience to) rarely come come from a software architecture, or at least development back ground. It more appears a thinly veiled covering over typical old school management tactics.

As in part demonstrated by this post :

Ignoring the apparent "I'm right your wrong" title (which sums up what I'm talking about) and going straight for some of the contents of the post, that  supports my points.

(Points they are ironically trying to distance themselves from, reading Arnold Mindell would help so many .....).
"Will Jessup notes that developers tend to be highly intelligent people and often won’t just do something because we say so. He suggests a dialogue."
A dialogue ? You mean actually communicate with a team, share a vision and try and focus on a desired outcome opposed to just telling people what to do ? ......... now there is a strange approach that will never catch on ......... sheesh, is there that much ego about still ?

I'll ignore the "developers tend to be highly intelligent people" as anyone can see that for how it's meant.

There is really little surprise there is resistance to the technique when this approach in used.
"Kevin Shine suggests that we should move from selling to getting them to buy in of their own accord: “There is a big mind shift between these 2 thinking approaches. One is collaborative and cooperative and the other is more command and control."
The obvious issue here is that the very common C&C driven approach is much more accepted and the norm, as well as being from the middle ages. The premise of "us and them" is set up by the method even before the person has met the "software team".

The situation is walked into with the mind set :
"I'm going to have to tell them how to do it and what is right. If they resist it's because I've not talked enough and told them what to do. Developers are difficult, I really need to dominate to prove myself." 
opposed to :
"I'm here to facilitate an outcome, that of a cohesive efficient directed team. The team will deliver a finished product. I'm not actually building it or solving the logic, though I do have a role to support the team how ever I can, and interface with the business. My role here is to understand the context of the work within time lines and communicate that to both areas. This is we, not us and them"
If the focus was an outcome opposed to being right, the focus would be on humble facilitation and progress, not proving process. One of the biggest failings of the application agile (not the method itself) being the tenant that it's pushed as:
"Agile right - developers and experience wrong".
It's top down hierarchy, "Developers are stupid and argumentative, treat them as such", again no surprise there is resistance.

But the focus of dealing with that resistance is always the developers? ......... how about the cause of it, the manner in which the change is being applied, and by whom ........ an applier of agile who is complaining about resistance maybe .......

That is seems some kind of revelation that team work should be collaborative or cooperative to the usual pushers of this technique speaks volumes.

This has nothing to do with agile or scrum. The title of that piece should of been : "Why being difficult, condescending and telling people what to do without engaging them will cause difficulties".

An example :
"In addition he points out that a new Scrum Master needs time to gain the respect and trust of  the team members."
Isn't this kind of thing blindingly obvious?  Sure I might of trained as a counsellor to help me understand group work, communication and respect. Let alone how to be client focused (which I switch to the "client" being the  project at work) to help me take ego out of it.

But given that these kinds of points are being talked about as if they are new or some kind of magic approach, is very disheartening and explains a lot as to the failure of enforcers of scrum & agile.
Lacking a clear and compelling and a sense of urgency the team doesn’t commit. Ashish goes onto suggest 1-1 coaching, root cause analysis and rigorous use of retrospectives to help team members discover issues on their own.
You don't need urgency, though I sense that is just code for "panic and rush everything".

What you do need is an understanding of timing in a business context.

The 2nd point that isn't talked about from this quote, is that the assumption is that it's always the developers or "other team members" that have the issues. Wholly arrogant in my view. It's just as common that it's the people trying to enforce agile/scrum or management themselves who have issues, just as we all do, it's called being human.


Overwhelmed by ego and the need for the method be be right, makes most view anything that challenges them as something that is "wrong" or "difficult".

Opposed to seeing the method as a way of facilitating a better outcome for all. How long are we going to have to endure this kind of thing being top down and prescribed?

Little wonder a lot of successful software houses now are small developer lead teams. Any why ? Because it's fluid, communication is done when necessary, and the subject experts are allowed to engage opposed to being manipulated. Let alone the lack of interruptions .......

Leadership it's about communication and facilitation,  process is just a way of managing that.

Resistance is a good thing, it means there is passion and thought, there is a reason and likely something that hasn't been considered. You investigate it, you don't try to bully or scare it away, as some would seem to advise.

I think a lot of this is tied into typical business people's lack of ability to listen, either to a team to hear what is actually going on, or more importantly themselves to detect the real root cause of the resistance ....... ego.

A good summary here :


Also note ............ I've not said agile is bad any where here, only the way it's commonly pushed. I actually believe that it's well places in some projects and can help, though it's just a tool.

It is suitable in some places, though you can do a lot of damage with the "right" process tool in the wrong place. And a lot more damage using the wrong tool in the wrong place, in the wrong way!


As always it's about people interacting and cooperation, and the core essence of that is self awareness for each member, and quality communication.

It's not rocket science, though for most it's not east and something we all have to work on. Process is nice, though people are more important.


Tuesday, December 14, 2010

XenForo and CodeIgniter Integration


For a current project I am experimenting with XenForo and integrating CI (CodeIgniter) to build some extra functionality.

They are other areas of the site (not form specific). I just needed to be sure that the user (Visitor in XenForo terminology) was logged in, and if they were a super admin; while having access to libraries and the MVC of CI.


The set up for XenForo is fairly minor, there is none. Just install it and you're done, for this example it's installed in :


URL being :


I've installed (a default 1.7.3) in a separate directory for the moment just for ease of example (structure you site as needed, as always):


Example URL being :

Browsing to, gives us the obligatory all OK from the CI welcome controller :


I've shared the CI libary on gitHub :

As with any CI libary, download that and put the config/xf_auth.php config file in :


and the library/xf_auth.php in :


The two config settings in there being the URL for the forum and the path for the XenForo install, only the file path is used at the moment, so :

$config['xfAuth']['fileDir'] = '/var/www/ci';

No harm is setting the forum URL, I've just not written the functions to use it yet.

xF_auth is a wrapper for the XenForo_Visitor class, which is part of the XenForo core. It handles getting an instance of the Visitor and checking permissions for various actions and access.

Hello World

With CI, XenForo and xF_auth all installed and configured, time to test CI.

The default welcome controller for CI is :


Edit this file and add the library load to the constructor :

class Welcome extends Controller 
    function Welcome()


And then for the sake of testing, echo some debug from the controller opposed to creating a new view. That can be done once you get into development.
    function index()
            echo "You're logged in to XenForo !" .
User id :: " . $this->xf_auth->getUserId() .
username :: " . $this->xf_auth->get('username') .
email :: " . $this->xf_auth->get('email');
            echo "You are not logged in";


/* End of file welcome.php */
/* Location: ./system/application/controllers/welcome.php */

Now you will see :

(as long as you've logged into XF that is!)

You have now bootstrapped XF in another environment and can build functionality else where on the site using familiar CI MVC techniques.

Future Development

XenForo at time of writing is in beta, and approaching RC. Though having a library abstract the bootstrapping means that any changes can be hidden from a CI application and implementation.

Usergroups - The next functions to be added will check what usergroups and permissions the user/Visitor has so the C of MVC in CI can build the views correctly.

Templates - Being able to call XF templates within CI (view integration), though using the XF context and phrasing is a goal.


Sunday, December 12, 2010

How to get involved with PHP and support open source software

I noticed a post from an past college Scott MacVicar about getting involved with and helping out PHP.  Personally I've always thought about it and wanted to, though have always focused on hard core C, thinking :

"Helping PHP, means writing hardcore C, and building libraries"

The more I thought about that statement, and the way I typically think about software, Open Source and communities, I realized what nonsense that is. There is so much more going on in a project and community, therefor there are many more ways to help.

Developing a technology & community is a holistic activity. Some of us are writers, framework developers, application developers, business decision makers, power users, clients making requirements, community members, and so on. Though a lot of us, are more than one role, it takes us all and there is always room for more.

Though you could use these points for any OSS project, I'm just focused on PHP. Here are 10 I've thought of, in some kind of progressive intensity.

  • 0. Talking about PHP
  • 1. Actually using PHP
  • 2. Online Community
  • 3. Writing about PHP
  • 4. In Person Community
  • 5. Presenting about PHP
  • 6. Helping Open Source Projects
  • 7. Documentation for PHP
  • 8. Test Cases
  • 9. Developing the PHP source

0. Talking about PHP

Quite the opposite of fight club here. By talking about PHP I mean when you are in a development environment, where decisions are being made. Or a business environment where people are nervous of open source (yes places like that still exist......). A common rebuttal I hear is "It's just a scripting language, is it really up to the job" ........ this by people who are on Facebook/Digg/various forums at the same time!

Combat the fear and ignorance with some numbers.
It's out there, and en mass. So remind nervous people it's all good, all the cool kids are doing it ;)

1. Actually using PHP

Talked the talk, also walk the walk.

Knowing PHP is one thing, but by actually using it you're creating more software and systems in it. So there is more of it in the world, making it more ubiquitous. When things become more prevalent, they typically become more accepted.

There are a few aspects of acceptance, talking about something doesn't mean it's accepted. When something is used and considered de facto, it's accepted. I believe it's akin to evolving development methods in teams (which is something I focus on).

Just because I  advise a team about a development technique, doesn't mean it is accepted, and the standard. When they are used daily and accepted, it is.

There are various was of using and supporting PHP, not only writing raw PHP code at work. You can also use a project (growing its install base), where the project itself is written in PHP. Giving lots of cash to SAP or ....... has SugarCRM been evaluated?

    2. Online Community

    The whole point of the internet is information sharing and connection. The resources and communities out there are vast. A quick Google of "PHP development forum" will give you pages to go though.

    And that is just for the language itself, there are lots of communities that are focused on the development of something specific (which you might be more interested in) that just happen to use PHP.

    So you approach it from both sides, language first or interest first.

    Most of the greatest works are done in teams, not by super star individuals (some are, though it's rare). We all have our areas of specialty, and more important our areas of interest. For instance there is little I don't know about migrating data between forums and CMSs and still get emails and IMs about various issues. Being able to share the information, pass it on, get feedback and make it better is a core philosophy of what we do. 

    I'm quite the fan of : Cathedral and the Bazaar by Eric S. Raymond for community and OSS.

    3. Writing about PHP

    I suppose to a degree what I'm doing here. Make some noise, let people know what is going on, get more people on board. There are always things to do, as well as people joining and leaving their roles in the community.

    You could be writing some kind of howto for the technical people, or and introduction targeted at new developers.

    If you're a business type maybe a paper, or review of how using PHP in your business helped things - positive engagement with OSS. Enterprise might move slowly, though when it does, it turns up with big cheque books and looks for people who know what they are talking about.

    More importantly people who've done it before and can demonstrate that success.

    Maybe you've seen a blog post that you disagree with, or that you want to add too. I did that a little while ago with another Top 10 list of PHP developer improvements just because I wanted to by my own 2 cents.

    4. In Person Community

    As I mentioned above great things get done in teams, and communication is key to this. We are usually surrounded by like minded people and just think we aren't because we've never met them ! A bit of a truism if there ever was one.  is a perfect example of how to get involved with a local community, or show and tell. has links to specific groups (a few broken ones in there, though you can Google around that !).

    Not only are you going to potentially make new friends, which is rarely a bad thing, you'll get to share your ideas, see others ideas, get to ask questions, feel good when you answer some and even network for jobs and contracts.

    5. Presenting about PHP

    Personally this terrifies me. When I trained as a counselor I found out that public speaking is the number 1 fear, 2nd is flying.

    Though it gets better with practice and is a lot easier when you're talking about something you're involved with and passionate about.

    Just about all people want to hear what you have to say or they wouldn't be there, and even if they don't they'll be tweeting away on their laptops.

    Presenting doesn't have to be to hundreds of people at a big conference either, starting at that point would be brave and more than a little scary. Though once you've talked to 2 or more people at work or say a MeetUp, you are presenting, you're most of the way there. The rest is just scale and organisation.

    Find your passion and go for it, Rasmus Lerdorf  is still at it.

    6. Helping Open Source Projects 

    Just about all of the ten points here you could apply to any OSS project.

    The spirit of the movement is the reason we have PHP in the first place Its because of people (starting with Rasmus Lerdorf in 1995) putting in the effort to solve a problem and then sharing it for everyone.

    If there is a OSS project you use, there is very likely instructions on how to help or where to engage. Or even better maybe there is an app or tool that you've developed and use yourself because you couldn't find anything else. In all likelihood you're not alone, so why not share it?

    GitHub is an ideal place for this, I've recently had a few ideas myself and have put them there for a current project. I just need some more hours in the day now .......

    7. Documentation for PHP

    I think the documentation of a project is likely one of (if not the) main criteria for success, aside from the technology doing what is supposed to of course! Documentation in coding standards is one of the things I always encourage and support (read enforce!) when I'm leading a project.

    Also it's a very valuable skill for employers who know the value of quality technology. Delivery of system, fully documented is a great thing.

    It's the first port of call for new people, and then for everyone as they learn, look up errors, check arguments, etc. One of the main reasons I think closed book syntax test for developers are completely pointless, it's not how we work.

    The comments and feed back are another excellent example of a way to help. Many times I've seen an example or how to in a comment that's saved me hours.

    Rather than just reword Scott for how to help out with documentation, I'll just quote him :
    "To get involved with documentation, the only thing you need to know is that the format used is DocBook. While it can be somewhat complex to understand, it does a good job of abstracting the actual documentation from the output format so the same source can produce the PDF, HTML and the Windows compiled formats. There is a live editor you can use anonymously to generate patches. There is also a wiki page that explains more."

    8. Test Cases

    Believing something works and being correct most of the time is nice, though it's a cowboy way of doing things and best left to the fly by night hackers. It's not engineering. If you want to be sure of something you should be able to prove it, and prove it continuously. Confidence is key.

    My personal favorite on PHP projects themselves is : for continuous integration. There is no real excuse for not doing it on any sizable PHP project, or in any main stream language really.

    Personally I use ZendStudio and it's built in, even wizard driven. Though all this is to test applications, which is great, being happy with Test Driven Development and knowing it's value is great for quality.

    Though there are test cases for PHP itself, same idea, just one layer down in the stack. Is the language doing what it says on the tin ? Over at PHP QA, they have a great How to Help page explaining it all, and what to do.

    Being asked in an interview "Do you use or understand TDD" and being able to say yes, is one thing, saying that you help or are are of the team that does it for the language itself ! .... well that is mighty kung fu indeed.

    And to think I didn't like mathematical proofs in university .......

    9. Developing the PHP source

    The point of this blog post was more to do with the activities you can engage in without actually programming for the source.

    Though who knows, one day!

    Getting happy working with the source, compiling it and writing your own extensions would likely be a good start, here is a book for that : Extending and Embedding PHP. Though in all practical terms finding and reproducing bugs is a good way of helping and finding out how things work :

    And if you've gotten this far you don't need my suggestions any more.

    First rule of PHP club

    Well all know the first rule of PHP club..... is talk about PHP club.

    Talk is good, though action is better. PHP has helped give lots of us great jobs, careers and opportunities. What  are you going to do to give back and help PHP?


    Wednesday, December 8, 2010

    The worst job interview ever for software & systems developers

    Recently I've been meeting with a local company to asses what opportunities they have, how they operate and more importantly, how they treat people.

    I wanted the first hand information opposed to going on their reputation.

    I met with them 3 times (have to give them a fair run and all that) and each time there was something strange going on. First time around, in a 14x14 room with paper and pen to do a memory syntax test, to asses what I'm like as a "developer" ...... really .... Second time was a whole day with a 15 min break and given the last 30 seconds of each meeting to ask my own questions. Then when I was called in for the 3rd interview, my mind was made up, which is the experience I'll use as an example here.

    For the cost of a few hours of my life (the 3rd interview) and something to reflect on, it turned into a gold mind of "How not to interview software people".

    Everyone was late, there was no interview process, I'd been given the wrong job spec, the interviewers ranged from completely uninterested (one didn't say more than 5 words) to aggressive and baiting, and so on.

    I looked about and found a "top 5" worse questions list and to my amazement I was asked all of them in one interview, and mostly by the same person. I'm not sure if the company realizes how damaging it is to let people like that interview.

    Knowing that the organisation deals with people in this manner, confirmed by hunches that it's bad, really bad. I wonder how they attempt to attract or keep talent.

    I thought I'd go over the questions and try and think of some alternatives to pull some good out of it all.

    1. "So then, tell me about yourself"

    The number 1 worst question you can ask, especially when you have someones resume in front of you.

    Also if you're sitting at a computer 5 minuets before the interview, how about putting their name into Google, if they work in IT and specifically the web, you should be able to find them. Not doing this show a lack of imagination, skills or planning for the interview. Again who wants to work with someone like that?

    On a side note - One of the main things I'm noted for by friends in my curiosity. I always have to be learning, doing courses about random subjects, and trying to see the world from different perspectives. I attempted to answer this question by explaining that. I see having a curious mind as a good thing, and that I've actually trained at learning.

    I listed of a few of the topics that interest me and that I've studied and received the answer :

    "No no no, don't tell me about your professional career, we've talked about that. School is for getting a job that is all, tell me about the real you, I want to know who you are".

    I couldn't believe what I was hearing, though seeing as an interview is a two way process, they all got on the fail boat, sailed off and sunk at that point !!


    What is it you're actually trying to learn? If you've any clue you want to get them to talk about something they are passionate about, see if they have the spark. You're likely after 3 skills, problem solving, interpersonal and ability to execute.

    Can they figure "it" out, doing it in a team, and get it done.

    2. "Where do you see your career in X years"

    Err ............ building things? Given the speed tech and decent companies move at, who knows. Also if anyone is planning that far ahead they are going to limit their opportunities and decisions in the hear and now. Which is the important bit.


    Why do you want them to look into a crystal ball ? Unless the company has interesting projects that are engaging, people will leave. Asking them to lie about wanting to build a career with a company they don't know yet,  is mostly pointless.

    3. "What are you bad at, where have you failed"

    Once again a pointless exercise in humiliation, and a dead end question.


    How about asking what was the last time a foreseen risk was adverted and how ?

    4. "How do you work under stress, can you deal with a lot"

    Basically you're explaining that you work in an abusive environment and that you want the person being interviewed to accept that as the norm, so you can treat them that way later.


    Realize that focused, healthy and happy people are far more productive?

    5. "Tell me why you want to work for company X"

    Ego driven and nonsensical. This nothing more than an attempt to hear lots of good things about a company the interviewer works for. And it's every actually the company you work for it's a project you work with. You  could choose any mid size organisation, and it's a collection of groups doing different things, there is no magic Kool-Aid dispenser.


    You're likely attempting to discover motivation for being there ........ ultimately it's to get paid, deal with it. So why not focus on why a person does the kind of work they do. In the IT case, why the certain area of coding they work in. They could look at my resume and legitimately  ask "Why LAMP for 13 years, what keeps you engaged with it that long", for someone else it could be the internals of a database, or network programming, system design, etc.


    Just goes to show 60 seconds on Google and save both sides a whole lot of headache and wasted time before an interview. Also don't say "Thanks for coming we'll be in touch" if you don't follow up as you're just proving yourself to be a liar, say "We'll contact you if we want to take this further" ....... it's call being forthright and honest!

    I'm happy to of had a lucky escape, learning what I did before getting involved.

    As per "Interviewing & Dating for the Techie : Playing the game" remember that this is a two way process you have to asses the situation and what is on offer from both sides.

    It's good to know I'm not alone, and there is a lot (unfortunately) of this going on :

    Take care people !


    Monday, December 6, 2010

    More myths of software outsourcing

    1. It will reduce costs, it's cheaper 

    It's most common excuse I hear, it's also typically the most short sighted, and most that likens software development to manufacturing widgets.

    Firstly creating software isn't akin to producing widgets, it's a creative process that depends highly on communication, and the quality of that communication. I don't just mean spoken language there, as the issues with that are all but obvious, but cultural as well. The culture and approach of the company, as well as the work ethics of the teams must be the same, or there will be a lot of friction.

    The speed that decisions can be made and acted on, are dependent one two things.

    Firstly on how they decisions are actually made (i.e. process, hence why smaller teams with less management overhead and programmer lead projects will always beat the top down meeting driven interruption culture of bigger organisations). Though secondly and that which has more importance here is, the speed at which that decision can be captured,  understood, written down (requirements, design, ticket etc), scheduled and communicated.

    If you have an implementation team that is so far removed from that, it will be slower and the quality of the communication will be worse, clarification, updates and corrections will sap time, and time is very expensive.

    Support - Once the system is built you're beholden to external costs for change and management of it. If you choose to bring it in house you do so with little or no implementation knowledge, so the learning curve for that will drive up costs massively as the internal team get up to speed.

    Quality - The external team's main focus is making money, not making you a quality product, it has to be, it's a business equation. So negotiating for the best price (i.e. cheapest up front) is going to ensure high technical debt that will lower quality and drive up costs.

    Ownership - There is a legal overhead for protecting your IP and possibly and company secrets (though if you're mad enough to outsource that ! ....... then well, you're likely not reading this).

    2. Developers are all the same, it's all about numbers

    The difference from bad or average system architects, designers, developers to good or great ones, is massive, I think that argument was proved a long time ago. The levels in productivity and quality are an order of magnitude higher from high quality developers.

    As Michael Bean mentions :

    "But writing innovative software cannot be done on an assembly line. It requires hard-to-find development and design skills. Farming out development to legions of programmers overseas will not create a differentiation advantage. When a technology company outsources software development, that company loses its capacity to innovate and its competitive advantage."

    And never a truer word said I think, what is it that is trying to be achieved: creative & innovative software, or cheaper and faster widgets?

    If you are (essentially)a software company, that being what you offer the customers is software, or a service that relies on software, then it's the core of what the business is. Viewing it with such laborious distance means you will most defiantly reap what you sow.

    Creating good software & systems involves passion and creativity with a good healthy mix of "getting things done" attitude, out sourcing to a Dickensian sweatshop instantly removes all of these factors.

    3. We'll bring it in house when it's done, and then look after it

    I've worked on several clean up jobs and have walked away from several more because of this myth.

    The reason the company changes it's mind and realizes that outsourcing is wrong, is because it goes badly, the relationship is sour and what is being produced is substandard and mediocre.

    Typically because of a combination of the reasons mentioned above; poor communication and cost, over quality.

    On the point of things going wrong ......... even if you have the best person in the world who's responsibility with in the company is to manage the development, their effectiveness is going to be massively reduced by having to deal with external teams.

    They effectively turn into nothing more than a requirements channel and point of blame for the project, with little or no ability to influence proceedings.

    I recently interview with a company and only learnt about this exact thing in the interview ....... I couldn't wait to get out of there ! I've learn't my lesson, trying to pull one of these projects out of the technical grave led to the experience and inspiration for the burnout post.

    4. Can't find the staff, it's all the local markets fault, we have to do it

    Would you open what you want to be a high quality restaurant with no chefs and then outsource everything in the kitchen to a remote fast food joint?

    OK, the logistics are different but the analogy points out the lack of sanity in it all.

    It's what a technology company does, and if out sources that it gives up the ability to control the quality and innovation at source.

    On the point of "can't find the staff" ........ notice I'm not talking about physical location. I've nothing against remote working, personally I've done it for years. The original vBulletin team (that I was a part of) which built version 3.0 until 3.6, were remote working for the majority of the time. And the quality in that was very high (I'm proud to say), we weren't outsourced development, we were a part of the company.

    You can build a virtual team and maintain high quality , this is 2010, the internet is the primary form of environment & communication for most of us in development. Just about all developers and designers I know prefer it.

    Which is another blog post in itself, about mental health and social interaction!

    37signals and Peopleware are forever talking about methods of working, and how typical offices are actually bad for most creative software work (though they have their place).

    5. We can use agile, get versions back quickly & keep track

    Which within itself is no reason to outsource, if you really want to be "agile" then having the team as close to the feedback and decisions (i.e. a part of it), as possible is best, not the opposite !!

    While you might see results sooner, you're still going to be plagued by all the above points.

    6. When it goes wrong we can sort it out

    When it goes wrong with outsource ............ it gets contractual, with an internal team it gets intense.

    Though at least they are part of the company and the company is the people and technology. So in the later scenario the focus is to fix it and be successful, not to engage in the blame game and hiding behind contracts and quoting the effects of poor to each other.

    Further reading :

    Update : As per a comment by Michael Thore over at, I'd agree with adapting the title to "offshoring" opposed to the generic and possibly misleading term of "outsourcing". But I still advocate having development as an integral & core part of the
    business if and where possible. Ideally from the outset and definitely
    in the long run.

    Sunday, December 5, 2010

    OffLog : High volume reporting on a limited system

    • How do you log a lot of data without killing your limited setup ?
    • But how can you scale the system or change components easily, i.e. have good architecture ?
    • How to not over load production databases, so logging kills the actual system you're running ?
    Questions I've dealt with a few times, and a pet project for logging on a game I'm building that I thought I'd share.

    The situation

    You've been told to report on "everything" though not given the kind of back-end (or cash) that Facebook/Google/Amazon have for systems like MapReduce or Pentaho.

    You have at least one server (virtual or real) for "reporting", or at least an allowance of processing time and storage some where.

    Though if the systems gets big you want to be able to swap out the back end or add more capacity without having to do anything in the application.

    There are likely some front end machine(s) doing some kind of scripted work, for an application or game (LAMP in this example) and a database back-end for the actual application.

    But the main thing you have is a LOT of data to log and process.

    Given how much time and effort that has gone into building the application and making it efficient, the thought of slowing it down with logging irks you.

    Lots of tight code & apc/memecache, and now you're going to slow it down logging to a database, slowing everything to disk speed and synchronous calls ? Hell no.

    And rightly so.

    Personally I believe :
    1. Reporting shouldn't go in the "main database".
    2. Reporting shouldn't slow down the application, or as one grows so does the other, and it all grinds to a halt.
    3. Good architecture at the start, helps advert problems along the way, and can not only mitigate some nasty problems, but helps in long term maintenance and evolution of the system.

    The components

    This example is LAMP (and memcache), with the addition of Gearman. That's it.

    And if you haven't played with Gearman, the sooner the better.

    The set up

    I'm using two servers, though can do everything on one and move to two when I need or want, transparently to the client machines.

    • Cn - Client script uses GearmanClient::doBackground to hand off logging actions to Gearman, as to not interrupt the main script
    • Wn - Worker, maintains persistent connection to memecache on logging server, to update action counts
    • 0-5 = 6 * 10 min time slices in memcache (the ring buffer), workers log to current time slice

    So what happens ?

    The key part is the workers from gearman are filling a Ring Buffer in memcache, In this example, there are 6 time slices in the hour, so 10 min sections.

    As a logging action is created (or incremented) at min 6 of the hour it goes in slice 0 (0-5 for the 6) at 15 past the hour, it's slice 1, and so on.

    Much like the sections of a trivial pursuit board game piece.

    As one slice is filled and the clock ticks on, so it moves onto the next slice every 10 mins.

    This is the first place we balance accuracy with "getting the job done", we're pre-aggregating. We can break down to 10 min sections, but beyond that the detail is lost.

    A fair price for getting a good enough idea of what is going on I think.

    If there are users or actions that must be  logged, then have the workers just those data points separately. As it's all going through Gearman the client doesn't know no care (and shouldn't) what happens.

    The use of Gearman here allows the front end scripts to hand off logging data and keep running at maximum speed. Gearman can then buffer and pass of data as quick as the workers can process it. At this point if the memecache is on another machine, it's down to network speed, though only for logging, not your main application.

    The collection & the register

    So as we've just read the Gearman workers on the front end servers are filling up the slices in memecahe (with persistent connections to alleviate start up and tear down).

    But now we have to collect that data and store it.

    Each of the workers do two things, they take log action data from a front end script and put the details in memcache, they also make sure that the userid and action id is logged in a register.

    This is because we can't "SELECT *" from memcache to get all the users that have done something in that 10 min section.

    This is where cron comes in, ever slice size, being 10 mins in this example and the default implementation.

    Firstly to get the list of users that have done something in that slice, and what the list of actions are. Then with a loop though the users and all the actions all the counts are grabbed from memcache.

    There will be some misses here, as given we have a list of all the users and all the actions, not all users have performed all actions. Though we'll accept the misses to make sure we get it all.

    After which we have all the data from that slice and it can now be left, as the memcache time out is 35 mins so it will have evaporated before the workers come around again with fresh data.

    Also given the ring buffer size and time dimensions each cron has 35 mins to complete aggregating and pushing the data to the database before the data times out in memcache.

    Now in processing time terms, 35 mins is a veritable ice age and if you run over that there is something else going on!

    Integration to an app

    Using the PHP client, we can instantiate the logging client at the beginning of our main application script, so we can log multiple items though out the script if needed.

    $loggingClient = new offLogClient($offLogCfg);
    $loggingClient->logSimpleAction(1, ACTION_USER_LOGON, 'test string');

    First param being the userid. Seconds being ACTION_USER_LOGON constant number, which is defined in the config along with all the others needed.

    The third is an arbitrary sting that can be used for all manner of extra information, though for the purposes of action counting, is ignored. This is used for a watched action or watched user, where we want specifics.


    The aggregation happens in the slices of the buffer of course, each piece of data being a userid and a actionid (key) pointing to a count (value) :
    • Userid X did action Y, Z times in this time slice.
    Once the cron job has gone though and collected all the data from the time slice and put it in the database, you're back into the realm of SQL where you can generate reports, archive data, mine, spot trends, and graph out.

    On the topic of graphs, I use : and will be adding a few simple scripts to the code by way of example as well.


    As long as you have a LAMP environment, all you need to install in a Gearman server and the support in PHP, there is an overview of that here.

    The only other thing needed is the code, and I've put that in gitHub here :

    Obviously a work in progress and an idea I'm playing with and developing, now that I've thought it through and written the blog piece I can get on with coding it up!

    Given the use of Gearman, if the logging requirements change level of details, volume, etc using a different back end system is easialy doable.

    An example of why I think components in a system though exhibit the same two qualities that good OO code does, good encapsulation and low cohesion.