Knowledge Management


In an earlier blog we discussed how data became knowledge after being grouped and contextualised.  But what value has Knowledge?

Knowledge is part of a firm’s intangible assets, which also include Brand, Reputation and their unique business process.  However, knowledge that cannot be shared is next to useless.  If specific critical knowledge only exists in one employees head, then the organisation is at risk of that knowledge walking out the door if he/she leaves.  Only when it is possible to share knowledge throughout the organisation, and learn from it, has knowledge value.


Types of Knowledge

Explicit Knowledge is Documented knowledge.  Whether that’s in structured form, like documents, or unstructured, such as emails, voicemail, graphics, etc.

Tacit Knowledge is less quantifiable.  It’s undocumented – generally stored in an employees head.  It’s knowledge learned from experience, but you’re often not aware you know it.  It’s often when something isn’t working with a program, it triggers a memory; This suggests something to try to resolve the problem, which often works.  You may not have knowledge of the system being coded, but a similar experience with another system can be enough to gain knowledge from.

Knowledge Management

Knowledge Management is the process of capturing, developing, sharing and using organisational knowledge in an effective manner.1

There are known steps in the Knowledge Management value chain.  These steps allow an organisation to learn & change it’s business processes to reflect this learning.

KM Value Chain

  1. Acquire Knowledge.
    • Repositories of structured & unstructured data.
    • Online expert networks within the organisation.
    • Discovering patterns in data.
    • External Pertinent Data
    • Internal Transaction data.
  2. Store Knowledge.
    • Storage that can be accessed by employees as needed – Indexed, tagged, digitised, etc.
    • Expert systems that can help organisations save past knowledge & build new.
    • Rewarding employees for keeping knowledge up to date.
  3. Disseminate Knowledge.
    • Options available for sharing information – wikis, blogs, E-mail, Instant Messaging, Collaboration Tools
    • Key learning routes to help managers deal with important information.
  4. Apply Knowledge.
    • Management Support for process, product & services change as a result of new Knowledge.
    • If an organisation doesn’t change as a result of new Knowledge, they get to return on investment.



People are the holders of knowledge.  They develop, share & use knowledge.  For all the possible Knowledge Management systems available to companies, it’s their people that are the most important store of knowledge.  Unless there is a culture of sharing that knowledge within your company you might as well save your money on the IT spend.  It will fail.  Only when the company culture is right, will you gain benefits from Knowledge Management Systems.





Laudon K & Laudon J. (2014) ‘Management Information Systems: Managing the Digital Firm’ Pearson, Edinburgh.

Enterprise Systems

Enterprise System 2jpg


Enterprise Systems are application packages which allow an organisation integrate their entire IT system together.  In essence, company-wide access to required business knowledge becomes available, rather than independent systems which don’t talk to each other and duplicate data, or manual input of data to reports & spreadsheets.  They support the business processes and flow of information in an organisation, while also enabling easier reporting and analytics within the company.  The final goals being efficient running of the company, increasing work quality, saving employee time, and possible reducing costs.

Real World.

At this point many would think ‘Sounds good, but why would my business need it?’

Let me give you a real-life example.  My Dad was looking for to purchase some Potash for his fruit trees recently.  He rang around the local garden shops & agricultural stores to find the best price-by-weight.  One particular store was about half the price of the others, and even though it was 10-miles further out of town, he considered it worth the drive.  (Considering we were  only talking €8 for 2.5kg versus €8 for 1.2kg it might not have been a decision I would have made!)

Decision made, he drove the extra ten miles to the agricultural store.  He ordered his Potash, paid for it, was issued with an Invoice & receipt, went down to the store to hand in his invoice, and was told they had none in stock!  He then had to return to the Reception area, get a refund for the purchase he wasn’t able to make, then leave for another store.  Not a happy customer.

If we look at the interactions that took place between customer and retailer, how many places could we have made the interaction better?

Now let’s look at the scenario where the company had an Enterprise system in place.  In this case all the companies information would be stored on one database, this would include stock levels & pricing.  The person who took the original phone call could have seen the stock levels at the time they were checking the price.  As a result they could have informed my Dad they didn’t have the stock in place, and he wouldn’t have had an wasted drive & the bad customer experience.

An even better result would be that the company had agreed minimum stock levels saved in the system, say 10 bags of Potash.  When the 10th last bag in stock is sold, the system automatically sends through an order to their suppliers, and the cash accounts would reflect the order.  No interaction required by the store staff to log a low stock level.  And more importantly, no customers turned away unhappy.  In addition, the managers of the store would always have available analytics on the use of the product, and be able to perfect the minimum stock levels for each product to ensure they don’t have too much stock to hand at all times.

Sounds like a better system all round ?!


Enterprise System

An Enterprise System is essentially a centralised database, and a number of software modules which integrate together.  All the different divisions of the company write to the database.  This data is then available for the other software modules to use within the organisation.

In our picture above the Sales Forecast from the Sales & Marketing department can inform the Production Schedules for Manufacturing & Production department.  This in turn will inform the Material requirements, which require financing from the Finance & Accounting department.  All of these conversations require no passing of paper reports between department.

Company Management can obtain point-in-time information on company operations as required.

Enterprise Systems for large vendors include SAP NetWeaver & Oracle Fusion.  With options available on the market for Small & Medium companies, or those who want to use cloud services.

All systems include customisable tables, so you can customise the system to the way your compnay works, e.g. you can organise by location, product range, etc.  If the customisable tables are not enough to get the system to work the way you want it to, you could code changes to the system.  However, this is inadvisable.  Firstly because there are tricky systems to change – you could affect other areas without knowing.  But Secondly these applications are built around best practice principles, and you should look at your business processes first for change.


I think enterprise systems are brilliant.  Having one reliable and accurate source of company data is the Holy Grail for somebody who has worked with multiple legacy systems.  However, there in lies the major problem I see with these – legacy systems & the willingness to do away with old systems.  It’s fine for a small company who want to grow into the future – not too much data or too many business processes to bring across from old systems.  Or even for major multi-national companies with money behind them, e.g. Coca-Cola switching to a SAP enterprise system to leverage their buying power for raw materials (Laudon & Laudon, 2014, pg 372).  However, if your company has been around for a while, you probably have a number of systems, and business processes in place.  Are you willing to spend the time & money doing a full analysis of their systems?  Identifying what can go, what needs to change, and what is absolutely required for your company going forward?  Only then can you even think about switching to an enterprise system.  If you’re not going to take this on board fully, I see it as wasted money, as inevitably, any system will be customised to the point where it no longer functions as well as it can do.

Second major issue to be looked at is security & data access.  This is obviously a general concern for all IT systems.  However, it’s worth spending extra time considering the Security Policy of Enterprise Systems.  In a case where HR had their own Personnel databases, they could control who had access to employee records in-house.  In an Enterprise system, the Personnel data is held on the centralised database.  This means Identity Management needs to be put in place to identify who can have access to which data, and what level of data access they can have, e.g. Read, write, or update access.  The concern I would have is that you’ll need somebody who knows how to customise this access on the staff.  Meaning the company needs IT resources or else pay for consultants to do the customisation.  This is an extra cost that needs to be taken into account going forward with Enterprise Systems.

Overall, Enterprise systems seem a logical way to look after a companies data & provide great advantages to an organisation.


Laudon K & Laudon J. (2014) ‘Management Information Systems: Managing the Digital Firm’ Pearson, Edinburgh.


hadoopHadoop is an open source suite of programs, procedures & tools created by Apache Software Foundation.  They are designed to facilitate the analysis of very large datasets of Structured & non-Structured data.

Hadoop makes it possible to work on thousand of nodes involving many terabytes, or even petabytes, of data.  This makes analysis of big data substantially easier.  It also has rapid data transfer rates among nodes, meaning if one goes down, you can transfer it’s work to another.  This gives it a high degree of fault tolerance.  Reducing the risk of failures slowing processing.

So, how does it work?  At its simplest, Hadoop takes a large big data analysis problem, and breaks it down into smaller problems.  It then distributes the smaller problems to inexpensive distributed computers or servers for parallel processing.  It then combines the results for easy analysis, or further processing.


Hadoop was inspired by Google’s MapReduce white paper.  This paper described why they created MapReduce – to be able to index the huge increase in data on the internet.  Hadoop was released in 2005 by Apache Software Foundation, as an open source product.  Although one of the more interesting facts about its history is the name.  Doug Cutting was one of the creators of Hadoop, and it was named after his child’s stuffed toy elephant.  Digital Marketers all over the world are despairing as a result…

hadoop bed time

Hadoop Ecosystem:

Hadoop was originally composed of the core components of MapReduce, and HDFS (Hadoop Distributed File System).  However, as further components were added for specific needs, the number of components increased.  These are now generally referred to as the Hadoop Ecosystem.

Let’s look at the base modules:

Hadoop distributed file system (HDFS)

File System that keeps track of data across large number of linked storage.  It can be accessed by any computer using a supported operating system.  It will accept any type of data, you just put it in the cluster, and leave it there until you decided how you want to process it.

Hadoop actually supports many different file systems, for example the Amazon Web Service integrates Hadoop with it’s own S3 file system.  But HDFS is the Hadoop version.


This is the default data processing system.  It’s Java-Based.  As Hadoop is not a relational database, or indeed a database at all, you cannot use SQL to get answers from the data.  As a result, you use NoSQL.

Hadoop Common

Tools and libraries needed for other Hadoop modules.


Manages resources of systems which store the data & run the analysis.


Hadoop ecosystem

There are now also a number of additional modules you can use for different processing on top of these modules.  For example,


– Very Flexible. Easy to scale up or down on inexpensive computers or servers.

– Popular.  It’s widely used in Big Data industry, and as a mature product offering support is available.

– Free as it’s open source.  Also means, if software experts make enhancements, it’s fed back into the development community for general use.

– It will process any amount of data – Petabytes and above.  The data can be in any form – Structures, non-structures, emails, patents, voicemails, etc.


– In it’s basic state, Hadoop can be complex to use.  As a result, commercial versions are being created with simplified use.  For example, Cloudera, Hortonworks, MapR, etc.  You’ll pay for support & consultation with many of these.

– MapReduce is not a good match for all analysis.  It’s quite a complicated piece to work with, and can often only handle one problem at a time.  This means you either need experts to run it, or look into alternative options to run it, for example HIVE (which has a broader range of skilled practitioners available).

– Hadoop is a more mature option, when it comes to open source Big Data processing.  But it’s not necessarily the best.  A lot is being said about Spark these days, which is quickly finding it’s place in industry.  Just because Hadoop was out first, does not make it the best for your requirements.  Maybe you need both?  But it’s a decision that needs thought put into it.  What do you actually want to process and how?


First off, a disclaimer: I’ve never used Hadoop.  My opinion is formed from articles on the internet, and the content of text books.  So, feel free to disagree.

It seems to me, Hadoop was the best option at the time, and it still does some things well.  However, it’s difficult to use, and as a result, people are adding bits on top which go against the original premise.  For example, Putting relational technology (SQL) on top of Hadoop, which is neither a database nor relational, seems daft.  The whole concept of big data concerns, is that existing models of processing can’t cope.  Why then try to restrict it backwards with processing we’re up to speed with?  If training, and a lack of experts is the problem, well then train them up.  IT professionals of any reasonable skill level are adept at learning new computer languages, and platforms.  They are inherently interested in learning new ways to do cool things.  (And yes that includes myself.)  If Big data processing is the means to the future of data analytics, then be open to the alternative ways of working that come with it.

Secondly, it’s all very well loading data into your Hadoop cluster until you’re ready to work with it.  But that brings security risks & expense if you’re on cloud computing services.  It’s hard to see how data governance can be enforced if you’re not even sure where your data is, or what is in it.  We’re told again, and again, that computer piracy is on the up, and to do as much as you can to protect your data, especially customer & core organisational data.  However, the risk with big data is we get overwhelmed with it, and as a result don’t go to the same rounds of protecting it.  This is more a big data issue, than a Hadoop issue.  But reading these articles, it’s seen as a major advantage of Hadoop to be able to throw anything into the Hadoop cluster until you’re ready to use it.  One even mentioned thinking of Hadoop as a big bucket!  As long as you remember you need to lock that big bucket in a safe, inside a safe, inside a castle with a big moat around it, you may be OK.

I’m a big fan of Open Source programs.  Apart from the fact they’re free, there is generally a whole community of developers willing to help you use them to the fullest.  The downside is they’re often not as user friendly as those created to make profit.  I believe this is part of the issue with Hadoop.  The original developers came up with a great solution.  All the add-ons since then have either been a solution to another problem, using the core functionality of Hadoop as its foundations.  Or have made more user-friendly options available.  At it’s core it’s a great solution to the processing of very large data.  We shouldn’t forget the core value of that solution when overtaken by all the add-ons.



Data Star Trek

What is Data?

According to Wikipedia, Data is “uninterpreted information”.  It then goes on to ask what type of Data are you interested in?  Options include ‘a fictional android from Star Trek’, a book by Euclid, a moth, a British Drum & Bass musician, and a ‘non-governmental organisation founded by Bono’! 1   For this blog I’m interested in Computing Data, although the android lived an interesting life too …

Data is raw facts- numbers, text, images, symbols, etc.  On it’s own data means nothing.  Data needs to be interpreted, grouped & processed, in order to become usable information.  After all Colonel Mustard, in the library, with the Candlestick, doesn’t mean anything.  However, if you know you’re playing Cluedo, it makes all the difference!


What is information?

This leaves us with the understanding that information is interpreted data.  Information is what we use to make decisions & decide future actions.

If you’ve ever applied for a mortgage, you know the amount of financial information you’re required to hand into the bank – Salary information; Bank Accounts, balances & statements for 6 months; Any loans; Details of current rent, etc.  At it’s basic level it’s just figures on a page.  The bank then groups this data together into Earnings &  Outgoings.  This information is then used to decide if a Mortgage approval will be given, and for home much.

In order for the decision to be useful, the processed data must be:

If we go back to our mortgage example, if the information is 2 years old, it’s not useful to enable us make an informed decision on a customers current financial situation.  We may not know about a new gambling addiction that would rule out approving a mortgage.  In the same way, if we believe a customer has an outstanding loan for €1,000, but it’s actually €10,000, that makes a big difference to a decision.  Lastly, if the bank only records one salary, rather than two for a couple, the financial situation will not be complete and the decision cannot be correct.

In days gone by, all of the above mortgage decision was done on paper in your bank, where they knew your name.  If they were still unsure, you’d get an important member of the community to give you a reference, i.e. priest, doctor, etc.!  These days, it’s all on computers, for good or ill!  As an IT professional, I’d say it’s progress, and point out the amazing things we can do now with IT.  But then you couldn’t have put a country in debt to billions the old way …

What is Knowledge?

Knowledge it the understanding of the context of information.  How to interpret the information.



Reverting back to our mortgage example for the last time, looking at one persons Income & Expenditure only gives the scoring team so much information to make a decision.  However, if they have the same information for 1,000 previous scoring decisions, and the outcomes of the mortgages granted, then they will be able to look at the information with experienced eyes.  So, looking at your expenditure and seeing a BetFair or PaddyPower online account regularly used, that is just information.  However, if the last 20 people they gave a mortgage to with the same outgoings history got into financial difficulties, then that’s a concern.  Only by having that context on the information, can the bank acquire the knowledge to look for the red flags.


We now know that data is the building block for knowledge.  So, how does a computer store data?

That is to be continued in later blog…






Can R tells us who is the best Female Tennis Player of all time?

What is R?

First off, let’s look at R itself.  R is a programming language.  It was, and is, designed specifically for data analysis.  It allows you to manipulate, calculate and graph data.  It allows you model statistics and save your results to many standard file types, e.g. PDF, Jpeg, Metadata, etc.

It’s completely open source, i.e. free, unlike equivalent statistical packages, e.g. SAS.  There’s also a thriving R community to support your use, and expand what’s possible in R. 1

Where to start with R?

I started out with the free Try R course from who give you this lovely badge on completion of the course: TryRCertification

The R Graphics Cookbook, or it’s website (, was also useful starting off.

However, I found the only way to get going was to choose some data, download R and try it out., and many other R advice websites, were hugely beneficial in helping me out if I got stuck.  The main thing I discovered is that there is never just one way to do something in R, so keep trying and learning!

Let’s try it!

As I write this blog (05/08/2015), Serena Williams has recently won Wimbledon again.  The US Open is coming up.  The question is being asked in numerous media & blogs, who is the greatest Female Tennis Player of all time?  The consensus seemed to be it was between Serena Williams and Steffi Graf, but are they right? 2 3 4

Generally, we can’t judge people of a different era together.  They may not have had the same opportunities to travel and compete, or maybe their competitors were not as good as other eras.  However, we have win records; maybe the statistics can help throw light on the argument?  Or could it muddy the waters further?  Let’s see what R can tell us…

To start, I need data.  I searched online, and came up with the names of 11 female tennis players who regularly appear in the ‘Top 10 Best Female Players‘ type posts.  Using that list, I went to the WTA (Women’s Tennis Association) website, and retrieved player stats for the players I’d chosen.  These were then saved to a CSV file, and loaded into R.

Load Data

What can we tell from the data?

How to we classify “the Best”?

Single Titles

Let’s look first at who won the most Singles Titles.  These are generally considered the benchmark for the best players.  So, who won the most?

Single Slams Bar

Margaret Court won 24, ahead of all the competition.  Case closed, correct?  The data is irrefutable?!  However, she won the majority of her titles at the Australian Open in an era when the majority of her competition didn’t travel that far.  So, if the best of her competition was not even playing, can they still hold up to the current standard?  Possibly not.  So, what else can we look at?

Total Prize Winnings

In this age, who won the most prize money could be considered a true marker of greatness.  In Golf they even have the ‘Race to Dubai’ every year, which is purely based on Prize money during the year.  Let’s see if that gives us a true answer.

Total Prize WinningsSerena Williams, outright winner.  Which, let’s face it, is another unfair chart.  The prize winnings on offer were nowhere near current trends when Billie Jean King was playing.  Even in the era of Steffi Graf, Female tennis winnings were not on a par with Male winnings.  The only legitimate comparison we could do in this case is Serena in comparison to her sister, Venus, as they are competing in the same era prize money wise.

Career Win Stats

Another line of comparison is career win stats for each player.  That is their Win/Loss record expressed as a percentage.

Win StatsWhich as you can see proves Venus Williams is the worst player of the list!  Obviously, the statistics don’t lie.  But … How can we stand over that?  Especially when Venus has won more slams & more prize money that Martina Hingis – her closest competitor.  It doesn’t seem right.  Maybe there is no “best” player in this case.

However, let’s have one more stab at this.

All round player

As it currently stands, fewer and fewer players are competing in Doubles matches as well as the Singles competitions.  5 There are a few reasons for this, less prize money, concentrating effort on the Singles prizes, and a different skillset required to play.  There is certainly more Serve & Volley skills required in Doubles than in the average Singles match.  There is an argument which says to be the best all round tennis player you should be winning both Singles & Doubles matches.  Have we a player who stands out for both?

This graph groups the slams wins by player:

Total Slams Won

Finally, this graph shows the slams stacked for each player:

Total Slams Won

So, Martina Navratilova is the winner for Best Player!  Serena Williams coming up a close second and as she is still playing she could still reach the top.  I’m happy with that result, but then that shows my personal preferences.


As you can see depending on how we ask, we get a different answer.  The phrase “Lies, Damn Lies and statistics” comes to mind.  Let’s look at a summary of the players:


In this case, there is no correct answer with the data I’ve entered.  Martina Navratilova comes out tops in more categories than Serena Williams, with Chris Evert and Margaret Court coming up next behind them.  Surprisingly, Steffi Graf is a little behind.  However, that could say more about the fact she gave up Tennis at a relatively young age, or the quality of her opposition, but who’s to say.  There are alternative possible means of getting an answer in this case, however, I won’t be continuing with this analytics further.

Alternative Suggestions

1. You could look at total career titles, not just the slams.  This would be over their entire career, and not just the headline grabbing main competitions.

2. You could look at the players whole career, rank the quality of their opposition,  and using the resulting quality scores, analyse who were more successful.

3. You could look at match stats, such as unforced errors, serving stats, etc.


I may not have confirmed who the best ever female tennis player was, but I acquired a good understanding of a subsection of R.  The TryR course was a good starting point, but I didn’t feel very confident with my knowledge immediately afterwards.  As with most programming languages, actually working with real world data makes it easier to learn.  In addition, you gain from working through the frustration of figuring out something that won’t work.  The community sharing help for R makes it even easier, as long as you put the work in.

I feel I’ve only scraped the surface of what is possible in R.  It’s worth considering other R courses, or available training online to advance the knowledge.  For example, There is a free R Programming course from John Hopkins on Coursera to learn more of the options available within the environment.  Interesting assignment.  Thank you.










Google Fusion Tables

Fusion tables are an experimental offering from Google for use in Data Management.  It allows you to gather, visualise & share data 1.

Data can be converted into Charts, Network graphs, Scatterplots, Timelines, & Geographical Maps.

For this assignment, we were asked to use Fusion Charts to create a Heat Map of the Republic of Ireland taking the 2011 census data.


So, how do we go about this task?

First we have to find the data.  The population data came from the Central Statistics office website here.  This I saved as a comma delimited CSV file.

The Irish KMZ Datafile which gave the borders for each county came from

I loaded both files into Google Drive.  And cleaned up any anomalies in the data which weren’t helpful in this case.  For example, We were only interested in Dublin county as a whole, not each council area.  In addition, Galway City & County would not match to Galway on the KML file.  We were only interested in the entire County Galway population.  Another I didn’t spot until I’d created my Geographic map was Tipperary.  Tipperary North & South were on the CSV file, but Tipperary the entire county boundary were in the KML file.  This meant I had a clear area for the whole of Tipperary.  Once I merged the two Tipperary figures, I then filled in the gap in the image.

Anyway, once you’re happy with the data, you need to merge the tables.  To do this, open one of the spreadsheets.  Then click File>Merge…, this will open up another window where you can choose the name of the other table you want to merge.  Then choose how you want to merge the data, in our case Name field from the map_lead table, and the County field in the County Population table.  You’ll then have a merge table, which I’d advise to rename to something more appropriate.

In the ‘Map of geometry’ tab you’ll now see the merged data, albeit all in one colour.  To make this a useful image, we’ll want to graduate the colours used, based on population counts.  To do this, choose ‘Change feature styles …’ from the left hand side of the screen.  On the subsequent pop up window, choose ‘Fill Color’ under Polygons, and the ‘Buckets’ tab.  I divided this into 5 buckets.  The system will automatically break up the population figures, if you allow it.  However, if you try it, you’ll end up with Dublin as one colour, and the rest of the country lagging behind.  While this may be a fair reflection, it doesn’t make a great image.  So, I broke the bands up as follows: 0 – 70,000: Blue; 70,000 – 100,000: Green; 100,000 – 140,000: Yellow; 140,000 – 190,000: Orange; 190,000 – 1,273,070: Red.

I then made the map Public on the internet. This allowed me to publish the map on this blog.

All of which leaves you with this:



The map dramatically shows the distribution of the Republic around the big cities – Dublin, Cork, Limerick & Galway.  There are whole swathes of Connaught and Border areas which are under populated as a result.  Even without knowing anything about the country’s infrastructure, you could surmise that most of the jobs are in these areas.

So, how could we make the map more useful?  What could we add?


It’s apparent that the majority of the motorways start in Dublin and are spread out the main cities like the spokes of a bicycle wheel.  Limerick – Cork or Galway don’t have motorways the whole way.  Going from Galway to anywhere north of the Galway-Dublin line is also not on motorways.  Maybe if the infrastructure for industry to get in and out of Sligo or Leitrim, for example, we wouldn’t have population black spots there.


What percentage of each county is unemployed?  Or has suffered the most from emigration?  If there is 5% unemployment in Dublin, say.  While we could say 25% in Leitrim, for our argument.  Using these figures, Enterprise Ireland or IDA Ireland could prioritise those areas with the worst employment figures for future employment opportunity.


Where are hospitals based around the country?  And what specialities do they have?  If you’re injured in Roscommon on a Sunday at 8pm, where do you have to go for A&E?  If you’re living in Donegal, where do you have to go for Cancer treatment?  Are these distances appropriate for Ireland, and their populations?  Or are all the resources being spent in the capital?


Fusion Maps is a really easy product to use.  There is no need to know anything about mapping tools to use it.  It’s part of the Google suite of applications, and if you’re already working with those tools, it’s handy to take data you’ve already stored on Google Drive and work with it.

However, I’m not a fan for one reason: there are so many better applications out there at the moment.

What I mean by that is, it’s neither as fun, interactive or useful for routes as say Google Earth & Google Maps 4, at one end of the user spectrum.  Nor is it capable of handling sizable data, i.e. it currently has a size limit of 250MB per table, 1GB total.  It can only handle more that 350,000 features on a map ( ).  Which means it’s out for big data, or any serious data analysis.  Students studying for a PhD in Science, for example, would often have more than that, never mind industry.

In my personal opinion, even publishing your map onto another webpage, or even WordPress blog (!) is more difficult than QGIS, which is Open Source.

To be fair to Google, it is documented as an ‘Experimental’ application.  So, maybe it’s the starting point to something that will eventually take on ArcGIS, QGIS, or many of the other mapping applications, open source or not  3.  Not forgetting the functionality currently available in Excel, e.g. Pivot tables, charts, etc.

It’ll be interesting to see where they go with it into the future.








Tesco Clubcard Data

Tesco was one of the leading lights of Customer data analytics in the world of Retail.  In 1994 they first looked into the Clubcard idea.  They trialled it in a few stores, working with a Customer science company called Dunnhumby.  Reportedly, the then Chairman of Tesco, Lord MacLaurin, commented “What scares me about this is that you know more about my customers after three months than I know after 30 years.” 1

Based on the Feedback and information they accumulated from the trials, they rolled it out throughout the Tesco business in 1995.  This data was then utilised to target customer marketing, and supply chain opimisation, to drive profits across the company.  Up until the early 2010s this was the textbook case study for customer data analysis driving profit:



As late as 2013, Tesco “stressed that they do not sell their loyalty data to third parties”. 3

In January 2015, Goldman Sachs were brought on board to prepare the Tesco-owned Dunnhumby for sale, with a reported £2billion price tag attached.4

So, what happened?

Tesco announced a Pre-Tax loss of £6.38billion up to the end of February 2015.  5

Depending on what you’re reading, various areas are blamed:

  • £250million profit overstatement,
  • Competitive Marketplace,
  • An unsuccessful attempt to expand into the American market with ‘Fresh & Easy’ convenience stores,
  • Property & Share losses,
  • Tesco, themselves blame a shift online, which while successful in itself, has seen profits fall.

I suspect it’s likely a perfect storm of all of the above, the worldwide recession, and Tesco taking their eye off the core business were to blame.

So, heads have rolled, and non-core businesses are up for sale or sold.  Including the Clubcard consumer data.



Tesco reportedly have a database of one billion worldwide shoppers, and the experience to analyse & use this data for customer marketing.

“Major food and drinks companies like Coca-Cola, Unilever and Nestle, are all willing to pay Tesco for access to the data and the consumer insight and shopping habits it provides.” 6

Google & WPP, an advertising and marketing company are also in the mix.

At this point, you’ll read all over the internet about how Tesco were so focused on Data they lost sight of their customers, and maybe that’s true.

Harvard Business Review

LinkedIn Post

Tesco will also have to try and learn the lessons and turn the business around fast.

However, I’d like to cover two other points instead:

  1. Who you want to buy the data?
  2. Who owns the data?


Tesco have been working with this data for 15/20 years now.  They’ve maximised savings and learned quite a bit about consumer habits from it.  But it didn’t stop them from falling & loosing customers.  So, is it worth spending £2billion to buy it?

The low-cost competitors who are receiving some of the blame, don’t have loyalty systems.  They spend a fortune on marketing to get customers through the doors, but they don’t market directly to specific segments of their customers.  I don’t think many people would believe they’re not doing customer analytics without the customer loyalty schemes either.  If they can do it alone, why can’t others?  So, what will the new buyers of the database gain from buying the company?

They’ll get expertise in the form of expert staff.  They’ll get access to a database they’ve never seen before, and may be able to derive new data from.  However, the data is in the past.  Technology and recession has changed buyer behaviour.  So, is the data valid into the future?

Is it possible that the data itself was part of Tesco’s downfall?  Was it prepared for the fact that I can order coffee capsules in bulk online, and will never need coffee from the supermarket?  Or that if my next door neighbour does their shopping online that they’re not tempted by items they don’t need or pester power by their child?  What about the statistical increase in one/ two person households in Ireland at least.  Their shopping baskets are bound to be smaller that expected. 7

I agree that if the new owners ask the right questions of the data, they may get new answers that hadn’t been considered before.  However, from my perspective, the employees may be the greatest asset in the sale.


OK, let’s be clear Tesco own the data, for now.  As a customer, if you had a Clubcard, you were clearly swapping your purchasing habit data for ‘Points’.  (The fact supermarkets, credit card companies, etc. collect it regardless of this permission is for another rant another day.)

However, I believe this was on the understanding that it stayed with Tesco for their use.  The terms may now say, they can anonymise the data and share with third parties, but I suspect it wasn’t there in 1995/96!

Is this a breach of customer trust?  Can I refuse to have my data transferred?  Opt out?  Or should I just accept it’s going to happen and forget about it?  Frankly, I shop in Tesco about twice a year, so I’m not all that concerned on their behave.  However, it’s the principle of a ‘fair’ swap that seems to be being phased out in this world of consumer marketing.

Of course, as a Data Analyst, I’d love to see what I could find in the data!


From one of the original innovators in Retail customer data analysis, Tesco are now having to go a new route.  The Data will move on, and both Tesco and the new owners will have to re-think existing marketing strategies.

I think one of the most clearest pieces of knowledge to come out of this whole thing in the customers know what they want.  When you stop delivering it, they’ll move on.  You need to keep up with the changing world, but ensure you’re continuing to put customers first & foremost in your plans.  Investing in side lines, and things your customers don’t value, is a recipe for failure.

In the end data can only try and give you an answer to question.  If it’s the correct question you’re on a winner.  If it’s not, you can go down the wrong path completely.