Quick note on automation

Automation should result in organizational flexibility.

However, this is not a necessary consequence of automation.

Let met back up.

Prior to computers, all work needed to be performed by a human being. Humans, in general, are pretty smart: even the lowest paid workers usually have some ideas about how to do their job better (faster, more easily, with less error, etc). In general, management and organization involves taking those ideas; evaluating them; distributing them; and ensuring compliance.

Each step involves work by a human. However, people only have so much work they can perform in a day. This varies to some extent by person, and varies significantly based on the environment they’re in and the process that supports them, but is ultimately finite — and, in my opinion, quite low. I think you can get 3-5 hours of cognitively challenging / attentive work, per day, out of people on an ongoing basis.

Ideally, automating any part of a business’ process should act to reduce the amount of cognitively challenging work that employees do per day — enabling them to spend that portion on other work. This can either be directly value-adding, or kept as “flex” time for emergencies and one-off issues (which end up being surprisingly common).

Unfortunately, this does not always occur. I lay considerable blame at the feet of people conflating automation with enforcing compliance: a computer system can reduce competitive work, but it can also ensure that everyone follows a strict, unyielding process.

While process compliance is important, people tend to underestimate the number and importance of one-off exceptions to that process; leading to strife when employees need to make an exception the system does not allow for.

Similarly, computers can greatly increase the audit trail or visibility into the work employees do. However, adding visibility for its own sake is rarely value-adding, and in my opinion increasing the amount of data entry employees do can be a net drain on their productivity.

Instead, I think that software should be introduced as a way of creating found time: reducing the amount of boring, repetitive work that people perform, and freeing them to tackle important-but-not-urgent items on their task list. I would prefer to see success measured by a decrease in the amount of time spent achieving broader corporate goals, or the number of business objectives – especially those not considered business-critical – that an organization accomplishes within a year.

Technology frequently has a holistic impact on an organization. As such, a key metric should be holistic in nature. This has all sorts of problems – confounding being chief among them – but many of the more visible metrics act to distort the goals of implementation.

Microdecisions and Typing

I remember learning to touch type when I was in high school. It was really useful – I could look at the screen instead of my keyboard when I typed, and that allows me to monitor what was going on.

That is, I learned to touch type through instant messenger and online games (where you don’t necessarily want to look down for a few seconds).

The problem with learning to touch type in that fashion was twofold:

  1. First, accuracy was less important than speed. This lead to a high (ish) error rate.
  2. Second, speed was achieved on “familiar” words; I got good at typing regular words, and the keys the words those used.

That was good enough through college – getting 50 words per minute was fine for essays, and my error rate was under 10%.

More recently, however, I’ve become aware of how frustrating it is to have a high error rate in typing.

Errors are nasty things. You need to watch closely for them, and then identifying them requires you to stop your chain of thought, correct it, and then move on. It’s mentally “draining” particularly for words you’re not really experienced with.

Worse, the less familiar with typing and the keyboard you are, the more you have to focus on articulating what’s in your brain – need to recall where the keys are, how to spell the word, and so on.

It’s not much – and in most cases, barely noticeable – but it adds friction to the process of getting things out of your head and onto the computer.

If you feel more comfortable with a pen and paper than with a keyboard, that’s probably because of the additional marginal energy you need to exert to put it through your input device (mouse/keyboard).

And, of course, the problem is more apparent when programming. Hitting brackets, equal signs, etc – not normally used in English – are very, very necessary in programming.

So, a few months ago I endeavored to learn to touch type. The good news: it’s not very difficult, doesn’t take that long, and works pretty well. The bad news: your typing speed drops a lot initially.

TypingWeb

For general purpose “learning to type” I found TypingWeb to be the best. The interface is clean, the lessons helpful.

Typing.io

For special characters, typing.io is excellent. You can type through code from open source projects, and – in the premium version – upload your own code and type through that.

Typing.io also has a really useful grading summary. It counts – unlike so many other typing tools – backspaces and eliminated characters to develop an efficiency score. That’s great – an error rate of, say 2% doesn’t fully encompass how much trouble you go to in order to fix the errors,. Counting incorrectly typed, collaterally typed, and backspaces does account for that trouble – and from experience, I’ll say that an error rate of 5% can lead to an overall efficiency of 80% – 85%. That’s not very good.

The above lesson was a little easier than most – Python, using variable names I’d typed quite a bit before – but coding feels so much smoother when the unproductive keystrokes are under 5% (usually, aka for Javascript or PHP, I’m at ~60 WPM with 9% – 12% unproductive keystrokes).

I’m going to keep practicing until I can get to > 75 WPM with < 3% error rate across multiple languages (PHP, Javascript, C, Clojure, Python, etc). At that point I think I won’t have to worry about what to type – it’ll just be seamless – and I’ll reduce the friction, very slightly, from getting things inside my head onto the computer.

The Future is Freelancing?

Recently, Shane Snow (CCO of Contently) penned an article called “Half of us May Soon Be Freelancing: 6 Compelling Reasons Why.”

Shane believes that the future of journalism is freelancing and is working to make that happen – he is the COO of Contently, which was founded to help freelance journalists succeed. Fortunately, he doesn’t adopt the position that all business will become freelance-based – he says that “I don’t believe the majority of businesses will ever become completely freelance or remote (core staff need to be in-house and work in proximity at any company of a certain size; local service-based businesses need people on site, though those can be freelancers).”

Quite right: there are reasons to have people on site, and to have employees on payroll.

To understand that, let me outline a different perspective on freelancing.

My go-to to understanding the formation and structure of businesses is Ronald Coase; specifically, his article on The Nature of the Firm. The basic premise is that a firm exists where it is cheaper to do transactions within a company than outside of a company. As a crude example, if you have a graphic designer in house you can ask them to do something; if you go outside the company, you normally have to deal with asking for a quote (which entails generating a RFQ, etc); additional overhead in billing; less commitment on resource allocation; difficulty meeting deadlines; and so on.

As a note: “transaction costs” include pretty much everything, from the costs of locating a freelancer, vetting the freelancer, risk of getting the wrong person, costs of communicating, etc.

For a company, hiring employee full time makes sense as long as you have the work to justify it – a company is basically negotiating a lower rate for buying in bulk, and committing to future purchases. Employees can agree because they decrease their risk (not having business) and increase their utilization (no more accounts receivables, marketing, lead generation, contract negotiation, etc).

So, there’s a natural place for freelancing: it’s where companies want less-than-FTE work of a certain kind, the transaction costs are sufficiently low, and the freelancer is not risk-averse and/or is in high demand.

Consequently, anything that reduces transaction costs will increase the rate of freelancing (of, if you’re feeling extra fancy, the “natural rate of freelancing”). Online marketplaces that make it easy to establish contracts and monitor work; online portfolios that show work done; ratings by people (freelancers rating companies and companies rating freelancers); and so on.

The fun factor is the internet, because the internet effectively expands the market – it removes much of the impact of geographical location. Not completely, because people still prefer in-person contact, but in general you’d expect that to be factored in rates or business allocation (so, a freelancer who is nearby will get selected over a remote freelancer).

Overall, I don’t think this is a particularly notable change in theory – but it certainly is notable in practice.

How much time in a college degree?

I’ve been going through some university lectures recently (Stanford SEE, iTunes U, and MIT OpenCourseWare) and, of course, I’ve created a spreadsheet to allow me to track completeness and prioritize.

Currently, I have 13 courses setup in my Excel spreadsheet, for a total of 308 lectures and 319 hours.

I wasn’t sure how to contextualize that number, so I did a rough check on lecture hours during my college years (something, oddly enough, I never did in college).

If you assume 18 credits per semester– where each credit is meant to map to one hour per week of lecture time – and have 8 semesters, which each averaging 13 weeks, that gives us 1,872 hours of lecture-time (the recommended 15-credits per semester works out to 1,560 hours)

That’s pretty inexact – for instance, many of my 4-credit courses at Skidmore had only two 1:20 classes per week, for under 3 hours per week. Others were spot-on (two 1:50 minute lessons/week) and others were more (e.g. with a lab).

If we round that number up and assume 2,000 hours of work – well, for starters, that’s close to the number of working hours in a year. It’s interesting to compare the learning value from one year of work with the learning value of all classes you attended in college. I understand that (i) it’s not directly comparable (building skills vs. knowledge) and (ii) work at college includes homework (5 hours per week? 10 hours? 20?).

Still, it’s a helpful benchmark in my mind, particularly when moving into a new domain that you’re unfamiliar with.

Employee Investment Payoff

Earlier today, I posted in a forum on the topic of employees using their own money to purchase equipment that they could then use at work.

An example was a second monitor, which has documented productivity improvements.

My contribution was to observe two things:

  1. Pre-Tax vs. Post-Tax: A company purchases equipment at a discount relative to the employee, since the company deducts the cost before taxes.
  2. It’s a tiny expense: As an example, I calculated that spending $600 on a dual-monitor setup for an office employee earning the average income for “Professional and Business Services” would be beneficial if the employee saved less than 2 minutes per day (over a 3 year time period).

It’s a pretty simply calculation:

  • The average hourly wage is $25.13. But that’s not the cost to the employer of having an employee there – benefits accounts for 27.8% of total compensation. The cost per hour is actually $34.83.
  • Dual monitors are unlikely to last only one year. Let’s assume a 3-year replacement period.
  • The breakeven point is the employee saving 5:45 each year.
  • If we assume 250 working days per year (50 weeks), that’s 00:01:22 (one minute 22 seconds) per day.

Of course, you could argue that (i) increases in employee productivity don’t map directly to profit, and (ii) that’s very difficult to measure.

But it’s interesting to note how fast some businesses are to waste time, and how slow to authorize relatively minor expenses (e.g. $600) to improve employee productivity.

That’s a failure of accounting, in my mind.

I’ve included a very basic spreadsheet below.

Disruption Is Not Nitpicking

Suw Charman-Anderson, a UK self-published author and social media pioneer who also writes for Forbes occasionally, believes that Amazon is ripe for a fundamental disruption in its business model.

image

Her article – linked above – describes her argument. I’ll summarize it, briefly.

  1. Amazon’s review system is fundamentally broken; customers find it unreliable and sufficiently harmful to make them look elsewhere for reviews. This “will habituate them to looking outside Amazon for information on books and bring Amazon’s position as the canonical reference for books under threat.”
  2. Amazon’s affiliate program is not the only option, and bloggers will shift to other programs or offer multiple links. Amazon currently has share, but no competitive moat.
  3. Amazon doesn’t provide enough data to book publishers so they can make informed marketing decisions. Publishers cannot forge a direct relationship with their customers (and they need to).
  4. Publishers can’t bundle books and ebooks through Amazon.

Additionally, Suw asks:

Can Amazon be sure to maintain its dominant position purely through its catalogue, reach and discount? Is that really enough for keep it secure?

I believe there are two major problems with this viewpoint. First, Amazon’s advantages are not limited to its catalogue, reach and discount; second, that the 3 problems Amazon has are disruptive.

Conceiving of Amazon as an online version of WalMart (or ASDA) is a mistake. The core operating principle of Amazon – that I can see – is to lower the friction of each marginal purchase. Once you have an account on Amazon, each additional order becomes easier. If you buy enough things (say, one purchase a month) where an Amazon Prime subscription pays off, the friction lowers even more. And if you buy a Kindle, the friction for certain media drops further still – down to instant gratification.

This is one of the reasons why Amazon has so strenuously defended its 1-Click patent: it regards the ease of purchase as a serious competitive advantage.

And, while Amazon doesn’t kick data back to its suppliers, it uses every scrap of data it can get its hands on. If you have an Amazon account, and you visit Amazon to check out a product, Amazon will note that and then email you later with an offer. The time delay of the email, and the quantity of the emails, is – I’m confident guessing – variable on a per-person basis to maximize read-and-clickthrough rate. This is one of the smallest things Amazon does.

In fact, I wouldn’t be surprised if the industry upset over how Amazon has been managing reviews has to do with Amazon choosing what reviews to feature / hide / etc based on how that maximizes conversion rates and minimizes returns. What seems to be random deletions and arbitrary rules is more likely to be Amazon being very aggressive about managing KPIs for conversion rates and customer satisfaction.

The combination of Amazon’s near-obsessive focus on making marginal purchase decisions easy – i.e. their conversion rate – and their well-known focus on using data to make decisions means that Amazon does have a sustainable competitive advantage with their affiliate system.

An independent website, seeking to monetize links to a retailer, will base their opinion on one primary thing: what makes them the most money. I would wager that the conversion rate from Amazon Affiliate links is much higher than the conversion rate from other links.

Which means that (i) Amazon is likely the preferred affiliate link, and (ii) adding more affiliate links on a page will lead to a net decrease in revenue (if for no other reason that multiple links for the same thing decrease total clickthrough rate – the cognitive load of making a decision about which link to click on is more than most people care to exert).

Unfortunately, I also don’t find Suw’s criticism of Amazon’s poor data-sharing habits. Oh, not that the data Amazon makes available is good – it’s not, if you want to have good ROI numbers for your marketing work – but publishers don’t care. They may complain about it – but for their entire existence, book publishers have sold through multiple channels. No brick and mortar bookstore kicks back sufficiently detailed information to know if customers purchased a book based on a TV ad, a Facebook ad, or a news article. I doubt they collect it themselves.

Publishers, in other words, are not facing anything new.

I imagine that’s why Suw’s solution is not for publishers to exert themselves such that Amazon sees the error of its ways.

No: Suw’s solution is more radical.

Think about this: These days, authors have to do a lot of their own promotional work. Contrary to popular belief, just chucking a book on Amazon doesn’t mean that it’s going to get found and bought. And that’s especially true for new entrants with no reviews, no or low sales, and a price below £2.49. You have to promote, and promote hard. Doesn’t matter if you’re self-published or an author with a traditional publishing house, at some point you have to reach out to your audience and say, “Here is where to buy my stuff”.

That means you can choose where to send them. Will you link to Amazon, where your sale goes into a data black hole, or will you send them to your own webshop, or your publisher’s, where that information can be captured and you can provide a few little extras to keep your readers sweet?

Why is this wrong? Because it’s all about the sales.

Fundamentally, people – publishers or authors – can use detailed data to maximize profit. There are two ways to do this: increase revenue, or decrease costs. More detailed data allows them to spend marketing dollars where they have the highest ROI (increasing revenue), and it also allows them to identify losses more quickly – cutting off spend.

But sales data does most of that. And, because book publishers have diversified sales channels, they can get a pretty good idea of what advertising channel has higher ROIs. They get rough geography, research reports can give them broad demographics (who shops where), etc.

Could they use more data? Sure, everyone could. They could, for example, use it to identify customers who are most likely in a new book, and send them personalized emails.

Except Amazon does that already.

What’s the real benefit in doing that kind of personalized marketing work yourself, if your retail partners do it for your? You’re taking on extra cost, and unless you somehow extract more money from the chain – e.g., charging higher prices – it’s not worthwhile.

It might make sense, of course, if your retail partners aren’t doing any of that marketing work. Thus, it makes sense to advocate for publishers to do a lot of that if you also want them to replace their retail partners:

Suw explains:

I don’t know how much more information a major publisher gets out of Amazon, though I’m guessing it isn’t half as much as they’d get if they ran their own retail operations. And ecommerce is a problem that has been solved. There are plenty of off-the shelf-solutions for inventory tracking, sales, fulfilment (both digital and physical), the whole nine yards. There’s absolutely no innovation needed for publishers to start their own retail outlets online. They could get going tomorrow if they so chose.

Apart from significantly underestimating the costs for running an ecommerce website, I’m not sure Suw understands that she’s making an argument for vertical integration (or that many publishers used to offer purchases through their own website, and some still do).

It would be a significant industry shift. A publisher makes money by (i) identifying excellent books, and then (ii) selling those books everywhere they can. They only need a fraction of the books they select to be really successful, and they can lose money on the others (no judgment is going to be perfect). They can incentivize authors to publish with them by offering marketing campaigns, editing services, etc to improve the quality of the product. They can sign long-term deals (multiple years, books) in recognition that there are non-trivial setup costs for a name-brand author that will sell via name alone.

It’s an entirely different arrangement from classic retail stores, who try to (i) identify great books after they are ready, and then (ii) make it easy to buy them.

Indeed, you could argue – as Suw does – that “Amazon now risks exactly the same disintermediation that it perpetrated a decade ago.” Amazon would be ‘disintermediated’ from the sales process.

Unfortunately, I don’t think Amazon did disintermediate anyone. Rather, Amazon disrupted the cost structure of traditional retail.

Tradition Brick & Mortar Amazon
  1. Print book
  2. Store at Warehouse
  3. Ship to Store
  4. Customer Shop at Store
  1. Print Book
  2. Store at Warehouse
  3. Customer shops at website
  4. Ship to Customer

There are two things to note: first, that a website has high upfront costs but low marginal costs with tiny increments. There’s a reason that Amazon was founded in 1994 and turned its first profit over 7 years later.

In contrast, brick and mortar stores face a step function for growth. To open a new market, they need to create a new store which is a non-trivial expense. Additionally, their market was bounded by the geographic area within which people are comfortable travelling. Nor are stores cheap to maintain.

You can buy from Amazon nearly anywhere. Not so with Barnes & Noble, Waterstones, etc.

Second, that Amazon faces lower risk in carrying a title. Amazon ships a book out only after a customer pays for shipping costs; a brick & mortar store pays to bring the book to the store, first. Since Amazon can ship from any warehouse, it has less duplication.

This means that Amazon can also offer a far larger catalog, and then limit what it presents to people based on sales. A traditional bookstore can only have a limited supply of books in stock.

The disruption was away from having a physical store – a significant capital investment, both in real estate and inventory – and towards a virtual store, which doesn’t operate with the same limitations.

Suw suggests no disruption in either cost or limitations, just a shifting of the existing cost structure – vertical integration, in other words. However, there’s no real reason to expect that a vertically integrated publisher would do better.

Of course, Suw could be right. Some retailers have publishing arms -  Barnes & Nobles owns Sterling Publishing Co;  Amazon has Kindle Direct Publish, as well as a string of publishing brands (47North, AmazonCrossing, AmazonEncore, Thomas & Mercer, Montlake Romance, and Amazon’s Children Publishing). But those don’t seem to be to be major players – despite Sterling Publishing being around since 1949 (although Amazon Direct Publish could be …bad… for publishers, long-term. That’s a serious disruption play: publish first, then filter later. Publishers operate as gatekeepers; Amazon gets rid of the gatekeeper, just hides books that don’t sell).

Suw Charman-Anderson has not, in my mind, predicted any market disruption. Worse, I think that the weaknesses Suw identified is more nitpicking than necessarily weaknesses – and even if they were, I see no reason why Amazon could not eliminate those weaknesses with a relatively minor effort. There’s nothing stopping Amazon from “solving” any of the issues Suw brings up except that Amazon doesn’t want to. Is its reasoning correct? Perhaps not – but when Amazon is operating by choice, and not an exogenous limitation, the possibility of disruption is slim indeed.

Problems with Printing

Or: Leave it to an expert, because amateurs make mistakes.

I’m formalizing a freelance business for data analysis. It’s part market research (what do companies need? what are companies currently doing?), partly networking (can I get a job?), and partly for revenue purposes (rent/student loans).

I’ve been doing some freelance web development for some time, but I don’t think I’m enjoy a career in it. I like many of the technical components – and have gotten rather good at HTML/CSS/JS + PHP – but my original interest has always been in knowledge-building (my major in Epistemology, etc).

So: I did the first thing any self-respecting freelancer with too much time on their hands does – I designed a business card.

It was meant to look like this:

image

Unfortunately, it seems that when I changed all of the colors from RGB mode to CMYK mode, I missed one. Which one?

Only the most important one:

image

Yes, the back of the card has been rendered unreadable because there is no close CMYK match to the RGB color I used. While it was meant to match the website – not yet finished – instead it’s an unreadable mess.

Is there a lesson to this?

Probably: experts matter, because they can catch mistakes; printing is hard; and you should test multiple times in small batches before committing a large order.

All things I thought I knew. Just not well enough, apparently.

Hiring for Talent Development

Labor is an asset, but it’s different in kind to most asset classes.

I’m currently in the job market. I’m not (yet) looking very hard – I make some money doing freelance web development, and am involved with a couple of companies part-time doing data analysis, which means I make enough money to get by. The downside, of course, is that it makes me lazy.

However, it’s caused me to think about “clearing the labor market” in a somewhat different way.

Typically, I tend to fall back to an economic-centric way of looking at things. Companies have a set of things that need to be done; some known to the company, some unknown. Hiring talent (labor) to perform those duties is a matter of evaluating whether the candidate can perform the known tasks well, and ideally add value by identifying unknown tasks that add additional value to the company.

Each candidate is willing to accept a salary range, where the range maps to a fixed quality of life score for the candidate (e.g. a candidate may accept a lower salary if they prefer the work, or it has fewer hours, less travel, etc).

It’s neat, simple, and – for me – entirely misses the point.

While I’m not looking hard for a position, that mainly boils down to not actively applying to many jobs (a couple a week? Less?) as opposed to not looking at options. I receive targeted emails from a few websites (e.g. Indeed.com) as well as recruiters who randomly email me or message me on LinkedIn.

I find it quite astonishing that so few positions seem to appeal to me.

And then I realize why.

I’m not looking for a job, per se – I’m looking for the opportunity to apply, and develop, my skillset and knowledgebase. I’ve been relatively active at expanding my skillset, and it’s something I’m very interested in developing further in a few targeted ways.

Specifically, I want to use statistical programming languages (e.g. R) to analyze diverse data sets (normalize, explore, hypothesis testing, etc.) and create a narrative to explain the data and influence decision-making in the right direction (report writing, presentations), and then automating that analysis where possible (using Python, C#, etc) and/or creating web-based dashboards and tools. Ideally, I’d use a sizable fraction of my (targeted) skillset.

I’ve developed the skeleton for this skillset over time, such that a moderate investment would really allow me to flesh out those skills and become something of an expert.

Unfortunately, figuring out where I can do this is I damnably difficult.

I know, I know, the answer is informational interviewing and meeting up with people in various fields. Most jobs aren’t posted online, particularly the good ones. Or so I’ve been told.

But.

The jobs that I do see online, that I read through in my email every day, are written exclusively from the standpoint of current skills and previous experience. What it doesn’t tell you is anything about the opportunities for development.

Sure, I get the rationale that companies are interested in what candidates can do for them. I even understand it from the perspective that, once the candidate pool has been thinned down, the company and the candidates can discuss career/personal development more directly, since that’s likely to be more specific to the candidate. And hey, I also understand that this is a bad time – candidates are a-plenty, so companies have less incentive to outline what they can offer candidates, since there are plenty of good candidates without going to the extra step of formulating the benefits to the candidate of the company.

Except…

I hear – from people hiring – that good candidates are damnably difficult to find. Yes, it’s quite possible – probable – that in response to increased labor availability, companies increased hiring requirements and lowered compensation to thin the flood of applications.

I’m simply of the opinion that the best people to have around are those who want to push their boundaries. To learn more; to become more capable; to have a greater impact on the company.

All of these job descriptions seem, implicitly, to want someone who is currently capable of the job and no more. Who can step in, fulfill the tasks outlined by the company, and very little else. Perhaps even people who are less interested in developing themselves, and more interested in getting the job and getting out.

After all, turnover is expensive, and if you can find someone who can do their job well and stay for an extended period of time, you save money. Sort of.

The difference between labor and other asset classes is that labor is changeable, and appreciates in value over time (usually). For each worker, there is an opportunity cost for taking a job below his/her skillset, failing to develop skills, etc.

The difference for the company is that a known resource – capable of a fixed amount – will never outperform.

The most insightful epistemological statement made by Donald Rumsfeld as Secretary of Defense is that:

[T]here are known knowns; there are things we know that we know.

There are known unknowns; that is to say there are things that, we now know we don’t know.

But there are also unknown unknowns – there are things we do not know we don’t know.

A company hiring a known resource will fulfill the known knowns very well. They can also hire – with much more difficulty – in an effort to deal with known unknowns (“We don’t have someone capable of fixing this old Fortran code? Then find someone who can!”).

But a company is limited to its current resources to identify, and deal with, unknown unknowns – the most dangerous form, and potentially lucrative, form.

Hiring candidates who will continue to develop over some time period (say three years) seem, to me, to be much more likely to identify the difficult problems – unknown unknowns.

I’d prefer to see job descriptions that provide a list of requirements, and then outline the areas where a candidate would be expected to improve – or, listing the areas where significant improvement in an employee would have disproportionate effects for the company, for that position.

I mean, an employee can invest in all sorts of skills that have limited returns to a company’s bottom line (office politics, anyone?); for a given company, and a given position, the number of areas of improvement with a significant potential impact is likely to be quite small.

Oh well. At the bottom line, this is just all wishful thinking – stuff that would make my life easier, by displacing work from me to companies seeking to hire employees.

Still, it would be nice. Reading these job descriptions becomes terribly banal after a while.

Two Things

  1. It’s unbelievable: I haven’t added a blog post in nearly a year. Worse, I have no excuse – writing is an activity I both enjoy and want to improve at; additionally, I’ve had a sufficiently interesting year to provide me with decent writing material.
  2. MediaTemple Grid Hosting is slow. Really, unforgivably slow. Really, I (i) opened Word, (ii) registered by blog account, and (iii) typed this before the Add New Post page finished loading. I realize I don’t get much traffic – 100 hits a month? – but that’s unforgiveable. Frankly I should move off them entirely for simple hosting like this, but I might shift to a (ve) server since migrating would be helluva lot easier.

SEO and Talk: The Non-Story

In the past few weeks, there have been a number of stories presaging a shift in how Google and other search engines rank content.

The contention is simple enough: Current search engine technology is limited because people game the system. Lately, people have been gaming the system a bit too successful; people have documented the unreasonable success of both Demand Media (recent IPO) and The Huffington Post (just sold to AOL).

“Gaming the system” is, in the system, called “Search Engine Optimization.” And, – due to the recent scrutiny -  a number of people are taking the opportunity to forecast the death of search engine optimization.

Farhad Manjoo makes the argument in Slate that content farms are doomed an ignoble death. According to him, the problem is that “Google’s weaknesses aren’t permanent.

There are a few problems with this forecast. To illustrate them, let me back up a bit and discuss search engines and content farms.

Originally, a search engine existed to answer the question “Which pages on the internet have to do with my query?” But, as their power and success has grown (and by “their” I mean “Google’s”) so has the scope of their ambition.

Now, a search engine exists to provide a (satisfactory) answer to a searcher.

The history of search engines means that this has taken the form of links (10 to a page!) and, occasionally, in-line answers (Bing’s “Instant Answers” and the like). The current form of search engine results as a set of blue links casts the search engine strictly as an intermediary between the searcher and the answer.

The history of search engines also lead to the rise of content farms.

From an objective perspective, content farms fulfill the same goals as search engines. That is, a content farm aims to have a page ready for every question an individual may have. Whether it’s How to Train My Hamster, How to Kill Flies in Your House with Mushrooms, or How to Set Up a Super Bowl Party, eHow has a page ready.

How do they know what pages people want? Well, it’s pretty simple: they mine search engine data. (Well, plus some guesswork and interpolation).

It’s a pretty ingenious idea: people are looking for things on Google, so why not create the very things they’re looking for? You already know what they’re looking for, since they’re already searching.

Really, it’s a wonder search engines didn’t think of the idea themselves (Oh, wait, they did).

Now, due to revenue concerns, most of the content in content farms is crap. You pay very little, and then reap in the advertising revenue. As the advertising revenue scales with how much traffic you get, and traffic comes almost exclusively from search engines, content farms sink a huge amount of effort trying to find (i) what search engines consider important, and (ii) doing more of what they consider important.

This is problematic for a few reasons, all of which can be summed up by: “Google isn’t perfect.”

More specifically, search engines like Google measure proxies of quality and use it as an indicator of quality. This is expressed most clearly in Google’s big breakthrough – PageRank. PageRank assumes that (i) humans link to web pages, and (ii) on average, a link is a vote. Someone considers the page valuable, or no one would be linking to it. Certainly, Google uses other variables (Bing uses over a thousand, so we can infer that Google uses at least that many. Sort of.). But (almost) all of them are proxies for value – proxies for the reality of the situation.

When content farms engage in search engine optimization, they are “tricking” Google into thinking that their content is higher quality that it actually is.

To an extent, this is purely a trick. Google must – and has in the past – responded by de-valuing signals which people being to manipulate consciously.

But in another sense, this is good: some of the things Google tracks actually do have to do with quality. The announcement last year that Google was going to begin using page load speed as a ranking factor is an example – sites that load faster as more pleasant to use, so sites that work for a higher Google ranking will also benefit site users.

In that sense, search engine optimization is similar to grammar. Yes, a writer with poor grammar can communicate himself – but it’s considerably easier for a reader to interpret someone who has a grasp of good grammar.

This explanation of search engines and content farm  suggests a couple of things.

First, that search engine optimization isn’t going anywhere (nor has grammar, to the dismay of many).

Second, that as Google becomes better at providing links to high quality answers, content farms will provide higher-quality answers.

It’s a terribly symbiotic relationship. Since content farms subsist on advertising revenue, the more accurately Google (and others) can reflect user judgment, the higher the quality of the content that content farms will produce.

Unless both run out of money, which is rather unlikely – after all, it’s Google (AdSense) that’s paying the bills for the content farms, and it’s Google AdWords which is paying the bills for search.