Soliciting Advice: highly concurrent, available, non-blocking server

I’m seeking feedback on a language or platform for a highly reliable and low latency web service / application.

Assumption

Bottlenecks in a web service are usually related to data retrieval and storage, and eventually bandwidth and latency. 
Highly concurrent, lightweight threads provide options for reliability, load distribution, and perceived performance that would otherwise not be available. 

General Requirements 

  • Easy to use (build, deploy, monitor)
  • Plentiful external, stable, pre-integrated Libraries
  • Use case: distributed, non-blocking web services
  • Quite a bit of message and job queueing 
  • multiple databases , caching

Top candidates

Very incomplete list of pros and cons, but some of my thoughts, highlighted. 

  • Scala
    • pros
    • cons
      • new language syntax, paradigm learning curve
      • doubts about JVM memory efficiency and stability as resources are constrained
  • Server Side Javascript – via  node.js  
    • pros
      • redis integration for caching
      • fast, lightweight, easy language. 
      • some custom js would be portable to browsers (coolness)
    • cons
      • very new
      • performance
      • not as many external libs?
  • Tornado
    • pros
      • Python
        • as many libs as Scala
    • cons
      • narrower use cases
      • performance

 Would love comments, but a more complete list is presented in a survey:

Field Collapsing in SOLR

Field Collapsing
Field collapsing allows something akin to a “group by” in SOLR, so that the number of results returned reflect a logical grouping rather than another total.??
Faceting can be used in conjunction. Facet counts reflect subsets within results, where-as collapse counts are group by counts.
This means that Field Collapsing could be used for certain analytics, as well as the common use-case of nesting and grouping results. To use effectively, I found it helpful to “pre-collapse” certain fields, so that a new, unique string was created that could be used to easily group, since I believe you can only field collapse on a single field. (If I’m wrong, please let me know!)
Special Setup
This assumes you are or will be running a development version of SOLR (trunk via SVN).
Field collapsing is not yet available in SOLR trunk and you must apply a patch file to SOLR and build again.
If pulling in from trunk, download the patch found at https://issues.apache.org/jira/browse/SOLR-236 in to your solr source code directory.

wget https://issues.apache.org/jira/secure/attachment/12440108/SOLR-236-trunk.patch
patch -p 1 -i SOLR-236-trunk.patch

And rebuild using supplied Apache Ant scripts.
Read more on these sites or in the comprehensive “Solr 1.4, Enterprise Search Server” page 191.
Why SOLR
SOLR has a been a great tool for BetterLesson.org. Because our primary database is MySQL, we looked at around 8 full-text indexers – but the two finalists were Sphinx [1] and SOLR. Sphinx had very tight integration with MySQL, so the learning curve seemed less. ??SOLR required a JVM, an app server, and quite a lot of configuration.
When we were deciding, an excellent SOLR book came out just when we were choosing. Further the SOLR IRC channel and mailing list for SOLR are friendly and quite active. We even had the option for commercial support through Massachusetts’ own Lucid Imagination. So I dove in.??While the configuration is non-trivial, but the configuration parameters have proven very powerful.??
More background:
I had written a half-dozen or so custom faceted search interfaces – almost entirely using MySQL, and even one used Sesame (an RDF store – and it eventually worked pretty well). Skipping the stories of pain, confusion and suffering on the road to enlightenment – SOLR has been great.??Used extensively at Netflix.com, Zappos.com, CitySearch.com, Reddit.com, Wego.com, Whitehouse.gov, Drupalgardens.com and others [2], ??supported by Apache, based on Lucene, SOLR provides a scalable, distributed search and has good data import from MySQL, including delta queries.
[1] – Sphinx is used by Craigslist and http://www.sphinxsearch.com/powered.html

Why I’m starting a personal tech blog

Note – I’ve merged a couple of blogs and this post was from another site.
Welcome.

 

Supercalafragilisticexpialadocio.us is for technologists, like myself, who enjoy the challenges that come with taking interesting ideas and bringing them to a wide audience.
Currently I’m the CTO of a “lean” startup – http://betterlesson.org.

 

The work I’ll be posting is done in the context of startups. Startups are great. The definition l’ll use of “startup” is informal. By my definition, a startup is the adventure of working to make a sufficiently new idea self sustaining.

 

Building a laboratory and having a fair shot at a large audience is no longer a barrier to entry.** Often, technologist’s education is almost entirely from peers on the Internet – piecing together technology that yesterday may have been out of reach.  With low barriers to entry comes a tremendous amount of competition.  What differentiates is not only technical skill, but ability to successfully navigate the technology world. Propaganda, pain, tunnel vision, and joy dot the fast-changing technology blogosphere. Execution is tough when the more ambitious ideas you have would be obsolete by the time you finish them. When do you keep exploring? When do you settle on a technology and dive in?

 

I plan to dive into technology specifics, so this post is to set the tone. If I stray from my own ideas, tell me. I need your help.

 

  • Technology alone is not enough:
    • Make sure you love something you consider to be not technology. Loving code and assembling lego blocks works for some of the best – but for mortals, having other interests can serve as a compass to guide the “why” you do what to do.
    • Courage is not only in tackling cutting edge technologies, but also in shepherding the transition to new technology for the larger world we serve. Yes, I believe that world domination is a goal of every startup. But domination means we want to serve as many people possible, and do it well. “Don’t be evil” is a mantra I reflect on in this context.
  • Be open:
    • Isolation is dangerous. Open source, open communities, and open style of development keep you healthy. Some “enemy” may occasionally out-calcuate you because you showed your hand. But you’ll lose a few games of poker in order to win the innovation game. Chances are if your idea is something some one _can_ steal simply by sharing the idea with them, then some one else is going independently invent it anyway.  Don’t live in fear that some one might be doing something newer or better.
    • Your reputation is important, but if some one can trash your idea easily it’s either because the idea is actually bad, or they are simply wrong.
Looking forward to sharing the adventure.

 

Cheers,
Jonathan

 

** at least with sufficiently modern computer and a high speed internet. 

 

In defense of the Semantic Web, Again.

With the official announcement that RDF is in Drupal core and the Semantic Web conference in DC, I wanted to take time to respond to “tales of a semantic web skeptic”. Healthy criticism, and a good read.

This piece is to defend the vision, if not the execution.

I helped get RDF into Drupal and spoke on the topic at two DrupalCons (one in Brussels and the other in Barcelona). No credit beyond that belongs to me, I’ve done no development on it since.

Arguments are mostly semantic about the semantic web. The computer science is done, the technology is used in real world applications in genetics, law, and military applications.

What is perhaps a PR shift is to differentiate the upper-case and lower-case semantic web.

The semantic web:

  • a data exchange standard for graph based meta data and logical meta data
  • a webservice with a standardized API
  • a graph database, or other specialized store
  • consumers or Agents

The Semantic Web (a la W3C)

  • RDF(S), RDFa, OWL(S), etc
  • REST/ SPARQL
  • Sesame, Jena, YARS, Redland, etc
  • Semantic Agents

Microformats and popularizations are all good. Folksonomy instead of Taxonomy – Clay Shirky, or rather, the mob (you and I) he describes, is hard to argue with. To mash up verified, trusted content in federated queries from heterogeneous data sources is cool to me, but not everyone.

Tim Berners-Less talk at Ted changes the term to “Linked Data”. That makes sense. I think there’s a struggle to create a revolution and an industry again – something with as big an impact as the web. Linked data is the web Sir Web wants/wanted. But the first web didn’t happen because a few folks wanted it. We needed it. As the YCombinator mantra goes “make something people want”. Making Semantic Web software has, in the past, made Semantic Web people happy… but not too many others (I have first hand experience in this).

A final two points:

Maybe it’s fair to say the community may be too top-down. Luckily, freedom of speech extends to computer code.

Not everyone is going to be inspired and “believe” in grand visions. Artificial intelligence is perfect analogy. Our culture has adopted the term – for better or worse – to mean lots of things.

Norman Borlaug – Nobel Peace Prize “Green Revolution”

Norman Borlaug recently passed away, and has perhaps saved millions of lives by improving food crop productivity. Fifty years later, a tremendous amount of science has been done since then. The Green in Green Revolution might not be appropriate any more.

Contributions to science can to be viewed as non-political and amoral. But there is a polarization and politicization of organic and “industrial” agriculture. My opinion is that the fundamental issue is sustainability. I think the course correction for industry still needs to be sustainability (economic, environmental and social)

If you are into agriculture – (food sources in general) – the video below is an interesting interview with Norman’s neighbors on the impact of chemical farming on soil and sustainability.

As a technologist from rural Maine, I like to keep in touch with ecology/agriculture topics. (my post at hacker news )

considering voting – technology and new paths towards democracy

I voted today, just an hour ago. The outcome of the election is unknown. There no reason for me to be cynical at the moment. In this moment, I’m happy.

Voting is one part of a participatory democracy – a clear path to being involved, belonging to something larger. However, Democracy, for me, is to be independent, educated, creative and unify around specific causes. These causes change depending on the world around us. This perspective on the purpose and meaning of democracy leads me to question voting as the best answer we have to creating democracy. Follows is a brief outline on how the internet and related technology offers new options for what government is, and does.

  1. Open Source voting machines:

    Paying private companies to write bad software on unsecured hardware is obviously crazy when we are talking about one of the most basic components to the infrastructure of democracy.

  2. Semantic Web and Open Government:

    If I am honest with myself, I have to admit that no matter how much research I do, I doubt I really know what is going on in our government. I’m pretty sure there is waste, but the waste is likely systemic as much as it is caused by corruption. There are complex organizations spending most of their energy just being complex. Institutional complexity can be reduced when there is insight into on what that organization is even up to – ie “transparent government”.
    Putting all government documents online isn’t enough to make Government transparent. When single laws are 800 pages, being able to search through the mountain of data is critical. But, even more difficult than finding what you want is summarizing and co-relating data. I won’t elaborate on specifics here, but if the the whole of our public government is open, searchable, and easy to collect and reorganize into digestible pieces of knowledge, then we’re all better off.

  3. Game Theory:

    Voting is not simple. Votes aren’t just counted, they are grouped, reassigned, allocated, recounted, averaged, rounded, molded. The game theory behind different voting mechanisms is a very well researched field. William Poundstone’s book gives a great intro into other voting systems. Hopefully on this day, November 4th, 2008, we overcome our flaws.
    Check out Ubuntu/OLPC contributor Benjamin Mako Hill’s online voting site Selectricty to try out other voting systems and the Ruby Vote voting software.

  4. Taxes:

    The internet can make where money flows more dynamic and accountable.
    Paying into the system is not, currently, the same thing as buying into it. My libertarian friends feel the pain most acutely. With 1 billion dollars spent on the election, each vote cost roughly $8. A billion dollars seems like a lot, but when all those voters pay $10,000 dollars or more in taxes a year, that’s a pretty good profit margin.
    So why can’t we vote on every little thing our money goes towards? Perhaps, because the world is too complex perhaps, or perhaps taxes are as sure as death. Death usually isn’t questioned. Well, with progress in genetics, there are folks questioning death too. I don’t have an answer, but I do think it’s fair to question our tax system and how the money gets spent when technology could make the flow of money more impactful, and provably so.

If you are cynical, I get it. However, while technology is nothing without people changing behavior, having safe and trusted options can open the door.

Amazon: from cloud computing to cloud forest

Cloud computing. A great idea, unlocking new markets, new opportunities for internet startups to have access to computing scale and power. Head in the clouds? Like clean electric cars, the electricity still comes from somewhere. Maybe the cloud isn’t puffy and white – it might just be black.

At a recent O’Reilly Ignite Boston, Tim O’Reilly gave the company spiel, mixed with a little extra enthusiasm and praise for technologists – a population thought of as family at O’Reilly, if not flock. The latter half of the talk uncovered the motivation for the emotion. Reminding me of Dennis Hopper Californian dramatics, he pleaded to do something that mattered. What mattered? The environment and education. Work on that, do something that matters.

Mr Big O. recounted ( or perhaps therapeutically re-lived what could be interpreted as post traumatic stress disorder) a meeting with the chief researcher for still partly secret International Report on Climate Change of a UN agency. Tim’s question was on humanity’s chances of surviving. The answer given: “we’re fucked”. Don’t trust the UN? So thinks the pentagon too.

Even if climate change isn’t “real” – the game still has to be played out because just maybe we are heading to the land of FAIL. Yes, Fuckdom. Not fuckdom like, “hey, I like to scare people”, but fuckdom like inheriting the worst code you’ve ever seen which depends on closed source. There’s a better chance of climate change being a big problem than you’ll ever succeed significantly in a start up. Personally, I want to maintain the legacy app called Earth….

Well, so speaking of startups, try out CO2 Stats. Simply place a widget like Google’s analytics on your site. measure C02 emissions based on available data about the servers and the clients (that’s you). Optionally you can have them automatically purchase carbon offsets for you, or, as in the case of this site – advertisers pay for your … gasses.  

And if you don’t like C02stats – do you have a better idea?

Review of Clayton Christensen’s book Disrupting Class

Disrupting Class was an excellent resource in providing a technology and business vocabulary which is applicable to the deep challenges facing public education today.

To be honest, my view of education is emotive as it  represents so many formative years of my life. I can not claim to bring objectivity to the dialogue. A relative writing “Freedom to Learn”, as well as my mother being a retired special education teacher … I’m biased.

But it is as a technologist, sitting on the cusp of big change, that I can read with sense of calling – knowing that as a CTO of Better Lesson I’m privileged to be in the kind of position coveted by catalysts of renaissance, and admirers of diversity – where once there was only one word – philosophy. Besides the need to avenge my childhood (where I was to have skipped two grades, but also had the diplomacy of a Tasmanian Devil combined with winning the award for “Teachers Pest”, an award created just for me if I remember), maybe now I can make a lot of folks happy.

This is the kind of book that makes me happy. Happy to me is a wholestic thing. I want to make a lot of money without others suffering. I want to pursue knowledge that sustainably changes the world.

Introspection on setting and striving for goals.

When you work on something really hard for really long time – you have learned a lot. But, you might not be achieving the goals you set out to achieve. Something near the end is holding you back from success; invisible and powerful.

Over time, it’s easy to loose track of your goals. To more accurately describe that process for me, I pursue my goals stubbornly, with each subgoal towards achieving a larger goal becoming its own journey. I enjoy the journey, and my ethics, my principles and imagination keep me oriented in that process.

Gandhi said “The means is indivisible from the ends”. This is one of my favorite sayings. I do not believe the ends justifies the means.

Sometimes I’ll make my goals too high intentionally, so I can learn more, push harder. But pursuing success does eventually create a point of intersection between your knowledge and your abilities. That maturation point can happen at a lot of different stages.

Sticking with your goal may will lead you to the familiar difficult point at the end, when everything gets really difficult. Then, all of a sudden you realize its only one thing holding you back, and it only requires you to admit you were wrong about something.

When building a chain of interdependent tasks, there will inevitably be a weak or unfinsihed link. A chain with 10,000 links breaks with one weak link, and it seems so hard to fix because it is hard to find. That’s why knowing and admitting when you are wrong is so valuable, as is getting rid of what you don’t really need in life. After this general milestone is passed, things get easier….