Soliciting Advice: highly concurrent, available, non-blocking server

I’m seeking feedback on a language or platform for a highly reliable and low latency web service / application.

Assumption

Bottlenecks in a web service are usually related to data retrieval and storage, and eventually bandwidth and latency. 
Highly concurrent, lightweight threads provide options for reliability, load distribution, and perceived performance that would otherwise not be available. 

General Requirements 

  • Easy to use (build, deploy, monitor)
  • Plentiful external, stable, pre-integrated Libraries
  • Use case: distributed, non-blocking web services
  • Quite a bit of message and job queueing 
  • multiple databases , caching

Top candidates

Very incomplete list of pros and cons, but some of my thoughts, highlighted. 

  • Scala
    • pros
    • cons
      • new language syntax, paradigm learning curve
      • doubts about JVM memory efficiency and stability as resources are constrained
  • Server Side Javascript – via  node.js  
    • pros
      • redis integration for caching
      • fast, lightweight, easy language. 
      • some custom js would be portable to browsers (coolness)
    • cons
      • very new
      • performance
      • not as many external libs?
  • Tornado
    • pros
      • Python
        • as many libs as Scala
    • cons
      • narrower use cases
      • performance

 Would love comments, but a more complete list is presented in a survey:

Field Collapsing in SOLR

Field Collapsing
Field collapsing allows something akin to a “group by” in SOLR, so that the number of results returned reflect a logical grouping rather than another total.??
Faceting can be used in conjunction. Facet counts reflect subsets within results, where-as collapse counts are group by counts.
This means that Field Collapsing could be used for certain analytics, as well as the common use-case of nesting and grouping results. To use effectively, I found it helpful to “pre-collapse” certain fields, so that a new, unique string was created that could be used to easily group, since I believe you can only field collapse on a single field. (If I’m wrong, please let me know!)
Special Setup
This assumes you are or will be running a development version of SOLR (trunk via SVN).
Field collapsing is not yet available in SOLR trunk and you must apply a patch file to SOLR and build again.
If pulling in from trunk, download the patch found at https://issues.apache.org/jira/browse/SOLR-236 in to your solr source code directory.

wget https://issues.apache.org/jira/secure/attachment/12440108/SOLR-236-trunk.patch
patch -p 1 -i SOLR-236-trunk.patch

And rebuild using supplied Apache Ant scripts.
Read more on these sites or in the comprehensive “Solr 1.4, Enterprise Search Server” page 191.
Why SOLR
SOLR has a been a great tool for BetterLesson.org. Because our primary database is MySQL, we looked at around 8 full-text indexers – but the two finalists were Sphinx [1] and SOLR. Sphinx had very tight integration with MySQL, so the learning curve seemed less. ??SOLR required a JVM, an app server, and quite a lot of configuration.
When we were deciding, an excellent SOLR book came out just when we were choosing. Further the SOLR IRC channel and mailing list for SOLR are friendly and quite active. We even had the option for commercial support through Massachusetts’ own Lucid Imagination. So I dove in.??While the configuration is non-trivial, but the configuration parameters have proven very powerful.??
More background:
I had written a half-dozen or so custom faceted search interfaces – almost entirely using MySQL, and even one used Sesame (an RDF store – and it eventually worked pretty well). Skipping the stories of pain, confusion and suffering on the road to enlightenment – SOLR has been great.??Used extensively at Netflix.com, Zappos.com, CitySearch.com, Reddit.com, Wego.com, Whitehouse.gov, Drupalgardens.com and others [2], ??supported by Apache, based on Lucene, SOLR provides a scalable, distributed search and has good data import from MySQL, including delta queries.
[1] – Sphinx is used by Craigslist and http://www.sphinxsearch.com/powered.html

Why I’m starting a personal tech blog

Note – I’ve merged a couple of blogs and this post was from another site.
Welcome.

 

Supercalafragilisticexpialadocio.us is for technologists, like myself, who enjoy the challenges that come with taking interesting ideas and bringing them to a wide audience.
Currently I’m the CTO of a “lean” startup – http://betterlesson.org.

 

The work I’ll be posting is done in the context of startups. Startups are great. The definition l’ll use of “startup” is informal. By my definition, a startup is the adventure of working to make a sufficiently new idea self sustaining.

 

Building a laboratory and having a fair shot at a large audience is no longer a barrier to entry.** Often, technologist’s education is almost entirely from peers on the Internet – piecing together technology that yesterday may have been out of reach.  With low barriers to entry comes a tremendous amount of competition.  What differentiates is not only technical skill, but ability to successfully navigate the technology world. Propaganda, pain, tunnel vision, and joy dot the fast-changing technology blogosphere. Execution is tough when the more ambitious ideas you have would be obsolete by the time you finish them. When do you keep exploring? When do you settle on a technology and dive in?

 

I plan to dive into technology specifics, so this post is to set the tone. If I stray from my own ideas, tell me. I need your help.

 

  • Technology alone is not enough:
    • Make sure you love something you consider to be not technology. Loving code and assembling lego blocks works for some of the best – but for mortals, having other interests can serve as a compass to guide the “why” you do what to do.
    • Courage is not only in tackling cutting edge technologies, but also in shepherding the transition to new technology for the larger world we serve. Yes, I believe that world domination is a goal of every startup. But domination means we want to serve as many people possible, and do it well. “Don’t be evil” is a mantra I reflect on in this context.
  • Be open:
    • Isolation is dangerous. Open source, open communities, and open style of development keep you healthy. Some “enemy” may occasionally out-calcuate you because you showed your hand. But you’ll lose a few games of poker in order to win the innovation game. Chances are if your idea is something some one _can_ steal simply by sharing the idea with them, then some one else is going independently invent it anyway.  Don’t live in fear that some one might be doing something newer or better.
    • Your reputation is important, but if some one can trash your idea easily it’s either because the idea is actually bad, or they are simply wrong.
Looking forward to sharing the adventure.

 

Cheers,
Jonathan

 

** at least with sufficiently modern computer and a high speed internet.