FlumeBase at OSCON Data

I’m pleased to announce that OSCON Data has accepted a talk submission about FlumeBase!

I’ll be presenting FlumeBase, its motivation, design and implementation, and some example use cases at the OSCON Data conference (collocated with the main OSCON event) on Monday, July 25th.

Session information:

  • Title: Real-time Streaming Analysis for Hadoop and Flume
  • Date/Time: 07/25/2011  3:30pm –  4:10pm PDT
  • Room: C124

Here’s your chance to learn about FlumeBase in person, talk about use cases, needed features, and understand where we sit in the broader “big data ecosystem”–which will be well-represented by the many other talks in the conference.

If you’re interested, learn more and register for OSCON Data on their site at http://www.oscon.com/oscon2011/public/content/data. The conference runs from July 25–27th in Portland, Oregon.

See you there!

Posted in Uncategorized | Leave a comment

FlumeBase 0.2.0 Released

Today I’m pleased to announce a new FlumeBase release, version 0.2.0. This version includes a substantial number of new features and bugfixes as compared with the first FlumeBase release.

This article provides some details on the new features we’ve added. Continue reading

Posted in Uncategorized | Leave a comment

Introducing FlumeBase

Welcome to FlumeBase.org!

With this post, we’re pleased to announce the first release of a new open source tool called FlumeBase. FlumeBase is a SQL-based streaming data analysis platform built on top of Flume — Cloudera’s open source data ingestion engine for Hadoop.

Why did we build this? We use a lot of Hadoop, and talk with a lot of other Hadoop users. Just getting data into Hadoop is one of the biggest challenges facing users. Flume has come a long way toward making this process straightforward. But as soon as you’ve got streams of data entering Hadoop, you’ll want to process this data even faster. MapReduce batch processing can handle a lot of analysis, but sometimes you need results in seconds–or less. To bridge the gap, we’ve seen a lot of custom solutions out there, but we think that Flume provides the foundation for a general-purpose tool for filtering, ETL, alerting, and more operations on-the-fly.

Like Flume, FlumeBase is licensed under the Apache 2.0 software license: it’s free to use and to extend. We believe this constitutes a core piece of infrastructure in the emerging big data ecosystem, and think that the community as a whole will benefit from the project — and that the project will benefit from involvement by the broader community.

The first release of FlumeBase is definitely “beta” software. We’ve been working on this for a few months, and are pretty proud of the result — but there’s still a lot more work to be done. If this sounds interesting to you, hop on over to github and grab a copy of the source code.

Over the coming weeks, you can expect more blog posts to follow. We’ll show you some tips and tricks, explore use cases, and demonstrate how to integrate FlumeBase into your data analysis pipeline.

Ready to get started?

Posted in Uncategorized | Leave a comment