Generic JDBC Queries on EMR Zeppelin

The EMR (Elastic Map Reduce) service on Amazon has some nice packages that come pre-installed, and one of them is Apache Zeppelin, which is a Jupyter Notebook interface for Spark. Zeppelin has interpreters for spark, pyspark, spark-sql and others, but if you want to run spark-sql code on a PostgreSQL database, you need first to install the JDBC interpreter and add some extra configuration to Zeppelin. The JDBC adapter supports a wide variety of database engines, and it allows you to configure multiple database connections, which makes data exploration much easier....

April 3, 2019 · 5 min · Thiago Araujo

Fast ElasticSearch Indexing with Apache Spark on EMR (overview)

I’ve been building the data infrastructure for a project and I needed to efficiently query, merge, process and clean terabytes of structured data and then index hundreds of millions of documents on elasticsearch. The problem is that querying and joining data on a RDBMS like Postgres is very painful when you have more than low terabytes of data. You’re going to spend a huge amount of time tuning your database, reading query plans, adding indexes, sharding, and slowly moving data around until you have something decent that take hours, days, maybe weeks to run....

September 6, 2018 · 4 min · Thiago Araujo

Machinations

An interesting quote about the unexpected effects of actions and the limitations of rationality and the planning fallacy: A net set up to catch fish may snare a duck; a mantis hunting an insect may itself be set upon by a sparrow. Machinations are hidden within machinations; changes arise beyond changes. So how can wit and cleverness be relied upon? – Back to Beginnings, Reflections on the Tao by Huanchu Daoren, translated by Thomas Cleary....

September 6, 2018 · 1 min · Thiago Araujo
xkcd: The Strong Collatz Conjecture states that this holds for any set of obsessively-hand-applied rules

Collatz Conjecture

The Collatz Conjecture is a simple mathematical problem that still has no formal proof. So it’s an open problem. This is how it works: Choose any positive integer x. While x > 0, do: if x is even, divide it by 2 (x = x/2) if x is odd, multiply it by 3 and add 1 (x = 3x + 1) The Collatz Conjecture states that no matter what value of x you start with, the sequence will always reach x = 1....

October 15, 2015 · 2 min · Thiago Araujo

This is a little story about my first super nintendo console, and how I became a smuggler (of videogames) just to get what I wanted. I wanted to share this fun and short story to let members of the group know a bit more about myself, my family, and my love for computers and videogames. When I was 7 years old in 1992, I had a simple dream: I wanted a Super Nintendo console for my birthday....

4 min · Thiago Araujo