Video Transcript


Dataclips Power Insights at Heroku

Every organization needs to be data-driven in order to be successful. Whether you're tracking an application's performance, incoming support tickets, or revenue rates, different components of any company depend on metrics that inform the health of the business.

At Heroku, we're hackers to the core, but that doesn't mean we're all programmers. We build on top of our own platform for everything we do, and one of the products we often use is Heroku Dataclips. If you haven't heard of them before, Heroku Dataclips allow you to create SQL queries in a web GUI that run on your Heroku Postgres database. The unique dataclip URL can then be shared internally or externally, and the results are even accessible through an HTTP endpoint that returns data in either CSV or JSON.

An image showing different views of the Heroku Dataclips GUI
Web GUI for Heroku Dataclips

We gathered a few stories from Heroku employees across various departments that show just how powerful these queries can be.

The easiest way to see what matters

With all of your data already stored in a production database, you have everything you need to start extracting useful insights into how your users are interacting with your application. Traditionally, pulling this information out has been restricted to individuals with access to these systems—namely, engineers or DBAs. If you need to find out how many new users created accounts, grouped by the month, and you're part of a department without production access, you're stuck asking someone else to gather that information for you.

We've seen first hand that one of the benefits of Dataclips has been to reduce bottlenecks and democratize access to this information. If that same SELECT query is stored as a dataclip, the person asking for the information can get at fresh data any time they want. Individuals can grab the unique URL generated by a dataclip and share it through email or Slack; rather than repeatedly asking for updates, they’re empowered to get the data themselves.

Even better, every dataclip also exposes a URL that represents the resulting data set in JSON format. You can choose to create a simple, lightweight dashboard app (as some of our customers have done) that periodically pings this endpoint to draw charts that visualize the live data.

Identifying product usage

Product managers ask a common question when building a new feature: "how many people are using it?" When we launched the Heroku Button, a one-click process to deploy an application directly from an open-source project, we wanted to be able to track its usage. With each new button generated, we knew how many were being created, because new rows were generated in the database. And, since clicking the button created a new row in our deployments table, we were able to distinguish precisely which came from the CLI and which came from the button. Our product and marketing teams could then take that generic SELECT and create their own analytics. In fact, our Heroku Buttons page is powered by a dataclip; we’re able to automate this presentation by periodically fetching JSON data from a dataclip URL.

Another opportunity for Dataclips has been to track the use of deprecated features. For example, we recently began the end-of-life window for Cedar-14. Our team was easily able to query all the applications still relying on this outdated stack and generate an email campaign that notified all of our users of its imminent sunset. Doing this with a tool that didn’t have access to our production database would require emailing a list around, and would likely fall woefully out of date.

With the transparency around production data available through a dataclip, we've also been better able to investigate and triage suspicious behavior. For example, if a massive wave of signups is being generated from the same location, we have a reasonably high certainty that it's a pattern of abuse. Engineers can write queries that pinpoint the sources of these bursts of activity and share the results in a Slack room. Being able to drop a link that fosters a transparent decision-making process and makes for a quick turn around to solving a problem, as multiple people are able to assess and discuss it. It's much more nerve-wracking to hop into a production database and run a specific query to drop a banhammer without first getting several validating approvals from your colleagues.

Teaching the entire organization

Being a data-driven organization means that everyone in the organization should be able to get the information that they need. We recognize that while dataclips grant accessibility, there's still a massive barrier to using them in the form of a three letter acronym that can conjure dread in even the most senior developer: SQL.

To help individuals become self-sufficient, our Business Operations team hosts an Office Hours session designed to teach Herokai about the fundamentals of writing a query using SQL and our internal data schema. Often, someone will join a session with a single question—"How can I get a list of all the add-ons that a customer is using?"—and inevitably, this leads to another one—"Can you get me a list of all the add-ons from Customer Y, too?" Rather than deliver one-off answers, it's much better to teach people how they can craft a query on their own. This has a network effect of spreading knowledge. It also gives them a new technical skill that they can use forever. We are answering ten future generations of questions by teaching the basics of writing that first query.

How it operates

The Heroku platform that we build on is exactly the same that's available for our users. Here's a little insight into how we've configured our dataclips to support querying all of our data safely.

Queries from Dataclips can run on any Heroku Postgres database available to your organization (except Shield Tier plans). But bad SQL queries can lock up a table and potentially impact production performance. To mitigate this, we instead run all of our queries against a Postgres follower database. A follower database is a read-only replica of your main, or leader, database. Writes made to the leader database propagate to any follower databases in close to real-time. This means that any dataclip query can be safely made on a database with production data without any impact on an application's systems.

People can be given permission to read from this follower database, while the leader database is kept restricted to individuals for whom this access is necessary for their job.

Finding out more about Dataclips

We've only scratched the surface of what Dataclips can do—did you know you can embed them directly into Google Sheets, too?

To learn more about Heroku Dataclips, check out our article on Dev Center. We also recorded two episodes recently about Dataclips on Code[ish], our podcast exploring technology and the lives of modern developers. The first one is about how Kajabi uses Dataclips by integrating results into their Slack chatbot, and the second is an interview with Becky Jaimes, the product manager for Heroku Dataclips, on recent improvements and the future of Dataclips. Be sure to give those a listen!

Originally published: July 15, 2019

Browse the archives for engineering or all blogs Subscribe to the RSS feed for engineering or all blogs.