{"id":9683,"date":"2012-09-19T08:00:31","date_gmt":"2012-09-19T12:00:31","guid":{"rendered":"http:\/\/blog.shareaholic.com\/?p=9683"},"modified":"2014-10-06T11:40:15","modified_gmt":"2014-10-06T15:40:15","slug":"shareaholic-big-data-visualization","status":"publish","type":"post","link":"https:\/\/www.shareaholic.com\/blog\/shareaholic-big-data-visualization\/","title":{"rendered":"[Tech] How to Visualize Big Data"},"content":{"rendered":"<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-9721 alignright\" title=\"Shareaholic Data Visualization\" src=\"https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/IMAG0213-1024x612.jpg\" alt=\"Shareaholic Data Visualization\" width=\"311\" height=\"186\" srcset=\"https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/IMAG0213-1024x612.jpg 1024w, https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/IMAG0213-669x400.jpg 669w, https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/IMAG0213-300x179.jpg 300w, https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/IMAG0213-768x459.jpg 768w, https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/IMAG0213-400x239.jpg 400w\" sizes=\"(max-width: 311px) 100vw, 311px\" \/><\/p>\n<p style=\"text-align: left;\">With a publisher network of 200,000 websites who reach 300 million people every month, Shareaholic\u00a0has access to\u00a0<em>quite<\/em> a bit of data. You&#8217;ve read about it before in our <a href=\"https:\/\/www.shareaholic.com\/blog\/category\/data-reports\">previous data reports<\/a>, but it&#8217;s sometimes hard to grasp the full magnitude of our data set without seeing it in action.<\/p>\n<p>As we are planning to go to the <a href=\"http:\/\/career-fair.mit.edu\/index.php\">MIT Career Fair<\/a> later this week (psst&#8230;<a href=\"https:\/\/www.shareaholic.com\/careers\">we&#8217;re hiring<\/a>), we thought it would be a great chance to give you all a glimpse into the world of big data by creating a visualization of our own. Check it out below along with a behind-the-scenes\u00a0look at the data visualization process from our\u00a0Data Scientist,\u00a0<a href=\"https:\/\/www.shareaholic.com\/blog\/author\/jkibe\/\">Joseph Kibe<\/a>:<\/p>\n<h3>What does the data visualization show?<\/h3>\n<p style=\"text-align: center;\"><img decoding=\"async\" loading=\"lazy\" class=\" wp-image-9731 aligncenter\" title=\"Data visualization\" src=\"https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/Data-visualization1-1024x574.png\" alt=\"\" width=\"535\" height=\"300\" srcset=\"https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/Data-visualization1-1024x574.png 1024w, https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/Data-visualization1-700x392.png 700w, https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/Data-visualization1-300x168.png 300w, https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/Data-visualization1-768x431.png 768w, https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/Data-visualization1-400x224.png 400w, https:\/\/www.shareaholic.com\/blog\/wp-content\/uploads\/2012\/09\/Data-visualization1.png 1871w\" sizes=\"(max-width: 535px) 100vw, 535px\" \/><\/p>\n<blockquote><p>A few weeks ago, our team tasked me with a project: whip up something interesting to show the fresh-faced folks visiting our booth at the <a href=\"http:\/\/career-fair.mit.edu\/index.php\">MIT Career Fair<\/a> for this week. We threw ideas back and forth until settling on something fairly simple: a data visualization summarizing the pageview data we receive.<\/p>\n<p>The result allows us to observe some of the state of our pageview data stream. We see, for a given window, the total pageviews processed; the velocity of pageviews over time; the density of pageviews by country; and a few dozen of the top-viewed pages across our network.<\/p><\/blockquote>\n<h3>How does the data visualization work?<\/h3>\n<blockquote><p>Unfortunately, it wasn&#8217;t feasible to hook my data visualization into our raw data stream to do a truly real-time visualization \u2014\u00a0that&#8217;s a topic for another post \u2014\u00a0but I was able to do the next best thing: stream in the raw data logs we store in S3, which weigh in at roughly a gigabyte per hour, uncompressed.<\/p>\n<p>Both the streamer \u2014 the component that reads the log files \u2014\u00a0and the visualizer, which accepts the data and presents it, are written in Python. The former component is really quite simple: a script that opens, reads and parses the log file data, and then pushes it to the visualizer.<\/p>\n<p>The visualizer is a bit more complicated. It uses the Tornado Web framework, which provides built-in support for Web sockets to &#8220;push&#8221; data from the visualizer server to any client browsers rendering the visualization. At one end, it accepts the pageview data as it is streamed in. The summary data is simply stored in memory, though I use count-min sketching to track the top viewed pages. The visualization layer is all done via HTML, JavaScript and CSS with a lot of help from the excellent d3.js framework for working with data-driven documents.<\/p>\n<p>All of this went quite smoothly until I decided to make the new data load with some degree of elegance in the user-facing HTML, CSS, JavaScript amalgam. Doing a bit of design to present the data was not especially problematic. My many hours spent perusing art museums has done me some good. Making new data enter the page properly, by contrast, proved a relative challenge. Perhaps because it was so much more unpleasant, whether due to inexperience or by virtue of some intrinsic property of doing animations, the animation bit felt as though it took 90% of the time I spent on the data visualization.<\/p>\n<p>But in the end, I pulled it out, and we wound up with something neat to show off the growing stream of data we process.<\/p><\/blockquote>\n<p>If you&#8217;re an MIT student, come chat big data with<a href=\"https:\/\/www.shareaholic.com\/about\/team\"> Joseph Kibe<\/a>, our Data Scientist, and <a href=\"https:\/\/www.shareaholic.com\/about\/team\">Robby Grossman<\/a>, our Tech Lead, on Friday, September 21 at the <a href=\"https:\/\/www.shareaholic.com\/careers\">MIT Career Fair<\/a>. We&#8217;d love to meet you!<\/p>\n<div class=\"callout\">\n<div class=\"logo\">Shareaholic<\/div>\n<p>Intrigued by big data? We&#8217;re always looking for talented people to join our team! <a href=\"https:\/\/www.shareaholic.com\/careers?utm_campaign=Shareaholic_Blog&amp;utm_source=Blog_CTA&amp;utm_medium=Bottom_CTA&amp;utm_content=shareaholic_big_data_visualization\">Click here to see our latest opportunities.<\/a><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>With a publisher network of 200,000 websites who reach 300 million people every month, Shareaholic\u00a0has access to\u00a0quite a bit of data. You&#8217;ve read about it&hellip;&nbsp;<br \/><a class=\"continue-reading\" href=\"https:\/\/www.shareaholic.com\/blog\/shareaholic-big-data-visualization\/\">Continue Reading<\/a><\/p>\n","protected":false},"author":40,"featured_media":9756,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[525],"tags":[528,529],"_links":{"self":[{"href":"https:\/\/www.shareaholic.com\/blog\/wp-json\/wp\/v2\/posts\/9683"}],"collection":[{"href":"https:\/\/www.shareaholic.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.shareaholic.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.shareaholic.com\/blog\/wp-json\/wp\/v2\/users\/40"}],"replies":[{"embeddable":true,"href":"https:\/\/www.shareaholic.com\/blog\/wp-json\/wp\/v2\/comments?post=9683"}],"version-history":[{"count":0,"href":"https:\/\/www.shareaholic.com\/blog\/wp-json\/wp\/v2\/posts\/9683\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.shareaholic.com\/blog\/wp-json\/wp\/v2\/media\/9756"}],"wp:attachment":[{"href":"https:\/\/www.shareaholic.com\/blog\/wp-json\/wp\/v2\/media?parent=9683"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.shareaholic.com\/blog\/wp-json\/wp\/v2\/categories?post=9683"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.shareaholic.com\/blog\/wp-json\/wp\/v2\/tags?post=9683"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}