Knowledge Discovery: Using Infinit.e to Explore Traffic Data

Aug 06, 2012
Andrew Strite

Case Study

Transportation congestion costs the American people $200 billion a year, according to the U.S. Department of Transportation (link: http://www.dot.gov/stratplan2011/redcong.htm). While trying to improve a system as complex as our nation’s transportation infrastructure, it is almost too easy to get bogged down in data while trying to understand the local conditions affecting traffic in a given region. In a time when “state budgets are stretched thin, and gasoline taxes are becoming untenable as long-term sources of funding,” individuals charged with improving our nation’s infrastructure cannot afford to make ill-informed decisions regarding costly investments in long-term infrastructure projects.

To tackle this problem, we used Infinit.e to ingest a sample of historical traffic data for New York from August and September 2011, when Hurricane Irene hit the East Coast. Infinit.e is an open source analytic platform capable of ingesting, analyzing, and visualizing data from a variety of unstructured and structured formats. Infinit.e does not just put tools in the hands analysts and decision makers; it puts the right tools in their hands. The platform drives the knowledge discovery process, allowing users to isolate relevant information and identify trends and patterns. Most importantly, the flexible framework also helps validate those insights about what forces are affecting traffic in specific regions.

Methodology

For this study, we took over 21,000 New York traffic records from August to September 2011 and ingested them into Infinit.e for document analysis and unstructured and structured data harvesting. Each event identified in the documents was categorized both as an entity (the physical event) and an association (the event affecting traffic). These associations were tied directly to other extracted entities, like the roads in the area. Other sources can be similarly processed to fuse multiple data sources using a single unified data model (e.g. a road from one source will be the same road in another source). These entities and associations let us perform complex queries on the data and visualize it across multiple dimensions.

 

 

Visualization & Analysis

Traffic data naturally lends itself to geographic visualization, so we started there by populating the traffic events across the New York area as a heat map. Red areas showed the highest concentrations of events, quickly highlighting clusters of activity for additional analysis. Not surprisingly, this high level view illustrated that the highest frequency of traffic events occurred around population centers during the reporting period. The default implementation of the heat map only considers the number of events in an area and not their severity, but could have been modified to measure different dimensions thanks to Infinit.e’s flexible visualizations suite.

 

 

 

 

 

 

 

 

 

We refined our results further Infinit.e’s query builder. By adding additional dimensions to the query individual users or groups can query for data specifically useful and relevant for their specific research interests. For this example, we decided to focus on and only show results around Albany, NY. We added a geographic boundary via the graphical tools in the user interface, but this could also have been accomplished directly in the query bar.

Adding the geographic bound reduced the data returned from the query, enabling greater fidelity around Albany traffic incidents. The new heat map showed that the majority of the 339 traffic events in this region over a 2 month period affected I-87. Other visualizations provide additional ways to interpret the data. For instance, a query metrics visualization provided a breakdown of entities and associations returned the Albany query, such as the type of events found in the current records.

 

 

 

 

 

 

 

 

 

Next we refined the Albany query further by limiting the geographic boundary to only show, events around the intersection of I-87 and 1-90, near Albany, NY. At this level, it made more sense to look at discreet events than the heat map because there were only 42 events returned. Infinit.e’s default geographic visualizations allow users to view discreet events at any level, but filters many out at higher levels to keep the visualization uncluttered. The entity significance visualization revealed the most prominent event types in affecting traffic at the intersection of I-87 and I-90. In this case, road maintenance and accidents were the largest contributors, but a few other candidates for investigation were identified as well.

 

 

 

 

 

 

 

 

 

The fourth most frequent event affecting this localized region was fallen trees. We initially speculated that this could have been because of Hurricane Irene. As this study only focused on the two months affected by the storm, we could say whether this exceeded historical norms for the area. However, the ability to view the data across several visualizations and emphasizing specific dimensions let us get greater insight into the problem. We searched the entire data set for fallen trees events and then plotted those out over a timeline. Out of the 28 fallen tree events over the two months, 25 occurred on August 21 with the remaining 3 occurring on August 28. A separate web search showed that heavy thunderstorms affected the region on August 21, whereas Irene made landfall in New York on August 28. We expected that a hurricane would cause greater damage to trees, but the data suggests that either the August 21 storm was more damaging or that fallen tree events were underreported during Irene. This may have been due to more pressing concerns, but further analysis would be required to conclude one way or the other.

 

 

 

 

 

 

 

 

 

Infinit.e can also help automate the process of complex insight discovery with Apache Hadoop and map-reduce plugins. These powerful analytic tools allow users to track, aggregate, and analyze large data sets over time to discover meaningful trends. . Combined with the flexibility of Infinit.e’s visualization framework, users can easily query and explore data across multiple dimensions to actually discover useful knowledge that informs decision-making.

Conclusion

We only chased one vein in an otherwise complicated and robust traffic data set. Analyzing for every possible insight to understand and improve traffic congestion would be extremely time and resource intensive for any organization. Infinit.e solves these problems by offering a powerful, scalable, open source platform with API support that allows organizations to:

  • Fuse and store data from unstructured and unstructured sources in a semi-structured format that facilitates ease of retrieval and visualization;
  • Query data and segment the results in ways that make large data sets relevant to specific regions or interests;
  • Automate patterns and trend discovery with Hadoop and map-reduce technologies;
  • Validate insights to ensure they provide value to decision makers balancing competing interests.

To learn more about how Infinit.e can help your organization by visiting the IKANOW website, requesting a demo, or downloading the Infinit.e platform.

About the Author:

Andrew Strite – Solutions Architect

As a Solutions Architect, Andrew works directly with clients and IKANOW’s delivery team to create analytic solutions and manages the execution of the solutions from start to finish. Andrew has 6 years of experience in program management, strategic analysis, and requirements development. Before joining IKANOW, Andrew was a U.S. Air Force Intelligence Officer. Andrew holds a M.A. in Intelligence Studies from American Military University and a B.A. in History from the University of Delaware. Andrew describes his hobbies stating, “I’m an avid gamer; I especially love strategy games or those with open worlds. Lately, I’ve also been wedding planning with my fiancée.”

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>