@DXWorldExpo Authors: Zakia Bouachraoui, Liz McMillan, Pat Romanski, Elizabeth White, Carmen Gonzalez

Blog Feed Post

Cloudera and Platfora Leveraged to Address Hard Challenge: What do “they” know about my network?


Editor’s note: This guest post by Wayne Wheeles focuses on a topic I’ve struggled with for over a 15 years and shows great promise in addressing challenges no one else has tackled.  Wayne is a Network Forensics Analytic/Enrichment Developer at Six3 Systems. – bg

For a decade now, many Network Forensics Analysts, Network Security Engineers, and Cyber security Professionals have pondered that most interesting of questions:  What do “they” know about my network? From time to time over the years, discussions related to determining what external entities may know about determining the attack surface of a network occur and then fizzle out.  Often, organizations collect and store a great deal of data to piece together a defensive view of a network but do not piece together what external entities know about or have shown interest in on the same network. Big Data offers the potential to evaluate this question in ways that were unimaginable just five years ago. New technologies and techniques enable organizations to evaluate the question of what is the known attack surface of my network.  I addressed this question head-on using a variety of cyber security data sets, enrichment techniques, Cloudera CDH 4 (Hadoop distribution), and Platfora: a relative newcomer that is one of the most powerful tools I have worked with in some time.

In this day and age it is amazing how little is known about what activities are occurring on our networks.  The “they” alluded to earlier in the blog is used to describe external entities which engage in scanning and network mapping, seeking to learn more about all aspects of a target network: what devices reside on the network, what ports are open, and identify potential avenues for exploitation. This scanning occurs at a scale that is almost unimaginable and often goes unnoticed. For those who have the question: So is this network scanning common? On the working data used for this article set, I determined that over 4000 large-scale scans of the target network occurred each year, originating from at least 95 countries worldwide.

As always, the real story is told through the data; using netflow data, port and geographic enrichment. In order to more effectively share the tale at scale, we worked with Platfora to explore and visualize the data.  The screen shot below is of the Platfora Data Catalog, which makes it easy to look at all of the available data sets available in the cluster. The data catalog provides the instrument for defining data sets and relationships between different classes of data within the cluster.


Next, using Platfora we loaded a series of derivative data sets which captured all of the major scans on the network during 2013 into the Platfora Data Catalog. From the Platfora Data Catalog, we generated a series of lenses or views of the data. When creating Lenses, Platfora provides a wide range of functions, operators and aggregates for working with data which are really helpful in generating visualizations in this blog.

Platfora provided a wide range of capabilities for preparing the data for analysis which considerably reduced data preparation time. After completing the preparation of the data, the emphasis shifted to developing and understanding the data using a variety of visualization techniques. In the Platfora VizBoard below, of interest was not the fact that high ports (x-axis) were scanned, but rather the number of times (indicated by color of bars) that they were scanned by the same source IP address (y-axis).  Each of the source IP addresses in the set below scanned ports of the targeted network over 1000 times in a 90-day timeframe.


The heat map above depicts the fact that not only did the source IP addresses (y-axis) scan large numbers of destination ports (x-axis) on the target network but in many instances returned between four and six times to the same port during the observation period.  When building the data sets, references were defined, defining the relationships between different types of data resident in the cluster.  In the graphic above, when port 61000 is highlighted, the netflow information which served as the base data set has been augmented with information from other data sets on: known exploits for a given port, Intrusion Detection Signatures information for a given port over time and information on Intrusion Detection Signatures for a given IP address.  Platfora was very useful for “following where the data will lead”, enabling the analyst to pivot in the direction with all details on a port or IP address, bytes, packets, and generate new derivative lenses with two clicks of a button.

In review, what do “they” know about my network? Based on the analysis of the set of aforementioned actors above, the following observations were made: over 300 scans a month occurred, roughly 4000 (sweeping scans covering a large number of ports) large scans occurred each year, in all over 22,500 ports were probed and of those no less than twelve ports were revisited up to ten times.  Based on the analysis using Platfora, several areas were identified for additional investigation and recommendations made to improve the overall network security posture.

In order to put this article together, a four-node Hadoop cluster built using Cloudera CDH 4, IBM Pure Data for Analytics 2001 and Platfora’s exploratory BI tool for Hadoop.

Based on what I had read previously my view of Platfora was that it was just a visualization package but to my surprise it turned out to be a complete end-to-end data integration and visualization platform fully integrated with Hadoop and Hive.

Finally, I would like to thank two contributors: Keith McClellan and Six3 Systems for helping me pull this off and Bob Gourley (CTO Vision) for posting my blog.

Read the original blog entry...

More Stories By Bob Gourley

Bob Gourley writes on enterprise IT. He is a founder of Crucial Point and publisher of CTOvision.com

DXWorldEXPO Digital Transformation Stories
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
DSR is a supplier of project management, consultancy services and IT solutions that increase effectiveness of a company's operations in the production sector. The company combines in-depth knowledge of international companies with expert knowledge utilising IT tools that support manufacturing and distribution processes. DSR ensures optimization and integration of internal processes which is necessary for companies to grow rapidly. The rapid growth is possible thanks, to specialized services an...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Cloud-Native thinking and Serverless Computing are now the norm in financial services, manufacturing, telco, healthcare, transportation, energy, media, entertainment, retail and other consumer industries, as well as the public sector. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that pro...