View Printable Version

F09 - Spark Analytics for Database Professionals

Session Number: 3266
Track: Ala Carte I
Session Type: Podium Presentation
Primary Presenter: Anjali Khatri [IBM]

Pecos => Wed, May 25, 2016 (08:00 AM - 09:00 AM)

Speaker Bio: Anjali Khatri is a client technical professional in IBM's Big Data & Analytics group, focusing on specific products such as IBM's Hadoop solution called BigInsights and Real Time Analytics solution called Streams. From a pre-sales perspective, for the last 2 years she's worked on product overview, implementation, and enablement with customers, vendors and business partners to present IBM's Analytics initiative. Previously, she was an IBM consultant focused on application performance and network management for customers in specific industries of: media, retail, insurance, financial etc.
Audience experience level: Intermediate
Presentation Category: Emerging Technology
Presentation Platform: Cross Platform
Audiences this presentation will apply to: Application Developers, Data Architects, Database Administrators, Systems Programmers, New Users, IT Managers
Technical areas this presentation will apply to: User Experiences
If Tools and Utilities was selected which products: Apache Spark and Hadoop Platform
Objective 1: Introduce Apache Spark technology and Hadoop platform
Objective 2: Introduce analytics over multiple data formats (structured/unstructured) and bridge the knowledge gap of database professionals
Co-speaker Name: Maria N Schwenger
Co-speaker Bio: Maria N. Schwenger is a Program Manager in IBM Innovations, IBM Watson Group, where she works on the strategy and framework architectures of the IBM Watson cognitive technologies. For the last 7 years Maria has been working on the leading edge of the most innovative IBM technologies such as the DB2 SQL compatibility feature, DB2 pureScale, and in the PureSystems family. She is an author of US and international publications and patents on innovative IBM technologies like DB2 BLU and IBM BigInsights. Maria sees the big data and the cognitive analytics space as the next big area for innovative development under IBM’s leadership.

Abstract:  Apache Spark is an open source parallel processing framework for large-scale data analytics that runs across large compute clusters. Spark became a top-level project of the Apache Software Foundation in 2014, with the release of version 1.0 of Apache Spark in May 2014. Spark provides insight of data with faster time-to-value because the querying of large volumes of real-time or archived data is done in-memory. In this session we will outline the most common use cases of Spark and compare them to another popular platform – the Hadoop space as an Open Data Platform. We will also compare the different ways Spark applications can be written in either Python or Scala and provide a comparative analysis amongst the two programming languages for several Spark extensions such as: SQL, machine learning and streaming analysis.

For questions or concerns about your event registration, please contact