SAP BusinessObjects Predictive Analysis is a statistical analysis and data mining solution that enables you to build predictive models and visually interact with the data to discover hidden insights and relationships in your data, and thereby provide the basis for making predictions about future events.
Recently i had my first exposure to SAP Predictive Analytics 1.0, I feel that predictive analytics whilst not new is certainly an area of Enterprise performance management applications that has not been exploited to its fullest potential. I was intrigued to understand more about this SAP product offering especially in conjunction with SAP HANA.
First impression of SAP Predictive analysis 1.0 is that it is an easy to use application; the tool itself is installed on a client machine (user laptop/PC), and like the SAP HANA studio is built on the Eclipse framework and has a similar feel. My prediction is that eventually this tool will be embedded into the SAP HANA Studio; we are at our limits to installing SAP client tools every time a new product is released.
The predictive analytics process is simple consisting of 4 key steps:
• Data acquisition (Data Readers)
• Data Preparation
• Algorithms (SAP, R or PAL)
• Data Writers
This first release of SAP Predictive analysis allows for the user to connect to SAP BOBJ universes (.unv only), several databases via JDBC, Excel, CSV and of course SAP HANA. I feel that BOBJ .unx universes should have been supported from the first version, unsure sure why this would be left out. Moreover i was very surprised not to see connections to SAP BW via OLAP BAPI , MDX or even BICS, this is surly something that will come with future releases.
Available ‘Data Readers’ in version 1.0 including the supported databases for JDC connections:
I noted that when the ‘HANA Reader’ is selected no other reader type can be inserted into the analysis flow, using the other readers it is possible to have multiple readers in one dataflow i.e. CSV, Universe and JDBC.
Depending on the ‘data reader’ selected, different preparation options will be made available, selecting the HANA reader will only present options for ‘filter’ and ‘sample’, whereas the other reader types give additions of ‘Data Type Definition’ and ‘Formula’:
The filter is particularly useful as many of the algorithms fail if zero values are present, I would recommend using the ‘filter’ to remove zero value records this certainly is useful when using the Regression based algorithms.
Algorithms (SAP, R or PAL)
As mentioned, I am interested in the HANA algorithms, so after selecting the HANA Reader I expected to see all the PAL algorithms displayed, i was bitterly disappointed. Only two of the possible seven PAL function are available, i was really hoping to see C4.5 Decision Tree algorithm, no such luck!
With this said I am positive SAP will incorporate more PAL functions in later releases, I hear rumours that SAP HANA SP4 will deliver many more in-built PAL functions and i do expect to see these incorporated.
I had a quick play with the HANA K-Means algorithm on a dataset with over 100 million records; I was pleasantly surprised as my result was returned in less than 2 seconds, which is fantastic:
However with only two PAL functions to play with I quickly became, let’s say, less than enthused. Also note that when using the HANA Reader to read the data it is currently not allowed to use SAP and R specific algorithms, bummer.
So I thought about using the JDBC reader to connect directly to a SAP HANA table and to see what standard SAP algorithms where available to play with. Great seven standard SAP algorithms to choose from:
I choose the ‘Nearest Neighbour Outlier’ on a data set of about 50K records from within SAP HANA, I expected the result set to be returned a little slower as the algorithm would be executed on my laptop and not on the SAP HANA database. I thought a couple of minutes would be sufficient for executing the predictive analytics process, so off I went to have a coffee. When I came back around 30 minutes later it was still running, i thought something had gone wrong but it said ‘running’. Finally after exactly 4170192 milliseconds (almost 70 minutes) the result came back, i was disappointed (again) as only a table of data and a summary page telling me it had found 10 outliers was returned, not impressed:
Not deterred by this I experimented with the other SAP algorithms and to be fair these all executed in under 15 seconds on the same dataset of 50K records, they also produced some nice graphical outputs for analysis.
I have not had a chance as yet to install the ‘R’ libraries; however reading the documentation the number of supported ‘R’ algorithms is limited. In totality SAP Predictive analytics provides approximately 20 predictive algorithms (R, SAP and PAL).
There is functionality to export your models as PMML or XML for use in other statistical applications there is however no functionality to export your predictive analysis as a PDF document or alike.
By using the ‘Data Writers’ you are able to push the calculated results set back into SAP HANA, other supported database or CSV file for further analysis and reporting. For example by pushing the calculated result set back into SAP HANA reporting and PDF printing can be achieved via Web Intelligence, Crystal reports or Advance Analysis.
Whilst this is a good feature, if for example we want to do a daily update of predictions based on most current data, we would have to manually execute the process every day. There is no automation i.e. scheduling or triggering to automate the process. I hope very much we will see this in the future (think BW data mining / analysis process design and process chains).
Whilst SAP state for hardware specifications that a 2.0 GHz Pentium 4-class processor and 2GB RAM will suffice for running SAP Predictive analysis, from my simple experiments would recommend at least 4GB RAM if not more, as you can see (noting we also have some other application running):
My future Predications (without the need for using SAP Predictive Analysis) are:
• Integrated as part of the SAP HANA studio and/or web based client (zero footprint)
• Provide access to all SAP HANA PAL and BFL algorithms
• Support a connection to SAP BW (OLAP BAPI or MDX)
• Will support BOBJ .UNX Universes
• Increased number PAL / BFL libraries (50+)
• Tighter integration with ‘R’ algorithms (more choice)
• Enhanced graphical outputs directly within the studio
• Ability to export analysis as PDF (not just PMML or XML)
• Automation / scheduling of Predictive Analysis Processes.
For the die hard data scientists i would say stick with SPSS; however in 12 months time I am sure I will be changing my recommendation especially in light of SAP HANA + Predictive analytics. This will allow for the predictive algorithms to be fully executed in the database in a matter of seconds (or less).