- Download XGBoost and sklearn libraries to all nodes of your Greenplum cluster (using
gpscp -f hostfile <source> =:<destination>) - Compile and install XGBoost and sklearn libraries on all nodes (using
gpssh -f hostfilefollowed by the compile and install commands) - Run the above SQL file (it will create a schema called
xgbdemo). - Invoke the UDFs as shown in the sample snippet.
Since the XGBoost implementation in https://github.com/dmlc/xgboost is not Python 2.6 compatible, I recommend you clone my version from https://github.com/vatsan/xgboost and use it instead (Python 2.6 compatible, will work with PL/Python on Greenplum/HAWQ).
