Start Flink Cluster on Amazon Elastic Map Reduce

  1. Connect to the Amazon EMR cluster. Follow the link for ConnectToEmrCluster in the output section of the CloudFormation template to connect to the cluster.

    We are using Amazon Session Manager Systems Manager to connect to the cluster through the browser and without exposing the cluster directly over the public internet. Amazon EMR is already preconfigured with this capability. To enable the functionality, we just need to app permissions to use the capability to the cluster.

    cf-emr-connect

  2. Change to the correct user by issuing the following command:

    sudo su - hadoop
  3. Apache Flink is included in the Amazon EMR distribution and has been installed on the cluster. To start the Flink JobManager, execute the following command

    flink-yarn-session -n 2 -s 4 -tm 16GB -d
  4. To connect to the Flink UI, we first need to open to the Amazon EMR console

  5. Navigate to the details of the cluster by clicking on the cluster named beam-workshop and copy the Master public DNS

  6. Open Firefox on the Windows instance and navigate to the Hadoop ResourceManager by appending the port :8088 to the DNS name you have just copied

    You can only access the ResourceManager from the Windows instance, it is not available from the public internet. If you are having problems to connect, make sure that you have appended the port correctly and you are connecting from the Windows development environment.

  7. Follow the link labeled ApplicationMaster to access the Flink Dashboard

  8. You are now presented with the Apache Flink dashboard that allows you to interact with the different components of the runtime