Get Started with AWS Flink: Your First Real-Time App in Minutes

Ready to process streaming data on AWS but don’t want to manage servers? Amazon Managed Service for Apache Flink is your answer. This guide cuts through the theory and shows you how to launch your first real-time data processing application quickly.

APACHE FLINK

Pushpender Pannu

5/8/20241 min read

We’ll use Flink SQL via AWS Flink Studio — the simplest way to start.
What You’ll Need (Prerequisites)
  • An AWS Account: If you don’t have one, sign up.

  • An S3 Bucket: We need one for output. Create a new S3 bucket in your preferred region (e.g., my-flink-output-bucket).

  • A Kinesis Data Stream: This will be our data source. Go to the Kinesis console and create a Data Stream (e.g., my-flink-input-stream) with one shard for simplicity.

  • An IAM Role: Your Flink application needs permission.

  • Go to the IAM console and create a new Role.

  • Choose ‘AWS service’ and select ‘Kinesis Analytics’ (this is the older name, now Flink, but often still used in IAM).

  • Attach policies: AmazonKinesisFullAccess (for demo; restrict in production!) and AmazonS3FullAccess (for demo; restrict in production!).

  • Give it a name (e.g., FlinkAppRole)and create it.

Step 1: Launch Flink Studio
  1. Navigate to Amazon Managed Service for Apache Flink in the AWS Console.

  2. Click Studio notebooks.

  3. Click Create Studio notebook.

  4. Give it a name (e.g., MyFirstFlinkApp).

  5. Select the IAM role you created (FlinkAppRole).

  6. Choose Apache Zeppelin and click Create Studio notebook.

  7. Wait for it to become ‘Running’, then click Open in Apache Zeppelin.

Step 2: Write and Run Flink SQL

Inside the Zeppelin notebook:

  1. Create a Kinesis Source Table: This tells Flink how to read your Kinesis stream. Paste this into a notebook paragraph (make sure %flink.ssql is at the top) and run it: