Many faces of the “Consistency”

The word “consistency” is widely used in IT nowadays, especially regarding databases, but what does that really mean? Let’s find out!

Formal definition

In database systems, consistency refers to the requirement that any given database transaction must change affected data only in allowed ways.

Wikipedia (https://en.wikipedia.org/wiki/Consistency_(database_systems))

So basically this concept describes that if you have specific rules about your data (invariants), then every database transaction should preserve those rules – e.g. in a cinema ticket booking system, by creating every successful booking, the number of available tickets should be decremented, but can not drop below zero. Suppose a database transaction starts with valid data according to these rules (invariants) so any changes during the transaction should preserve the validity. Seems to be straightforward, right? Nevertheless, there are a few super confusing, but extremely important acronyms that contain “C” (as consistency) but have very little in common with the formal statement above.

A C I D

The first, but probably the biggest (and oldest) acronym is ACID, which refers to properties of database transactions:

  • A – Atomicity
  • C – Consistency
  • I – Isolation
  • D – Durability

I would rather change that to realistic AI (atomic and isolation), but let’s keep the Durability concept for the next post and focus on C (consistency) here.

So due to Wiki –

Consistency in ACID transactions ensures that a transaction can only bring the database from one consistent state to another, preserving database invariants

The problem with that statement lies in the fact, that the idea of consistency is mostly related to application business logic. That’s the application code’s responsibility to correctly construct the database transaction to preserve invariants. Basically, the database is unable to guarantee consistency (yes, there are some very simple database constraint checks like uniqueness), so if you write bad data (e.g. create a cinema booking but will not reduce the number of available seats), the database will not hold you back! That is up to the application what data is valid, and what is not. So the “C” letter should not belong to the “ACID” acronym.

CL A P

Another example of a misleading acronym is CAP, which refers to a theorem about guarantees in distributed data stores:

  • C – Consistency
  • A – Availability
  • P – Partition tolerance

Yet another example of too many letters in the acronym, as in fact, the network partition is a part of the problem, that will happen without your allowance whether you like it or not. So it rather should be something like “Choose Consistency or Availability in case of Partition”.

So due to Wiki –

Consistency in CAP theorem means that every read receives the most recent write or an error.

That description directly references the concept of Linearizability.

The strict definition of linearizability is quite fuzzy, but the basic idea is that as soon as one client successfully completes a write operation, all clients reading from the database should be able to see the value just written as it was only a single register (even if reality there are multiple replicas/nodes/partitions).

Example of a non-linearizable system

So after John has booked the last seat for the “Men in Black” movie, both Jane and Mike tried to check availability, but both of them saw different results, due to async replication and network delay during the master db change population.

Example of a linearizable system behavior

In the example above, as there is a point in time when Mike’s delete change became visible to John’s read operation, after that all readers should see the same version as John.

Other consistency examples

The word consistency is terribly overloaded across the whole broad IT domains, apart from the example above, you also can find that word in such areas:

Final thoughts

That article was an attempt to underline the problem of people using just the word “consistency” without a clear statement about what exactly they mean, which actually introduces inconsistency into the discussion 🙂

If you are interested to know more about described above, and other faces of consistency, I highly recommend you the book “Designing Data-Intensive Applications” by Martin Kleppmann.

How to prevent deletion of your AWS RDS backups.

Do you have a backup of your database? Is it stored in safe place? Do you have a plan B if someone will delete your database and backup by accident or intentionally?

This is the list of questions I faced after recent security breach. Let’s imagine a situation when we have a database in cloud and someone accidentally removed that database and all backups/snapshot. Sounds like impossible disaster right? Well, even if it is pretty hard to do accidentally, that doesn’t mean it’s impossible. Or if someone “from outside” will get an access to your cloud prod environment and intentionally remove your database, oh, that sounds more realistic. Well, all we know that due to Moore’s law that question is rather “When?” and not “What if?”.

Regardless of scenario, it is always good plan to have a plan B? So what we can do with that – probably the most obvious scenario is to save the database snapshots in a different place than your database lives. Yes, I think it is a good option. Nevertheless, it is always good to remember that every additional environment, framework or technology will require additional engineering time to support.

Another way is that we can to restrict anyone (regardless of permissions) from removing database snapshots. So even if someone intentionally will want to remove database, it is always a possibility of (relatively) quick restore. Fortunately Amazon S3 bucket provides a possibility to set object lock for S3 bucket. There are 2 retention modes for object lock in AWS S3 service (AWS doc source):

  • Governance mode
  • Compliance mode

Governance allow to delete objects if you have a special permissions and Compliance mode does not allow to remove this object at all (even for root user) for particular period of time.

In addition to that, recently AWS introduced possibility to export RDS database snapshots to S3 bucket (AWS doc source). From AWS Console we can easily do that by clicking to Export to Amazon S3 button:

AWS Console Export to S3 Bucket
AWS Console – Export to S3 bucket option

So we can combine Exporting Snapshots and Non-Deletable files to protect database snapshots from deleting.

The last thing, is that even if it is pretty simple to do manually through AWS Console, it is always better to have such important process automated, so our database will be safely stored even when we are chilling on the beach during our long waited summer holidays ? . For doing that, we can subscribe to RDS Snapshot creating events and via lambda execution initiate exporting newly created snapshot to non-deletable S3 bucket.

The architecture of that will look like that:

Bellow you could find serverless framework file for creating that infrastructure:

service:
 name: rds-s3-exporter

plugins:
 - serverless-pseudo-parameters
 - serverless-plugin-lambda-dead-letter
 - serverless-prune-plugin

provider:
 name: aws
 runtime: nodejs14.x
 timeout: 30
 stage: dev${env:USER, env:USERNAME}
 region: eu-west-1
 deploymentBucket:
   name: ${opt:stage, self:custom.default-vars-stage}-deployments
 iamRoleStatements:
   - Effect: Allow
     Action:
       - "KMS:GenerateDataKey*"
       - "KMS:ReEncrypt*"
       - "KMS:GenerateDataKey*"
       - "KMS:DescribeKey"
       - "KMS:Encrypt"
       - "KMS:CreateGrant"
       - "KMS:ListGrants"
       - "KMS:RevokeGrant"
     Resource: "*"
   - Effect: Allow
     Action:
       - "IAM:Passrole"
       - "IAM:GetRole"
     Resource:
       - { Fn::GetAtt: [ snapshotExportTaskRole, Arn ] }
   - Effect: Allow
     Action:
       - ssm:GetParameters
     Resource: "*"
   - Effect: Allow
     Action:
       - sqs:SendMessage
       - sqs:ReceiveMessage
       - sqs:DeleteMessage
       - sqs:GetQueueUrl
     Resource:
       - { Fn::GetAtt: [ rdsS3ExporterQueue, Arn ] }
       - { Fn::GetAtt: [ rdsS3ExporterFailedQ, Arn ] }
   - Effect: Allow
     Action:
       - lambda:InvokeFunction
     Resource:
       - "arn:aws:lambda:#{AWS::Region}:#{AWS::AccountId}:function:${opt:stage}-rds-s3-exporter"
   - Effect: Allow
     Action:
       - rds:DescribeDBClusterSnapshots
       - rds:DescribeDBClusters
       - rds:DescribeDBInstances
       - rds:DescribeDBSnapshots
       - rds:DescribeExportTasks
       - rds:StartExportTask
     Resource: "*"

 environment:
   CONFIG_STAGE: ${self:custom.vars.configStage}
   REDEPLOY: "true"
custom:
 stage: ${opt:stage, self:provider.stage}
 region: ${opt:region, self:provider.region}
 default-vars-stage: ppe
 vars: ${file(./vars.yml):${opt:stage, self:custom.default-vars-stage}}
 version: ${env:BUILD_VERSION, file(package.json):version}
 rdsS3ExporterQ: ${self:custom.stage}-rds-s3-exporter
 rdsS3ExporterFailedQ: ${self:custom.stage}-rds-s3-exporterFailedQ
 databaseSnapshotCreatedTopic: ${self:custom.stage}-database-snapshotCreated
 rdsS3ExporterBucket: "${self:custom.stage}-database-snapshot-backups"

functions:
 backup:
   handler: dist/functions/backup.main
   reservedConcurrency: ${self:custom.vars.lambdaReservedConcurrency.backup}
   timeout: 55
   events:
     - sqs:
         arn: "arn:aws:sqs:#{AWS::Region}:#{AWS::AccountId}:${self:custom.rdsS3ExporterQ}"
         batchSize: 1
   environment:
     CONFIG_STAGE: ${self:custom.vars.configStage}
     DATABASE_BACKUPS_BUCKET: ${self:custom.rdsS3ExporterBucket}
     IAM_ROLE: "arn:aws:iam::#{AWS::AccountId}:role/${opt:stage}-rds-s3-exporter-role"
     KMS_KEY_ID: alias/lambda
     REGION: "eu-west-1"

resources:
 Description: Lambda to handle upload database backups to S3 bucket
 Resources:
   rdsS3ExporterQueue:
     Type: AWS::SQS::Queue
     Properties:
       QueueName: "${self:custom.rdsS3ExporterQ}"
       MessageRetentionPeriod: 1209600 # 14 days
       RedrivePolicy:
         deadLetterTargetArn:
           Fn::GetAtt: [ rdsS3ExporterFailedQ, Arn ]
         maxReceiveCount: 5
       VisibilityTimeout: 60
   rdsS3ExporterFailedQ:
     Type: "AWS::SQS::Queue"
     Properties:
       QueueName: "${self:custom.rdsS3ExporterFailedQ}"
       MessageRetentionPeriod: 1209600 # 14 days
   databaseSnapshotCreatedTopic:
     Type: AWS::SNS::Topic
     Properties:
       TopicName: ${self:custom.databaseSnapshotCreatedTopic}
   snapshotCreatedTopicQueueSubscription:
     Type: "AWS::SNS::Subscription"
     Properties:
       TopicArn: arn:aws:sns:#{AWS::Region}:#{AWS::AccountId}:${self:custom.databaseSnapshotCreatedTopic}
       Endpoint:
         Fn::GetAtt: [ rdsS3ExporterQueue, Arn ]
       Protocol: sqs
       RawMessageDelivery: true
     DependsOn:
       - rdsS3ExporterQueue
       - databaseSnapshotCreatedTopic

   snapshotCreatedRdsTopicSubscription:
     Type: "AWS::RDS::EventSubscription"
     Properties:
       Enabled: true
       EventCategories : [ "creation"]
       SnsTopicArn : arn:aws:sns:#{AWS::Region}:#{AWS::AccountId}:${self:custom.databaseSnapshotCreatedTopic}
       SourceType : "db-snapshot"
     DependsOn:
       - databaseSnapshotCreatedTopic

   rdsS3ExporterQueuePolicy:
     Type: AWS::SQS::QueuePolicy
     Properties:
       Queues:
         - Ref: rdsS3ExporterQueue
       PolicyDocument:
         Version: "2012-10-17"
         Statement:
           - Effect: Allow
             Principal: "*"
             Action: [ "sqs:SendMessage" ]
             Resource:
               Fn::GetAtt: [ rdsS3ExporterQueue, Arn ]
             Condition:
               ArnEquals:
                 aws:SourceArn: arn:aws:sns:#{AWS::Region}:#{AWS::AccountId}:${self:custom.databaseSnapshotCreatedTopic}
  
   rdsS3ExporterBucket:
     Type: AWS::S3::Bucket
     DeletionPolicy: Retain
     Properties:
       BucketName: ${self:custom.rdsS3ExporterBucket}
       AccessControl: Private
       VersioningConfiguration:
         Status: Enabled
       ObjectLockEnabled: true
       ObjectLockConfiguration:
         ObjectLockEnabled: Enabled
         Rule:
           DefaultRetention:
             Mode: COMPLIANCE
             Days: "${self:custom.vars.objectLockRetentionPeriod}"
       LifecycleConfiguration:
         Rules:
           - Id: DeleteObjectAfter31Days
             Status: Enabled
             ExpirationInDays: ${self:custom.vars.expireInDays}
       PublicAccessBlockConfiguration:
         BlockPublicAcls: true
         BlockPublicPolicy: true
         IgnorePublicAcls: true
         RestrictPublicBuckets: true
       BucketEncryption:
         ServerSideEncryptionConfiguration:
           - ServerSideEncryptionByDefault:
               SSEAlgorithm: AES256

   snapshotExportTaskRole:
     Type: AWS::IAM::Role
     Properties:
       RoleName: ${opt:stage}-rds-s3-exporter-role
       Path: /
       AssumeRolePolicyDocument:
         Version: '2012-10-17'
         Statement:
           - Effect: Allow
             Principal:
               Service:
                 - rds-export.aws.internal
                 - export.rds.amazonaws.com
             Action:
               - "sts:AssumeRole"
       Policies:
           - PolicyName: ${opt:stage}-rds-s3-exporter-policy
             PolicyDocument:
               Version: '2012-10-17'
               Statement:
                 - Effect: Allow
                   Action:
                     - "s3:PutObject*"
                     - "s3:ListBucket"
                     - "s3:GetObject*"
                     - "s3:DeleteObject*"
                     - "s3:GetBucketLocation"
                   Resource:
                     - "arn:aws:s3:::${self:custom.rdsS3ExporterBucket}"
                     - "arn:aws:s3:::${self:custom.rdsS3ExporterBucket}/*"

package:
 include:
   - dist/**
   - package.json
 exclude:
   - "*"
   - .?*/**
   - src/**
   - test/**
   - docs/**
   - infrastructure/**
   - postman/**
   - offline/**
   - node_modules/.bin/**

Now it’s time to add lambda handler code:

import { Handler } from "aws-lambda";
import { v4 as uuidv4 } from 'uuid';
import middy = require("middy");
import * as AWS from "aws-sdk";

export const processEvent: Handler<any, void> = async (request: any) => {
	console.info("rds-s3-exporter.started");

	const record = request.Records[0];
	const event = JSON.parse(record.body);

	const rds = new AWS.RDS({region: process.env.REGION});
	await rds.startExportTask({
		ExportTaskIdentifier: `database-backup-${uuidv4()}`,
		SourceArn: event["Source ARN"],
		S3BucketName: process.env.DATABASE_BACKUPS_BUCKET || "",
		IamRoleArn: process.env.IAM_ROLE || "",
		KmsKeyId: process.env.KMS_KEY_ID || ""
	}).promise().then(data =>{
		console.info("rds-s3-exporter.status", data);
		console.info("rds-s3-exporter.success");
	})
};

export const main = middy(processEvent)
	.use({
		onError: (context, callback) => {
			console.error("rds-s3-exporter.error", context.error);
			callback(context.error);
		}
	})

As there is no built-in possibility to create a flexible filter of databases we want to export, it is always possible to add some custom filtering in the lambda execution itself. You could find example of such logic as well as all codebase in GitHub repository.

Right before publication this post I’ve found that AWS actually recently (Oct 2021) has implemented AWS Backup Vault Lock which does the same thing out of the box. You could read more about that at AWS Doc website. Nevertheless, at the time of publication this post, AWS Backup Vault Lock has not been certified by third-party organisations SEC 17a-4(f) and CFTC.

ASP.NET MVC (Not Core) in Windows Docker Container

Recently I’ve faced a task to dockerize legacy applications written in ASP.NET Core. Developers use IIS for the local development environment, but when a new teammate has to install the local environment, it takes ~8 hours to do all mumbo-jumbo to make everything works. So we decided to move the legacy application to docker, at least for the local development purpose at the beginning (aka “A Journey of a Thousand Miles Begins with a Single Step”).

Before I start, it is worth to mention, that I struggled to find a bit amount of info about containerizing ASP .Net Framework apps in Windows Containers. Almost all of the threads I’ve found were related to Linux Containers and .Net Core. So I’ve decided to share my own experience with such a task.

So the requirements after the dockerization were:

  • The application should still be hosted in IIS, due to internal app configuration
  • Changes in code files should be visible in the application with additional actions (except local build if needed)
  • Code should be debuggable
  • Deployed apps in containers should have custom hostnames
  • “One command run” process (instead of 8 hours of configuration)
  • Some apps use legacy AngularJS framework, with bower, etc. So Node Js should be available to use in containers
  • The Application should work 🙂

As a base, I’m going to use mcr.microsoft.com/windows/servercore/iis image, it is lighter than mcr.microsoft.com/dotnet/framework/aspnet image, so as smaller as better.

The code below downloads the nodejs distributive archive, next save it in image storage, unarchive to the folder and add this folder to the PATH variable. It will allow using node and npm command in the command line.. The last thing is the cleanup of the downloaded zip archive.

ADD https://nodejs.org/dist/v12.4.0/node-v12.4.0-win-x64.zip /nodejs.zip
RUN powershell -command Expand-Archive nodejs.zip -DestinationPath C:\; 
RUN powershell Rename-Item "C:\\node-v12.4.0-win-x64" c:\nodejs
RUN SETX PATH C:\nodejs
RUN del nodejs.zip

Next part of the code does the same, but here RemoteTools for Visual Studio is downloaded and installed, we will need it later for debugging.

ADD https://aka.ms/vs/16/release/RemoteTools.amd64ret.enu.exe /VS_RemoteTools.exe
RUN VS_RemoteTools.exe /install /quiet /norestart
RUN del VS_RemoteTools.exe

Now, as IIS should be used as a hosted server, we need to remove default content from inetpub\wwwroot folder. Later we will use this folder for own code.

RUN powershell -command Remove-Item -Recurse C:\inetpub\wwwroot\*

To be able to use ASP.NET application in IIS, we have to install Windows Feature by:

RUN powershell -command Install-WindowsFeature NET-Framework-45-ASPNET
RUN powershell -command Install-WindowsFeature Web-Asp-Net45

In order IIS to use content of inetpub\wwwroot folder, it is necessarry to add permission for access files. As we are using docker for development purposes and it is isolated, it is OK to grant everyone to access files by command:

RUN icacls "c:/inetpub/wwwroot" /grant "Everyone:(OI)(CI)M"

Now it is crucial to tell docker about our context:

WORKDIR /inetpub/wwwroot

Last, but not least, we have to run msvsmon.exe tool to allow remote debugging from Visual Studio. It is important to use as the lowest access restrictions as possible, just to omit to add exceptions to a firewall, auth issues, etc. (but remember that it is not enough for any kind of public access deployment option)

ENTRYPOINT ["C:\\Program Files\\Microsoft Visual Studio 16.0\\Common7\\IDE\\Remote Debugger\\x64\\msvsmon.exe", "/noauth", "/anyuser", "/silent", "/nostatus", "/noclrwarn", "/nosecuritywarn", "/nofirewallwarn", "/nowowwarn"]

The whole Dockerfile:

FROM mcr.microsoft.com/windows/servercore/iis

ADD https://nodejs.org/dist/v12.4.0/node-v12.4.0-win-x64.zip /nodejs.zip
RUN powershell -command Expand-Archive nodejs.zip -DestinationPath C:\; 
RUN powershell Rename-Item "C:\\node-v12.4.0-win-x64" c:\nodejs
RUN SETX PATH C:\nodejs
RUN del nodejs.zip

ADD https://aka.ms/vs/16/release/RemoteTools.amd64ret.enu.exe /VS_RemoteTools.exe
RUN VS_RemoteTools.exe /install /quiet /norestart
RUN del VS_RemoteTools.exe

RUN powershell -command Remove-Item -Recurse C:\inetpub\wwwroot\*
RUN powershell -command Install-WindowsFeature NET-Framework-45-ASPNET
RUN powershell -command Install-WindowsFeature Web-Asp-Net45

WORKDIR /inetpub/wwwroot

RUN icacls "c:/inetpub/wwwroot" /grant "Everyone:(OI)(CI)M"

ENTRYPOINT ["C:\\Program Files\\Microsoft Visual Studio 16.0\\Common7\\IDE\\Remote Debugger\\x64\\msvsmon.exe", "/noauth", "/anyuser", "/silent", "/nostatus", "/noclrwarn", "/nosecuritywarn", "/nofirewallwarn", "/nowowwarn"]

Obviously, we could run it file with docker run command, but let’s imagine we also need a create a database container for our application, so to fulfill the requirement of “One Command Run” I’ve created a docker-compose file

version: "3.8"

services:
  app:
    build: .
    image: samp.compose:latest
    ports:
      - "83:80"
      - "84:443"
    networks:
      app-network:
        ipv4_address: 10.5.0.5
    depends_on:
      - db
    hostname: samp
    
  db:
    image: microsoft/mssql-server-windows-express
    networks:
      - app-network
    ports:
      - "1433:1433"
    environment:
      - sa_password=test123
      - ACCEPT_EULA=Y
    hostname: dockerdb
    
networks:
  app-network:
    driver: nat
    ipam:
      config:
        - subnet: 10.5.0.0/16
          gateway: 10.5.0.1

As you can see, docker-compose file contains 2 services and 1 network, almost nothing special. But there are 2 things are worth to highlight:

  • Creating a network with an explicitly specified gateway and subnet. It will allow setting a predefined IP address for our containers (we will later use it for mapping to host). If you will not do that, then every time a container will be created, the new IP address will be automatically assigned from a pool of available IPs.
  • Using hostname in the db service. This allows us to use the hostname in the connection string outside of the container and app-network. For example, now it is possible to connect to this SQL server from your Windows machine using SSMS using the hostname as a Server name:

So now we could run our application just by command docker-compose up

Now just open the browser and put the IP from the docker-compose file (in my case 10.5.0.5) of app service and you should :

Now you could change your code and rebuild (if needed) the project on local machine to see changes on website.

Let’s check if our project has a nodejs installed (which potentially could be needed for some frontned build actions).

To do that, put the command docker ps and copy the container id of the app container:

Now put the command docker exec -it {YOUR CONTAINER ID} cmd. This command will connect to the container and run the cmd command. Now let’s just put node –version to check if nodejs is installed.

Node version is displayed, so now we could use it (together with npm) for further development

The next thing is debugging, as you remember we’ve added execution of msvsmon.exe as an entry point for our container. It allows us to use Visual Studio Remote Debugging. For that, just click Ctrl + Alt + P and select Remote (no authentication) and click the Find button:

Next select created app from the list:

The last thing is to check Show processes for all users and select IIS process w3wp.exe. (You may need to open your website in browser to start the lazy IIS process)

After VS attached to process, it is possible to use the regular debug experience.

The last requirement from the list is that applications in containers should run under hostname. To do that, we have to open C:\Windows\System32\drivers\etc\hosts file (you have to do that with admin rights) and append a line to the end and save the file.

10.5.0.5       samp.local.com

Where 10.5.0.5 is the IP from the docker-compose file, and samp.local.com is your custom hostname.

After that just open the browser (you may have to restart the browser and run ipconfig /flushdns command) and type http://samp.local.com

Link to the full codebase – in GitHub repository. You could find there an additional Dockerfile for the multistage build.

I hope this article could be helpful for people who are trying to migrate the legacy windows applications step by step to containerization.