{
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "## **Assignment: Alternatives to Pandas for Processing Large Datasets (Datatable)**\n",
        "\n",
        "**Group 7**\n",
        "\n",
        "Group Members: \n",
        "\n",
        "\n",
        "1.   Madina Suraya binti Zharin       A20EC0203\n",
        "2.   Nur Izzah Mardhiah binti Rashidi A20EC0116\n",
        "3. Tan Yong Sheng                     A20EC0157\n",
        "4. Chloe Racquelmae Kennedy           A20EC0026"
      ],
      "metadata": {
        "id": "H6uT3iqxJws1"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from google.colab import drive\n",
        "drive.mount('/content/drive')"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "pk23EnIwODd8",
        "outputId": "9f7707bd-1957-4547-f972-4fd0e8529b45"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Mounted at /content/drive\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "dataset = '/content/drive/MyDrive/Colab Notebooks/Big Data - Asg2/Rate.csv'"
      ],
      "metadata": {
        "id": "05VR3diiOGV-"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Installing Data Table\n",
        "\n",
        "To use the package, we need to install it first"
      ],
      "metadata": {
        "id": "nJu58ywELAfP"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install datatable"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "61jqgal7JlZb",
        "outputId": "daa42e25-5c9d-4630-ecf8-9a8946a3a468"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting datatable\n",
            "  Downloading datatable-1.0.0-cp38-cp38-manylinux_2_12_x86_64.whl (96.6 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m96.6/96.6 MB\u001b[0m \u001b[31m11.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hInstalling collected packages: datatable\n",
            "Successfully installed datatable-1.0.0\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "After installing, we need to import the package. Then, it is ready to be used"
      ],
      "metadata": {
        "id": "b0T1QS3eLWNO"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import time\n",
        "import datatable as dt\n",
        "print(dt.__version__)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        },
        "id": "oHvuduA0Jmlb",
        "outputId": "e2c38200-11fa-4421-ba89-84aa580b90ba"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "<style type='text/css'>\n",
              ".datatable table.frame { margin-bottom: 0; }\n",
              ".datatable table.frame thead { border-bottom: none; }\n",
              ".datatable table.frame tr.coltypes td {  color: #FFFFFF;  line-height: 6px;  padding: 0 0.5em;}\n",
              ".datatable .bool    { background: #DDDD99; }\n",
              ".datatable .object  { background: #565656; }\n",
              ".datatable .int     { background: #5D9E5D; }\n",
              ".datatable .float   { background: #4040CC; }\n",
              ".datatable .str     { background: #CC4040; }\n",
              ".datatable .time    { background: #40CC40; }\n",
              ".datatable .row_index {  background: var(--jp-border-color3);  border-right: 1px solid var(--jp-border-color0);  color: var(--jp-ui-font-color3);  font-size: 9px;}\n",
              ".datatable .frame tbody td { text-align: left; }\n",
              ".datatable .frame tr.coltypes .row_index {  background: var(--jp-border-color0);}\n",
              ".datatable th:nth-child(2) { padding-left: 12px; }\n",
              ".datatable .hellipsis {  color: var(--jp-cell-editor-border-color);}\n",
              ".datatable .vellipsis {  background: var(--jp-layout-color0);  color: var(--jp-cell-editor-border-color);}\n",
              ".datatable .na {  color: var(--jp-cell-editor-border-color);  font-size: 80%;}\n",
              ".datatable .sp {  opacity: 0.25;}\n",
              ".datatable .footer { font-size: 9px; }\n",
              ".datatable .frame_dimensions {  background: var(--jp-border-color3);  border-top: 1px solid var(--jp-border-color0);  color: var(--jp-ui-font-color3);  display: inline-block;  opacity: 0.6;  padding: 1px 10px 1px 5px;}\n",
              "</style>\n"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "1.0.0\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Reading Data"
      ],
      "metadata": {
        "id": "vmPhAKcIBR8R"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "`fread` is the function to read the files in datatable module. From the documentation,fread provides fast and convenient parsing of text (csv) files."
      ],
      "metadata": {
        "id": "ypPC1sk5Lv2T"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "start = time.time()\n",
        "input_file_name = dataset\n",
        "dt_df = dt.fread(input_file_name)\n",
        "end = time.time()\n",
        "print(end - start)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "FLioexD5KQSl",
        "outputId": "ae9d16a7-5290-4bf1-d3f8-7e471ab948c2"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "13.901981592178345\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Dataset Properties"
      ],
      "metadata": {
        "id": "2FCFNGgPBYNy"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "dt_df.head()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 486
        },
        "id": "9OJDOgAyBhQb",
        "outputId": "53dbc6de-7995-4905-ecae-ee046415ff47"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<Frame#7f0b081133f0 10x24>"
            ],
            "text/html": [
              "<div class='datatable'>\n",
              "  <table class='frame'>\n",
              "  <thead>\n",
              "    <tr class='colnames'><td class='row_index'></td><th>BusinessYear</th><th>StateCode</th><th>IssuerId</th><th>SourceName</th><th>VersionNum</th><th>ImportDate</th><th>IssuerId2</th><th>FederalTIN</th><th>RateEffectiveDate</th><th>RateExpirationDate</th><th class='vellipsis'>&hellip;</th><th>PrimarySubscriberAndThreeOrMoreDependents</th><th>CoupleAndOneDependent</th><th>CoupleAndTwoDependents</th><th>CoupleAndThreeOrMoreDependents</th><th>RowNumber</th></tr>\n",
              "    <tr class='coltypes'><td class='row_index'></td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='time' title='time64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='time' title='date32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='time' title='date32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td></td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td></tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr><td class='row_index'>0</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>14</td></tr>\n",
              "    <tr><td class='row_index'>1</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td>107.61</td><td>144.56</td><td>144.56</td><td>144.56</td><td>14</td></tr>\n",
              "    <tr><td class='row_index'>2</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td>107.61</td><td>144.56</td><td>144.56</td><td>144.56</td><td>15</td></tr>\n",
              "    <tr><td class='row_index'>3</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>15</td></tr>\n",
              "    <tr><td class='row_index'>4</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>16</td></tr>\n",
              "    <tr><td class='row_index'>5</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td>107.61</td><td>144.56</td><td>144.56</td><td>144.56</td><td>16</td></tr>\n",
              "    <tr><td class='row_index'>6</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td>94.5</td><td>126.95</td><td>126.95</td><td>126.95</td><td>17</td></tr>\n",
              "    <tr><td class='row_index'>7</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>17</td></tr>\n",
              "    <tr><td class='row_index'>8</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>18</td></tr>\n",
              "    <tr><td class='row_index'>9</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td>94.5</td><td>126.95</td><td>126.95</td><td>126.95</td><td>18</td></tr>\n",
              "  </tbody>\n",
              "  </table>\n",
              "  <div class='footer'>\n",
              "    <div class='frame_dimensions'>10 rows &times; 24 columns</div>\n",
              "  </div>\n",
              "</div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 12
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "It can be observed that, the method fread return datatable Frame object."
      ],
      "metadata": {
        "id": "0Kge46CDL5fP"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "type(dt_df)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "LgVO79CZD_51",
        "outputId": "f43d3019-daf5-48c5-d4b8-f3f64e1900c2"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "datatable.Frame"
            ]
          },
          "metadata": {},
          "execution_count": 5
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "dt_df.shape"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "YfNEIAuvF6yX",
        "outputId": "e8f19c5d-eb8d-46c2-e8b2-7bc18fa3f8f2"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "(12694445, 24)"
            ]
          },
          "metadata": {},
          "execution_count": 13
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "dt_df.names"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "iDFQ-y9jEcpw",
        "outputId": "b9e54655-7552-40f0-b179-a599e5fd304a"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "('BusinessYear',\n",
              " 'StateCode',\n",
              " 'IssuerId',\n",
              " 'SourceName',\n",
              " 'VersionNum',\n",
              " 'ImportDate',\n",
              " 'IssuerId2',\n",
              " 'FederalTIN',\n",
              " 'RateEffectiveDate',\n",
              " 'RateExpirationDate',\n",
              " 'PlanId',\n",
              " 'RatingAreaId',\n",
              " 'Tobacco',\n",
              " 'Age',\n",
              " 'IndividualRate',\n",
              " 'IndividualTobaccoRate',\n",
              " 'Couple',\n",
              " 'PrimarySubscriberAndOneDependent',\n",
              " 'PrimarySubscriberAndTwoDependents',\n",
              " 'PrimarySubscriberAndThreeOrMoreDependents',\n",
              " 'CoupleAndOneDependent',\n",
              " 'CoupleAndTwoDependents',\n",
              " 'CoupleAndThreeOrMoreDependents',\n",
              " 'RowNumber')"
            ]
          },
          "metadata": {},
          "execution_count": 6
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "dt_df.stypes"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "CJzId4xhEvs4",
        "outputId": "92fbbae9-d106-4983-d2b0-9bdb79151fc6"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "(stype.int32,\n",
              " stype.str32,\n",
              " stype.int32,\n",
              " stype.str32,\n",
              " stype.int32,\n",
              " stype.time64,\n",
              " stype.int32,\n",
              " stype.str32,\n",
              " stype.date32,\n",
              " stype.date32,\n",
              " stype.str32,\n",
              " stype.str32,\n",
              " stype.str32,\n",
              " stype.str32,\n",
              " stype.float64,\n",
              " stype.float64,\n",
              " stype.float64,\n",
              " stype.float64,\n",
              " stype.float64,\n",
              " stype.float64,\n",
              " stype.float64,\n",
              " stype.float64,\n",
              " stype.float64,\n",
              " stype.int32)"
            ]
          },
          "metadata": {},
          "execution_count": 9
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Looking at the number of unique values per columns"
      ],
      "metadata": {
        "id": "EY7Qts3PMGud"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "dt_df.nunique()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 120
        },
        "id": "WGWbVeeFFuGo",
        "outputId": "564f77f8-9bd9-420e-ee39-a93254478442"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<Frame#7f0b08225fc0 1x24>"
            ],
            "text/html": [
              "<div class='datatable'>\n",
              "  <table class='frame'>\n",
              "  <thead>\n",
              "    <tr class='colnames'><td class='row_index'></td><th>BusinessYear</th><th>StateCode</th><th>IssuerId</th><th>SourceName</th><th>VersionNum</th><th>ImportDate</th><th>IssuerId2</th><th>FederalTIN</th><th>RateEffectiveDate</th><th>RateExpirationDate</th><th class='vellipsis'>&hellip;</th><th>PrimarySubscriberAndThreeOrMoreDependents</th><th>CoupleAndOneDependent</th><th>CoupleAndTwoDependents</th><th>CoupleAndThreeOrMoreDependents</th><th>RowNumber</th></tr>\n",
              "    <tr class='coltypes'><td class='row_index'></td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td></td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td></tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr><td class='row_index'>0</td><td>3</td><td>39</td><td>910</td><td>3</td><td>23</td><td>266</td><td>910</td><td>335</td><td>14</td><td>26</td><td class=vellipsis>&hellip;</td><td>7632</td><td>7268</td><td>7733</td><td>8197</td><td>60444</td></tr>\n",
              "  </tbody>\n",
              "  </table>\n",
              "  <div class='footer'>\n",
              "    <div class='frame_dimensions'>1 row &times; 24 columns</div>\n",
              "  </div>\n",
              "</div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 14
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Data Slicing"
      ],
      "metadata": {
        "id": "zbY1KIh1JWIS"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "dt_df[0:10,0:4]"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 293
        },
        "id": "09axbVguJYb9",
        "outputId": "16f183b9-37c1-45e5-a91d-14ddb7d82f85"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<Frame#7f0b081139c0 10x4>"
            ],
            "text/html": [
              "<div class='datatable'>\n",
              "  <table class='frame'>\n",
              "  <thead>\n",
              "    <tr class='colnames'><td class='row_index'></td><th>BusinessYear</th><th>StateCode</th><th>IssuerId</th><th>SourceName</th></tr>\n",
              "    <tr class='coltypes'><td class='row_index'></td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td></tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr><td class='row_index'>0</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td></tr>\n",
              "    <tr><td class='row_index'>1</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td></tr>\n",
              "    <tr><td class='row_index'>2</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td></tr>\n",
              "    <tr><td class='row_index'>3</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td></tr>\n",
              "    <tr><td class='row_index'>4</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td></tr>\n",
              "    <tr><td class='row_index'>5</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td></tr>\n",
              "    <tr><td class='row_index'>6</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td></tr>\n",
              "    <tr><td class='row_index'>7</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td></tr>\n",
              "    <tr><td class='row_index'>8</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td></tr>\n",
              "    <tr><td class='row_index'>9</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td></tr>\n",
              "  </tbody>\n",
              "  </table>\n",
              "  <div class='footer'>\n",
              "    <div class='frame_dimensions'>10 rows &times; 4 columns</div>\n",
              "  </div>\n",
              "</div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 27
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Data Filtering"
      ],
      "metadata": {
        "id": "5Jygl2pYHDx-"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from datatable import *"
      ],
      "metadata": {
        "id": "-VKjpVJgGFtG"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Filter out the data where the Business Year data is 2015"
      ],
      "metadata": {
        "id": "lXSUvbhKMT5F"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "year = dt_df[f.BusinessYear == 2015,:]\n",
        "year.head(5)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 293
        },
        "id": "4jrPmcR4GX_Q",
        "outputId": "90f743ff-7fcc-465d-df04-2c0b1e84d0a4"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<Frame#7f0b0820e900 5x24>"
            ],
            "text/html": [
              "<div class='datatable'>\n",
              "  <table class='frame'>\n",
              "  <thead>\n",
              "    <tr class='colnames'><td class='row_index'></td><th>BusinessYear</th><th>StateCode</th><th>IssuerId</th><th>SourceName</th><th>VersionNum</th><th>ImportDate</th><th>IssuerId2</th><th>FederalTIN</th><th>RateEffectiveDate</th><th>RateExpirationDate</th><th class='vellipsis'>&hellip;</th><th>PrimarySubscriberAndThreeOrMoreDependents</th><th>CoupleAndOneDependent</th><th>CoupleAndTwoDependents</th><th>CoupleAndThreeOrMoreDependents</th><th>RowNumber</th></tr>\n",
              "    <tr class='coltypes'><td class='row_index'></td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='time' title='time64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='time' title='date32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='time' title='date32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td></td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td></tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr><td class='row_index'>0</td><td>2015</td><td>AK</td><td>21989</td><td>HIOS</td><td>4</td><td>2014-08-08<span class=sp>T</span>08:53:29</td><td>21989</td><td>93-0438772</td><td>2015-01-01</td><td>2015-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>14</td></tr>\n",
              "    <tr><td class='row_index'>1</td><td>2015</td><td>AK</td><td>21989</td><td>HIOS</td><td>4</td><td>2014-08-08<span class=sp>T</span>08:53:29</td><td>21989</td><td>93-0438772</td><td>2015-01-01</td><td>2015-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>15</td></tr>\n",
              "    <tr><td class='row_index'>2</td><td>2015</td><td>AK</td><td>21989</td><td>HIOS</td><td>4</td><td>2014-08-08<span class=sp>T</span>08:53:29</td><td>21989</td><td>93-0438772</td><td>2015-01-01</td><td>2015-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>16</td></tr>\n",
              "    <tr><td class='row_index'>3</td><td>2015</td><td>AK</td><td>21989</td><td>HIOS</td><td>4</td><td>2014-08-08<span class=sp>T</span>08:53:29</td><td>21989</td><td>93-0438772</td><td>2015-01-01</td><td>2015-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>17</td></tr>\n",
              "    <tr><td class='row_index'>4</td><td>2015</td><td>AK</td><td>21989</td><td>HIOS</td><td>4</td><td>2014-08-08<span class=sp>T</span>08:53:29</td><td>21989</td><td>93-0438772</td><td>2015-01-01</td><td>2015-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>18</td></tr>\n",
              "  </tbody>\n",
              "  </table>\n",
              "  <div class='footer'>\n",
              "    <div class='frame_dimensions'>5 rows &times; 24 columns</div>\n",
              "  </div>\n",
              "</div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 18
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Data Aggregation"
      ],
      "metadata": {
        "id": "AEQBgqQZHGdm"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Find the average of Individual Rate based on the Age"
      ],
      "metadata": {
        "id": "YFGrUGITMZlE"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "avgIndividualRate = dt_df[:,dt.mean(f.IndividualRate),dt.by(f.Age)]\n",
        "avgIndividualRate"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 528
        },
        "id": "nLHfpScVGxAZ",
        "outputId": "e8e5eb7a-e145-432d-ac30-c3ff0d783c1b"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<Frame#7f0b08213510 47x2>"
            ],
            "text/html": [
              "<div class='datatable'>\n",
              "  <table class='frame'>\n",
              "  <thead>\n",
              "    <tr class='colnames'><td class='row_index'></td><th>Age</th><th>IndividualRate</th></tr>\n",
              "    <tr class='coltypes'><td class='row_index'></td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td></tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr><td class='row_index'>0</td><td>0-20</td><td>122.256</td></tr>\n",
              "    <tr><td class='row_index'>1</td><td>21</td><td>4082.84</td></tr>\n",
              "    <tr><td class='row_index'>2</td><td>22</td><td>4082.91</td></tr>\n",
              "    <tr><td class='row_index'>3</td><td>23</td><td>4082.98</td></tr>\n",
              "    <tr><td class='row_index'>4</td><td>24</td><td>4083.08</td></tr>\n",
              "    <tr><td class='row_index'>5</td><td>25</td><td>4083.62</td></tr>\n",
              "    <tr><td class='row_index'>6</td><td>26</td><td>4087.14</td></tr>\n",
              "    <tr><td class='row_index'>7</td><td>27</td><td>4091.29</td></tr>\n",
              "    <tr><td class='row_index'>8</td><td>28</td><td>4097.99</td></tr>\n",
              "    <tr><td class='row_index'>9</td><td>29</td><td>4103.55</td></tr>\n",
              "    <tr><td class='row_index'>10</td><td>30</td><td>4106.36</td></tr>\n",
              "    <tr><td class='row_index'>11</td><td>31</td><td>4110.55</td></tr>\n",
              "    <tr><td class='row_index'>12</td><td>32</td><td>4114.73</td></tr>\n",
              "    <tr><td class='row_index'>13</td><td>33</td><td>4117.35</td></tr>\n",
              "    <tr><td class='row_index'>14</td><td>34</td><td>4120.15</td></tr>\n",
              "    <tr><td class='row_index'>&#x22EE;</td><td class='hellipsis'>&#x22EE;</td><td class='hellipsis'>&#x22EE;</td></tr>\n",
              "    <tr><td class='row_index'>42</td><td>62</td><td>4409.51</td></tr>\n",
              "    <tr><td class='row_index'>43</td><td>63</td><td>4423.06</td></tr>\n",
              "    <tr><td class='row_index'>44</td><td>64</td><td>4431.28</td></tr>\n",
              "    <tr><td class='row_index'>45</td><td>65 and over</td><td>4431.41</td></tr>\n",
              "    <tr><td class='row_index'>46</td><td>Family Option</td><td>28.5286</td></tr>\n",
              "  </tbody>\n",
              "  </table>\n",
              "  <div class='footer'>\n",
              "    <div class='frame_dimensions'>47 rows &times; 2 columns</div>\n",
              "  </div>\n",
              "</div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 22
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Data Sorting"
      ],
      "metadata": {
        "id": "wBVi00bqIdQi"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Sort the data based on Age (increasing)"
      ],
      "metadata": {
        "id": "9S2xllkRMhg5"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "sortedData = dt_df[:,:,dt.sort(f.Age)]\n",
        "sortedData.head(3)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 215
        },
        "id": "W-ucvObKHwJ9",
        "outputId": "63f35b48-c7a7-4acb-c4ad-1f6e5e4ebf11"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<Frame#7f0b1057d930 3x24>"
            ],
            "text/html": [
              "<div class='datatable'>\n",
              "  <table class='frame'>\n",
              "  <thead>\n",
              "    <tr class='colnames'><td class='row_index'></td><th>BusinessYear</th><th>StateCode</th><th>IssuerId</th><th>SourceName</th><th>VersionNum</th><th>ImportDate</th><th>IssuerId2</th><th>FederalTIN</th><th>RateEffectiveDate</th><th>RateExpirationDate</th><th class='vellipsis'>&hellip;</th><th>PrimarySubscriberAndThreeOrMoreDependents</th><th>CoupleAndOneDependent</th><th>CoupleAndTwoDependents</th><th>CoupleAndThreeOrMoreDependents</th><th>RowNumber</th></tr>\n",
              "    <tr class='coltypes'><td class='row_index'></td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='time' title='time64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='time' title='date32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='time' title='date32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td></td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td></tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr><td class='row_index'>0</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>14</td></tr>\n",
              "    <tr><td class='row_index'>1</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>60</td></tr>\n",
              "    <tr><td class='row_index'>2</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td><span class=na>NA</span></td><td>106</td></tr>\n",
              "  </tbody>\n",
              "  </table>\n",
              "  <div class='footer'>\n",
              "    <div class='frame_dimensions'>3 rows &times; 24 columns</div>\n",
              "  </div>\n",
              "</div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 23
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Sort the data based on Age (decreasing)"
      ],
      "metadata": {
        "id": "iFkSFi3wMj8i"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "sortedData = dt_df[:,:,dt.sort(-f.Age)]\n",
        "sortedData.head(3)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 215
        },
        "id": "Df2TnmkNIk3X",
        "outputId": "61283176-6ca3-416a-8143-ea1d4251cadd"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<Frame#7f0b0820e270 3x24>"
            ],
            "text/html": [
              "<div class='datatable'>\n",
              "  <table class='frame'>\n",
              "  <thead>\n",
              "    <tr class='colnames'><td class='row_index'></td><th>BusinessYear</th><th>StateCode</th><th>IssuerId</th><th>SourceName</th><th>VersionNum</th><th>ImportDate</th><th>IssuerId2</th><th>FederalTIN</th><th>RateEffectiveDate</th><th>RateExpirationDate</th><th class='vellipsis'>&hellip;</th><th>PrimarySubscriberAndThreeOrMoreDependents</th><th>CoupleAndOneDependent</th><th>CoupleAndTwoDependents</th><th>CoupleAndThreeOrMoreDependents</th><th>RowNumber</th></tr>\n",
              "    <tr class='coltypes'><td class='row_index'></td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='time' title='time64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='str' title='str32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='time' title='date32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='time' title='date32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td></td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='float' title='float64'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td><td class='int' title='int32'>&#x25AA;&#x25AA;&#x25AA;&#x25AA;</td></tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr><td class='row_index'>0</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td>107.61</td><td>144.56</td><td>144.56</td><td>144.56</td><td>14</td></tr>\n",
              "    <tr><td class='row_index'>1</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td>107.61</td><td>144.56</td><td>144.56</td><td>144.56</td><td>15</td></tr>\n",
              "    <tr><td class='row_index'>2</td><td>2014</td><td>AK</td><td>21989</td><td>HIOS</td><td>6</td><td>2014-03-19<span class=sp>T</span>07:06:49</td><td>21989</td><td>93-0438772</td><td>2014-01-01</td><td>2014-12-31</td><td class=vellipsis>&hellip;</td><td>107.61</td><td>144.56</td><td>144.56</td><td>144.56</td><td>16</td></tr>\n",
              "  </tbody>\n",
              "  </table>\n",
              "  <div class='footer'>\n",
              "    <div class='frame_dimensions'>3 rows &times; 24 columns</div>\n",
              "  </div>\n",
              "</div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 24
        }
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.5"
    },
    "colab": {
      "provenance": []
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}