mirea-projects/Third term/Artificial intelligence systems and big data/2.ipynb
2024-09-24 02:22:33 +03:00

1390 lines
39 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Рабочая тетрадь № 2"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd \n",
"from sklearn.preprocessing import MinMaxScaler, StandardScaler"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.3.1 Задание \n",
"\n",
"Создать 8x8 матрицу и заполнить её в шахматном порядке нулями и \n",
"единицами."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0, 1, 0, 1, 0, 1, 0, 1],\n",
" [1, 0, 1, 0, 1, 0, 1, 0],\n",
" [0, 1, 0, 1, 0, 1, 0, 1],\n",
" [1, 0, 1, 0, 1, 0, 1, 0],\n",
" [0, 1, 0, 1, 0, 1, 0, 1],\n",
" [1, 0, 1, 0, 1, 0, 1, 0],\n",
" [0, 1, 0, 1, 0, 1, 0, 1],\n",
" [1, 0, 1, 0, 1, 0, 1, 0]])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.array([[(x + y) % 2 for x in range(8)] for y in range(8)])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.3.2 Задание\n",
"\n",
"Создать 5x5 матрицу со значениями в строках от 0 до 4. Для создания \n",
"необходимо использовать функцию arrange. "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0, 1, 2, 3, 4],\n",
" [0, 1, 2, 3, 4],\n",
" [0, 1, 2, 3, 4],\n",
" [0, 1, 2, 3, 4],\n",
" [0, 1, 2, 3, 4]])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.array([np.arange(0, 5) for _ in range(5)])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.3.3 Задание\n",
"\n",
"Создать массив 3x3x3 со случайными значениями. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[[0.34392799, 0.718808 , 0.49594499],\n",
" [0.12167775, 0.56056024, 0.59003049],\n",
" [0.12481231, 0.79707319, 0.66605017]],\n",
"\n",
" [[0.11550937, 0.29438156, 0.69728858],\n",
" [0.3432886 , 0.35701781, 0.72659151],\n",
" [0.73779222, 0.09585279, 0.40705831]],\n",
"\n",
" [[0.23874481, 0.80360945, 0.53127737],\n",
" [0.85959837, 0.16119215, 0.78824553],\n",
" [0.53977056, 0.71800074, 0.93729907]]])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.random((3, 3, 3))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.3.4 Задание\n",
"\n",
"Создать матрицу с 0 внутри, и 1 на границах."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 1, 1, 1, 1, 1, 1, 1],\n",
" [1, 0, 0, 0, 0, 0, 0, 1],\n",
" [1, 0, 0, 0, 0, 0, 0, 1],\n",
" [1, 0, 0, 0, 0, 0, 0, 1],\n",
" [1, 0, 0, 0, 0, 0, 0, 1],\n",
" [1, 0, 0, 0, 0, 0, 0, 1],\n",
" [1, 0, 0, 0, 0, 0, 0, 1],\n",
" [1, 1, 1, 1, 1, 1, 1, 1]])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.array([[int((x in [0, 7]) or (y in [0, 7])) for x in range(8)] for y in range(8)])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.3.5 Задание\n",
"\n",
"Создайте массив и отсортируйте его по убыванию. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"arr = np.arange(0, 10)\n",
"np.sort(arr)[::-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.3.6 Задание\n",
"\n",
"Создайте матрицу, выведите ее форму, размер и размерность."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(8, 10) 2 80\n"
]
}
],
"source": [
"arr = np.array([np.arange(0, 10) for _ in range(8)])\n",
"\n",
"print(arr.shape, arr.ndim, arr.size)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3.1 Задание\n",
"\n",
"Найди евклидово расстояние между двумя Series (точками) a и b, не \n",
"используя встроенную формулу."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3.1622776601683795\n"
]
}
],
"source": [
"from math import sqrt\n",
"\n",
"first_dot = pd.Series([1, 3])\n",
"second_dot = pd.Series([4, 2])\n",
"\n",
"s = 0\n",
"for dim in range(first_dot.size):\n",
" s += (first_dot.array[dim] - second_dot.array[dim]) ** 2\n",
"\n",
"print(sqrt(s))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3.2 Задание \n",
"\n",
"Найдите в Интернете ссылку на любой csv файл и сформируйте из него \n",
"фрейм данных (например, коллекцию фреймов данных можно найти \n",
"здесь: https://github.com/akmand/datasets)."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Airline</th>\n",
" <th>Flight</th>\n",
" <th>AirportFrom</th>\n",
" <th>AirportTo</th>\n",
" <th>DayOfWeek</th>\n",
" <th>Time</th>\n",
" <th>Length</th>\n",
" <th>Delay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>CO</td>\n",
" <td>269</td>\n",
" <td>SFO</td>\n",
" <td>IAH</td>\n",
" <td>3</td>\n",
" <td>15</td>\n",
" <td>205</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>US</td>\n",
" <td>1558</td>\n",
" <td>PHX</td>\n",
" <td>CLT</td>\n",
" <td>3</td>\n",
" <td>15</td>\n",
" <td>222</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>AA</td>\n",
" <td>2400</td>\n",
" <td>LAX</td>\n",
" <td>DFW</td>\n",
" <td>3</td>\n",
" <td>20</td>\n",
" <td>165</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>AA</td>\n",
" <td>2466</td>\n",
" <td>SFO</td>\n",
" <td>DFW</td>\n",
" <td>3</td>\n",
" <td>20</td>\n",
" <td>195</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>AS</td>\n",
" <td>108</td>\n",
" <td>ANC</td>\n",
" <td>SEA</td>\n",
" <td>3</td>\n",
" <td>30</td>\n",
" <td>202</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>539378</th>\n",
" <td>CO</td>\n",
" <td>178</td>\n",
" <td>OGG</td>\n",
" <td>SNA</td>\n",
" <td>5</td>\n",
" <td>1439</td>\n",
" <td>326</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>539379</th>\n",
" <td>FL</td>\n",
" <td>398</td>\n",
" <td>SEA</td>\n",
" <td>ATL</td>\n",
" <td>5</td>\n",
" <td>1439</td>\n",
" <td>305</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>539380</th>\n",
" <td>FL</td>\n",
" <td>609</td>\n",
" <td>SFO</td>\n",
" <td>MKE</td>\n",
" <td>5</td>\n",
" <td>1439</td>\n",
" <td>255</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>539381</th>\n",
" <td>UA</td>\n",
" <td>78</td>\n",
" <td>HNL</td>\n",
" <td>SFO</td>\n",
" <td>5</td>\n",
" <td>1439</td>\n",
" <td>313</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>539382</th>\n",
" <td>US</td>\n",
" <td>1442</td>\n",
" <td>LAX</td>\n",
" <td>PHL</td>\n",
" <td>5</td>\n",
" <td>1439</td>\n",
" <td>301</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>539383 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" Airline Flight AirportFrom AirportTo DayOfWeek Time Length Delay\n",
"0 CO 269 SFO IAH 3 15 205 1\n",
"1 US 1558 PHX CLT 3 15 222 1\n",
"2 AA 2400 LAX DFW 3 20 165 1\n",
"3 AA 2466 SFO DFW 3 20 195 1\n",
"4 AS 108 ANC SEA 3 30 202 0\n",
"... ... ... ... ... ... ... ... ...\n",
"539378 CO 178 OGG SNA 5 1439 326 0\n",
"539379 FL 398 SEA ATL 5 1439 305 0\n",
"539380 FL 609 SFO MKE 5 1439 255 0\n",
"539381 UA 78 HNL SFO 5 1439 313 1\n",
"539382 US 1442 LAX PHL 5 1439 301 1\n",
"\n",
"[539383 rows x 8 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url = 'https://raw.githubusercontent.com/akmand/datasets/refs/heads/main/airlines.csv'\n",
"\n",
"df = pd.read_csv(url)\n",
"\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3.3 Задание\n",
"\n",
"Проделайте с получившемся из предыдущего задания фреймом данных \n",
"те же действия, что и в примерах 2.2.5-2.2.7. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.2.5\n",
"\n",
"Пронализировать характеристики фрейма данных."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Airline</th>\n",
" <th>Flight</th>\n",
" <th>AirportFrom</th>\n",
" <th>AirportTo</th>\n",
" <th>DayOfWeek</th>\n",
" <th>Time</th>\n",
" <th>Length</th>\n",
" <th>Delay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>CO</td>\n",
" <td>269</td>\n",
" <td>SFO</td>\n",
" <td>IAH</td>\n",
" <td>3</td>\n",
" <td>15</td>\n",
" <td>205</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>US</td>\n",
" <td>1558</td>\n",
" <td>PHX</td>\n",
" <td>CLT</td>\n",
" <td>3</td>\n",
" <td>15</td>\n",
" <td>222</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Airline Flight AirportFrom AirportTo DayOfWeek Time Length Delay\n",
"0 CO 269 SFO IAH 3 15 205 1\n",
"1 US 1558 PHX CLT 3 15 222 1"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head(2)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Airline</th>\n",
" <th>Flight</th>\n",
" <th>AirportFrom</th>\n",
" <th>AirportTo</th>\n",
" <th>DayOfWeek</th>\n",
" <th>Time</th>\n",
" <th>Length</th>\n",
" <th>Delay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>539380</th>\n",
" <td>FL</td>\n",
" <td>609</td>\n",
" <td>SFO</td>\n",
" <td>MKE</td>\n",
" <td>5</td>\n",
" <td>1439</td>\n",
" <td>255</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>539381</th>\n",
" <td>UA</td>\n",
" <td>78</td>\n",
" <td>HNL</td>\n",
" <td>SFO</td>\n",
" <td>5</td>\n",
" <td>1439</td>\n",
" <td>313</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>539382</th>\n",
" <td>US</td>\n",
" <td>1442</td>\n",
" <td>LAX</td>\n",
" <td>PHL</td>\n",
" <td>5</td>\n",
" <td>1439</td>\n",
" <td>301</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Airline Flight AirportFrom AirportTo DayOfWeek Time Length Delay\n",
"539380 FL 609 SFO MKE 5 1439 255 0\n",
"539381 UA 78 HNL SFO 5 1439 313 1\n",
"539382 US 1442 LAX PHL 5 1439 301 1"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail(3)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(539383, 8)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Flight</th>\n",
" <th>DayOfWeek</th>\n",
" <th>Time</th>\n",
" <th>Length</th>\n",
" <th>Delay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>539383.000000</td>\n",
" <td>539383.000000</td>\n",
" <td>539383.000000</td>\n",
" <td>539383.000000</td>\n",
" <td>539383.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>2427.928630</td>\n",
" <td>3.929668</td>\n",
" <td>802.728963</td>\n",
" <td>132.202007</td>\n",
" <td>0.445442</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>2067.429837</td>\n",
" <td>1.914664</td>\n",
" <td>278.045911</td>\n",
" <td>70.117016</td>\n",
" <td>0.497015</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>10.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>712.000000</td>\n",
" <td>2.000000</td>\n",
" <td>565.000000</td>\n",
" <td>81.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>1809.000000</td>\n",
" <td>4.000000</td>\n",
" <td>795.000000</td>\n",
" <td>115.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>3745.000000</td>\n",
" <td>5.000000</td>\n",
" <td>1035.000000</td>\n",
" <td>162.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>7814.000000</td>\n",
" <td>7.000000</td>\n",
" <td>1439.000000</td>\n",
" <td>655.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Flight DayOfWeek Time Length \n",
"count 539383.000000 539383.000000 539383.000000 539383.000000 \\\n",
"mean 2427.928630 3.929668 802.728963 132.202007 \n",
"std 2067.429837 1.914664 278.045911 70.117016 \n",
"min 1.000000 1.000000 10.000000 0.000000 \n",
"25% 712.000000 2.000000 565.000000 81.000000 \n",
"50% 1809.000000 4.000000 795.000000 115.000000 \n",
"75% 3745.000000 5.000000 1035.000000 162.000000 \n",
"max 7814.000000 7.000000 1439.000000 655.000000 \n",
"\n",
" Delay \n",
"count 539383.000000 \n",
"mean 0.445442 \n",
"std 0.497015 \n",
"min 0.000000 \n",
"25% 0.000000 \n",
"50% 0.000000 \n",
"75% 1.000000 \n",
"max 1.000000 "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.2.6\n",
"\n",
"Выберите индивидуальные данные или срезы фрейма данных."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Airline</th>\n",
" <th>Flight</th>\n",
" <th>AirportFrom</th>\n",
" <th>AirportTo</th>\n",
" <th>DayOfWeek</th>\n",
" <th>Time</th>\n",
" <th>Length</th>\n",
" <th>Delay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>US</td>\n",
" <td>1558</td>\n",
" <td>PHX</td>\n",
" <td>CLT</td>\n",
" <td>3</td>\n",
" <td>15</td>\n",
" <td>222</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>AA</td>\n",
" <td>2400</td>\n",
" <td>LAX</td>\n",
" <td>DFW</td>\n",
" <td>3</td>\n",
" <td>20</td>\n",
" <td>165</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>AA</td>\n",
" <td>2466</td>\n",
" <td>SFO</td>\n",
" <td>DFW</td>\n",
" <td>3</td>\n",
" <td>20</td>\n",
" <td>195</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Airline Flight AirportFrom AirportTo DayOfWeek Time Length Delay\n",
"1 US 1558 PHX CLT 3 15 222 1\n",
"2 AA 2400 LAX DFW 3 20 165 1\n",
"3 AA 2466 SFO DFW 3 20 195 1"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[1:4]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.2.7\n",
"\n",
"Требуется отобрать строки фрейма данных на основе некоторого \n",
"условия."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Airline</th>\n",
" <th>Flight</th>\n",
" <th>AirportFrom</th>\n",
" <th>AirportTo</th>\n",
" <th>DayOfWeek</th>\n",
" <th>Time</th>\n",
" <th>Length</th>\n",
" <th>Delay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>US</td>\n",
" <td>1558</td>\n",
" <td>PHX</td>\n",
" <td>CLT</td>\n",
" <td>3</td>\n",
" <td>15</td>\n",
" <td>222</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>US</td>\n",
" <td>498</td>\n",
" <td>DEN</td>\n",
" <td>CLT</td>\n",
" <td>3</td>\n",
" <td>55</td>\n",
" <td>179</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Airline Flight AirportFrom AirportTo DayOfWeek Time Length Delay\n",
"1 US 1558 PHX CLT 3 15 222 1\n",
"15 US 498 DEN CLT 3 55 179 0"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['Airline'] == 'US'].head(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.3.2 Задание\n",
"\n",
"Загрузить фрейм данных по ссылке: \n",
"https://raw.githubusercontent.com/akmand/datasets/master/iris.csv. \n",
"Необходимо выполнить нормализацию первого числового признака \n",
"(sepal_length_cm) с использованием минимаксного преобразования, а \n",
"второго (sepal_width_cm) с задействованием z-масштабирования. "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length_cm</th>\n",
" <th>sepal_width_cm</th>\n",
" <th>petal_length_cm</th>\n",
" <th>petal_width_cm</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145</th>\n",
" <td>6.7</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>146</th>\n",
" <td>6.3</td>\n",
" <td>2.5</td>\n",
" <td>5.0</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>148</th>\n",
" <td>6.2</td>\n",
" <td>3.4</td>\n",
" <td>5.4</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>149</th>\n",
" <td>5.9</td>\n",
" <td>3.0</td>\n",
" <td>5.1</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>150 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" sepal_length_cm sepal_width_cm petal_length_cm petal_width_cm \n",
"0 5.1 3.5 1.4 0.2 \\\n",
"1 4.9 3.0 1.4 0.2 \n",
"2 4.7 3.2 1.3 0.2 \n",
"3 4.6 3.1 1.5 0.2 \n",
"4 5.0 3.6 1.4 0.2 \n",
".. ... ... ... ... \n",
"145 6.7 3.0 5.2 2.3 \n",
"146 6.3 2.5 5.0 1.9 \n",
"147 6.5 3.0 5.2 2.0 \n",
"148 6.2 3.4 5.4 2.3 \n",
"149 5.9 3.0 5.1 1.8 \n",
"\n",
" species \n",
"0 setosa \n",
"1 setosa \n",
"2 setosa \n",
"3 setosa \n",
"4 setosa \n",
".. ... \n",
"145 virginica \n",
"146 virginica \n",
"147 virginica \n",
"148 virginica \n",
"149 virginica \n",
"\n",
"[150 rows x 5 columns]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url = 'https://raw.githubusercontent.com/akmand/datasets/master/iris.csv'\n",
"\n",
"iris_df = pd.read_csv(url)\n",
"\n",
"iris_df"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"length_feature = np.array(iris_df['sepal_length_cm']).reshape(-1,1)\n",
"\n",
"minmax_scale = MinMaxScaler(feature_range = (0, 1))\n",
"scaled_sepal_length = minmax_scale.fit_transform(length_feature)\n",
"iris_df['sepal_length_cm'] = scaled_sepal_length;"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"width_feature = np.array(iris_df['sepal_width_cm']).reshape(-1,1)\n",
"\n",
"z_scale = StandardScaler()\n",
"scaled_sepal_width = z_scale.fit_transform(width_feature)\n",
"iris_df['sepal_width_cm'] = scaled_sepal_width"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length_cm</th>\n",
" <th>sepal_width_cm</th>\n",
" <th>petal_length_cm</th>\n",
" <th>petal_width_cm</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.222222</td>\n",
" <td>1.032057</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.166667</td>\n",
" <td>-0.124958</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.111111</td>\n",
" <td>0.337848</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.083333</td>\n",
" <td>0.106445</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.194444</td>\n",
" <td>1.263460</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145</th>\n",
" <td>0.666667</td>\n",
" <td>-0.124958</td>\n",
" <td>5.2</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>146</th>\n",
" <td>0.555556</td>\n",
" <td>-1.281972</td>\n",
" <td>5.0</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147</th>\n",
" <td>0.611111</td>\n",
" <td>-0.124958</td>\n",
" <td>5.2</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>148</th>\n",
" <td>0.527778</td>\n",
" <td>0.800654</td>\n",
" <td>5.4</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>149</th>\n",
" <td>0.444444</td>\n",
" <td>-0.124958</td>\n",
" <td>5.1</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>150 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" sepal_length_cm sepal_width_cm petal_length_cm petal_width_cm \n",
"0 0.222222 1.032057 1.4 0.2 \\\n",
"1 0.166667 -0.124958 1.4 0.2 \n",
"2 0.111111 0.337848 1.3 0.2 \n",
"3 0.083333 0.106445 1.5 0.2 \n",
"4 0.194444 1.263460 1.4 0.2 \n",
".. ... ... ... ... \n",
"145 0.666667 -0.124958 5.2 2.3 \n",
"146 0.555556 -1.281972 5.0 1.9 \n",
"147 0.611111 -0.124958 5.2 2.0 \n",
"148 0.527778 0.800654 5.4 2.3 \n",
"149 0.444444 -0.124958 5.1 1.8 \n",
"\n",
" species \n",
"0 setosa \n",
"1 setosa \n",
"2 setosa \n",
"3 setosa \n",
"4 setosa \n",
".. ... \n",
"145 virginica \n",
"146 virginica \n",
"147 virginica \n",
"148 virginica \n",
"149 virginica \n",
"\n",
"[150 rows x 5 columns]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris_df"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}