{ "cells": [ { "cell_type": "markdown", "id": "spanish-casting", "metadata": {}, "source": [ "# 1、数据集说明\n", "\n", "这是一份来自 Johns Hopkins University 在github 开源的全球新冠肺炎 [COVID-19](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series) 数据集,每日时间序列汇总,包括确诊、死亡和治愈。所有数据来自每日病例报告。数据持续更新中。\n", ">由于数据集中没有美国的治愈数据,所以在统计全球的现有确诊人员和治愈率的时候会有很大误差,代码里面先不做这个处理,期待数据集的完善。\n" ] }, { "cell_type": "markdown", "id": "rapid-robertson", "metadata": {}, "source": [ "# 2、数据清洗" ] }, { "cell_type": "code", "execution_count": 1, "id": "informational-poland", "metadata": {}, "outputs": [], "source": [ "import pandas as pd \n", "confirmed_data = pd.read_csv('time_series_covid19_confirmed_global.csv')\n", "deaths_data = pd.read_csv('time_series_covid19_deaths_global.csv')\n", "recovered_data = pd.read_csv('time_series_covid19_recovered_global.csv')" ] }, { "cell_type": "code", "execution_count": 2, "id": "southwest-brunei", "metadata": {}, "outputs": [], "source": [ "# 美国的名称格式化\n", "confirmed_data['Country/Region']=confirmed_data['Country/Region'].apply(lambda x: 'United States' if x == 'US' else x)\n", "deaths_data['Country/Region']=deaths_data['Country/Region'].apply(lambda x: 'United States' if x == 'US' else x)\n", "recovered_data['Country/Region']=recovered_data['Country/Region'].apply(lambda x: 'United States' if x == 'US' else x)\n", "\n", "# 将台湾的数据归到中国\n", "idx = confirmed_data[confirmed_data['Country/Region'] == 'Taiwan*'].index[0]\n", "confirmed_data.loc[idx, 'Province/State'] = 'Taiwan'\n", "confirmed_data.loc[idx, 'Country/Region'] = 'China'\n", "\n", "idx = deaths_data[deaths_data['Country/Region'] == 'Taiwan*'].index[0]\n", "deaths_data.loc[idx, 'Province/State'] = 'Taiwan'\n", "deaths_data.loc[idx, 'Country/Region'] = 'China'\n", "\n", "idx = recovered_data[recovered_data['Country/Region'] == 'Taiwan*'].index[0]\n", "recovered_data.loc[idx, 'Province/State'] = 'Taiwan'\n", "recovered_data.loc[idx, 'Country/Region'] = 'China'" ] }, { "cell_type": "code", "execution_count": 3, "id": "every-entry", "metadata": {}, "outputs": [], "source": [ "# 增加 Country/Region 和 Province/State 的中文冗余列 Country/Region_zh 、Province/State_zh\n", "country_map = {\n", " 'Singapore Rep.': '新加坡', 'Dominican Rep.': '多米尼加', 'Palestine': '巴勒斯坦', 'Bahamas': '巴哈马', 'Timor-Leste': '东帝汶',\n", " 'Afghanistan': '阿富汗', 'Guinea-Bissau': '几内亚比绍', \"Côte d'Ivoire\": '科特迪瓦', 'Siachen Glacier': '锡亚琴冰川',\n", " \"Br. Indian Ocean Ter.\": '英属印度洋领土', 'Angola': '安哥拉', 'Albania': '阿尔巴尼亚', 'United Arab Emirates': '阿联酋',\n", " 'Argentina': '阿根廷', 'Armenia': '亚美尼亚', 'French Southern and Antarctic Lands': '法属南半球和南极领地', 'Australia': '澳大利亚',\n", " 'Austria': '奥地利', 'Azerbaijan': '阿塞拜疆', 'Burundi': '布隆迪', 'Belgium': '比利时', 'Benin': '贝宁', 'Burkina Faso': '布基纳法索',\n", " 'Bangladesh': '孟加拉国', 'Bulgaria': '保加利亚', 'The Bahamas': '巴哈马', 'Bosnia and Herz.': '波斯尼亚和黑塞哥维那', 'Belarus': '白俄罗斯',\n", " 'Belize': '伯利兹', 'Bermuda': '百慕大', 'Bolivia': '玻利维亚', 'Brazil': '巴西', 'Brunei': '文莱', 'Bhutan': '不丹',\n", " 'Botswana': '博茨瓦纳', 'Central African Rep.': '中非', 'Canada': '加拿大', 'Switzerland': '瑞士', 'Chile': '智利',\n", " 'China': '中国', 'Ivory Coast': '象牙海岸', 'Cameroon': '喀麦隆', 'Dem. Rep. Congo': '刚果民主共和国', 'Congo': '刚果',\n", " 'Colombia': '哥伦比亚', 'Costa Rica': '哥斯达黎加', 'Cuba': '古巴', 'N. Cyprus': '北塞浦路斯', 'Cyprus': '塞浦路斯', 'Czech Rep.': '捷克',\n", " 'Germany': '德国', 'Djibouti': '吉布提', 'Denmark': '丹麦', 'Algeria': '阿尔及利亚', 'Ecuador': '厄瓜多尔', 'Egypt': '埃及',\n", " 'Eritrea': '厄立特里亚', 'Spain': '西班牙', 'Estonia': '爱沙尼亚', 'Ethiopia': '埃塞俄比亚', 'Finland': '芬兰', 'Fiji': '斐',\n", " 'Falkland Islands': '福克兰群岛', 'France': '法国', 'Gabon': '加蓬', 'United Kingdom': '英国', 'Georgia': '格鲁吉亚',\n", " 'Ghana': '加纳', 'Guinea': '几内亚', 'Gambia': '冈比亚', 'Guinea Bissau': '几内亚比绍', 'Eq. Guinea': '赤道几内亚', 'Greece': '希腊',\n", " 'Greenland': '格陵兰', 'Guatemala': '危地马拉', 'French Guiana': '法属圭亚那', 'Guyana': '圭亚那', 'Honduras': '洪都拉斯',\n", " 'Croatia': '克罗地亚', 'Haiti': '海地', 'Hungary': '匈牙利', 'Indonesia': '印度尼西亚', 'India': '印度', 'Ireland': '爱尔兰',\n", " 'Iran': '伊朗', 'Iraq': '伊拉克', 'Iceland': '冰岛', 'Israel': '以色列', 'Italy': '意大利', 'Jamaica': '牙买加', 'Jordan': '约旦',\n", " 'Japan': '日本', 'Kazakhstan': '哈萨克斯坦', 'Kenya': '肯尼亚', 'Kyrgyzstan': '吉尔吉斯斯坦', 'Cambodia': '柬埔寨', 'Korea': '韩国',\n", " 'Kosovo': '科索沃', 'Kuwait': '科威特', 'Lao PDR': '老挝', 'Lebanon': '黎巴嫩', 'Liberia': '利比里亚', 'Libya': '利比亚',\n", " 'Sri Lanka': '斯里兰卡', 'Lesotho': '莱索托', 'Lithuania': '立陶宛', 'Luxembourg': '卢森堡', 'Latvia': '拉脱维亚', 'Morocco': '摩洛哥',\n", " 'Moldova': '摩尔多瓦', 'Madagascar': '马达加斯加', 'Mexico': '墨西哥', 'Macedonia': '马其顿', 'Mali': '马里', 'Myanmar': '缅甸',\n", " 'Montenegro': '黑山', 'Mongolia': '蒙古', 'Mozambique': '莫桑比克', 'Mauritania': '毛里塔尼亚', 'Malawi': '马拉维',\n", " 'Malaysia': '马来西亚', 'Namibia': '纳米比亚', 'New Caledonia': '新喀里多尼亚', 'Niger': '尼日尔', 'Nigeria': '尼日利亚',\n", " 'Nicaragua': '尼加拉瓜', 'Netherlands': '荷兰', 'Norway': '挪威', 'Nepal': '尼泊尔', 'New Zealand': '新西兰', 'Oman': '阿曼',\n", " 'Pakistan': '巴基斯坦', 'Panama': '巴拿马', 'Peru': '秘鲁', 'Philippines': '菲律宾', 'Papua New Guinea': '巴布亚新几内亚',\n", " 'Poland': '波兰', 'Puerto Rico': '波多黎各', 'Dem. Rep. Korea': '朝鲜', 'Portugal': '葡萄牙', 'Paraguay': '巴拉圭',\n", " 'Qatar': '卡塔尔', 'Romania': '罗马尼亚', 'Russia': '俄罗斯', 'Rwanda': '卢旺达', 'W. Sahara': '西撒哈拉', 'Saudi Arabia': '沙特阿拉伯',\n", " 'Sudan': '苏丹', 'S. Sudan': '南苏丹', 'Senegal': '塞内加尔', 'Solomon Is.': '所罗门群岛', 'Sierra Leone': '塞拉利昂',\n", " 'El Salvador': '萨尔瓦多', 'Somaliland': '索马里兰', 'Somalia': '索马里', 'Serbia': '塞尔维亚', 'Suriname': '苏里南',\n", " 'Slovakia': '斯洛伐克', 'Slovenia': '斯洛文尼亚', 'Sweden': '瑞典', 'Swaziland': '斯威士兰', 'Syria': '叙利亚', 'Chad': '乍得',\n", " 'Togo': '多哥', 'Thailand': '泰国', 'Tajikistan': '塔吉克斯坦', 'Turkmenistan': '土库曼斯坦', 'East Timor': '东帝汶',\n", " 'Trinidad and Tobago': '特里尼达和多巴哥', 'Tunisia': '突尼斯', 'Turkey': '土耳其', 'Tanzania': '坦桑尼亚', 'Uganda': '乌干达',\n", " 'Ukraine': '乌克兰', 'Uruguay': '乌拉圭', 'United States': '美国', 'Uzbekistan': '乌兹别克斯坦', 'Venezuela': '委内瑞拉',\n", " 'Vietnam': '越南', 'Vanuatu': '瓦努阿图', 'West Bank': '西岸', 'Yemen': '也门', 'South Africa': '南非', 'Zambia': '赞比亚',\n", " 'Zimbabwe': '津巴布韦', 'Comoros': '科摩罗'\n", "}\n", "\n", "province_map = {\n", " 'Anhui': '安徽', 'Beijing': '北京', 'Chongqing': '重庆', 'Fujian': '新疆', 'Gansu': '甘肃', 'Guangdong': '广东',\n", " 'Guangxi': '广西', 'Guizhou': '贵州', 'Hainan': '海南', 'Hebei': '河北', 'Heilongjiang': '黑龙江', 'Henan': '河南',\n", " 'Hong Kong': '香港', 'Hubei': '湖北', 'Hunan': '湖南', 'Inner Mongolia': '内蒙古', 'Jiangsu': '江苏',\n", " 'Jiangxi': '江西', 'Jilin': '吉林', 'Liaoning': '辽宁', 'Macau': '澳门', 'Ningxia': '宁夏', 'Qinghai': '青海',\n", " 'Shaanxi': '陕西', 'Shandong': '山东', 'Shanghai': '上海', 'Shanxi': '山西', 'Sichuan': '四川', 'Tianjin': '天津',\n", " 'Tibet': '西藏', 'Xinjiang': '新疆', 'Yunnan': '云南', 'Zhejiang': '浙江', 'Fujian':'福建', 'Taiwan': '台湾'\n", "}\n", "\n", "confirmed_data['Country/Region_zh'] = confirmed_data['Country/Region'].apply(lambda x: country_map.get(x, x))\n", "deaths_data['Country/Region_zh'] = deaths_data['Country/Region'].apply(lambda x: country_map.get(x, x))\n", "recovered_data['Country/Region_zh'] = recovered_data['Country/Region'].apply(lambda x: country_map.get(x, x))\n", "\n", "confirmed_data['Province/State_zh'] = confirmed_data['Province/State'].apply(lambda x: province_map.get(x, x))\n", "deaths_data['Province/State_zh'] = deaths_data['Province/State'].apply(lambda x: province_map.get(x, x))\n", "recovered_data['Province/State_zh'] = recovered_data['Province/State'].apply(lambda x: province_map.get(x, x))\n", "\n", "# 调整字段顺序\n", "confirmed_data = confirmed_data[['Province/State_zh', 'Country/Region_zh'] + confirmed_data.columns[:-2].to_list()]\n", "deaths_data = deaths_data[['Province/State_zh', 'Country/Region_zh'] + deaths_data.columns[:-2].to_list()]\n", "recovered_data = recovered_data[['Province/State_zh', 'Country/Region_zh'] + recovered_data.columns[:-2].to_list()]" ] }, { "cell_type": "markdown", "id": "incomplete-memorial", "metadata": {}, "source": [ "# 3、数据分析可视化 \n", "## 3.1 全球新冠疫情情况\n", "### 3.1.1 全球疫情现状" ] }, { "cell_type": "code", "execution_count": 4, "id": "minute-correlation", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", "
(3/20/21)全球疫情情况
\n", "由于数据集没有美国的治愈数据,所以治愈人数和治愈率都远低于实际,等待数据集完善
\n", "确诊人数 | \n", "死亡人数 | \n", "治愈人数 | \n", "死亡率 | \n", "治愈率 | \n", "
---|---|---|---|---|
122813796 | \n", "2709639 | \n", "69523087 | \n", "2.21% | \n", "56.61% | \n", "
(3/20/21)中国疫情情况
\n", "\n", "
确诊人数 | \n", "死亡人数 | \n", "治愈人数 | \n", "死亡率 | \n", "治愈率 | \n", "现有确诊人数 | \n", "
---|---|---|---|---|---|
102523 | \n", "4849 | \n", "97167 | \n", "4.73% | \n", "94.78% | \n", "507 | \n", "