{
"cells": [
{
"cell_type": "markdown",
"id": "human-franchise",
"metadata": {},
"source": [
"# 前言\n",
"\n",
"数据源是来自[和鲸社区](https://www.kesci.com/mw/dataset/5ffac64f3441fd001538228b/file)的 3 份互不相关的电商数据集,所以分成 3 部分,每部分只对其中的一个数据集进行分析。\n",
"\n",
"\n",
"## part 1\n",
"\n",
"tmall_order_report.csv 这个数据集是订单数据,可供挖掘的纬度有订单时间、省份(收货地址),指标则有销售量、销售额、退款金额、退货率、成交率、地区分布、下单时间趋势等。\n",
"\n",
"### 1、数据理解与处理"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "hawaiian-tournament",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 订单编号 | \n",
" 总金额 | \n",
" 买家实际支付金额 | \n",
" 收货地址 | \n",
" 订单创建时间 | \n",
" 订单付款时间 | \n",
" 退款金额 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 178.8 | \n",
" 0.0 | \n",
" 上海 | \n",
" 2020-02-21 00:00:00 | \n",
" NaN | \n",
" 0.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 21.0 | \n",
" 21.0 | \n",
" 内蒙古自治区 | \n",
" 2020-02-20 23:59:54 | \n",
" 2020-02-21 00:00:02 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 37.0 | \n",
" 0.0 | \n",
" 安徽省 | \n",
" 2020-02-20 23:59:35 | \n",
" NaN | \n",
" 0.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 157.0 | \n",
" 157.0 | \n",
" 湖南省 | \n",
" 2020-02-20 23:58:34 | \n",
" 2020-02-20 23:58:44 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 64.8 | \n",
" 0.0 | \n",
" 江苏省 | \n",
" 2020-02-20 23:57:04 | \n",
" 2020-02-20 23:57:11 | \n",
" 64.8 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 订单编号 总金额 买家实际支付金额 收货地址 订单创建时间 订单付款时间 \\\n",
"0 1 178.8 0.0 上海 2020-02-21 00:00:00 NaN \n",
"1 2 21.0 21.0 内蒙古自治区 2020-02-20 23:59:54 2020-02-21 00:00:02 \n",
"2 3 37.0 0.0 安徽省 2020-02-20 23:59:35 NaN \n",
"3 4 157.0 157.0 湖南省 2020-02-20 23:58:34 2020-02-20 23:58:44 \n",
"4 5 64.8 0.0 江苏省 2020-02-20 23:57:04 2020-02-20 23:57:11 \n",
"\n",
" 退款金额 \n",
"0 0.0 \n",
"1 0.0 \n",
"2 0.0 \n",
"3 0.0 \n",
"4 64.8 "
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"data = pd.read_csv('tmall_order_report.csv')\n",
"data.head() # 退款金额应该就是客户退货后,返还给客户的退款金额"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "dependent-analyst",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 28010 entries, 0 to 28009\n",
"Data columns (total 7 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 订单编号 28010 non-null int64 \n",
" 1 总金额 28010 non-null float64\n",
" 2 买家实际支付金额 28010 non-null float64\n",
" 3 收货地址 28010 non-null object \n",
" 4 订单创建时间 28010 non-null object \n",
" 5 订单付款时间 24087 non-null object \n",
" 6 退款金额 28010 non-null float64\n",
"dtypes: float64(3), int64(1), object(3)\n",
"memory usage: 1.5+ MB\n"
]
}
],
"source": [
"data.info() # 数据集情况 28010 条,6个字段"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "central-green",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['订单编号', '总金额', '买家实际支付金额', '收货地址', '订单创建时间', '订单付款时间', '退款金额'], dtype='object')"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.columns = data.columns.str.strip() # 列名有空格,需要处理下\n",
"data.columns"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "decreased-sector",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"订单编号 0\n",
"总金额 0\n",
"买家实际支付金额 0\n",
"收货地址 0\n",
"订单创建时间 0\n",
"订单付款时间 0\n",
"退款金额 0\n",
"dtype: int64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[data.duplicated()].count() # 没有完全重复的数据"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "american-chuck",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"订单编号 0\n",
"总金额 0\n",
"买家实际支付金额 0\n",
"收货地址 0\n",
"订单创建时间 0\n",
"订单付款时间 3923\n",
"退款金额 0\n",
"dtype: int64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.isnull().sum() # 付款时间存在空值,表示订单未付款"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "caring-flavor",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['上海', '内蒙古', '安徽', '湖南', '江苏', '浙江', '天津', '北京', '四川', '贵州', '辽宁',\n",
" '河南', '广西', '广东', '福建', '海南', '江西', '甘肃', '河北', '黑龙江', '云南', '重庆',\n",
" '山西', '吉林', '山东', '陕西', '湖北', '青海', '新疆', '宁夏', '西藏'], dtype=object)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data['收货地址'] = data['收货地址'].str.replace('自治区|维吾尔|回族|壮族|省', '') # 对省份做个清洗,便于可视化\n",
"data['收货地址'].unique()"
]
},
{
"cell_type": "markdown",
"id": "rocky-employee",
"metadata": {},
"source": [
"### 2、数据分析可视化"
]
},
{
"cell_type": "markdown",
"id": "continent-alpha",
"metadata": {},
"source": [
"#### 2.1 整体情况"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "spanish-aerospace",
"metadata": {},
"outputs": [],
"source": [
"result = {}\n",
"result['总订单数'] = data['订单编号'].count() \n",
"result['已完成订单数'] = data['订单编号'][data['订单付款时间'].notnull()].count() \n",
"result['未付款订单数'] = data['订单编号'][data['订单付款时间'].isnull()].count() \n",
"result['退款订单数'] = data['订单编号'][data['退款金额'] > 0].count() \n",
"result['总订单金额'] = data['总金额'][data['订单付款时间'].notnull()].sum() \n",
"result['总退款金额'] = data['退款金额'][data['订单付款时间'].notnull()].sum() \n",
"result['总实际收入金额'] = data['买家实际支付金额'][data['订单付款时间'].notnull()].sum() "
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "surgical-settle",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'总订单数': 28010,\n",
" '已完成订单数': 24087,\n",
" '未付款订单数': 3923,\n",
" '退款订单数': 5646,\n",
" '总订单金额': 2474823.0700000003,\n",
" '总退款金额': 572335.9199999999,\n",
" '总实际收入金额': 1902487.1500000001}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "direct-diana",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
"
整体情况
\n",
"
\n",
"
\n",
" \n",
" \n",
" 总订单数 | \n",
" 总订单金额 | \n",
" 已完成订单数 | \n",
" 总实际收入金额 | \n",
" 退款订单数 | \n",
" 总退款金额 | \n",
" 成交率 | \n",
" 退货率 | \n",
"
\n",
" \n",
" \n",
" \n",
" 28010 | \n",
" 247.48 万 | \n",
" 24087 | \n",
" 190.25 万 | \n",
" 5646 | \n",
" 57.23 万 | \n",
" 85.99% | \n",
" 23.44% | \n",
"
\n",
" \n",
"
\n",
"
\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pyecharts import options as opts\n",
"from pyecharts.charts import Map, Bar, Line\n",
"from pyecharts.components import Table\n",
"from pyecharts.options import ComponentTitleOpts\n",
"from pyecharts.faker import Faker\n",
"\n",
"table = Table()\n",
"\n",
"headers = ['总订单数', '总订单金额', '已完成订单数', '总实际收入金额', '退款订单数', '总退款金额', '成交率', '退货率']\n",
"rows = [\n",
" [\n",
" result['总订单数'], f\"{result['总订单金额']/10000:.2f} 万\", result['已完成订单数'], f\"{result['总实际收入金额']/10000:.2f} 万\",\n",
" result['退款订单数'], f\"{result['总退款金额']/10000:.2f} 万\", \n",
" f\"{result['已完成订单数']/result['总订单数']:.2%}\",\n",
" f\"{result['退款订单数']/result['已完成订单数']:.2%}\",\n",
" ]\n",
"]\n",
"table.add(headers, rows)\n",
"table.set_global_opts(\n",
" title_opts=ComponentTitleOpts(title='整体情况')\n",
")\n",
"table.render_notebook()"
]
},
{
"cell_type": "markdown",
"id": "young-insert",
"metadata": {},
"source": [
"#### 2.2 地区分析"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "still-cradle",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
" \n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result2 = data[data['订单付款时间'].notnull()].groupby('收货地址').agg({'订单编号':'count'})\n",
"result21 = result2.to_dict()['订单编号']\n",
"c = (\n",
" Map()\n",
" .add(\"订单量\", [*result21.items()], \"china\", is_map_symbol_show=False)\n",
" .set_series_opts(label_opts=opts.LabelOpts(is_show=True))\n",
" .set_global_opts(\n",
" title_opts=opts.TitleOpts(title='地区分布'),\n",
" visualmap_opts=opts.VisualMapOpts(max_=1000), \n",
" )\n",
")\n",
"c.render_notebook()"
]
},
{
"cell_type": "markdown",
"id": "hungry-bangkok",
"metadata": {},
"source": [
"#### 2.3 时间分析"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "behavioral-mumbai",
"metadata": {},
"outputs": [],
"source": [
"data['订单创建时间'] = pd.to_datetime(data['订单创建时间'])\n",
"data['订单付款时间'] = pd.to_datetime(data['订单付款时间'])"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "rough-torture",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
" \n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result31 = data.groupby(data['订单创建时间'].apply(lambda x: x.strftime(\"%Y-%m-%d\"))).agg({'订单编号':'count'}).to_dict()['订单编号']\n",
"c = (\n",
" Line()\n",
" .add_xaxis(list(result31.keys()))\n",
" .add_yaxis(\"订单量\", list(result31.values()))\n",
" .set_series_opts(\n",
" label_opts=opts.LabelOpts(is_show=False),\n",
" markpoint_opts=opts.MarkPointOpts(\n",
" data=[\n",
" opts.MarkPointItem(type_=\"max\", name=\"最大值\"),\n",
" ]\n",
" ),\n",
" )\n",
" .set_global_opts(title_opts=opts.TitleOpts(title=\"每日订单量走势\"))\n",
")\n",
"c.render_notebook()"
]
},
{
"cell_type": "markdown",
"id": "nasty-priority",
"metadata": {},
"source": [
"从上图来看,2月份上半月由于受新冠疫情影响,订单量比较少,随着复工开展,下半月的订单量增长明显。"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "existing-hungarian",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
" \n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result32 = data.groupby(data['订单创建时间'].apply(lambda x: x.strftime(\"%H\"))).agg({'订单编号':'count'}).to_dict()['订单编号']\n",
"x = [*result32.keys()]\n",
"y = [*result32.values()]\n",
"c = (\n",
" Bar()\n",
" .add_xaxis(x)\n",
" .add_yaxis(\"订单量\", y)\n",
" .set_global_opts(title_opts=opts.TitleOpts(title=\"每小时订单量走势\"))\n",
" .set_series_opts(\n",
" label_opts=opts.LabelOpts(is_show=False),\n",
" markpoint_opts=opts.MarkPointOpts(\n",
" data=[\n",
" opts.MarkPointItem(type_=\"max\", name=\"峰值\"),\n",
" opts.MarkPointItem(name=\"第二峰值\", coord=[x[15], y[15]], value=y[15]),\n",
" opts.MarkPointItem(name=\"第三峰值\", coord=[x[10], y[10]], value=y[10]),\n",
" ]\n",
" ),\n",
" )\n",
")\n",
"c.render_notebook()"
]
},
{
"cell_type": "markdown",
"id": "protecting-berkeley",
"metadata": {},
"source": [
"从每小时订单量走势来看,一天中有3个高峰期(10点、15点、21点),其中21点-22点之间是一天中订单量最多的时候,这个结果和之前 [1 亿条淘宝用户行为数据分析](https://github.com/TurboWay/bigdata_analyse/blob/main/UserBehaviorFromTaobao_Batch/用户行为数据分析.md) 的结果是一致的。对于卖家的指导意义就是,为了提高订单量,高峰期时应该尽量保证客服的回复速度,尤其是晚上21点-22点之间,所以很多做电商的基本都有夜班。"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "functional-yeast",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7.7399046511949745"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s = data['订单付款时间'] - data['订单创建时间']\n",
"s[s.notnull()].apply(lambda x: x.seconds / 60 ).mean() # 从下单到付款的平均耗时为 7.7 分钟"
]
},
{
"cell_type": "markdown",
"id": "running-conspiracy",
"metadata": {},
"source": [
"## part2\n",
"\n",
"双十一淘宝美妆数据.csv 这个数据集是美妆店铺的双十一销售数据,可以挖掘的纬度有日期、店铺,指标则有销售量、销售额、评论数等。\n",
"### 1、数据理解与处理"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "fresh-episode",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" update_time | \n",
" id | \n",
" title | \n",
" price | \n",
" sale_count | \n",
" comment_count | \n",
" 店名 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2016/11/14 | \n",
" A18164178225 | \n",
" CHANDO/自然堂 雪域精粹纯粹滋润霜50g 补水保湿 滋润水润面霜 | \n",
" 139.0 | \n",
" 26719.0 | \n",
" 2704.0 | \n",
" 自然堂 | \n",
"
\n",
" \n",
" 1 | \n",
" 2016/11/14 | \n",
" A18177105952 | \n",
" CHANDO/自然堂凝时鲜颜肌活乳液120ML 淡化细纹补水滋润专柜正品 | \n",
" 194.0 | \n",
" 8122.0 | \n",
" 1492.0 | \n",
" 自然堂 | \n",
"
\n",
" \n",
" 2 | \n",
" 2016/11/14 | \n",
" A18177226992 | \n",
" CHANDO/自然堂活泉保湿修护精华水(滋润型135ml 补水控油爽肤水 | \n",
" 99.0 | \n",
" 12668.0 | \n",
" 589.0 | \n",
" 自然堂 | \n",
"
\n",
" \n",
" 3 | \n",
" 2016/11/14 | \n",
" A18178033846 | \n",
" CHANDO/自然堂 男士劲爽控油洁面膏 100g 深层清洁 男士洗面奶 | \n",
" 38.0 | \n",
" 25805.0 | \n",
" 4287.0 | \n",
" 自然堂 | \n",
"
\n",
" \n",
" 4 | \n",
" 2016/11/14 | \n",
" A18178045259 | \n",
" CHANDO/自然堂雪域精粹纯粹滋润霜(清爽型)50g补水保湿滋润霜 | \n",
" 139.0 | \n",
" 5196.0 | \n",
" 618.0 | \n",
" 自然堂 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" update_time id title price \\\n",
"0 2016/11/14 A18164178225 CHANDO/自然堂 雪域精粹纯粹滋润霜50g 补水保湿 滋润水润面霜 139.0 \n",
"1 2016/11/14 A18177105952 CHANDO/自然堂凝时鲜颜肌活乳液120ML 淡化细纹补水滋润专柜正品 194.0 \n",
"2 2016/11/14 A18177226992 CHANDO/自然堂活泉保湿修护精华水(滋润型135ml 补水控油爽肤水 99.0 \n",
"3 2016/11/14 A18178033846 CHANDO/自然堂 男士劲爽控油洁面膏 100g 深层清洁 男士洗面奶 38.0 \n",
"4 2016/11/14 A18178045259 CHANDO/自然堂雪域精粹纯粹滋润霜(清爽型)50g补水保湿滋润霜 139.0 \n",
"\n",
" sale_count comment_count 店名 \n",
"0 26719.0 2704.0 自然堂 \n",
"1 8122.0 1492.0 自然堂 \n",
"2 12668.0 589.0 自然堂 \n",
"3 25805.0 4287.0 自然堂 \n",
"4 5196.0 618.0 自然堂 "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"data2 = pd.read_csv('双十一淘宝美妆数据.csv')\n",
"data2.head()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "annoying-guidance",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 27598 entries, 0 to 27597\n",
"Data columns (total 7 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 update_time 27598 non-null object \n",
" 1 id 27598 non-null object \n",
" 2 title 27598 non-null object \n",
" 3 price 27598 non-null float64\n",
" 4 sale_count 25244 non-null float64\n",
" 5 comment_count 25244 non-null float64\n",
" 6 店名 27598 non-null object \n",
"dtypes: float64(3), object(4)\n",
"memory usage: 1.5+ MB\n"
]
}
],
"source": [
"data2.info() # 数据集情况 28010 条,6个字段"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "silent-taxation",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"update_time 86\n",
"id 86\n",
"title 86\n",
"price 86\n",
"sale_count 82\n",
"comment_count 82\n",
"店名 86\n",
"dtype: int64"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2[data2.duplicated()].count() # 有86条完全重复数据 "
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "neural-toner",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"update_time 0\n",
"id 0\n",
"title 0\n",
"price 0\n",
"sale_count 2350\n",
"comment_count 2350\n",
"店名 0\n",
"dtype: int64"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2.drop_duplicates(inplace=True) # 删除重复数据\n",
"data2.reset_index(drop=True, inplace=True) # 重建索引\n",
"data2.isnull().sum() # 查看空值 ,销售数量和评论数有空值"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "binding-handle",
"metadata": {},
"outputs": [],
"source": [
"data2.fillna(0, inplace=True) # 空值填充\n",
"data2['update_time'] = pd.to_datetime(data2['update_time']).apply(lambda x: x.strftime(\"%Y-%m-%d\")) # 日期格式化,便于统计"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "radio-verse",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" update_time | \n",
" id | \n",
" title | \n",
" price | \n",
" sale_count | \n",
" comment_count | \n",
" 店名 | \n",
"
\n",
" \n",
" \n",
" \n",
" 27042 | \n",
" 2016-11-05 | \n",
" A541190557158 | \n",
" Herborist/佰草集新美肌梦幻曲面贴膜3片 保湿补水 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 佰草集 | \n",
"
\n",
" \n",
" 1494 | \n",
" 2016-11-10 | \n",
" A538981087285 | \n",
" 【双II预售】资生堂 新透白色控霜 30ml | \n",
" 390.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 资生堂 | \n",
"
\n",
" \n",
" 24148 | \n",
" 2016-11-09 | \n",
" A540190519057 | \n",
" 【娇兰盛典】腮红亲密容和肌肤 裸妆感 自然持久玫瑰闰色腮红 | \n",
" 420.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 娇兰 | \n",
"
\n",
" \n",
" 24147 | \n",
" 2016-11-09 | \n",
" A540189922026 | \n",
" 【娇兰盛典】丝柔蜜粉饼 营造细致透明妆感 柔滑细腻贴肤美颜 | \n",
" 480.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 娇兰 | \n",
"
\n",
" \n",
" 16974 | \n",
" 2016-11-05 | \n",
" A541166044768 | \n",
" L'OREAL欧莱雅卓韵霜时尚魅棕系列染发霜 富含炫闪因子蜜茶棕红棕 | \n",
" 79.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 欧莱雅 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" update_time id title price \\\n",
"27042 2016-11-05 A541190557158 Herborist/佰草集新美肌梦幻曲面贴膜3片 保湿补水 1.0 \n",
"1494 2016-11-10 A538981087285 【双II预售】资生堂 新透白色控霜 30ml 390.0 \n",
"24148 2016-11-09 A540190519057 【娇兰盛典】腮红亲密容和肌肤 裸妆感 自然持久玫瑰闰色腮红 420.0 \n",
"24147 2016-11-09 A540189922026 【娇兰盛典】丝柔蜜粉饼 营造细致透明妆感 柔滑细腻贴肤美颜 480.0 \n",
"16974 2016-11-05 A541166044768 L'OREAL欧莱雅卓韵霜时尚魅棕系列染发霜 富含炫闪因子蜜茶棕红棕 79.0 \n",
"\n",
" sale_count comment_count 店名 \n",
"27042 1.0 0.0 佰草集 \n",
"1494 1.0 0.0 资生堂 \n",
"24148 1.0 0.0 娇兰 \n",
"24147 1.0 1.0 娇兰 \n",
"16974 1.0 0.0 欧莱雅 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2[data2['sale_count']>0].sort_values(by=['sale_count']).head() # 从数据来看,sale_count 是销售量"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "exterior-headquarters",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" update_time | \n",
" id | \n",
" title | \n",
" price | \n",
" sale_count | \n",
" comment_count | \n",
" 店名 | \n",
" sale_amount | \n",
"
\n",
" \n",
" \n",
" \n",
" 27042 | \n",
" 2016-11-05 | \n",
" A541190557158 | \n",
" Herborist/佰草集新美肌梦幻曲面贴膜3片 保湿补水 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 佰草集 | \n",
" 1.0 | \n",
"
\n",
" \n",
" 1494 | \n",
" 2016-11-10 | \n",
" A538981087285 | \n",
" 【双II预售】资生堂 新透白色控霜 30ml | \n",
" 390.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 资生堂 | \n",
" 390.0 | \n",
"
\n",
" \n",
" 24148 | \n",
" 2016-11-09 | \n",
" A540190519057 | \n",
" 【娇兰盛典】腮红亲密容和肌肤 裸妆感 自然持久玫瑰闰色腮红 | \n",
" 420.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 娇兰 | \n",
" 420.0 | \n",
"
\n",
" \n",
" 24147 | \n",
" 2016-11-09 | \n",
" A540189922026 | \n",
" 【娇兰盛典】丝柔蜜粉饼 营造细致透明妆感 柔滑细腻贴肤美颜 | \n",
" 480.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 娇兰 | \n",
" 480.0 | \n",
"
\n",
" \n",
" 16974 | \n",
" 2016-11-05 | \n",
" A541166044768 | \n",
" L'OREAL欧莱雅卓韵霜时尚魅棕系列染发霜 富含炫闪因子蜜茶棕红棕 | \n",
" 79.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 欧莱雅 | \n",
" 79.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 17470 | \n",
" 2016-11-10 | \n",
" A24304992630 | \n",
" 德国妮维雅男士洗面奶控油祛痘印保湿去黑头去油清洁面乳液护肤品 | \n",
" 42.0 | \n",
" 1827562.0 | \n",
" 200154.0 | \n",
" 妮维雅 | \n",
" 76757604.0 | \n",
"
\n",
" \n",
" 17339 | \n",
" 2016-11-11 | \n",
" A24304992630 | \n",
" 2瓶更划算*妮维雅男士洗面奶控油祛痘印保湿去黑头去油洁面乳护肤 | \n",
" 35.0 | \n",
" 1886100.0 | \n",
" 199532.0 | \n",
" 妮维雅 | \n",
" 66013500.0 | \n",
"
\n",
" \n",
" 17228 | \n",
" 2016-11-12 | \n",
" A24304992630 | \n",
" 德国妮维雅男士洗面奶控油祛痘印保湿去黑头去油清洁面乳液护肤品 | \n",
" 37.9 | \n",
" 1920083.0 | \n",
" 199062.0 | \n",
" 妮维雅 | \n",
" 72771145.7 | \n",
"
\n",
" \n",
" 17126 | \n",
" 2016-11-13 | \n",
" A24304992630 | \n",
" 德国妮维雅男士洗面奶控油祛痘印保湿去黑头去油清洁面乳液护肤品 | \n",
" 37.9 | \n",
" 1921582.0 | \n",
" 198774.0 | \n",
" 妮维雅 | \n",
" 72827957.8 | \n",
"
\n",
" \n",
" 17026 | \n",
" 2016-11-14 | \n",
" A24304992630 | \n",
" 德国妮维雅男士洗面奶控油祛痘印保湿去黑头去油清洁面乳液护肤品 | \n",
" 42.0 | \n",
" 1923160.0 | \n",
" 197949.0 | \n",
" 妮维雅 | \n",
" 80772720.0 | \n",
"
\n",
" \n",
"
\n",
"
24398 rows × 8 columns
\n",
"
"
],
"text/plain": [
" update_time id title price \\\n",
"27042 2016-11-05 A541190557158 Herborist/佰草集新美肌梦幻曲面贴膜3片 保湿补水 1.0 \n",
"1494 2016-11-10 A538981087285 【双II预售】资生堂 新透白色控霜 30ml 390.0 \n",
"24148 2016-11-09 A540190519057 【娇兰盛典】腮红亲密容和肌肤 裸妆感 自然持久玫瑰闰色腮红 420.0 \n",
"24147 2016-11-09 A540189922026 【娇兰盛典】丝柔蜜粉饼 营造细致透明妆感 柔滑细腻贴肤美颜 480.0 \n",
"16974 2016-11-05 A541166044768 L'OREAL欧莱雅卓韵霜时尚魅棕系列染发霜 富含炫闪因子蜜茶棕红棕 79.0 \n",
"... ... ... ... ... \n",
"17470 2016-11-10 A24304992630 德国妮维雅男士洗面奶控油祛痘印保湿去黑头去油清洁面乳液护肤品 42.0 \n",
"17339 2016-11-11 A24304992630 2瓶更划算*妮维雅男士洗面奶控油祛痘印保湿去黑头去油洁面乳护肤 35.0 \n",
"17228 2016-11-12 A24304992630 德国妮维雅男士洗面奶控油祛痘印保湿去黑头去油清洁面乳液护肤品 37.9 \n",
"17126 2016-11-13 A24304992630 德国妮维雅男士洗面奶控油祛痘印保湿去黑头去油清洁面乳液护肤品 37.9 \n",
"17026 2016-11-14 A24304992630 德国妮维雅男士洗面奶控油祛痘印保湿去黑头去油清洁面乳液护肤品 42.0 \n",
"\n",
" sale_count comment_count 店名 sale_amount \n",
"27042 1.0 0.0 佰草集 1.0 \n",
"1494 1.0 0.0 资生堂 390.0 \n",
"24148 1.0 0.0 娇兰 420.0 \n",
"24147 1.0 1.0 娇兰 480.0 \n",
"16974 1.0 0.0 欧莱雅 79.0 \n",
"... ... ... ... ... \n",
"17470 1827562.0 200154.0 妮维雅 76757604.0 \n",
"17339 1886100.0 199532.0 妮维雅 66013500.0 \n",
"17228 1920083.0 199062.0 妮维雅 72771145.7 \n",
"17126 1921582.0 198774.0 妮维雅 72827957.8 \n",
"17026 1923160.0 197949.0 妮维雅 80772720.0 \n",
"\n",
"[24398 rows x 8 columns]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2['sale_amount'] = data2['price'] * data2['sale_count'] # 增加一列销售额\n",
"data2[data2['sale_count']>0].sort_values(by=['sale_count'])"
]
},
{
"cell_type": "markdown",
"id": "supreme-performer",
"metadata": {},
"source": [
"### 2、数据分析与可视化"
]
},
{
"cell_type": "markdown",
"id": "traditional-third",
"metadata": {},
"source": [
"#### 2.1 每日整体销售量走势"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "unnecessary-ownership",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
" \n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result = data2.groupby('update_time').agg({'sale_count':'sum'}).to_dict()['sale_count']\n",
"c = (\n",
" Line()\n",
" .add_xaxis(list(result.keys()))\n",
" .add_yaxis(\"销售量\", list(result.values()))\n",
" .set_series_opts(\n",
" areastyle_opts=opts.AreaStyleOpts(opacity=0.5),\n",
" label_opts=opts.LabelOpts(is_show=False),\n",
" markpoint_opts=opts.MarkPointOpts(\n",
" data=[\n",
" opts.MarkPointItem(type_=\"max\", name=\"最大值\"),\n",
" opts.MarkPointItem(type_=\"min\", name=\"最小值\"),\n",
" opts.MarkPointItem(type_=\"average\", name=\"平均值\"),\n",
" ]\n",
" ),\n",
" )\n",
" .set_global_opts(title_opts=opts.TitleOpts(title=\"每日整体销售量走势\"))\n",
")\n",
"c.render_notebook()"
]
},
{
"cell_type": "markdown",
"id": "technical-vitamin",
"metadata": {},
"source": [
"#### 2.2 谁家的化妆品卖的最好"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "blessed-wagon",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['2016-11-05',\n",
" '2016-11-06',\n",
" '2016-11-07',\n",
" '2016-11-08',\n",
" '2016-11-09',\n",
" '2016-11-10',\n",
" '2016-11-11',\n",
" '2016-11-12',\n",
" '2016-11-13',\n",
" '2016-11-14']"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dts = list(data2['update_time'].unique())\n",
"dts.reverse()\n",
"dts"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "olive-accident",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
" \n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pyecharts import options as opts\n",
"from pyecharts.charts import Map, Timeline, Bar, Line, Pie\n",
"from pyecharts.components import Table\n",
"from pyecharts.options import ComponentTitleOpts\n",
"\n",
"tl = Timeline()\n",
"tl.add_schema(\n",
"# is_auto_play=True,\n",
" is_loop_play=False,\n",
" play_interval=500,\n",
" )\n",
"for dt in dts:\n",
" item = data2[data2['update_time'] <= dt].groupby('店名').agg({'sale_count': 'sum', 'sale_amount': 'sum'}).sort_values(by='sale_count', ascending=False)[:10].sort_values(by='sale_count').to_dict()\n",
" bar = (\n",
" Bar()\n",
" .add_xaxis([*item['sale_count'].keys()])\n",
" .add_yaxis(\"销售量\", [round(val/10000,2) for val in item['sale_count'].values()], label_opts=opts.LabelOpts(position=\"right\", formatter='{@[1]/} 万'))\n",
" .add_yaxis(\"销售额\", [round(val/10000/10000,2) for val in item['sale_amount'].values()], label_opts=opts.LabelOpts(position=\"right\", formatter='{@[1]/} 亿元'))\n",
" .reversal_axis()\n",
" .set_global_opts(\n",
" title_opts=opts.TitleOpts(\"累计销售量排行 TOP10\")\n",
" )\n",
" )\n",
" tl.add(bar, dt)\n",
"tl.render_notebook()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "positive-basement",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
" \n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"item = data2.groupby('店名').agg({'sale_count': 'sum'}).sort_values(by='sale_count', ascending=False)[:10].to_dict()['sale_count']\n",
"item = {k: round(v/10000, 2) for k, v in item.items()}\n",
"c = (\n",
" Pie()\n",
" .add(\"销量\", [*item.items()])\n",
" .set_series_opts(label_opts=opts.LabelOpts(formatter=\"{b}: {c} 万({d}%)\"))\n",
")\n",
"c.render_notebook()"
]
},
{
"cell_type": "markdown",
"id": "confident-mattress",
"metadata": {},
"source": [
"#### 2.4 谁家的化妆品最贵"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "acute-wesley",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
" \n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"item = data2.groupby('店名').agg({'price': 'mean'}).sort_values(by='price', ascending=False)[:20].sort_values(by='price').to_dict()\n",
"c = (\n",
" Bar()\n",
" .add_xaxis([*item['price'].keys()])\n",
" .add_yaxis(\"销售量\", [round(v, 2) for v in item['price'].values()], label_opts=opts.LabelOpts(position=\"right\"))\n",
" .reversal_axis()\n",
" .set_global_opts(\n",
" title_opts=opts.TitleOpts(\"平均价格排行 TOP20\")\n",
" )\n",
")\n",
"c.render_notebook()"
]
},
{
"cell_type": "markdown",
"id": "virgin-purple",
"metadata": {},
"source": [
"## part3\n",
"\n",
"日化.xlsx 这个数据集是美妆类商品的订单数据,从数量来看,应该是批发类的订单。包含两个 sheet 页(订单表和商品表),可以挖掘的纬度有日期、地区、商品,指标则有销售量、销售额、增长率等。\n",
"### 1、数据理解与处理"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "otherwise-drama",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd \n",
"fact_order = pd.read_excel('日化.xlsx', sheet_name='销售订单表')\n",
"dim_product = pd.read_excel('日化.xlsx', sheet_name='商品信息表')"
]
},
{
"cell_type": "markdown",
"id": "underlying-registrar",
"metadata": {},
"source": [
"#### 1.1 商品表数据清洗"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "sexual-austin",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 商品编号 | \n",
" 商品名称 | \n",
" 商品小类 | \n",
" 商品大类 | \n",
" 销售单价 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" X001 | \n",
" 商品1 | \n",
" 面膜 | \n",
" 护肤品 | \n",
" 121 | \n",
"
\n",
" \n",
" 1 | \n",
" X002 | \n",
" 商品2 | \n",
" 面膜 | \n",
" 护肤品 | \n",
" 141 | \n",
"
\n",
" \n",
" 2 | \n",
" X003 | \n",
" 商品3 | \n",
" 面膜 | \n",
" 护肤品 | \n",
" 168 | \n",
"
\n",
" \n",
" 3 | \n",
" X004 | \n",
" 商品4 | \n",
" 面膜 | \n",
" 护肤品 | \n",
" 211 | \n",
"
\n",
" \n",
" 4 | \n",
" X005 | \n",
" 商品5 | \n",
" 面膜 | \n",
" 护肤品 | \n",
" 185 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 商品编号 商品名称 商品小类 商品大类 销售单价\n",
"0 X001 商品1 面膜 护肤品 121\n",
"1 X002 商品2 面膜 护肤品 141\n",
"2 X003 商品3 面膜 护肤品 168\n",
"3 X004 商品4 面膜 护肤品 211\n",
"4 X005 商品5 面膜 护肤品 185"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dim_product.head()"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "assumed-ideal",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 销售单价 | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 122.000000 | \n",
"
\n",
" \n",
" mean | \n",
" 156.155738 | \n",
"
\n",
" \n",
" std | \n",
" 58.454619 | \n",
"
\n",
" \n",
" min | \n",
" 56.000000 | \n",
"
\n",
" \n",
" 25% | \n",
" 102.250000 | \n",
"
\n",
" \n",
" 50% | \n",
" 158.000000 | \n",
"
\n",
" \n",
" 75% | \n",
" 210.750000 | \n",
"
\n",
" \n",
" max | \n",
" 253.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 销售单价\n",
"count 122.000000\n",
"mean 156.155738\n",
"std 58.454619\n",
"min 56.000000\n",
"25% 102.250000\n",
"50% 158.000000\n",
"75% 210.750000\n",
"max 253.000000"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dim_product.describe()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "organizational-tract",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"商品编号 0\n",
"商品名称 0\n",
"商品小类 0\n",
"商品大类 0\n",
"销售单价 0\n",
"dtype: int64"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dim_product[dim_product.duplicated()].count() # 没有完全重复的数据"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "diagnostic-western",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"商品编号 0\n",
"商品名称 0\n",
"商品小类 0\n",
"商品大类 0\n",
"销售单价 0\n",
"dtype: int64"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dim_product[dim_product['商品编号'].duplicated()].count() # ID 唯一没有重复"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "atomic-challenge",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"商品编号 0\n",
"商品名称 0\n",
"商品小类 0\n",
"商品大类 0\n",
"销售单价 0\n",
"dtype: int64"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dim_product.isnull().sum() # 没有空值 "
]
},
{
"cell_type": "markdown",
"id": "olive-massage",
"metadata": {},
"source": [
"#### 1.2 订单表数据清洗"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "indie-profit",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 订单编码 | \n",
" 订单日期 | \n",
" 客户编码 | \n",
" 所在区域 | \n",
" 所在省份 | \n",
" 所在地市 | \n",
" 商品编号 | \n",
" 订购数量 | \n",
" 订购单价 | \n",
" 金额 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" D31313 | \n",
" 2019-05-16 00:00:00 | \n",
" S22796 | \n",
" 东区 | \n",
" 浙江省 | \n",
" 台州市 | \n",
" X091 | \n",
" 892 | \n",
" 214 | \n",
" 190888.0 | \n",
"
\n",
" \n",
" 1 | \n",
" D21329 | \n",
" 2019-05-14 00:00:00 | \n",
" S11460 | \n",
" 东区 | \n",
" 安徽省 | \n",
" 宿州市 | \n",
" X005 | \n",
" 276 | \n",
" 185 | \n",
" 51060.0 | \n",
"
\n",
" \n",
" 2 | \n",
" D22372 | \n",
" 2019-08-26 00:00:00 | \n",
" S11101 | \n",
" 北区 | \n",
" 山西省 | \n",
" 忻州市 | \n",
" X078 | \n",
" 1450 | \n",
" 116 | \n",
" 168200.0 | \n",
"
\n",
" \n",
" 3 | \n",
" D31078 | \n",
" 2019-04-08 00:00:00 | \n",
" S10902 | \n",
" 北区 | \n",
" 吉林省 | \n",
" 延边朝鲜族自治州 | \n",
" X025 | \n",
" 1834 | \n",
" 102 | \n",
" 187068.0 | \n",
"
\n",
" \n",
" 4 | \n",
" D32470 | \n",
" 2019-04-11 00:00:00 | \n",
" S18696 | \n",
" 北区 | \n",
" 北京市 | \n",
" 北京市 | \n",
" X010 | \n",
" 887 | \n",
" 58 | \n",
" 51446.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 订单编码 订单日期 客户编码 所在区域 所在省份 所在地市 商品编号 订购数量 订购单价 \\\n",
"0 D31313 2019-05-16 00:00:00 S22796 东区 浙江省 台州市 X091 892 214 \n",
"1 D21329 2019-05-14 00:00:00 S11460 东区 安徽省 宿州市 X005 276 185 \n",
"2 D22372 2019-08-26 00:00:00 S11101 北区 山西省 忻州市 X078 1450 116 \n",
"3 D31078 2019-04-08 00:00:00 S10902 北区 吉林省 延边朝鲜族自治州 X025 1834 102 \n",
"4 D32470 2019-04-11 00:00:00 S18696 北区 北京市 北京市 X010 887 58 \n",
"\n",
" 金额 \n",
"0 190888.0 \n",
"1 51060.0 \n",
"2 168200.0 \n",
"3 187068.0 \n",
"4 51446.0 "
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fact_order.head()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "bibliographic-island",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 31452 entries, 0 to 31451\n",
"Data columns (total 10 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 订单编码 31452 non-null object \n",
" 1 订单日期 31452 non-null object \n",
" 2 客户编码 31452 non-null object \n",
" 3 所在区域 31450 non-null object \n",
" 4 所在省份 31450 non-null object \n",
" 5 所在地市 31452 non-null object \n",
" 6 商品编号 31451 non-null object \n",
" 7 订购数量 31450 non-null object \n",
" 8 订购单价 31448 non-null object \n",
" 9 金额 31448 non-null float64\n",
"dtypes: float64(1), object(9)\n",
"memory usage: 2.4+ MB\n"
]
}
],
"source": [
"fact_order.info()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "intensive-houston",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"订单编码 6\n",
"订单日期 6\n",
"客户编码 6\n",
"所在区域 6\n",
"所在省份 6\n",
"所在地市 6\n",
"商品编号 6\n",
"订购数量 6\n",
"订购单价 6\n",
"金额 6\n",
"dtype: int64"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fact_order[fact_order.duplicated()].count() # 没有完全重复的数据"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "amber-wedding",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"订单编码 0\n",
"订单日期 0\n",
"客户编码 0\n",
"所在区域 2\n",
"所在省份 2\n",
"所在地市 0\n",
"商品编号 1\n",
"订购数量 2\n",
"订购单价 4\n",
"金额 4\n",
"dtype: int64"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fact_order.drop_duplicates(inplace=True) # 删除重复数据\n",
"fact_order.reset_index(drop=True, inplace=True) # 重建索引\n",
"fact_order.isnull().sum() # 查看空值,有几条数据缺失"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "suspected-rhythm",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"订单编码 0\n",
"订单日期 0\n",
"客户编码 0\n",
"所在区域 0\n",
"所在省份 0\n",
"所在地市 0\n",
"商品编号 0\n",
"订购数量 0\n",
"订购单价 0\n",
"金额 0\n",
"dtype: int64"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fact_order.fillna(method='bfill', inplace=True) # 空值填充\n",
"fact_order.fillna(method='ffill', inplace=True) # 空值填充\n",
"fact_order.isnull().sum() # 查看空值,有几条数据缺失"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "golden-photograph",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 订单编码 | \n",
" 订单日期 | \n",
" 客户编码 | \n",
" 所在区域 | \n",
" 所在省份 | \n",
" 所在地市 | \n",
" 商品编号 | \n",
" 订购数量 | \n",
" 订购单价 | \n",
" 金额 | \n",
"
\n",
" \n",
" \n",
" \n",
" 20797 | \n",
" D26533 | \n",
" 2050-06-09 | \n",
" S21396 | \n",
" 北区 | \n",
" 河北省 | \n",
" 石家庄市 | \n",
" X022 | \n",
" 759 | \n",
" 158 | \n",
" 119922.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 订单编码 订单日期 客户编码 所在区域 所在省份 所在地市 商品编号 订购数量 订购单价 金额\n",
"20797 D26533 2050-06-09 S21396 北区 河北省 石家庄市 X022 759 158 119922.0"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fact_order['订单日期'] = fact_order['订单日期'].apply(lambda x: pd.to_datetime(x, format='%Y#%m#%d') if isinstance(x, str) else x)\n",
"fact_order[fact_order['订单日期'] > '2021-01-01'] # 有一条脏数据"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "lined-throw",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Timestamp('2019-09-30 00:00:00'), Timestamp('2019-01-01 00:00:00'))"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fact_order = fact_order[fact_order['订单日期'] < '2021-01-01'] # 过滤掉脏数据\n",
"fact_order['订单日期'].max(), fact_order['订单日期'].min() # 数据区间在 2019-01-01 到 2019-09-30 之间"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "coated-boston",
"metadata": {},
"outputs": [],
"source": [
"fact_order['订购数量'] = fact_order['订购数量'].apply(lambda x: x.strip('个') if isinstance(x, str) else x).astype('int')\n",
"fact_order['订购单价'] = fact_order['订购单价'].apply(lambda x: x.strip('元') if isinstance(x, str) else x).astype('float')\n",
"fact_order['金额'] = fact_order['金额'].astype('float')"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "regular-colorado",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Int64Index: 31445 entries, 0 to 31445\n",
"Data columns (total 10 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 订单编码 31445 non-null object \n",
" 1 订单日期 31445 non-null datetime64[ns]\n",
" 2 客户编码 31445 non-null object \n",
" 3 所在区域 31445 non-null object \n",
" 4 所在省份 31445 non-null object \n",
" 5 所在地市 31445 non-null object \n",
" 6 商品编号 31445 non-null object \n",
" 7 订购数量 31445 non-null int32 \n",
" 8 订购单价 31445 non-null float64 \n",
" 9 金额 31445 non-null float64 \n",
"dtypes: datetime64[ns](1), float64(2), int32(1), object(6)\n",
"memory usage: 2.5+ MB\n"
]
}
],
"source": [
"fact_order.info()"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "potential-recording",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['浙江', '安徽', '山西', '吉林', '北京', '云南', '广东', '广西', '内蒙古', '新疆', '湖北',\n",
" '江苏', '甘肃', '四川', '河南', '福建', '陕西', '辽宁', '山东', '江西', '重庆', '河北',\n",
" '湖南', '上海', '贵州', '天津', '海南', '宁夏', '黑龙江'], dtype=object)"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fact_order['所在省份'] = fact_order['所在省份'].str.replace('自治区|维吾尔|回族|壮族|省|市', '') # 对省份做个清洗,便于可视化\n",
"fact_order['所在省份'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "usual-suspect",
"metadata": {},
"outputs": [],
"source": [
"fact_order['客户编码'] = fact_order['客户编码'].str.replace('编号', '')"
]
},
{
"cell_type": "markdown",
"id": "speaking-angle",
"metadata": {},
"source": [
"### 2、数据分析与可视化\n",
"#### 2.1 每月订购情况"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "french-entry",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
" \n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pyecharts import options as opts\n",
"from pyecharts.charts import Map, Bar, Line\n",
"from pyecharts.components import Table\n",
"from pyecharts.options import ComponentTitleOpts\n",
"from pyecharts.faker import Faker\n",
"\n",
"fact_order['订单月份'] = fact_order['订单日期'].apply(lambda x: x.month) \n",
"item = fact_order.groupby('订单月份').agg({'订购数量': 'sum', '金额': 'sum'}).to_dict()\n",
"x = [f'{key} 月' for key in item['订购数量'].keys()]\n",
"y1 = [round(val/10000, 2) for val in item['订购数量'].values()]\n",
"y2 = [round(val/10000/10000, 2) for val in item['金额'].values()]\n",
"c = (\n",
" Bar()\n",
" .add_xaxis(x)\n",
" .add_yaxis(\"订购数量(万件)\", y1, is_selected=False)\n",
" .add_yaxis(\"金额(亿元)\", y2)\n",
" .set_global_opts(title_opts=opts.TitleOpts(title=\"每月订购情况\"))\n",
" .set_series_opts(\n",
" label_opts=opts.LabelOpts(is_show=True),\n",
" )\n",
")\n",
"c.render_notebook()"
]
},
{
"cell_type": "markdown",
"id": "sitting-yacht",
"metadata": {},
"source": [
"#### 2.2 哪里的人最爱美"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "incomplete-maria",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
" \n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"item = fact_order.groupby('所在地市').agg({'订购数量': 'sum'}).sort_values(by='订购数量', ascending=False)[:20].sort_values(by='订购数量').to_dict()['订购数量']\n",
"\n",
"c = (\n",
" Bar()\n",
" .add_xaxis([*item.keys()])\n",
" .add_yaxis(\"订购量\", [round(v/10000, 2) for v in item.values()], label_opts=opts.LabelOpts(position=\"right\", formatter='{@[1]/} 万'))\n",
" .reversal_axis()\n",
" .set_global_opts(\n",
" title_opts=opts.TitleOpts(\"订购数量排行 TOP20\")\n",
" )\n",
")\n",
"c.render_notebook()"
]
},
{
"cell_type": "markdown",
"id": "broken-estimate",
"metadata": {},
"source": [
"#### 2.3 什么类型的美妆需求量最大"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "seasonal-particle",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 订单编码 | \n",
" 订单日期 | \n",
" 客户编码 | \n",
" 所在区域 | \n",
" 所在省份 | \n",
" 所在地市 | \n",
" 商品编号 | \n",
" 订购数量 | \n",
" 订购单价 | \n",
" 金额 | \n",
" 订单月份 | \n",
" 商品名称 | \n",
" 商品小类 | \n",
" 商品大类 | \n",
" 销售单价 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" D31313 | \n",
" 2019-05-16 | \n",
" S22796 | \n",
" 东区 | \n",
" 浙江 | \n",
" 台州市 | \n",
" X091 | \n",
" 892 | \n",
" 214.0 | \n",
" 190888.0 | \n",
" 5 | \n",
" 商品91 | \n",
" 粉底 | \n",
" 彩妆 | \n",
" 214 | \n",
"
\n",
" \n",
" 1 | \n",
" D26674 | \n",
" 2019-05-01 | \n",
" S15128 | \n",
" 东区 | \n",
" 江苏 | \n",
" 南通市 | \n",
" X091 | \n",
" 1133 | \n",
" 214.0 | \n",
" 242462.0 | \n",
" 5 | \n",
" 商品91 | \n",
" 粉底 | \n",
" 彩妆 | \n",
" 214 | \n",
"
\n",
" \n",
" 2 | \n",
" D23381 | \n",
" 2019-09-22 | \n",
" S17133 | \n",
" 东区 | \n",
" 江苏 | \n",
" 宿迁市 | \n",
" X091 | \n",
" 1136 | \n",
" 214.0 | \n",
" 243104.0 | \n",
" 9 | \n",
" 商品91 | \n",
" 粉底 | \n",
" 彩妆 | \n",
" 214 | \n",
"
\n",
" \n",
" 3 | \n",
" D29060 | \n",
" 2019-09-10 | \n",
" S14106 | \n",
" 东区 | \n",
" 江苏 | \n",
" 常州市 | \n",
" X091 | \n",
" 544 | \n",
" 214.0 | \n",
" 116416.0 | \n",
" 9 | \n",
" 商品91 | \n",
" 粉底 | \n",
" 彩妆 | \n",
" 214 | \n",
"
\n",
" \n",
" 4 | \n",
" D21234 | \n",
" 2019-07-03 | \n",
" S17197 | \n",
" 东区 | \n",
" 湖北 | \n",
" 十堰市 | \n",
" X091 | \n",
" 342 | \n",
" 214.0 | \n",
" 73188.0 | \n",
" 7 | \n",
" 商品91 | \n",
" 粉底 | \n",
" 彩妆 | \n",
" 214 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 31439 | \n",
" D30482 | \n",
" 2019-06-05 | \n",
" S11033 | \n",
" 东区 | \n",
" 浙江 | \n",
" 金华市 | \n",
" X118 | \n",
" 551 | \n",
" 238.0 | \n",
" 131138.0 | \n",
" 6 | \n",
" 商品118 | \n",
" 蜜粉 | \n",
" 彩妆 | \n",
" 238 | \n",
"
\n",
" \n",
" 31440 | \n",
" D29542 | \n",
" 2019-05-01 | \n",
" S12446 | \n",
" 东区 | \n",
" 江苏 | \n",
" 南通市 | \n",
" X118 | \n",
" 165 | \n",
" 238.0 | \n",
" 39270.0 | \n",
" 5 | \n",
" 商品118 | \n",
" 蜜粉 | \n",
" 彩妆 | \n",
" 238 | \n",
"
\n",
" \n",
" 31441 | \n",
" D24798 | \n",
" 2019-06-26 | \n",
" S16170 | \n",
" 南区 | \n",
" 福建 | \n",
" 泉州市 | \n",
" X118 | \n",
" 62 | \n",
" 238.0 | \n",
" 14756.0 | \n",
" 6 | \n",
" 商品118 | \n",
" 蜜粉 | \n",
" 彩妆 | \n",
" 238 | \n",
"
\n",
" \n",
" 31442 | \n",
" D31831 | \n",
" 2019-08-13 | \n",
" S22214 | \n",
" 北区 | \n",
" 黑龙江 | \n",
" 佳木斯市 | \n",
" X118 | \n",
" 795 | \n",
" 238.0 | \n",
" 189210.0 | \n",
" 8 | \n",
" 商品118 | \n",
" 蜜粉 | \n",
" 彩妆 | \n",
" 238 | \n",
"
\n",
" \n",
" 31443 | \n",
" D28890 | \n",
" 2019-07-21 | \n",
" S15066 | \n",
" 西区 | \n",
" 四川 | \n",
" 南充市 | \n",
" X118 | \n",
" 148 | \n",
" 238.0 | \n",
" 35224.0 | \n",
" 7 | \n",
" 商品118 | \n",
" 蜜粉 | \n",
" 彩妆 | \n",
" 238 | \n",
"
\n",
" \n",
"
\n",
"
31444 rows × 15 columns
\n",
"
"
],
"text/plain": [
" 订单编码 订单日期 客户编码 所在区域 所在省份 所在地市 商品编号 订购数量 订购单价 金额 \\\n",
"0 D31313 2019-05-16 S22796 东区 浙江 台州市 X091 892 214.0 190888.0 \n",
"1 D26674 2019-05-01 S15128 东区 江苏 南通市 X091 1133 214.0 242462.0 \n",
"2 D23381 2019-09-22 S17133 东区 江苏 宿迁市 X091 1136 214.0 243104.0 \n",
"3 D29060 2019-09-10 S14106 东区 江苏 常州市 X091 544 214.0 116416.0 \n",
"4 D21234 2019-07-03 S17197 东区 湖北 十堰市 X091 342 214.0 73188.0 \n",
"... ... ... ... ... ... ... ... ... ... ... \n",
"31439 D30482 2019-06-05 S11033 东区 浙江 金华市 X118 551 238.0 131138.0 \n",
"31440 D29542 2019-05-01 S12446 东区 江苏 南通市 X118 165 238.0 39270.0 \n",
"31441 D24798 2019-06-26 S16170 南区 福建 泉州市 X118 62 238.0 14756.0 \n",
"31442 D31831 2019-08-13 S22214 北区 黑龙江 佳木斯市 X118 795 238.0 189210.0 \n",
"31443 D28890 2019-07-21 S15066 西区 四川 南充市 X118 148 238.0 35224.0 \n",
"\n",
" 订单月份 商品名称 商品小类 商品大类 销售单价 \n",
"0 5 商品91 粉底 彩妆 214 \n",
"1 5 商品91 粉底 彩妆 214 \n",
"2 9 商品91 粉底 彩妆 214 \n",
"3 9 商品91 粉底 彩妆 214 \n",
"4 7 商品91 粉底 彩妆 214 \n",
"... ... ... ... ... ... \n",
"31439 6 商品118 蜜粉 彩妆 238 \n",
"31440 5 商品118 蜜粉 彩妆 238 \n",
"31441 6 商品118 蜜粉 彩妆 238 \n",
"31442 8 商品118 蜜粉 彩妆 238 \n",
"31443 7 商品118 蜜粉 彩妆 238 \n",
"\n",
"[31444 rows x 15 columns]"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"order = pd.merge(fact_order, dim_product, on='商品编号',how='inner') # 表关联\n",
"order"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "amended-intensity",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" 订购数量 | \n",
"
\n",
" \n",
" 商品大类 | \n",
" 商品小类 | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 彩妆 | \n",
" 口红 | \n",
" 2013024 | \n",
"
\n",
" \n",
" 粉底 | \n",
" 1188621 | \n",
"
\n",
" \n",
" 睫毛膏 | \n",
" 587399 | \n",
"
\n",
" \n",
" 眼影 | \n",
" 296599 | \n",
"
\n",
" \n",
" 蜜粉 | \n",
" 45534 | \n",
"
\n",
" \n",
" 护肤品 | \n",
" 面膜 | \n",
" 5451914 | \n",
"
\n",
" \n",
" 面霜 | \n",
" 4566905 | \n",
"
\n",
" \n",
" 爽肤水 | \n",
" 3523687 | \n",
"
\n",
" \n",
" 眼霜 | \n",
" 3350743 | \n",
"
\n",
" \n",
" 隔离霜 | \n",
" 2488124 | \n",
"
\n",
" \n",
" 防晒霜 | \n",
" 2388610 | \n",
"
\n",
" \n",
" 洁面乳 | \n",
" 1928020 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 订购数量\n",
"商品大类 商品小类 \n",
"彩妆 口红 2013024\n",
" 粉底 1188621\n",
" 睫毛膏 587399\n",
" 眼影 296599\n",
" 蜜粉 45534\n",
"护肤品 面膜 5451914\n",
" 面霜 4566905\n",
" 爽肤水 3523687\n",
" 眼霜 3350743\n",
" 隔离霜 2488124\n",
" 防晒霜 2388610\n",
" 洁面乳 1928020"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"order.groupby(['商品大类','商品小类']).agg({'订购数量': 'sum'}).sort_values(by=['商品大类', '订购数量'], ascending=[True, False])"
]
},
{
"cell_type": "markdown",
"id": "changed-sentence",
"metadata": {},
"source": [
"#### 2.4 哪些省份的美妆需求量最大"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "opponent-buying",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
" \n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"item = fact_order.groupby('所在省份').agg({'订购数量': 'sum'}).to_dict()['订购数量']\n",
"c = (\n",
" Map()\n",
" .add(\"订购数量\", [*item.items()], \"china\", is_map_symbol_show=False)\n",
" .set_series_opts(label_opts=opts.LabelOpts(is_show=True))\n",
" .set_global_opts(\n",
" title_opts=opts.TitleOpts(title='省份分布'),\n",
" visualmap_opts=opts.VisualMapOpts(max_=1000000), \n",
" )\n",
")\n",
"c.render_notebook()"
]
},
{
"cell_type": "markdown",
"id": "distinguished-mercy",
"metadata": {},
"source": [
"#### 2.5 通过 RFM 模型挖掘客户价值\n",
"\n",
"RFM 模型是衡量客户价值和客户创利能力的重要工具和手段,其中由3个要素构成了数据分析最好的指标,分别是:\n",
"* R-Recency(最近一次购买时间)\n",
"* F-Frequency(消费频率)\n",
"* M-Money(消费金额)\n",
"\n",
"设定一个计算权重,比如 R-Recency 20% F-Frequency 30% M-Money 50% ,最后通过这个权重进行打分,量化客户价值,后续还可以基于分数进一步打标签,用来指导二次营销的策略。"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "alpine-warning",
"metadata": {},
"outputs": [],
"source": [
"data_rfm = fact_order.groupby('客户编码').agg({'订单日期': 'max', '订单编码': 'count', '金额': 'sum'})\n",
"data_rfm.columns = ['最近一次购买时间', '消费频率', '消费金额']"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "absent-automation",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 最近一次购买时间 | \n",
" 消费频率 | \n",
" 消费金额 | \n",
" R | \n",
" F | \n",
" M | \n",
"
\n",
" \n",
" 客户编码 | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" S11609 | \n",
" 2019-09-30 | \n",
" 42 | \n",
" 7326027.0 | \n",
" 0.980148 | \n",
" 0.796399 | \n",
" 0.903970 | \n",
"
\n",
" \n",
" S19828 | \n",
" 2019-09-30 | \n",
" 21 | \n",
" 2642275.0 | \n",
" 0.980148 | \n",
" 0.356879 | \n",
" 0.306556 | \n",
"
\n",
" \n",
" S17166 | \n",
" 2019-09-30 | \n",
" 17 | \n",
" 3627037.0 | \n",
" 0.980148 | \n",
" 0.261311 | \n",
" 0.478301 | \n",
"
\n",
" \n",
" S22925 | \n",
" 2019-09-30 | \n",
" 31 | \n",
" 3449117.0 | \n",
" 0.980148 | \n",
" 0.591413 | \n",
" 0.457987 | \n",
"
\n",
" \n",
" S10469 | \n",
" 2019-09-30 | \n",
" 30 | \n",
" 4198071.0 | \n",
" 0.980148 | \n",
" 0.570175 | \n",
" 0.564174 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" S16503 | \n",
" 2019-04-07 | \n",
" 14 | \n",
" 1682893.0 | \n",
" 0.004617 | \n",
" 0.198061 | \n",
" 0.146814 | \n",
"
\n",
" \n",
" S17547 | \n",
" 2019-03-14 | \n",
" 10 | \n",
" 1784531.0 | \n",
" 0.003232 | \n",
" 0.087258 | \n",
" 0.163435 | \n",
"
\n",
" \n",
" S20864 | \n",
" 2019-03-14 | \n",
" 8 | \n",
" 1118752.0 | \n",
" 0.003232 | \n",
" 0.039243 | \n",
" 0.047091 | \n",
"
\n",
" \n",
" S11908 | \n",
" 2019-03-09 | \n",
" 9 | \n",
" 1552311.0 | \n",
" 0.001847 | \n",
" 0.060942 | \n",
" 0.125577 | \n",
"
\n",
" \n",
" S11611 | \n",
" 2019-03-03 | \n",
" 10 | \n",
" 1487966.0 | \n",
" 0.000923 | \n",
" 0.087258 | \n",
" 0.113573 | \n",
"
\n",
" \n",
"
\n",
"
1083 rows × 6 columns
\n",
"
"
],
"text/plain": [
" 最近一次购买时间 消费频率 消费金额 R F M\n",
"客户编码 \n",
"S11609 2019-09-30 42 7326027.0 0.980148 0.796399 0.903970\n",
"S19828 2019-09-30 21 2642275.0 0.980148 0.356879 0.306556\n",
"S17166 2019-09-30 17 3627037.0 0.980148 0.261311 0.478301\n",
"S22925 2019-09-30 31 3449117.0 0.980148 0.591413 0.457987\n",
"S10469 2019-09-30 30 4198071.0 0.980148 0.570175 0.564174\n",
"... ... ... ... ... ... ...\n",
"S16503 2019-04-07 14 1682893.0 0.004617 0.198061 0.146814\n",
"S17547 2019-03-14 10 1784531.0 0.003232 0.087258 0.163435\n",
"S20864 2019-03-14 8 1118752.0 0.003232 0.039243 0.047091\n",
"S11908 2019-03-09 9 1552311.0 0.001847 0.060942 0.125577\n",
"S11611 2019-03-03 10 1487966.0 0.000923 0.087258 0.113573\n",
"\n",
"[1083 rows x 6 columns]"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_rfm['R'] = data_rfm['最近一次购买时间'].rank(pct=True) # 转化为排名 百分比,便于后续切片\n",
"data_rfm['F'] = data_rfm['消费频率'].rank(pct=True)\n",
"data_rfm['M'] = data_rfm['消费金额'].rank(pct=True)\n",
"data_rfm.sort_values(by='R', ascending=False) "
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "transsexual-frost",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 最近一次购买时间 | \n",
" 消费频率 | \n",
" 消费金额 | \n",
" R | \n",
" F | \n",
" M | \n",
" score | \n",
"
\n",
" \n",
" 客户编码 | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" S17476 | \n",
" 2019-09-30 | \n",
" 69 | \n",
" 10325832.0 | \n",
" 0.980148 | \n",
" 0.986611 | \n",
" 0.987073 | \n",
" 98.6 | \n",
"
\n",
" \n",
" S22326 | \n",
" 2019-09-30 | \n",
" 62 | \n",
" 10074609.0 | \n",
" 0.980148 | \n",
" 0.973223 | \n",
" 0.984303 | \n",
" 98.0 | \n",
"
\n",
" \n",
" S11581 | \n",
" 2019-09-28 | \n",
" 79 | \n",
" 10333668.0 | \n",
" 0.918283 | \n",
" 0.996768 | \n",
" 0.987996 | \n",
" 97.7 | \n",
"
\n",
" \n",
" S12848 | \n",
" 2019-09-29 | \n",
" 66 | \n",
" 9673572.0 | \n",
" 0.944598 | \n",
" 0.980609 | \n",
" 0.980609 | \n",
" 97.3 | \n",
"
\n",
" \n",
" S19095 | \n",
" 2019-09-26 | \n",
" 81 | \n",
" 11031632.0 | \n",
" 0.864728 | \n",
" 0.999077 | \n",
" 0.996307 | \n",
" 97.1 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" S12690 | \n",
" 2019-05-07 | \n",
" 7 | \n",
" 917233.0 | \n",
" 0.012927 | \n",
" 0.022622 | \n",
" 0.024931 | \n",
" 2.2 | \n",
"
\n",
" \n",
" S11176 | \n",
" 2019-06-09 | \n",
" 7 | \n",
" 614134.0 | \n",
" 0.036011 | \n",
" 0.022622 | \n",
" 0.009234 | \n",
" 1.9 | \n",
"
\n",
" \n",
" S18379 | \n",
" 2019-07-05 | \n",
" 4 | \n",
" 400195.0 | \n",
" 0.071099 | \n",
" 0.003232 | \n",
" 0.004617 | \n",
" 1.7 | \n",
"
\n",
" \n",
" S13259 | \n",
" 2019-06-01 | \n",
" 6 | \n",
" 645925.0 | \n",
" 0.025854 | \n",
" 0.011542 | \n",
" 0.011080 | \n",
" 1.4 | \n",
"
\n",
" \n",
" S12463 | \n",
" 2019-04-11 | \n",
" 7 | \n",
" 345919.0 | \n",
" 0.005540 | \n",
" 0.022622 | \n",
" 0.000923 | \n",
" 0.8 | \n",
"
\n",
" \n",
"
\n",
"
1083 rows × 7 columns
\n",
"
"
],
"text/plain": [
" 最近一次购买时间 消费频率 消费金额 R F M score\n",
"客户编码 \n",
"S17476 2019-09-30 69 10325832.0 0.980148 0.986611 0.987073 98.6\n",
"S22326 2019-09-30 62 10074609.0 0.980148 0.973223 0.984303 98.0\n",
"S11581 2019-09-28 79 10333668.0 0.918283 0.996768 0.987996 97.7\n",
"S12848 2019-09-29 66 9673572.0 0.944598 0.980609 0.980609 97.3\n",
"S19095 2019-09-26 81 11031632.0 0.864728 0.999077 0.996307 97.1\n",
"... ... ... ... ... ... ... ...\n",
"S12690 2019-05-07 7 917233.0 0.012927 0.022622 0.024931 2.2\n",
"S11176 2019-06-09 7 614134.0 0.036011 0.022622 0.009234 1.9\n",
"S18379 2019-07-05 4 400195.0 0.071099 0.003232 0.004617 1.7\n",
"S13259 2019-06-01 6 645925.0 0.025854 0.011542 0.011080 1.4\n",
"S12463 2019-04-11 7 345919.0 0.005540 0.022622 0.000923 0.8\n",
"\n",
"[1083 rows x 7 columns]"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_rfm['score'] = data_rfm['R'] * 20 + data_rfm['F'] * 30 + data_rfm['M'] * 50\n",
"data_rfm['score'] = data_rfm['score'].round(1)\n",
"data_rfm.sort_values(by='score', ascending=False) "
]
},
{
"cell_type": "markdown",
"id": "fixed-turkish",
"metadata": {},
"source": [
"根据这个分数结果,我们可以对客户打上一些标签,比如大于 80 分的,标志为优质客户,在资源有限的情况下,可以优先服务好优质客户。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {
"height": "calc(100% - 180px)",
"left": "10px",
"top": "150px",
"width": "384px"
},
"toc_section_display": true,
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 5
}