{"id":5107,"date":"2022-04-01T23:00:49","date_gmt":"2022-04-01T15:00:49","guid":{"rendered":"https:\/\/egonlin.com\/?p=5107"},"modified":"2022-04-03T06:52:38","modified_gmt":"2022-04-02T22:52:38","slug":"05-%e6%95%b0%e6%8d%ae%e5%88%86%e6%9e%90%e5%b7%a5%e5%85%b7%e5%8c%85data","status":"publish","type":"post","link":"https:\/\/egonlin.com\/?p=5107","title":{"rendered":"05 \u6570\u636e\u5206\u6790\u5de5\u5177\u5305Data"},"content":{"rendered":"<h1>1  \u524d\u8a00<\/h1>\n<p>data.table \u662f R \u4e2d\u4e00\u4e2a\u975e\u5e38\u901a\u7528\u548c\u9ad8\u6027\u80fd\u7684\u5305\uff0c\u4f7f\u7528\u7b80\u5355\u3001\u65b9\u4fbf\u800c\u4e14\u901f\u5ea6\u5feb\uff0c\u5728 R \u8bed\u8a00\u793e\u533a\u975e\u5e38\u53d7\u6b22\u8fce\uff0c\u6bcf\u4e2a\u6708\u7684\u4e0b\u8f7d\u91cf\u8d85\u8fc7 40 \u4e07\uff0c\u6709\u8fd1 650 \u4e2a CRAN \u548c Bioconductor \u8f6f\u4ef6\u5305\u4f7f\u7528\u5b83\u3002\u5982\u679c\u4f60\u662f R \u7684\u4f7f\u7528\u8005\uff0c\u53ef\u80fd\u5df2\u7ecf\u4f7f\u7528\u8fc7 data.table \u5305\u3002<\/p>\n<p>\u800c\u5bf9\u4e8e Python \u7528\u6237\uff0c\u540c\u6837\u5b58\u5728\u4e00\u4e2a\u540d\u4e3a datatable \u5305\uff0c\u4e13\u6ce8\u4e8e\u5927\u6570\u636e\u652f\u6301\u3001\u9ad8\u6027\u80fd\u5185\u5b58\/\u5185\u5b58\u4e0d\u8db3\u7684\u6570\u636e\u96c6\u4ee5\u53ca\u591a\u7ebf\u7a0b\u7b97\u6cd5\u7b49\u95ee\u9898\u3002\u5728\u67d0\u79cd\u7a0b\u5ea6\u4e0a\uff0cdatatable \u53ef\u4ee5\u88ab\u79f0\u4e3a\u662f Python \u4e2d\u7684 data.table\u3002<\/p>\n<h1>2  Datatable\u7b80\u4ecb<\/h1>\n<p><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/egonlin.com\/wp-content\/uploads\/2022\/04\/\u5ab2\u7f8epandas\u7684\u6570\u636e\u5206\u6790\u5de5\u5177\u53051.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  data-original=\"https:\/\/egonlin.com\/wp-content\/uploads\/2022\/04\/\u5ab2\u7f8epandas\u7684\u6570\u636e\u5206\u6790\u5de5\u5177\u53051.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" \/><\/div><\/p>\n<p>\u4e3a\u4e86\u80fd\u591f\u66f4\u51c6\u786e\u5730\u6784\u5efa\u6a21\u578b\uff0c\u73b0\u5728\u673a\u5668\u5b66\u4e60\u5e94\u7528\u901a\u5e38\u8981\u5904\u7406\u5927\u91cf\u7684\u6570\u636e\u5e76\u751f\u6210\u591a\u79cd\u7279\u5f81\uff0c\u8fd9\u5df2\u6210\u4e3a\u5fc5\u8981\u7684\u3002\u800c Python \u7684 datatable \u6a21\u5757\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u63d0\u4f9b\u4e86\u826f\u597d\u7684\u652f\u6301\uff0c\u4ee5\u53ef\u80fd\u7684\u6700\u5927\u901f\u5ea6\u5728\u5355\u8282\u70b9\u673a\u5668\u4e0a\u8fdb\u884c\u5927\u6570\u636e\u64cd\u4f5c (\u6700\u591a100GB)\u3002datatable \u5305\u7684\u5f00\u53d1\u7531 H2O.ai \u8d5e\u52a9\uff0c\u5b83\u7684\u7b2c\u4e00\u4e2a\u7528\u6237\u662f Driverless.ai\u3002<\/p>\n<h2>2.1  \u5b89\u88c5<\/h2>\n<blockquote>\n<p>Mac OS\u7cfb\u7edf<\/p>\n<\/blockquote>\n<pre><code class=\"language-python\">pip install datatable<\/code><\/pre>\n<blockquote>\n<p>Linux\u7cfb\u7edf<\/p>\n<\/blockquote>\n<p>\u5b89\u88c5\u8fc7\u7a0b\u9700\u8981\u901a\u8fc7\u4e8c\u8fdb\u5236\u5206\u5e03\u6765\u5b9e\u73b0<\/p>\n<pre><code class=\"language-python\"># If you have Python 3.5\npip install https:\/\/s3.amazonaws.com\/h2o-release\/datatable\/stable\/datatable-0.8.0\/datatable-0.8.0-cp35-cp35m-linux_x86_64.whl\n# If you have Python 3.6\npip install https:\/\/s3.amazonaws.com\/h2o-release\/datatable\/stable\/datatable-0.8.0\/datatable-0.8.0-cp36-cp36m-linux_x86_64.whl<\/code><\/pre>\n<p>\u5f88\u9057\u61be\u7684\u662f\uff0c\u76ee\u524d datatable \u5305\u8fd8\u4e0d\u80fd\u5728 Windows \u7cfb\u7edf\u4e0a\u5de5\u4f5c\uff0c\u4f46 Python \u5b98\u65b9\u4e5f\u5728\u52aa\u529b\u5730\u589e\u52a0\u5176\u5bf9 Windows \u7684\u652f\u6301\u3002\u66f4\u591a\u7684\u4fe1\u606f\u53ef\u4ee5\u67e5\u770b Build instructions \u7684\u8bf4\u660e\u3002<\/p>\n<p><a href=\"https:\/\/datatable.readthedocs.io\/en\/latest\/install.html\">https:\/\/datatable.readthedocs.io\/en\/latest\/install.html <\/a><\/p>\n<h2>2.2  \u6570\u636e\u8bfb\u53d6<\/h2>\n<p>\u8fd9\u91cc\u4f7f\u7528\u7684\u6570\u636e\u96c6\u662f\u6765\u81ea Kaggle \u7ade\u8d5b\u4e2d\u7684 Lending Club Loan Data \u6570\u636e\u96c6, \u8be5\u6570\u636e\u96c6\u5305\u542b2007-2015\u671f\u95f4\u6240\u6709\u8d37\u6b3e\u4eba\u5b8c\u6574\u7684\u8d37\u6b3e\u6570\u636e\uff0c\u5373\u5f53\u524d\u8d37\u6b3e\u72b6\u6001 (\u5f53\u524d\uff0c\u5ef6\u8fdf\uff0c\u5168\u989d\u652f\u4ed8\u7b49) \u548c\u6700\u65b0\u652f\u4ed8\u4fe1\u606f\u7b49\u3002\u6574\u4e2a\u6587\u4ef6\u5171\u5305\u542b226\u4e07\u884c\u548c145\u5217\u6570\u636e\uff0c\u6570\u636e\u91cf\u89c4\u6a21\u975e\u5e38\u9002\u5408\u6f14\u793a datatable \u5305\u7684\u529f\u80fd\u3002 <\/p>\n<p>\u6570\u636e\u96c6\uff1a<\/p>\n<pre><code class=\"language-python\">&quot;&quot;&quot;\n\u94fe\u63a5:https:\/\/pan.baidu.com\/s\/1_vVviJWj6A9I05F7bmQNlg  \u5bc6\u7801:y4jd\n&quot;&quot;&quot;<\/code><\/pre>\n<pre><code class=\"language-python\">import numpy as np\nimport pandas as pd\nimport datatable as dt<\/code><\/pre>\n<p>\u9996\u5148\u5c06\u6570\u636e\u52a0\u8f7d\u5230 Frame \u5bf9\u8c61\u4e2d\uff0cdatatable \u7684\u57fa\u672c\u5206\u6790\u5355\u4f4d\u662f Frame\uff0c\u8fd9\u4e0ePandas DataFrame \u6216 SQL table \u7684\u6982\u5ff5\u662f\u76f8\u540c\u7684\uff1a\u5373\u6570\u636e\u4ee5\u884c\u548c\u5217\u7684\u4e8c\u7ef4\u6570\u7ec4\u6392\u5217\u5c55\u793a\u3002<\/p>\n<blockquote>\n<p><strong>\u4f7f\u7528datatable\u8bfb\u53d6\u6570\u636e<\/strong><\/p>\n<\/blockquote>\n<pre><code class=\"language-python\">%%time\ndft = dt.fread(&#039;loan.csv&#039;)\n\nCPU times: user 23.8 s, sys: 2.32 s, total: 26.1 s                              \nWall time: 2.54 s<\/code><\/pre>\n<p>\u8fd9\u4e2a\u6570\u636e\u96c6\u4e00\u5171226\u4e07\u884c\uff0c145\u5217\uff0c\u5c06\u8fd11.2G\u7684\u6570\u636e\uff0c\u901a\u8fc7datatable\u8bfb\u53d6\u53ea\u7528\u4e862.54s<\/p>\n<p><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/egonlin.com\/wp-content\/uploads\/2022\/04\/\u5ab2\u7f8epandas\u7684\u6570\u636e\u5206\u6790\u5de5\u5177\u53052.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  data-original=\"https:\/\/egonlin.com\/wp-content\/uploads\/2022\/04\/\u5ab2\u7f8epandas\u7684\u6570\u636e\u5206\u6790\u5de5\u5177\u53052.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" \/><\/div><br \/>\n\u5982\u4e0a\u6240\u793a\uff0cfread() \u662f\u4e00\u4e2a\u5f3a\u5927\u53c8\u5feb\u901f\u7684\u51fd\u6570\uff0c\u80fd\u591f\u81ea\u52a8\u68c0\u6d4b\u5e76\u89e3\u6790\u6587\u672c\u6587\u4ef6\u4e2d\u5927\u591a\u6570\u7684\u53c2\u6570\uff0c\u6240\u652f\u6301\u7684\u6587\u4ef6\u683c\u5f0f\u5305\u62ec .zip \u6587\u4ef6\u3001URL \u6570\u636e\uff0cExcel \u6587\u4ef6\u7b49\u7b49\u3002\u6b64\u5916\uff0cdatatable \u89e3\u6790\u5668\u5177\u6709\u5982\u4e0b\u51e0\u5927\u529f\u80fd\uff1a<\/p>\n<ul>\n<li>\u80fd\u591f\u81ea\u52a8\u68c0\u6d4b\u5206\u9694\u7b26\uff0c\u6807\u9898\uff0c\u5217\u7c7b\u578b\uff0c\u5f15\u7528\u89c4\u5219\u7b49\u3002<\/li>\n<li>\u80fd\u591f\u8bfb\u53d6\u591a\u79cd\u6587\u4ef6\u7684\u6570\u636e\uff0c\u5305\u62ec\u6587\u4ef6\uff0cURL\uff0cshell\uff0c\u539f\u59cb\u6587\u672c\uff0c\u6863\u6848\u548c glob \u7b49\u3002<\/li>\n<li>\u63d0\u4f9b\u591a\u7ebf\u7a0b\u6587\u4ef6\u8bfb\u53d6\u529f\u80fd\uff0c\u4ee5\u83b7\u5f97\u6700\u5927\u7684\u901f\u5ea6\u3002<\/li>\n<li>\u5728\u8bfb\u53d6\u5927\u6587\u4ef6\u65f6\u5305\u542b\u8fdb\u5ea6\u6307\u793a\u5668\u3002<\/li>\n<li>\u53ef\u4ee5\u8bfb\u53d6 RFC4180 \u517c\u5bb9\u548c\u4e0d\u517c\u5bb9\u7684\u6587\u4ef6\u3002<\/li>\n<\/ul>\n<blockquote>\n<p><strong>\u4f7f\u7528pandas\u8bfb\u53d6\u6570\u636e<\/strong><\/p>\n<\/blockquote>\n<div style='color:red;font-size:28px'>\uff01\uff01\uff01\u6ce8\u610f:\u7531\u4e8e\u6570\u636e\u91cf\u8fc7\u5927\uff0c\u4f7f\u7528pandas\u8bfb\u53d6\u6570\u636e\u4f1a\u7ecf\u5e38\u4f7f\u670d\u52a1\u6302\u673a\uff0c\u6240\u4ee5\u53ef\u4ee5\u4f7f\u7528\u6570\u636e\u91cf\u7a0d\u5c0f\u7684\u6570\u636e\u96c6\u6765\u6d4b\u8bd5<\/div>\n<pre><code class=\"language-python\">%%time\ndf = pd.read_csv(&#039;loan.csv&#039;)\n\nCPU times: user 27.3 s, sys: 4.68 s, total: 31.9 s\nWall time: 28.5 s<\/code><\/pre>\n<p>\u7531\u6b64\u53ef\u4ee5\u770b\u51fa\uff0c\u7ed3\u679c\u8868\u660e\u5728\u8bfb\u53d6\u5927\u578b\u6570\u636e\u65f6 datatable \u5305\u7684\u6027\u80fd\u660e\u663e\u4f18\u4e8e Pandas\uff0cPandas \u9700\u8981\u63a5\u8fd130\u79d2\u7684\u65f6\u95f4\u6765\u8bfb\u53d6\u8fd9\u4e9b\u6570\u636e\uff0c\u800c datatable \u53ea\u9700\u89812\u79d2\u591a\u3002<\/p>\n<h2>2.3 \u5e27\u8f6c\u6362 (Frame Conversion)<\/h2>\n<p>\u5bf9\u4e8e\u5f53\u524d\u5b58\u5728\u7684\u5e27\uff0c\u53ef\u4ee5\u5c06\u5176\u8f6c\u6362\u4e3a\u4e00\u4e2a Numpy \u6216 Pandas dataframe \u7684\u5f62\u5f0f\uff0c\u5982\u4e0b\u6240\u793a\uff1a <\/p>\n<pre><code class=\"language-python\">numpy_df = dft.to_numpy()\npandas_df = dft.to_pandas()<\/code><\/pre>\n<p>\u4e0b\u9762\uff0c\u5c06 datatable \u8bfb\u53d6\u7684\u6570\u636e\u5e27\u8f6c\u6362\u4e3a Pandas dataframe \u5f62\u5f0f\uff0c\u5e76\u6bd4\u8f83\u6240\u9700\u7684\u65f6\u95f4\uff0c\u5982\u4e0b\u6240\u793a\uff1a<\/p>\n<p>\u7531\u4e8e Lending Club Loan Data \u6570\u636e\u96c6\u7684\u6570\u636e\u91cf\u8fc7\u5927\uff0c\u4f7f\u7528to_padnas\u64cd\u4f5c\uff0cjupyte\u670d\u52a1\u5bb9\u6613\u6302\u673a\uff0c\u6240\u4ee5\u4f7f\u7528\u4e00\u4e2a\u6570\u636e\u96c6\u8f83\u5c0f\u7684\u8fdb\u884c\u6d4b\u8bd5\u3002<\/p>\n<pre><code class=\"language-python\">%%time\ndft = dt.fread(&#039;baba.csv&#039;)\npandas_df = dft.to_pandas()\n\nCPU times: user 2.44 ms, sys: 287 \u00b5s, total: 2.72 ms\nWall time: 2.62 ms<\/code><\/pre>\n<p>\u901a\u8fc7datatable\u8bfb\u53d6\u6570\u636e\u52a0\u4e0a\u5c06\u5176\u8f6c\u6362\u4e3aDataFrame\u6570\u7ec4\uff0c\u4e00\u5171\u662f2.62ms.<\/p>\n<pre><code class=\"language-python\">%%time\ndft = pd.read_csv(&#039;baba.csv&#039;)\n\nCPU times: user 7.95 ms, sys: 3.18 ms, total: 11.1 ms\nWall time: 14.4 ms\n<\/code><\/pre>\n<p>\u5355\u901a\u8fc7pandas\u8bfb\u53d6\u6570\u636e\uff0c\u603b\u5171\u9700\u898114.4ms\u3002<\/p>\n<p>\u770b\u8d77\u6765\u5c06\u6587\u4ef6\u4f5c\u4e3a\u4e00\u4e2a datatable frame \u8bfb\u53d6\uff0c\u7136\u540e\u5c06\u5176\u8f6c\u6362\u4e3a Pandas dataframe\u6bd4\u76f4\u63a5\u8bfb\u53d6 Pandas dataframe \u7684\u65b9\u5f0f\u6240\u82b1\u8d39\u7684\u65f6\u95f4\u66f4\u5c11\u3002\u56e0\u6b64\uff0c\u901a\u8fc7 datatable \u5305\u5bfc\u5165\u5927\u578b\u7684\u6570\u636e\u6587\u4ef6\u518d\u5c06\u5176\u8f6c\u6362\u4e3a Pandas dataframe \u7684\u505a\u6cd5\u662f\u4e2a\u4e0d\u9519\u7684\u4e3b\u610f\u3002<\/p>\n<h2>2.4 \u5e27\u7684\u57fa\u7840\u5c5e\u6027<\/h2>\n<p>\u4e0b\u9762\u6765\u4ecb\u7ecd datatable \u4e2d frame \u7684\u4e00\u4e9b\u57fa\u7840\u5c5e\u6027\uff0c\u8fd9\u4e0e Pandas \u4e2d dataframe \u7684\u4e00\u4e9b\u529f\u80fd\u7c7b\u4f3c\u3002<\/p>\n<pre><code class=\"language-python\">print(dft.shape)       # (nrows, ncols)\nprint(dft.names[:5])   # top 5 column names\nprint(dft.stypes[:5])  # column types(top 5)\n______________________________________________________________\n(2260668, 145)\n(&#039;id&#039;, &#039;member_id&#039;, &#039;loan_amnt&#039;, &#039;funded_amnt&#039;, &#039;funded_amnt_inv&#039;)\n(stype.bool8, stype.bool8, stype.int32, stype.int32, stype.float64)\n<\/code><\/pre>\n<p>\u4e5f\u53ef\u4ee5\u901a\u8fc7\u4f7f\u7528 head \u547d\u4ee4\u6765\u6253\u5370\u51fa\u8f93\u51fa\u7684\u524d n \u884c\u6570\u636e\uff0c\u5982\u4e0b\u6240\u793a\uff1a<\/p>\n<pre><code class=\"language-python\">dft.head(10)\n<\/code><\/pre>\n<p><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/egonlin.com\/wp-content\/uploads\/2022\/04\/\u5ab2\u7f8epandas\u7684\u6570\u636e\u5206\u6790\u5de5\u5177\u53053.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  data-original=\"https:\/\/egonlin.com\/wp-content\/uploads\/2022\/04\/\u5ab2\u7f8epandas\u7684\u6570\u636e\u5206\u6790\u5de5\u5177\u53053.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" \/><\/div><br \/>\n\u6ce8\u610f\uff1a\u8fd9\u91cc\u7528\u989c\u8272\u6765\u6307\u4ee3\u6570\u636e\u7684\u7c7b\u578b\uff0c\u5176\u4e2d\u7ea2\u8272\u8868\u793a\u5b57\u7b26\u4e32\uff0c\u7eff\u8272\u8868\u793a\u6574\u578b\uff0c\u800c\u84dd\u8272\u4ee3\u8868\u6d6e\u70b9\u578b\u3002<\/p>\n<h2>2.5 \u7edf\u8ba1\u603b\u7ed3<\/h2>\n<p>\u5728 Pandas \u4e2d\uff0c\u603b\u7ed3\u5e76\u8ba1\u7b97\u6570\u636e\u7684\u7edf\u8ba1\u4fe1\u606f\u662f\u4e00\u4e2a\u975e\u5e38\u6d88\u8017\u5185\u5b58\u7684\u8fc7\u7a0b\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u5728 datatable \u5305\u4e2d\u662f\u5f88\u65b9\u4fbf\u7684\u3002\u5982\u4e0b\u6240\u793a\uff0c\u4f7f\u7528 datatable \u5305\u8ba1\u7b97\u4ee5\u4e0b\u6bcf\u5217\u7684\u7edf\u8ba1\u4fe1\u606f\uff1a<\/p>\n<pre><code class=\"language-python\">dft.sum()      dft.nunique()\ndft.sd()       dft.max()\ndft.mode()     dft.min()\ndft.nmodal()   dft.mean()\n<\/code><\/pre>\n<p>\u4e0b\u9762\u5206\u522b\u4f7f\u7528 datatable \u548cPandas \u6765\u8ba1\u7b97\u6bcf\u5217\u6570\u636e\u7684\u5747\u503c\uff0c\u5e76\u6bd4\u8f83\u4e8c\u8005\u8fd0\u884c\u65f6\u95f4\u7684\u5dee\u5f02\u3002<\/p>\n<blockquote>\n<p><strong>Datatable\u8bfb\u53d6<\/strong><\/p>\n<\/blockquote>\n<pre><code class=\"language-python\">%%time\ndft.mean()\n__________________________________________________________________\nCPU times: user 3.56 s, sys: 5.35 ms, total: 3.56 s\nWall time: 302 ms\n<\/code><\/pre>\n<blockquote>\n<p><strong>Pandas\u8bfb\u53d6<\/strong><\/p>\n<\/blockquote>\n<pre><code class=\"language-python\">pandas_df.mean()\n__________________________________________________________________\nThrows memory error.\n<\/code><\/pre>\n<p>\u4f7f\u7528 Pandas \u8ba1\u7b97\u65f6\u629b\u51fa\u5185\u5b58\u9519\u8bef\u7684\u5f02\u5e38\u3002<\/p>\n<h1>3 \u6570\u636e\u64cd\u4f5c<\/h1>\n<p>\u548c dataframe \u4e00\u6837\uff0cdatatable \u4e5f\u662f\u67f1\u72b6\u6570\u636e\u7ed3\u6784\u3002\u5728 datatable \u4e2d\uff0c\u6240\u6709\u8fd9\u4e9b\u64cd\u4f5c\u7684\u4e3b\u8981\u5de5\u5177\u662f\u65b9\u62ec\u53f7\uff0c\u5176\u7075\u611f\u6765\u81ea\u4f20\u7edf\u7684\u77e9\u9635\u7d22\u5f15\uff0c\u4f46\u5b83\u5305\u542b\u66f4\u591a\u7684\u529f\u80fd\u3002\u8bf8\u5982\u77e9\u9635\u7d22\u5f15\uff0cC\/C++\uff0cR\uff0cPandas\uff0cNumpy \u4e2d\u90fd\u4f7f\u7528\u76f8\u540c\u7684 DT[i\uff0cj] \u7684\u6570\u5b66\u8868\u793a\u6cd5\u3002\u4e0b\u9762\u6765\u770b\u770b\u5982\u4f55\u4f7f\u7528 datatable \u6765\u8fdb\u884c\u4e00\u4e9b\u5e38\u89c1\u7684\u6570\u636e\u5904\u7406\u5de5\u4f5c\u3002<\/p>\n<p><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/egonlin.com\/wp-content\/uploads\/2022\/04\/\u5ab2\u7f8epandas\u7684\u6570\u636e\u5206\u6790\u5de5\u5177\u53054.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  data-original=\"https:\/\/egonlin.com\/wp-content\/uploads\/2022\/04\/\u5ab2\u7f8epandas\u7684\u6570\u636e\u5206\u6790\u5de5\u5177\u53054.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" \/><\/div><\/p>\n<h2>\u9009\u62e9\u884c\/\u5217\u7684\u5b50\u96c6<\/h2>\n<p>\u4e0b\u9762\u7684\u4ee3\u7801\u80fd\u591f\u4ece\u6574\u4e2a\u6570\u636e\u96c6\u4e2d\u7b5b\u9009\u51fa\u6240\u6709\u884c\u53ca funded_amnt \u5217\uff1a<\/p>\n<pre><code class=\"language-python\">dft[:,&#039;funded_amnt&#039;]\n<\/code><\/pre>\n<p>\u5c55\u793a\u5982\u4f55\u9009\u62e9\u6570\u636e\u96c6\u4e2d\u524d5\u884c3\u5217\u7684\u6570\u636e\uff0c\u5982\u4e0b\u6240\u793a\uff1a <\/p>\n<pre><code class=\"language-python\">dft[:5,:3]\n<\/code><\/pre>\n<h2>\u5e27\u6392\u5e8f<\/h2>\n<ul>\n<li>datatable \u6392\u5e8f<\/li>\n<\/ul>\n<p>\u5728 datatable \u4e2d\u901a\u8fc7\u7279\u5b9a\u7684\u5217\u6765\u5bf9\u5e27\u8fdb\u884c\u6392\u5e8f\u64cd\u4f5c\uff0c\u5982\u4e0b\u6240\u793a\uff1a <\/p>\n<pre><code class=\"language-python\">%%time\ndft.sort(&#039;funded_amnt_inv&#039;)\n\nCPU times: user 1.47 s, sys: 77.1 ms, total: 1.55 s\nWall time: 147 ms\n<\/code><\/pre>\n<ul>\n<li>Pandas \u6392\u5e8f<\/li>\n<\/ul>\n<pre><code class=\"language-python\">%%time\npandas_df.sort_values(by = &#039;funded_amnt_inv&#039;)\n___________________________________________________________________\nCPU times: user 8.76 s, sys: 2.87 s, total: 11.6 s\nWall time: 12.4 s\n<\/code><\/pre>\n<p>\u53ef\u4ee5\u770b\u5230\u4e24\u79cd\u5305\u5728\u6392\u5e8f\u65f6\u95f4\u65b9\u9762\u5b58\u5728\u660e\u663e\u7684\u5dee\u5f02\u3002<\/p>\n<h2>\u5220\u9664\u884c\/\u5217<\/h2>\n<p>\u4e0b\u9762\u5c55\u793a\u5982\u4f55\u5220\u9664 member_id \u8fd9\u4e00\u5217\u7684\u6570\u636e\uff1a<\/p>\n<pre><code class=\"language-python\">del dft[:, &#039;member_id&#039;]\n<\/code><\/pre>\n<h2>\u5206\u7ec4 (GroupBy)<\/h2>\n<p>\u4e0e Pandas \u7c7b\u4f3c\uff0cdatatable \u540c\u6837\u5177\u6709\u5206\u7ec4 (GroupBy) \u64cd\u4f5c\u3002\u4e0b\u9762\u6765\u770b\u770b\u5982\u4f55\u5728 datatable \u548c Pandas \u4e2d\uff0c\u901a\u8fc7\u5bf9 grade \u5206\u7ec4\u6765\u5f97\u5230 funded_amout \u5217\u7684\u5747\u503c\uff1a <\/p>\n<ul>\n<li><strong>datatable \u5206\u7ec4<\/strong><\/li>\n<\/ul>\n<pre><code class=\"language-python\">%%time\nfor i in range(100):\n    dft[:, dt.sum(dt.f.funded_amnt), dt.by(dt.f.grade)]\n\nCPU times: user 9.45 s, sys: 643 ms, total: 10.1 s\nWall time: 861 ms\n<\/code><\/pre>\n<ul>\n<li><strong>pandas \u5206\u7ec4<\/strong><\/li>\n<\/ul>\n<pre><code class=\"language-python\">%%time\nfor i in range(100):\n    pandas_df.groupby(&quot;grade&quot;)[&quot;funded_amnt&quot;].sum()\n____________________________________________________________________\nCPU times: user 12.9 s, sys: 859 ms, total: 13.7 s\nWall time: 13.9 s\n<\/code><\/pre>\n<h2>.f \u4ee3\u8868\u4ec0\u4e48<\/h2>\n<p>\u5728 datatable \u4e2d\uff0cf \u4ee3\u8868 frame_proxy\uff0c\u5b83\u63d0\u4f9b\u4e00\u79cd\u7b80\u5355\u7684\u65b9\u5f0f\u6765\u5f15\u7528\u5f53\u524d\u6b63\u5728\u64cd\u4f5c\u7684\u5e27\u3002\u5728\u4e0a\u9762\u7684\u4f8b\u5b50\u4e2d\uff0cdt.f \u53ea\u4ee3\u8868 dt_df\u3002<\/p>\n<h2>\u8fc7\u6ee4\u884c<\/h2>\n<p>\u5728 datatable \u4e2d\uff0c\u8fc7\u6ee4\u884c\u7684\u8bed\u6cd5\u4e0eGroupBy\u7684\u8bed\u6cd5\u975e\u5e38\u76f8\u4f3c\u3002\u4e0b\u9762\u5c31\u6765\u5c55\u793a\u5982\u4f55\u8fc7\u6ee4\u6389 loan_amnt \u4e2d\u5927\u4e8e funding_amnt \u7684\u503c\uff0c\u5982\u4e0b\u6240\u793a\u3002 <\/p>\n<pre><code class=\"language-python\">dft[dt.f.loan_amnt&gt;dt.f.funded_amnt,&quot;loan_amnt&quot;]\n<\/code><\/pre>\n<h2>\u4fdd\u5b58\u5e27<\/h2>\n<p>\u5728 datatable \u4e2d\uff0c\u540c\u6837\u53ef\u4ee5\u901a\u8fc7\u5c06\u5e27\u7684\u5185\u5bb9\u5199\u5165\u4e00\u4e2a csv \u6587\u4ef6\u6765\u4fdd\u5b58\uff0c\u4ee5\u4fbf\u65e5\u540e\u4f7f\u7528\u3002\u5982\u4e0b\u6240\u793a\uff1a<\/p>\n<pre><code class=\"language-python\">dft.to_csv(&#039;output.csv&#039;)\n<\/code><\/pre>\n<p>\u6709\u5173\u6570\u636e\u64cd\u4f5c\u7684\u66f4\u591a\u529f\u80fd\uff0c\u53ef\u67e5\u770b datatable \u5305\u7684\u8bf4\u660e\u6587\u6863<\/p>\n<blockquote>\n<p>\u5730\u5740\uff1a<a href=\"https:\/\/datatable.readthedocs.io\/en\/latest\/using-datatable.html\">https:\/\/datatable.readthedocs.io\/en\/latest\/using-datatable.html<\/a><\/p>\n<\/blockquote>\n<h1>\u603b\u7ed3<\/h1>\n<p>\u5728\u6570\u636e\u79d1\u5b66\u9886\u57df\uff0c\u4e0e\u9ed8\u8ba4\u7684 Pandas \u5305\u76f8\u6bd4\uff0cdatatable \u6a21\u5757\u5177\u6709\u66f4\u5feb\u7684\u6267\u884c\u901f\u5ea6\uff0c\u8fd9\u662f\u5176\u5728\u5904\u7406\u5927\u578b\u6570\u636e\u96c6\u65f6\u7684\u4e00\u5927\u4f18\u52bf\u6240\u5728\u3002\u7136\u800c\uff0c\u5c31\u529f\u80fd\u800c\u8a00\uff0c\u76ee\u524d datatable \u5305\u6240\u5305\u542b\u7684\u529f\u80fd\u8fd8\u4e0d\u5982 pandas \u5b8c\u5584\u3002\u76f8\u4fe1\u5728\u4e0d\u4e45\u7684\u5c06\u6765\uff0c\u4e0d\u65ad\u5b8c\u5584\u7684 datatable \u80fd\u591f\u66f4\u52a0\u5f3a\u5927\u3002ik1000000<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1 \u524d\u8a00 data.table \u662f R \u4e2d\u4e00\u4e2a\u975e\u5e38\u901a\u7528\u548c\u9ad8\u6027\u80fd\u7684\u5305\uff0c\u4f7f\u7528\u7b80\u5355\u3001\u65b9\u4fbf\u800c\u4e14\u901f\u5ea6\u5feb\uff0c\u5728 R \u8bed\u8a00\u793e\u533a [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":5110,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[371,378],"tags":[],"_links":{"self":[{"href":"https:\/\/egonlin.com\/index.php?rest_route=\/wp\/v2\/posts\/5107"}],"collection":[{"href":"https:\/\/egonlin.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/egonlin.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/egonlin.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/egonlin.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5107"}],"version-history":[{"count":0,"href":"https:\/\/egonlin.com\/index.php?rest_route=\/wp\/v2\/posts\/5107\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/egonlin.com\/index.php?rest_route=\/wp\/v2\/media\/5110"}],"wp:attachment":[{"href":"https:\/\/egonlin.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5107"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/egonlin.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5107"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/egonlin.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5107"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}