数据探索性分析_探索性数据分析

论坛 期权论坛     
选择匿名的用户   2021-5-29 23:15   526   0
<article style="font-size: 16px;">
<p>数据探索性分析</p>
<div>
  <section>
   <div>
    <div>
     <p>When we hear about Data science or Analytics , the first thing that comes to our mind is Modelling , Tuning etc. . But one of the most important and primary steps before all of these is Exploratory Data Analysis or EDA.</p>
     <p> 当我们听到有关数据科学或分析的知识时,想到的第一件事就是建模,调整等。 但是,在所有这些步骤中最重要和最主要的步骤之一是探索性数据分析或EDA。 </p>
     <figure style="display:block;text-align:center;">
      <div>
       <div>
        <div>
         <div style="text-align: center;">
          <img alt="Image for post" height="121" src="https://beijingoptbbs.oss-cn-beijing.aliyuncs.com/cs/5606289-e910213530a483987372f0ade2fbf9e0.png" style="outline: none;" width="415">
         </div>
        </div>
       </div>
      </div>
      <figcaption>
       Exploratory data analysis (Machine learning process steps)
      </figcaption>
      <figcaption>
        探索性数据分析(机器学习过程步骤)
      </figcaption>
     </figure>
     <h1> <strong>为什么选择EDA</strong> <span style="font-weight: bold;">(</span><strong>Why EDA</strong><span style="font-weight: bold;">)</span></h1>
     <p>In Data Science one of the Major problem Data Scientists/Analysts are facing today is the Data Quality . Since we rely on multiple sources for data , data quality is often compromised.The quality of Data determines the quality of models which we are building on it .As the adage goes,Garbage in , garbage out . The above statement holds very true in the case of Data science.</p>
     <p> 在数据科学领域,数据科学家/分析师当今面临的主要问题之一是数据质量。 由于我们依赖于多个数据源,因此数据质量常常受到损害。数据的质量决定了我们在其上构建的模型的质量。 上面的陈述在数据科学领域非常正确。 </p>
     <p>We cannot build Empire State Building or Burj Khalifa on a shaky foundation !</p>
     <p> 我们不能在摇摇欲坠的基础上建造帝国大厦或哈利法塔! </p>
     <p>And that explains why 60–80% of time of Data Scientists are being spent on Data gathering and Data preparation.</p>
     <p> 这就解释了为什么将60-80%的数据科学家的时间都花在数据收集和数据准备上。 </p>
     <p>When we are working with Data , EDA or Exploratory Data Analysis is the most important step .It is very important to gather as much information and insights from data as we could before processing it . This could be done by EDA. EDA Also help us to analyse the underlying trends and patterns in data and also help us to formulate our problem statement in a better way .</p>
     <p> 当我们处理数据时,EDA或探索性数据分析是最重要的步骤。在处理数据之前,从数据中收集尽可能多的信息和见解非常重要。 这可以由EDA完成。 EDA还可以帮助我们分析数据的潜在趋势和模式,还可以帮助我们更好地制定问题陈述。 </p>
     <p>“ <strong>Well begun is half done”</strong></p>
     <p> “ <strong>好的开始已经完成了一半”</strong> </p>
     <p>Exploratory Data Analysis helps to understand the data better and also it helps to understand what Data speaks.This could be done both by visual analysis as well as with few other analysis.Also EDA helps to distinguish between what to be pursued further and what is not worth following up.</p>
     <p> 探索性数据分析有助于更好地理解数据,也有助于理解数据的含义,这既可以通过可视化分析也可以通过很少的其他分析来完成,此外EDA有助于区分需要进一步追求的目标和不追求的目标值得跟进。 </p>
     <p><strong>Exploratory Data Analysis</strong></p>
     <p> <strong>探索性数据分析</strong> </p>
     <p>Let’s explore steps of Exploratory data analysis using Bank loan Data set</p>
     <p> 让我们探索使用银行贷款数据集进行探索性数据分析的步骤 </p>
     <p><em>Import the Libraries:</em></p>
     <p> <em>导入库:</em> </p>
     <p>To perform initial analysis , we would need libraries like Numpy, Pandas,Seaborn and Matplotlib. Numpy is an array processing package.Its a library for numerical computations .Pandas is used for data manipulation and analysis. Matplotlib and Seaborn are statistical libraries used for data visualization</p>
     <p> 为了进行初步分析,我们需要Numpy,Pandas,Seaborn和Matplotlib之类的库。 Numpy是一个数组处理程序包,它是一个用于数值计算的库.Pandas用于数据处理和分析。 Matplotlib和Seaborn是用于数据可视化的统计库 </p>
     <figure style="display:block;text-align:center;">
      <div>
       <div>
        <div>
         <div style="text-align: center;">
          <img alt="Image for post" height="191" src="https://beijingoptbbs.oss-cn-beijing.aliyuncs.com/cs/5606289-e8e34b3071a6a8f99e4a0055bfc7b167.png" style="outline: none;" width="289">
         </div>
        </div>
       </div>
      </div>
     </figure>
     <p><em>Import Dataset:</em></p>
     <p> <em>导入数据集:</em> </p>
     <p>Data is stored in csv file format, hence we are
分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:3875789
帖子:775174
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP