We Have Numbers Of Free Samples
For Each Subject To Make A Difference In Your Grade
Tasks for Analysis of Data Set
Total Views 707
Task: 1
a.
Fig (1): Random 250 sample selection in excel
Here for selecting a 250 numbers of sample from the 2000 set of data in the data sheet we have used manual method by using some commands. But to select a 250 number of data randomly in manual basis it is very difficult for that reason we have used the random selection by using ‘RAND’ function in excel data sheet. So we can say a manual and function based random selection we have used here for the selection of 250 random samples from the data sheet. The process we have followed for this selection is stated below in this part of the report.
The steps are followed using some function and sort keys in Microsoft excel,
- Insert a column at the left most side of the entire data sheet.
- On the first column we can select any random number not upon own mind it should be selected by using random variable selection so we have used ‘=RAND()’ on that particular box and we get a random variable on this box.
- Now select the entire column up-to the 2001 no of row and by using the function button ‘F2’ one can go to the top most row where we have performed the ‘=RAND()’ function.
- Then we have selected the entire column and same rand function comes throughout the column as we can see in the above figures where we have shown the random selection of 250 data as a sample.
- After that we have arranged the random selected variables in the left most column in a lower to higher manner then automatically as a random variable selection the random variables are arranged as per the requirement now we can select a 250 number of sample data from the whole data sheet. That we have done actually in the random selection.
- It can be selected at anywhere of the data sheet because the data sequencing is just changed as per the random variable and by using ‘F9’ we can change the random selections and the corresponding data sheet will also change accordingly.
This is how we have selected the random 250 samples from the data sheet. The selected data sheet is also given with the assignment report so that one can confirm the technique and the selection terminologies.
According to my opinion the sampling technique used in this process is one of the best and short less time consuming technique for selection of any sample of data from a bigger data sheet. Instead of this there are so many techniques are there for selecting random list from a data sheet like rand between function, sampling function in data analysis but the main problems in these technique is, if there are non-numeric data available in our data sheet then those techniques will not work in Microsoft excel. The reason behind it is the excel can only read a numeric value at the time of data analysis so if we want to get the sampling data by using data analysis method we need to rename the non-numeric data into numeric one for excel’s understanding. So for that reason the technique we have used for any characteristics simulation like the gender of a household head is the best method.
The reason behind the selection of sample by the method we have applied is the best stated on the above paragraph.
B.
Fig (2): Descriptive statistics and Box- Whisker plotting
Fig (3): Box- Whisker plotting
Table (1): descriptive statistics result data
Alcohol | Meals | Fuel | Phone | |
Minimum | 0 | 0 | 0 | 0 |
Mean | 1158.9960 | 1107.4538 | 2106.6345 | 1413.430 |
Median | 809 | 780 | 1620 | 1140 |
Maximum | 6518 | 6000 | 36000 | 16500 |
Standard deviation | 1303.495 | 1139.9 | 2792.025 | 1498.277 |
Sum | 288590 | 275756 | 524552 | 351944 |
Count | 250 | 250 | 250 | 250 |
Variance | 1699100 | 1299372 | 7795403 | 2244833 |
Here we have shown the descriptive statistics result in a short tabular format for the alcohol, meal, fuel and phone. And the calculations done as followed by the data analysis descriptive statistical tool. On the basis of this the box whisker plots are made. Since the mean and the median do not have a higher difference for that reason the boxes are congested here in the plotting section.
C. As we can see the descriptive statistics is in our hand we can say a lot about the different parameters we have taken into consideration for the statistics. Here the parameters we have taken for the descriptive statistics are expenditures in different section like alcohol, meals, fuel and phone. We can conclude that the minimum values of expenditure in that particular region is 0, which says that there are some families are available who do not consume alcohol, some families are there who do not have any expenditure in their meal, some of the families having no expenditure in fuel and also some of the families are available do not have any expenditure in phone. But, if we see the average of the different expenditures we can see the expenditure related to the alcohol, meal, and phone are almost nearest to each other but the mean expenditure for fuel are much higher compared to any other expanses. For that kind of scenario, we can say the people at that region have a higher expense in their fuels.
If we consider the expenditure in alcohol and meal are almost same but the expenditure in phone are higher compared to these two. As we can see the mean as well as the sum is higher for the expenditure in phone.
These are the basic findings from the descriptive statistics we have done on the 250 random sample data taken from the whole data sheet. It might happen that we will get a more prominent result if we take more number of samples at the time of selecting sample. Like, if we will take 500 sample data them our finding out assumption will be more prominent.
Task 2
A.
Table (2): Computed top 10% value of annual after tax income
Top 10 % | |
Mean | 146908.6834 |
Standard Error | 4162.969 |
Median | 129652 |
Mode | 418490 |
Standard Deviation | 58725.9 |
Skewness | 3.179543 |
Range | 314574 |
Minimum | 103916 |
Maximum | 418490 |
Sum | 29234828 |
Count | 200 |
Here what we have done actually, first we have taken the entire data sheet like copied the annual after tax income for the entire data sheet then we have arranged them on maximum to the minimum basis. Since, we have done with it now the annual income after tax paying list. From that data sheen we will be able to take a 10% of the value it is 200 values from its top side to take the top values or in terms of maximum values. After that we have selected the bottom 10% data it means bottom 200 samples to get the minimum data of 200 families. We have made selected them accordingly and the descriptive statistics are performed.
As we can see for the top 10% the mean income is about 146908.6834 as the maximum income is 418490 and the minimum income is 103916. So we can see there is a higher variation between the annual incomes after paying tax. The difference is almost 314575. If we observe carefully then web can say there are some families are available in the top 10% list having a very low income after paying tax compared to some others. Since, the value of the mean is closer to the maximum value then we can say among the top 10 % people more of them having a higher income after paying tax and very few are having a lower income after paying tax.
Table (3): Computed bottom 10% value of annual after tax income
Bottom 10% | |
Mean | 12346.8944 |
Standard Error | 180.7245 |
Median | 12501 |
Mode | 12220 |
Standard Deviation | 2549.433 |
Sample Variance | 6499608 |
Range | 11796 |
Minimum | 5000 |
Maximum | 16796 |
Sum | 2457032 |
Count | 200 |
Here the above table shows the same thing we have done earlier in this part of the assignment but the only difference is in the first case we have taken the top 10% incomes and now we have taken the bottom 10% income for consideration in terms of computing and discussion about the incomes of the families stands on the bottom 10% income after paying tax.
Here the mean income after paying tax is about 12346.8944, among the bottom 10% income families we can see the maximum amount of income after paying tax is about 16796 and the minimum amount of income after paying tax is 5000. The difference between the mean and the maximum is not that much higher but the difference with the minimum is much higher. So we can conclude from it, in the bottom 10% of the families having a higher number of families having income more but few number of families are there having a low income after paying tax.
These are the detail discussion about the problem statement we can say for our data sheet. If one can read it carefully the main thing can be very clear to all.
B. Here in the data sheet we have the list something like some of the families has their own house and some of them do not have. So here we need to make the so we need to find out the proportion for the peoples having own house and people do not have their own house.
The proportion is just basically the ratio of the existing quantity and the total quantity. For that reason, we have sampled the data of the own house in the data sheet like maximum to minimum form there we have counted the families having a code of 1 and also counted the codes of 0. Here 1 represents having own house and 0 represents do not having own house.
The number of families having own house is 1443 and the number of families do not have own house is 557. So we can say the proportion of the family having own house is 1443/2000 = 0.7215 and the proportion of the families’ do not have their own house is 0.2785. Now, we should state what is the exact meaning of the proportion? And what it signifies here?
We can say from the result 72.15% of the families having their own house and 27.85% do not having their own house.
Since, we have calculated the proportion for finding out the number of households having own house, and we got the result 72.15% (own house) and 27.85% (don’t have own house). So if we take any five samples from the chart as a random basis by manually random selection then in selection of 5 samples for various cases we will get various results. The reason behind it is, in one case it may happen among 5 sample we have 3 for own house and 2 for not having house on the other hand we can face something like 2 having own house and 3 don’t have.
Fig (4): Probability finding out procedure
But after selecting a randomly 5 samples we have calculated the probability of own housed for 3 house id among the 5 samples. And we have found for a particular sample the probability of having house among 3 families is 60%.
C. Here in this part of the assignment we have used the command ‘LN (location)’ for finding out the logarithmic value of the Texp and the ATaxInc after that we have used the scatter plot for finding out the relation between ln (Texp) and ln (ATaxInc) and we get the auto generated scatter plot and the equation for the relationship between these two quantities.
Fig (5): Natural logarithmic quantities plotting methodology
Fig (6): ln (Texp) Vs ln (ATaxInc) plotting and coefficient of correlation finding out
Here we have found out the relationship between the two quantities we have stated earlier. The scatter plotting is kind of congested because we have used the entire data sheet for making this plotting. The relation between them is almost a liner one as we can see visually the connecting line of those plots is a straight line.
The relationship between the ln (Texp) and ln (ATaxInc) is exactly y = 0.998X + 0.018, and the value of R2= 0.998. From here we can manually find out the coefficient of correlation and also by the correlation analysis and regression analysis we can find out the value of R which is the Coefficient of correlation. Hence, the value of coefficient of correlation(R) is 0.9989.
In case of finding out the linear relation between two variables the coefficient of correlation states the strong relation between the independent and dependent variable. In a simple regression analysis, the coefficient of correlation varies from -1.00 to +1.00 if we get a value of R very close to +1 then it signifies the strong relation between the two variables. Likely, if we get a value close to -1.00 then it signifies a weak bonding between the two variables. Since, we are getting a coefficient of correlation of 0.998 is very close to +1.00 so we can say the relationship between the ln (Texp) and ln (ATaxInc) is very strong.
Task 3
A. In this part of the assignment we need to find out the contingency table. Now, the question is what is a contingency table? it is a table used in general survey data of a huge data sheet for finding out the frequencies in different parameters of the data sheet. In this kind of table mainly two variables are shown and the form of the variable selection in the table is kind of matrix form. The form of this table is in matrix form to reduce the complexity in readability. For our case we have made the contingency table in excel by using the ‘COUNTIFS’ function, the shown parameters in the contingency table is basically the gender of the people and the educational qualification of them in a huge data sheet. From this contingency table we will get to know the total number of mail and total number of female, the number of males and females in different educational level which is the most important need for our problem statement.
Table (4): Contingency table between gender and level of education
P ( Primary) |
S (Secondary) | I (Intermediate) | B (Bachelor) | M (Master) | TOTAL | |
M | 201 | 218 | 214 | 174 | 193 | 1000 |
F | 206 | 200 | 197 | 219 | 178 | 1000 |
TOTAL | 407 | 418 | 411 | 393 | 371 | 2000 |
B. Here the problem statement is given about the probability of the gender of the household is male and having a higher degree of master (M). Since, we have taken the genders from the data list of the household head. So, from the contingency table we will be able to find out the number of male personal having a higher degree of Master. That means those personal are male, with a higher degree of master and along with it he is the head of the household.
We have 193 male personal having a higher degree of master and they are also the head of the household. But the total number of house hold are 2000 so the probability of male head of the house hold having master as a higher degree is (193/2000) *100 = 9.56%.
Fig (7): Probability (Male, master degree, head of household)
C. In this part the problem statement is given like, we have to find out the probability of the male as a head of the household among the personal having a higher degree as master degree. For that purpose, we just need to consider two variables one is the total personal having a master degree and the total male personals having master degree.
For this,
The total number of personal having a master degree = 371
The total number of male personal having master degree = 193
Therefore, the probability of male master degree personal as a head of household among the master degree personals only is, (193/371) *100 = 52.0216%.
Fig (8): Probability (Male, master degree, head of household) among master degree
D. In this part the problem statement is to find out the proportion of the bachelor as a higher degree in females,
For finding out this, it is just the ratio of the number of female having bachelor degree to the total number of female personal. So, here the proportion is, (219/1000) = 0.2190.
Fig (9): Proportion of bachelor female personals (Head of households)
E. Here the problem statement is given something like this; we have to clarify the independency of the event a female as a head of household having educational degree of Primary. According to this problem statement we can say obviously it is an independent event. The reason behind it is, if there is a chance of having a probability that a female can be a head of a household and having an educational degree of primary. It is completely independent; the probability does not affect any other events we have discussed. Even all other events are completely independent. We cannot mix the male and female. So, the event stated is completely independent in nature.
Reference List
Kish, L., (1965), Survey Sampling, Willey, New York.
Neefjes, K., (2000), Environments and Live hoods: Strategies for Sustainability. London: Oxfam.
Hoinville, G. & Jowell, R., (1978), Survey Research Practice, Hants: Gower, England.
Casley, D. J, & Lury, D. A., (1987), Data Collection in Developing Countries. Oxford: Clarendon Press.
BookMyEssay is backed up by thousands of experts and professionals who spend their lot of time in making informative and impressive all assignment help. You can rightly say that this is one of the most trusted assignment provider company in the marketplace. If you want to score in the examination, then contact this website today only.
Download
505
Size
140.91 KB
File Type