We Have Numbers Of Free Samples
For Each Subject To Make A Difference In Your Grade
Data Analysis Report on Transport for London’s ‘Santander’
Total Views 707
Introduction:
In this topic we’ve used Microsoft Excel and Microsoft Word to carry out personal and business work. We got to know various methods in these applications required to work with these aspects, which are of serious standards. While using these tools we acquired essential knowledge that would help us in generalizing our career globally. It helped in putting our ideas vibrantly and verifying that data. These work processes not only would build us personally and professionally to be essential parts of Regent’s University and beyond but would also help us to have a vivid objective in the business world.
This topic bases on Microsoft Excel during computer lab classes. After this project is completed successfully we got to know the common methods being used, the languages which are used to analyze a business, that would help us in judging and analyzing the methods which are being used by other organizations.
Aims:
This project is based on two aspects, one is business mathematics, and another is statistics. The first method gives an idea of linear and non-linear functions, introduction to financial mathematics, linear programming and provides solutions to linear equations. The second method pressurizes statistics, from the description of statistics to the introduction of inference in statistics. We’ve used various mathematical and statistical functions in Microsoft Excel while working in this.
Learning outcomes:
Lastly from this experiment we can state some crucial points, which are as follows:
- We got to know some basic mathematical computations which are used for a business to work.
- We learnt description of statistics, probability, statistic inference, correlation, sampling, basic regression, along with forecasting which are the tools in statistics used for the data analysis.
- We could use the concepts of financial mathematics that consists of simple and compound interest: discrete and continuous, the present value along with depreciation.
- We could provide the summaries of large data quantitatively and interpret them as well.
- We got to know the use of instruments which are commonly used in the inference of statistics regarding business.
- We came across the use of a range of media and tools required in information technology, and learnt how to write quantitatively for an audience, which is in target.
- Through this topic we could describe the importance of a certain category of Excel that is uses in the analysis of every situation in business.
Task 1:
a) First we need to calculate the mean and the median for the number of hired bikes on each day. For these findings we can directly calculate the mean and the median of the datasheet as well as we can find out the mean and the median physically.
If we check the mean and the median physically then to find out the number of bikes hired per day will be very much hectic for that huge data sheet.
Number of days = 1494
Number of bikes hired for this entire period = 47368404
That means the mean of the bikes hired per day will be ![]()
For finding out the median we need to find out the mid-point of that datasheet. But if we will be able to arrange the number of bikes hired in ascending or descending order then the finding out of median will be easier.
For that reason, we have sort the number of bikes hired on an ascending order and find out the median of the data sheet.
The maximum bikes hired in that entire duration is 61690 in one day and the minimum number of bikes hired in this same duration is 3593 in one day.
From the excel data sheet we have find out the median is the 33103.
For evidence we have shown some snap shots as the requirement,

b) Here we are going to determine the number of bicycles is normally distributed or not. To find out it’s normally distribution, we need to analyze the whole data sheet in terms of a changing variable. Here we have to find the number of bikes distributed with varying the temperature. Here the dependent variable is the temperature.
For this purpose, the best way to find the normal distribution is the Histogram. Histogram is basically a plot made on the context of two variables. Here the two variables are temperature and the number of bikes hired. Plotting them on two different axis in MS Excel we can generate automatically a histogram.

For evidence we have also implemented a snapshot of finding out the histogram plotting in excel.

Fig: It shows that the histogram is made on selecting the whole data sheet by taking two variables these are temperature and no of bikes.
The need of this kind of histogram is to determine the normally distribution. We can get a clear idea about the normally distribution parameter from histogram. Now it is the turn to determine the normally distribution. We can say a normally distribution occurred if the curve made by the histogram is a symmetric one, otherwise it is not normally distributed. From our result we can say that if we just look at our histogram it has a symmetric structure with a same amount of interval. For that we can say it is normally distributed. But if we observe it carefully then it will shoe that there are so many variations in number of bikes hired. So at that scenario we can say it is not normally distributed.
c) The normally distribution is very much important for analyzing any data sheet. The reason behind it is, the normally distributed data are some kind of symmetric data layer. That means it is kind of an arrangement of data which matches with some natural phenomena. For example, we can say the blood pressure, height, errors these are also normally distributed at the time of preparing the datasheet to make it easier for analysis, and to get a symmetric result.
Task 2
a) Here the requirement is to find out a forecast on the context of number of bikes required for next 7 days. Where, the previous 3 weeks’ data is given. From that previous 3 weeks’ data we need to find out the linear function between the dates and the bikes hired. The previous 3 weeks’ data sheet is given below,
| Season | Days | Date | Temperature | No of bikes hired |
| Summer | Friday | 10/08/2018 | 22 | 41242 |
| Summer | Saturday | 11/08/2018 | 24 | 45275 |
| Summer | Sunday | 12/08/2018 | 25 | 49212 |
| Summer | Monday | 13/08/2018 | 20 | 35340 |
| Summer | Tuesday | 14/08/2018 | 22 | 42263 |
| Summer | Wednesday | 15/08/2018 | 26 | 61418 |
| Summer | Thursday | 16/08/2018 | 25 | 51937 |
| Summer | Friday | 17/08/2018 | 21 | 40019 |
| Summer | Saturday | 18/08/2018 | 24 | 45931 |
| Summer | Sunday | 19/08/2018 | 25 | 47862 |
| Summer | Monday | 20/08/2018 | 23 | 43206 |
| Summer | Tuesday | 21/08/2018 | 23 | 44261 |
| Summer | Wednesday | 22/08/2018 | 21 | 38998 |
| Summer | Thursday | 23/08/2018 | 21 | 40250 |
| Summer | Friday | 24/08/2018 | 15 | 20000 |
| Summer | Saturday | 25/08/2018 | 16 | 35973 |
| Summer | Sunday | 26/08/2018 | 22 | 41971 |
| Summer | Monday | 27/08/2018 | 22 | 41394 |
| Summer | Tuesday | 28/08/2018 | 22 | 42170 |
| Summer | Wednesday | 29/08/2018 | 21 | 40081 |
| Summer | Thursday | 30/08/2018 | 23 | 43305 |
For finding out the equation in terms of two variable date verses no of bikes hired then we can get the proper equation. That equation will determine the number of bikes required for a particular date. For this purpose, we have prepared a scatter plot with its linear equation. On that equation we will be able to find out forecasted data.

Fig: Scatter plotting for finding out the linear equation regarding forecasting data.
Here the scatter plot is being prepared on varying the date to observe its number of vehicle required. Here the value of R2 is 0.108. The plot will match if we use power plotting here. But for the requirement of linear function we have used a linear graph system to find out the forecasted value.
If we put the values on the equation given in the plot we will get a forecasted value of the requirements of bikes.
The forecasted result will be something like this,
| Date | No of bikes required |
| 31/08/2018 | 38148 |
| 01/09/2018 | 37781 |
| 02/09/2018 | 37417 |
| 03/09/2018 | 36972 |
| 04/09/2018 | 36705 |
| 05/09/2018 | 36346 |
| 06/09/2018 | 35996 |
Here this table signifies the forecasted requirement of the bikes required to maintain the supply chain.

Fig: Evidence that we have applied the procedure to find out the result for the given 21 days’ data
b) As the result found out by the help of scatter plotting with two variables, these are date and number of bikes hired. Here the result is not exactly a perfect one because the graph having a very fluctuating nature. For this fluctuating nature of the graph the equation should not be a linear one. Still we calculated a linear equation depending on the functions stated above. The values are getting from this equation generating procedure we have got an approximate result regarding the requirements of bikes for the next 7 days.
For my point of view if we generate a same pattern by using temperature variable instead of date then the result will be much prominent. All this data is given in the data sheet. If we require any extra verification, we can go for it.
Task 3
a) Here we are using the RANDBETWEEN function for finding out the random variables on the context or requirement of bikes. In this command we are going to prepare the 500 number of variable randomly by considering the minimum and the maximum number of bikes hired per day. The minimum and the maximum number of bikes hired per day are very clear to us from the database. They are respectively 3593 and 61690. In between this range we have prepared the random variable list. Since it is a large table having a 25 numbers of rows and 20 numbers of columns, so we are not incorporating it in this report. For proof of concept and evidence we are attaching a snapshot of finding out the random variables.

The procedure of finding out the random variables in a excel sheet is to make a function and giving it the lower and upper value. The result values will be displayed are obviously belongs to that range only. If we are not satisfied among them then by F9 we can change the automatic generated random variables
b) Using this sample we can find out the mean of that particular 500 random variable.
Here the excel generated mean of that 500 random variable is 32428.42

Fig: Sorted for finding out median low to high basis
For this 500 random variables the median is the mid position value among the 500 values. But the problem is that the total number of variable is 500. So there is not a single mid position. 250th and the 251th this two are the mid position. So the value of median is the average of 250th and 251th variable.
That means the median of those 500 sample variables is 34643.
For these 500 numbers of random variables in between the minimum and maximum values the standard deviation is 16553.42.

Fig: Finding out the standard deviation for 500 numbers of random variables
Here the mean, median and standard deviation for the 500 random variable samples are 32428.42, 34643 and 16553.42 respectively.
There are some basic differences in mean, median and standard deviation. Basically mean is the average of some data, it expressed as the total divided by total number of events.
Median is defined as the mid positioned value of a data sheet where the data are arranged on ascending or descending order.
The standard deviation is the avg-main; answer square then avg of the final answer. This is the basic requirement of finding out the standard deviation.
Task 4
a) Here we have prepared a pivot table on the context of taking a simple generalized report database for finding out the number of hired bikes, getting from the entire database on the variable of Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday.
From this pivot table we can get idea about the average number of bikes hired on each day.

From here we are getting the average number of bikes hired per day, they are listed on the last row.
| Days | Avg number of bikes hired |
| Sunday | 341701.1429 |
| Monday | 349390 |
| Tuesday | 312240 |
| Wednesday | 291165.273 |
| Thursday | 278176.1 |
| Friday | 305789 |
| Saturday | 2058191.48 |
b) Here also we are doing the same thing but on the time of considering the pivot table instead of taking the days as a variable here we are taking the season as a variable and making the pivot table.

From this pivot table we also get a clear view of the entire database. It shows the total number of bikes hired in different seasons (Autumn, Spring, Summer, Winter). The grand total biked hired in this seasons are also clear in this pivot table. The average hired bikes are also being calculated in the excel sheet. As a result we can conclude a data sheet like
| Season | Avg number of bikes hired |
| Autumn | 12272771.1 |
| Spring | 1186646 |
| Summer | 1342158 |
| Winter | 843803 |
On the result we can say, the pivot table is very much important topic for this kind of analysis, because a pivot table can make a whole large table into a very simple and short table, as we observe here.
Task 5
a) Here with the help of correlation analysis we have found out the fundamental relations among the temperature and the number of bikes hired. Here the value of correlation coefficient in the two cases is determined. The first case by taking the whole database and the second database by taking the last 365 days database.
Table: Finding out correlation coefficient for whole datasheet
| temperature | bikes hired | |
| temperature | 1 | 0.900764586 |
| bikes hired | 0.900764586 | 1 |
Table: Finding out correlation coefficient for last 365 days
| temperature | bikes hired | |
| temperature | 1 | 0.93590783 |
| bikes hired | 0.93590783 | 1 |
For the first case we get to know that, the coefficient of correlation is positive in both the cases. At the first time it was almost about 0.900 and in the second time it was almost about 0.935. The positive correlation coefficient says that, if the temperature increases at that time the number of bikes hired increased. It is the importance of correlation analysis. It gives us a forecasted model also to get a future idea for upcoming weathers and temperatures; it can help such kind of business to make their arrangement by knowing a huge future demand.

Since the scatter plots are not completely visible for our data sheet because here the list of temperatures and their corresponding number of hired bikes are huge. So we need to conclude our thinking on the basis of correlation.
In case of coefficient of correlation, we can see that the overall correlation coefficient is lesser than that of the coefficient of correlation on last one year. That indicates an important thing, which is the demand is growing in terms of hiring bikes.
a)
SUMMARY OUTPUT
| Regression Statistics | |
| Multiple R | 0.900765 |
| R Square | 0.811377 |
| Adjusted R Square | 0.81125 |
| Standard Error | 4911.065 |
| Observations | 1493 |
ANOVA
| df | SS | MS | F | Significance F | |
| Regression | 1 | 1.55E+11 | 1.55E+11 | 6413.65 | 0 |
| Residual | 1491 | 3.6E+10 | 24118558 | ||
| Total | 1492 | 1.91E+11 |
| Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
| Intercept | 7859.648 | 323.7644 | 24.27583 | 5.6E-110 | 7224.566 | 8494.73 | 7224.566 | 8494.73 |
| 15 | 1575.954 | 19.67845 | 80.08526 | 0 | 1537.353 | 1614.554 | 1537.353 | 1614.554 |
Now we need to make a successful regression analysis for same kind of thing, but the only thing we need to be consider is, the regression must be done by considering the data for last 365 days. Then on we can comment on the regression equation, R square values and the F statistics.
The regression outcome data shown below is the regression analysis for last 365 days of data analysis.
SUMMARY OUTPUT
| Regression Statistics | |
| Multiple R | 0.935908 |
| R Square | 0.875923 |
| Adjusted R Square | 0.875581 |
| Standard Error | 4310.024 |
| Observations | 364 |
ANOVA
| df | SS | MS | F | Significance F | |
| Regression | 1 | 4.75E+10 | 4.75E+10 | 2555.554 | 4.1E-166 |
| Residual | 362 | 6.72E+09 | 18576310 | ||
| Total | 363 | 5.42E+10 |
| Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
| Intercept | 7019.632 | 570.4063 | 12.30637 | 2.58E-29 | 5897.906 | 8141.358 | 5897.906 | 8141.358 |
| 20 | 1651.731 | 32.67358 | 50.55249 | 4.1E-166 | 1587.477 | 1715.985 | 1587.477 | 1715.985 |
So, here for the first case when the whole data sheet is considered the regression equation becomes,
For first case,
- , that means the slope of the regression equation is positive. Also the p value is very small compared to the other equation values that signifies the linear relationship between x and y.
- For this case the correction coefficient (R) = 0.900765, the value of R- square coefficient of determination is 0.811377, which is also a percentage of y explained by regression of x parameter. That means 81.13% of the variation in y is explained by x.
For second case,
- , that means the slope of the regression equation is positive. Also the p value is very small compared to the other equation values that signifies the linear relationship between x and y.
- For this case the correction coefficient (R) = 0.935908, the value of R- square coefficient of determination is 0.875923, which is also a percentage of y explained by regression of x parameter. That means 87.59% of the variation in y is explained by x.
Among this two regression data according to our findings the first one regression data is more prominent than the second one. The reason behind it is, at the first time we have considered the whole data sheet for finding out regression analysis. But at the second time we worked only with the last one-year database. So it is very obvious that of we get more detailed data then automatically the regression analysis will be more prominent.
b) Though all the data available in this datasheet is enough to calculate the needful requirements. But if we want to make a much clearer analysis on this topic then we must say that, we should have some another important parameters/ variables then only the regression analysis will be more prominent.
Conclusion:
This is the report on which we worked on the topic of TfL case study. With the help of a huge data sheet we have performed so many experimental procedures to get an analysis outcome. From this report we can get a detail performance analysis also. From the data sheet we have concluded so many things like the relationships between all the variables. We have determined the dependent and independent variables. From regression analysis we got most detail parameters related to this assignment. As a whole we can conclude that, the outcome from this project gives us a lot experience in data handling and their virtual simulation.
Reference List
Aiken, L. S., & West, S. G. (1991). Multiple regression: testing and interpreting interactions. Thousand Oaks, CA: Sage.
Cohen, J, (1994). The Earth is round (P<.05). the American Psychologist, 49(912), 997-1003.
Kelley, K., & Maxwell, S. E. (2003). Sample size for multiple regression: Obtaining regression coefficients that are accurate, not simply significant. Psychological Methods, 8(3), 305-321.
Greene, S. B. (1991). How many subjects does it take to do a regression analysis? Multivariate Behavioral Research , 26(3), 499-510.
Luborsky, Mark R. & Robert L. Rubinstein (1995). Sampling in Qualitative Research: Rationale, issues and Methods. Research on Aging 17: 89-113.
Saldana, Johnny (2012): The Coding Manual for qualitative Researchers (Second Ed.). London: Sage,
At BookMyEssay, there are hundreds of report writing professionals who are ever ready to deliver academic assignment help. In any case, if you are not fully satisfied with the given subject, then you can send it back for rectification. They do not charge a single penny for the whole rectification process.
Download
505
Size
140.91 KB
File Type
[contact-form-7 404 "Not Found"]