R project for statistical analysis
Thursday, September 19, 2013
Saturday, April 6, 2013
Session # 10 : Video Making Assignment
IT BAL Video Assignment:
The link to the video is :
10 things you should know about Android
This video is made by
Rajib Layek (12BM60030)
Rohit Joshi (12BM60031)
The link to the video is :
10 things you should know about Android
This video is made by
Rajib Layek (12BM60030)
Rohit Joshi (12BM60031)
Saturday, March 30, 2013
Session # 10 : 3D Plot on R
Session # 10 : 3D Plot on R
Assignment 1:
#Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length.
T<- cbind(x,y,z)
#Create 3 dimensional plot of the same.
> sample<-rnorm(50,25,6)
> sample [1] 23.10381 25.85777 22.04959 43.53180 12.11174 37.23922 38.92648 22.77181 17.80844 30.41365 32.32586 37.09651 24.55097 19.85470 31.01534 29.70007 31.72610 22.26199[19] 19.85826 36.94503 23.50247 18.00116 24.50004 27.57822 20.34054 17.32243 30.26892 19.03535 16.14514 28.81016 29.45099 23.10639 25.49178 35.95906 19.35419 23.04064[37] 25.20819 18.83031 30.75433 19.14759 28.11077 25.91251 28.03618 33.34057 30.19792 25.07813 25.08856 26.12123 24.15002 22.09888
> x<-sample(sample,10)
> y<-sample(sample,10)
> z<-sample(sample,10)
> x[1] 22.77181 31.01534 27.57822 22.09888 24.15002 19.85826 28.03618 17.32243 31.72610 19.35419
> y[1] 19.85826 26.12123 30.19792 23.10639 31.72610 22.04959 24.55097 27.57822 28.03618 31.01534
> z[1] 28.11077 31.72610 28.81016 12.11174 30.41365 23.50247 24.55097 22.04959 29.45099 30.26892
> T<-cbind(x,y,z)
> T
x y z
[1,] 22.77181 19.85826 28.11077
[2,] 31.01534 26.12123 31.72610
[3,] 27.57822 30.19792 28.81016
[4,] 22.09888 23.10639 12.11174
[5,] 24.15002 31.72610 30.41365
[6,] 19.85826 22.04959 23.50247
[7,] 28.03618 24.55097 24.55097
[8,] 17.32243 27.57822 22.04959
[9,] 31.72610 28.03618 29.45099
[10,] 19.35419 31.01534 30.26892
> plot3d(T)
> plot3d(T,col=rainbow(1000))
> plot3d(T,col=rainbow(1000),type='s')
Assignment 2:
Create 2 random variables Create 3 plots:
> #1. X-Y ,X-Y|Z (introducing a variable z and cbind it to x and y with 5 diff categories)
> x<-rnorm(1500,100,10)
> y<-rnorm(1500,85,5)
> z1<-sample(letters,5)
> z2<-sample(z1,1500,replace=TRUE)
> z<-as.factor(z2)
> t<-cbind(x,y,z)
> qplot(x,y)
> qplot(x,z)
> qplot(x,z,alpha=I(1/10))
> qplot(x,y,geom=c("point","smooth"))
> qplot(x,y,colour=z)
> qplot(log(x),log(y),colour=z)
Saturday, March 23, 2013
Session # 9: InfoGraphics Tool on Facebook
Session # 9: InfoGraphics Tool
Tool : WolframAlpha ( http://www.wolframalpha.com/facebook/)
Review:
What it does:
Basically It collects, analyze and present all the data from your facebook via WolFram apps.
What’s cool:
The kind of insightful data it gives is awesome in every sense. It starts from your birth date and go beyond friends location, relationship status breakdown, most liked post/pic.video and many more. Few features are shown below.
Personal Information:
Activity History:-
Post Statistics:-
Most Frequent word used:-
Most Like Post:-
Weekly Status:-
Most Liked Pic:-
Friends Statistics:-
Friend Age Distribution:-
Friend's Location ;-
Geographic Extreme:-
Friend's Network:-
These cluster shows different friend circles like below:
1. While sharing on Facebook/ Google+ the pic is not shown properly.
2. To much information to read while sharing.
Tool : WolframAlpha ( http://www.wolframalpha.com/facebook/)
Review:
What it does:
Basically It collects, analyze and present all the data from your facebook via WolFram apps.
A feature Wolfram Alpha introduced back in September, Wolfram Alpha’s Facebook report delves into your profile and breaks down all of your activity into easy to digest graphs. It’s surprisingly comprehensive so data like times of interaction, word maps, relationship stats and network structure is all visualized for your convenience.
This is mostly for fun since it’s only your personal account that’s visualized, but if Facebook is your main source of interaction (with subscribers, friends, etc.), you will have a lot of information to help you improve.
What’s cool:
The kind of insightful data it gives is awesome in every sense. It starts from your birth date and go beyond friends location, relationship status breakdown, most liked post/pic.video and many more. Few features are shown below.
Personal Information:
Activity History:-
Post Statistics:-
Most Frequent word used:-
Most Like Post:-
Weekly Status:-
Most Liked Pic:-
Friends Statistics:-
Friend Age Distribution:-
Friend's Location ;-
Geographic Extreme:-
Friend's Network:-
These cluster shows different friend circles like below:
Drawbacks:
1. While sharing on Facebook/ Google+ the pic is not shown properly.
2. To much information to read while sharing.
Skill level: Beginner
Runs on: Any Web browser.
Learn more: See the below video how to get all the data from facebook.
Friday, March 15, 2013
Session # 8: Panel Data analysis with R
Session # 8
Assignment
Calculate the values for all the 3 models and decide which models best fits the data set for panel estimation ?Panel Data:
Panel data is combination of Time Series Data and Cross sectional Data. It means it contains data of different time and for different attribute.
R for Panel Data:
The basic function used for panel data generation and estimation is plm.The data set we have used in this session in "Produc".
The description for the same is as under.It contains the following data headings
- state : the state
- year : the year
- pcap: private capital stock
- hwy : highway and streets
- water: water and sewer facilities
- util: other public buildings and structures
- pc: public capital
- gsp: gross state products
- emp: labor input measured by the employement in non–agricultural payrolls
- unemp: state unemployment rate
Download and Load the "plm" package.
Use the data set "Produc" , a panel data set within plm package for panel estimations
Step1 : calculating value for pooling model
Step 2: Calculating value for fixed model
Step3 : calculating value for random model
Now to choose the best model that fits the data set "Produc" , we need to run pairwise hypothesis tests among the 3 models and select the best fit in the end.
Test1 : Between pooling and fixed model
pFtest (fixed1 , pooled)
Test details :
H0: Null: the individual index and time based params are all zeroAlternative Hypothesis : atleast one of the index and time based params are non zero
The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low.. Null hypothesis is rejected.
Hence Fixed model is better than the pooling model.
Test 2: Between pooling and random model
Command used :plmtest (pooled)
Test details :
H0: Null: the individual index and time based params are all zero : Pooling ModelAlternative Hypothesis : atleast one of the index and time based params are non zero : Random Model
The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low.. Null hypothesis is rejected.
Hence random model is better than the pooling model.
Test 3: Between fixed and random model
We use Hausman test :-
phtest(random1 , fixed1)
Test details :
H0: Null: individual effects are not correlated with any regressor : Random ModelAlternative Hypothesis : Individual effects are correlated : Fixed Model
The hypothesis test suggests that the one of the models is inconsistent.
As the p-value is too low.. Null hypothesis is rejected.
Hence fixed model is better than random model.
Conclusion :-
After the series of tests , we can conclude that fixed model best fits the "Produc" data set panel data estimations. i.e there is significant correlation observed with the regressor variables and index impact exists.Hence we would choose "Fixed" model to estimate the panel data presented by "Produc" data set.
Wednesday, February 13, 2013
Session # 6
Session # 6
Assignment 1
1) Create log of returns data (from 01.01.2012 to 01.01.2013) and calculate historical volatility2) Create ACF plot for the log returns data ,perform adf test and interpret.
Commands:
> stockprice<-read.csv(file.choose(),header=T)
> head(stockprice)
> closingprice<-stockprice[,5]
> closingprice.ts<-ts(closingprice,frequency=252)
> returns<-(closingprice.ts-lag(closingprice.ts,k=-1))/lag(closingprice.ts,k=-1)
> z<-scale(returns)+10
> logreturns<-log(z)
> logreturns
> acf(logreturns)
From the above graph, we can see that the measurements lie with in the 95% confidence interval. Therefore, the time series is stationary.
> T=252^0.5
> historicalvolatility<-sd(logreturns)*T
> historicalvolatility
> adf.test(logreturns)
Augmented Dickey-Fuller Test
data: logreturn
Dickey-Fuller = -5.656, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
Warning message:
In adf.test(logreturn) : p-value smaller than printed p-value
Interpretation:
Since p-value is less than (1-.95) ,therefore we can say null hypothesis is rejected and hence the time series is stationary so data analysis can be done.
Thursday, February 7, 2013
Session #5
Assignment 1
1. Find returns of NSE data of greater than 6 months having selected the 10th data point as start and 95th data point as end.
2. Find plot of that return
The file consist of S&P CNX Nifty data from January 2012 to July 2012.
Code
Extra Commands
Returns
Plot
Assignment 2
1-700 data is available, Predict the data from 701-850, use the GLM estimation using LOGIT Analysis for the same.
Solution:
Annexures:
Subscribe to:
Posts (Atom)





































