R project for statistical analysis

Thursday, September 19, 2013

http://www.youtube.com/watch?v=rulIUAN0U3w&list=TLKWxYP9aX0g4

Saturday, April 6, 2013

Session # 10 : Video Making Assignment

IT BAL Video Assignment:

The link to the video is :
10 things you should know about Android

This video is made by

Rajib Layek (12BM60030)
Rohit Joshi (12BM60031)

Saturday, March 30, 2013

Session # 10 : 3D Plot on R

Session # 10 : 3D Plot on R

Assignment 1:

#Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length.

T<- cbind(x,y,z)

#Create 3 dimensional plot of the same.

> sample<-rnorm(50,25,6)
> sample [1] 23.10381 25.85777 22.04959 43.53180 12.11174 37.23922 38.92648 22.77181 17.80844 30.41365 32.32586 37.09651 24.55097 19.85470 31.01534 29.70007 31.72610 22.26199[19] 19.85826 36.94503 23.50247 18.00116 24.50004 27.57822 20.34054 17.32243 30.26892 19.03535 16.14514 28.81016 29.45099 23.10639 25.49178 35.95906 19.35419 23.04064[37] 25.20819 18.83031 30.75433 19.14759 28.11077 25.91251 28.03618 33.34057 30.19792 25.07813 25.08856 26.12123 24.15002 22.09888
> x<-sample(sample,10)
> y<-sample(sample,10)
> z<-sample(sample,10)
> x[1] 22.77181 31.01534 27.57822 22.09888 24.15002 19.85826 28.03618 17.32243 31.72610 19.35419
> y[1] 19.85826 26.12123 30.19792 23.10639 31.72610 22.04959 24.55097 27.57822 28.03618 31.01534
> z[1] 28.11077 31.72610 28.81016 12.11174 30.41365 23.50247 24.55097 22.04959 29.45099 30.26892
> T<-cbind(x,y,z)
> T
x y z
[1,] 22.77181 19.85826 28.11077
[2,] 31.01534 26.12123 31.72610
[3,] 27.57822 30.19792 28.81016
[4,] 22.09888 23.10639 12.11174
[5,] 24.15002 31.72610 30.41365
[6,] 19.85826 22.04959 23.50247
[7,] 28.03618 24.55097 24.55097
[8,] 17.32243 27.57822 22.04959
[9,] 31.72610 28.03618 29.45099
[10,] 19.35419 31.01534 30.26892

> plot3d(T)

> plot3d(T,col=rainbow(1000))

> plot3d(T,col=rainbow(1000),type='s')

Assignment 2:

Create 2 random variables Create 3 plots:

> #1. X-Y ,X-Y|Z (introducing a variable z and cbind it to x and y with 5 diff categories)
> x<-rnorm(1500,100,10)
> y<-rnorm(1500,85,5)
> z1<-sample(letters,5)
> z2<-sample(z1,1500,replace=TRUE)
> z<-as.factor(z2)
> t<-cbind(x,y,z)

> qplot(x,y)

> qplot(x,z)

> qplot(x,z,alpha=I(1/10))

> qplot(x,y,geom=c("point","smooth"))

> qplot(x,y,colour=z)

> qplot(log(x),log(y),colour=z)

Saturday, March 23, 2013

Session # 9: InfoGraphics Tool on Facebook

Session # 9: InfoGraphics Tool

Tool : WolframAlpha ( http://www.wolframalpha.com/facebook/)

Review:

What it does:
Basically It collects, analyze and present all the data from your facebook via WolFram apps.

A feature Wolfram Alpha introduced back in September, Wolfram Alpha’s Facebook report delves into your profile and breaks down all of your activity into easy to digest graphs. It’s surprisingly comprehensive so data like times of interaction, word maps, relationship stats and network structure is all visualized for your convenience.

This is mostly for fun since it’s only your personal account that’s visualized, but if Facebook is your main source of interaction (with subscribers, friends, etc.), you will have a lot of information to help you improve.

What’s cool:
The kind of insightful data it gives is awesome in every sense. It starts from your birth date and go beyond friends location, relationship status breakdown, most liked post/pic.video and many more. Few features are shown below.

Personal Information:

Activity History:-

Post Statistics:-

Most Frequent word used:-

Most Like Post:-

Weekly Status:-

Most Liked Pic:-

Friends Statistics:-

Friend Age Distribution:-

Friend's Location ;-

Geographic Extreme:-

Friend's Network:-

These cluster shows different friend circles like below:

Drawbacks:

1. While sharing on Facebook/ Google+ the pic is not shown properly.
2. To much information to read while sharing.

Skill level: Beginner

Runs on: Any Web browser.

Learn more: See the below video how to get all the data from facebook.

Friday, March 15, 2013

Session # 8: Panel Data analysis with R

Session # 8

Assignment

Calculate the values for all the 3 models and decide which models best fits the data set for panel estimation ?

Panel Data:

Panel data is combination of Time Series Data and Cross sectional Data. It means it contains data of different time and for different attribute.

R for Panel Data:

The basic function used for panel data generation and estimation is plm.

The data set we have used in this session in "Produc".

The description for the same is as under.It contains the following data headings

- state : the state
- year : the year
- pcap: private capital stock
- hwy : highway and streets
- water: water and sewer facilities
- util: other public buildings and structures
- pc: public capital
- gsp: gross state products
- emp: labor input measured by the employement in non–agricultural payrolls
- unemp: state unemployment rate

Download and Load the "plm" package.
Use the data set "Produc" , a panel data set within plm package for panel estimations

Step1 : calculating value for pooling model

Step 2: Calculating value for fixed model

Step3 : calculating value for random model

Now to choose the best model that fits the data set "Produc" , we need to run pairwise hypothesis tests among the 3 models and select the best fit in the end.

Test1 : Between pooling and fixed model

pFtest (fixed1 , pooled)

Test details :

H0: Null: the individual index and time based params are all zero
Alternative Hypothesis : atleast one of the index and time based params are non zero

The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low.. Null hypothesis is rejected.

Hence Fixed model is better than the pooling model.

Test 2: Between pooling and random model

Command used :
plmtest (pooled)

Test details :

H0: Null: the individual index and time based params are all zero : Pooling Model
Alternative Hypothesis : atleast one of the index and time based params are non zero : Random Model

The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low.. Null hypothesis is rejected.

Hence random model is better than the pooling model.

Test 3: Between fixed and random model

We use Hausman test :-

phtest(random1 , fixed1)

Test details :

H0: Null: individual effects are not correlated with any regressor : Random Model
Alternative Hypothesis : Individual effects are correlated : Fixed Model

The hypothesis test suggests that the one of the models is inconsistent.
As the p-value is too low.. Null hypothesis is rejected.

Hence fixed model is better than random model.

Conclusion :-

After the series of tests , we can conclude that fixed model best fits the "Produc" data set panel data estimations. i.e there is significant correlation observed with the regressor variables and index impact exists.
Hence we would choose "Fixed" model to estimate the panel data presented by "Produc" data set.

Wednesday, February 13, 2013

Session # 6

Assignment 1

1) Create log of returns data (from 01.01.2012 to 01.01.2013) and calculate historical volatility
2) Create ACF plot for the log returns data ,perform adf test and interpret.

Commands:
> stockprice<-read.csv(file.choose(),header=T)
> head(stockprice)
> closingprice<-stockprice[,5]
> closingprice.ts<-ts(closingprice,frequency=252)
> returns<-(closingprice.ts-lag(closingprice.ts,k=-1))/lag(closingprice.ts,k=-1)
> z<-scale(returns)+10
> logreturns<-log(z)
> logreturns
> acf(logreturns)

From the above graph, we can see that the measurements lie with in the 95% confidence interval. Therefore, the time series is stationary.
> T=252^0.5
> historicalvolatility<-sd(logreturns)*T
> historicalvolatility
> adf.test(logreturns)

Augmented Dickey-Fuller Test

data: logreturn
Dickey-Fuller = -5.656, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(logreturn) : p-value smaller than printed p-value

Interpretation:

Since p-value is less than (1-.95) ,therefore we can say null hypothesis is rejected and hence the time series is stationary so data analysis can be done.

Thursday, February 7, 2013

Session #5

Assignment 1

1. Find returns of NSE data of greater than 6 months having selected the 10th data point as start and 95th data point as end.

2. Find plot of that return

The file consist of S&P CNX Nifty data from January 2012 to July 2012.

Code

Extra Commands

Returns

Plot

Assignment 2

1-700 data is available, Predict the data from 701-850, use the GLM estimation using LOGIT Analysis for the same.

Thursday, September 19, 2013

Saturday, April 6, 2013

Session # 10 : Video Making Assignment

Saturday, March 30, 2013

Session # 10 : 3D Plot on R

Saturday, March 23, 2013

Session # 9: InfoGraphics Tool on Facebook

Drawbacks:

Skill level: Beginner

Runs on: Any Web browser.

Learn more: See the below video how to get all the data from facebook.

Friday, March 15, 2013

Session # 8: Panel Data analysis with R

Session # 8

Assignment

Panel Data:

R for Panel Data:

Step1 : calculating value for pooling model

Step 2: Calculating value for fixed model

Step3 : calculating value for random model

Test1 : Between pooling and fixed model

Test details :

Test 2: Between pooling and random model

Test details :

Test 3: Between fixed and random model

Test details :

Conclusion :-

Wednesday, February 13, 2013

Session # 6

Session # 6

Assignment 1

Augmented Dickey-Fuller Test

Interpretation:

Thursday, February 7, 2013

Session #5

Assignment 1

Code

Extra Commands

Returns

Plot

Assignment 2

Solution:

Annexures: