r - Troubleshooting ddply() script -
I am developing a sensor dependent variable for use in survival analysis. My goal is to find the last time ("time") that a person answered a question in a survey (for example, where "q.time" is "1", and "q.time + 1" and q Is coded in) the later time is coded as "0")
By this argument, the last question answered is coded as "1" (q.time) should be done. The first question that has not been answered (q.time + 1) should be coded as "0" and all the questions after answering the first question should be coded as "NA". I want to remove all the rows where my dataset from DV = NA
A very generous colleague has helped me develop the following code, but she is now on leave and she needs a little more lovin. The code is as follows:
Library (plyr) #ddply for the library (statistics) # to resize (...) # from the above data & lt; - data.frame (id = c (1, 2, 3, 4), q.1 = c (1, 1, 0, 0), q.2 = c (1, 0, 1, 0), dv. 1 = c (1, 1, 1, 1), dv.2 = c (1, 1, 0, 1)) # Longer than & lt; - reshape (dat, direction = 'long', varying = c ('q.1', 'q.2', 'dv.1', 'dv.2')) ddply (long, (id), function (Df) {# Dropoff timed the answer & lt; - subset (df, q == 1) last.q = maximum (time given response) subs & lt; - subset (df, time & lt; = last .q + 1) # Determine all new Dvs as desired; - Representative (Last.C., 1) if (last Q & lt; max (df $ time)) new.dv < C (0, last.q) subs $ dv <- new.dv subs}) Unfortunately, this error message returns:
Error in "$ & lt; -. Data.frame` (` * TMP * ',' D ', value = C (0, - Inf)): Replacement has 2 rows, 0 in the data " Any ideas? The problem seems to be located in the "Rep" command, but I am a newbie for R. Thanks a lot!
Update: See the details below, and then follow the follow question Hello there - I totally I followed you, and really appreciate the time you made to help me. I went back to my data and coded it in a dummy question, where all the respondents have the value of "1" - but, it came to know where the error might actually be. In my actual data set, I have 30 questions (i.e. 30 times the longest size) After changing the dataset for Q == 1 for all ID variables, the error message is
error in '$$ & lt; - data.frame` (`* tmp) * * , 'Newvar', value = c (0, 29)): There are 2 rows in the replacement, 31 " in the data if the problem is with the number of rows specified for the subs The source of the error coming from is ...
subs & lt; - subset (df, time & lt; = last.q + 1) i.e., $ time & lt; Setting the number of rows = last.q + 1 $ to EQUAL last. Q + 1? Update 2: What, ideally, I like to see my new variable! id time q dv 1 1 1 1 1 2 1 1 1 3 1 1 1 4 1 1 5 0 0 1 6 0 NA 2 1 1 1 2 2 1 1 2 3 0 0 2 4 0 NA 2 5 0 NA 2 6 0 NA Please note that "q" differs between "0" or "1" over time (See Overview at time = id = 1), but due to the nature of survival analysis, "DV" can not be. What I need to do is create a variable that finds the last time with a change in "q" between "1" and "0", and accordingly censored it. After step 4, my data should look like this:
id time q dv 1 1 1 1 1 2 1 1 1 3 1 1 1 4 1 1 2 1 1 1 1 4 1 1 2 1 1 1 2 2 1 1 2 3 0
. (Id) is equal to plyr & gt; Dum & lt; -split (long, long $ id) & gt; Dum [[4]] id time q dv 4.1 4 1 0 1 4.2 4 2 0 1 Your problem is in your fourth division in your function < Code> Answered & lt; - subset (df, q == 1) This is an empty set because there is no dum [[4]] value of carrying $ q 1 If you want to ignore this partition, then something like ans & lt; (Df, q == 1) if (length (north $ q) == 0) {return (); (return) )} Last.q = max (time given) subs & lt; - subset (df, time & lt; = last.q + 1) # set all new DV as desired- new.dv & lt; - Representative (Last.C., 1) if Last (Last Lieutenant; Max (DF $ Time)) new.dv & lt; - c (0, last.q) subs $ dv & lt; - new.dv subs }) Gt; Ans ID time q 1 1 1 1 2 2 1 2 1 2 3 2 1 1 0 4 2 2 0 1 5 3 1 0 2 6 3 2 1 2 The result will be
Comments
Post a Comment