Work in R with very large data set -
I am working with a very large data set which I am downloading from Oracle data base. There are approximately 21 million rows and 15 columns in the data frame. My OS is Windows XP (32-bit), I have 2 GB RAM. I can not upgrade my RAM or my OS (it's working before I take a decent PC before it will take months before ).
Library (RODBC) sqlQuery (Channel1, "Select * from table 1", stringsfactor = falls) I already have "vector from here" Can not allocate XMB to "I got stuck with. I got a suggestion about using the ff package. I would like to know if someone familiar with the FF package can tell me that this will help in my case. Do you know another way to get around the memory problem? Will 64-bit solution help? Thanks for your suggestions.
If you are working with package FF and have your data in SQL, you can easily Bring them in the FF package, see the documentation for example when using ROracle
In my experience, the FF is fully compatible with the dataset you are working with (21 meo rows and 15 columns) Friendly - in fact your setup is smaller up to FF, unless your column has a lot of variables Fitter data that will be converted into factors (meaning that all your factor levels should be able to fit in your RAM) Package ETLUtils, ff and package to get your data in RF and some basic on the basis of FF Do statistics. Depending on what you will do with your data, hardware, you may have to consider sampling while making the model. I am liking my data in R, building a model based on a sample and FF ( Like chunking) or want to create scores using the tool from the package FFbbase.
The drawback is that you have to use for the fact that your data are FFDF objects and this may take some time - especially if you are new to R.
Comments
Post a Comment