Sunday, March 15, 2009

Progress with my paper for the May graduate conference in social economy.

I have created columns next to the total population and male and female population counts columns that subtract a given age band from the previous age band. Thus if these new columns are negative it means a given age band is larger than the younger age band before it. For instance, if there are 300,000 persons aged 5 to 9 years old and there are only 200,000 babies aged 0-4 years old in a country in say 1996 then the next columns shows - 100,000. I then used conditional formatting to highlight the new column if a value was negative. This has given me a way of visually finding any suspected baby booms. But I still desire R code that will scan these new columns and produce a data set of only the population tables where these negative values occur. At this point, as well I have been offered the use of a documentation software to document in Office 2007 format, my use of R code.

I have some idea of the required indexes in the data set to use in R loops or array indexes. These are the re-occurrence rows of data in the large data set. In other words starting with the World population data tables then going from country names starting at A then down to Z we have repeated tables starting in 1996 to 2010 for each year and then 2015 and basically every five years projected until 2050. So there are tables in two dimensions, country names and then years. I will limit my searches to 1996 to 2005 at first but then if I am choosing a country as having booms in this decade of years 1996 to 2005, I will come back to the projections for the future.

I also need to get the path's right to open data files using R on windows. So once this path problem is solved I can move ahead with trying index's to parse the large data set into a smaller data set with only population tables that indicate a suspected baby boom.

No comments: