Sunday, May 10, 2009

Finally success with creating a clean population data file

I was finally able to get a good clean data set for the world's population data using a different approach. In the original data set there was always a blank row at the end of each yearly table and at the beginning. Between these blank rows was a row with only the table title in column one. This title was a combination of the country name then a slash then the year covered by the table. My easy process was to label each row with the year as another variable. I did this by physically typing the year in the new column beside the title row then filling down in Excel to the blank row at the end of the table. Once I had the first country in this case actually the world tables with the new variable labeling each row in the cooresponding table with the tables year 1996 to 2050, I then copied the year variable to the next country Afghanistan. Then I highlighted both of the year variables for the World and Afghanistan and copied the years to the third and fourth country. Then repeated for the fitth, sixth, seventh and eighth and so on always doubling. This did not take long and it worked because the tables were all repeated with exactly the same number of rows per year table and per country.

The next step was done with brute force copy and paste so took all night but actually I did others tasks on the computer. It really took about as many copy and drag fills as there are countries. I highlighted the table title cell, the column one cell in the title row and then copied out only the country name and pasted this in a second new variable column and then I drag filled this down from 1996 to 2050 so each row concerning one country now had a new variable named Country that contained the name of the country. I then added column headers for variables names, Age_band, Both_genders, Male, Female, Year, Country. I then saved tihs excel file as a CSV file. Then I imported this to SAS. The using a data step where statement I deleted all the blank rows and title rows. So now I have this basic data file.

No comments: