Models for socio-economic development are useful for planners to build appropriate policies. Such models are ideally constructed based on empirical data, and we take up the problem of working towards a district development model for India by using two waves of census data. India has almost six hundred districts, diversely spread apart in terms of both social and economic development, and hence presents a unique natural experiment to understand how social and economic factors interplay with one another. We present some interesting observations we are able to make from longitudinal analysis of census data from the years 2001 and 2011.

Census variables have multiple parameters which report the number of households in a district for each parameter. We group these mutually exclusive parameters into three broad parameter types of rudimentary, intermediate, and advanced. As an example, firewood is considered as a rudimentary type of fuel for cooking, kerosene and cow dung are grouped together as an intermediate type, and PNG, LPG and biogas are grouped together as advanced types of fuel for cooking. We then do a k-means clustering on the districts based on the percentage of households in each district that use different types of fuel: rudimentary, intermediate, and advanced. This allows us to label each district as a level-1/2/3 district: Level-1 districts predominantly use rudimentary types of fuel for cooking, level-2 districts primarily use intermediate types of fuel, and level-3 districts predominantly use advanced types of fuel for cooking. We follow the same method for all indicators viz. Main Source of Light, Main Source of Water, Asset Ownership, Fuel for Cooking, Bathroom Facility and Condition of Household.

Assigning levels to districts based on socio-economic parameters makes it easy to interpret and further helps to easily compare districts with one another using simple probabilistic analysis to determine broad patterns which we divide into 6 hypotheses. They not only highlight the key trends in socio-economic development across the past few years, but also illustrate a good method for simplifying complex multi-dimensional data into simple units for making comparison.

You can refer our research paper here :

Towards Building a District Development Model for India Using Census Data -
D. Goswami, S.B. Tripathi, S. Jain, S. Pathak, and A. Seth.
ICTD 2019. Supplementary material.

News Articles

This section shows news articles on a daily basis that show something interesting that is happening in a district. A district can be categorized into 9 subclasses (Fast Agri, Slow Agri, Average Non Agri etc. ) and centroids for each subclass are obtained using model training followed by hierarchical clustering. The details can be found here. The centroids can be used to predict subclass for a new article based on similarity with a subclass centroid and dissimilarity with other subclass centroids. This is measured quantitatively through a relevance score which is simply a ratio of cosine similarities. The subclass which has the highest relevance score is termed as subclass of the article. If the subclass of article and the subclass of the district to which the article belongs are different, then it is interersting as it deviates from general subclass of the district. To keep the newsfeed fresh (with new articles), a recency factor is also incorporated in the relevance score which decreases relevance as inverse square of number of days passed since it was published. This combined score is called relevance-recency score (Hover over the title to get more information). Finally for fair exposure of all districts on the newsfeed, a fairness criteria (Over a long period, all districts should have equal exposure on the newsfeed) and a diversity criteria (on a single day, there is an upper bound on number of articles that can belong to one district). Click on the title to redirect to the main page of article to read complete news.

Satellite Based Prediction

We also use images from Landsat 7 satellite to train a district-level machine learning model to predict the level of development. Once trained, this model was used to predict development labels for year the 2019. A neural network-based change classifier was made, which would take labels of years 2001 and 2011 as training data. Post this, for each district, we define a metric named Aggregate Development Index which is the sum of predicted labels of all six district indicators. This Index was used as an effective indicator of relative development in districts.

The three maps below depict the ADI of districts for the years 2001, 2011 and 2019. The graphs are color-coded to show the relative magnitude of ADI. Green indicating a very progressive district, and Red indicates a less developed district. From 2001 to 2011, we can see that states from the eastern part of India (such as Orissa, Bihar, Jharkhand, and West Bengal), large parts in central India (Uttar Pradesh and Madhya Pradesh), and the Northeast districts, showed very little change. These indeed have been the poorest states of the country. In contrast, states such as Gujarat, Maharashtra, and Rajasthan saw many districts as having improved substantially. We are currently correlating these observations with potential explanatory factors such as the degree of industrialization in these districts which could have led to more rapid growth as compared to other non-industrialized and predominantly agricultural districts.During 2011-2019, there has been more widespread growth in the poorest states, especially in West Bengal, Orissa, Uttar Pradesh, and Madhya Pradesh. States like Jharkhand and Bihar have however not progressed substantially. This seems to tally with our general observations of more development attention having been paid to some of the parts which have developed during this time.

Socio-Economic Development

News Articles

Satellite Based Prediction

2001

2011

2019