Andrew Lowman’s DCS204 Blog

Synthesis #6

During the final week of instruction for our class, my grasp of archival historic information, programming and visualization methods grew exponentially. In terms of the literature for the week, we analyzed three interesting readings by Nathan Yau, Tahu Kukutai and John Taylor, and Catherine D’Ignazio and Lauren Klein. The reading that struck me the most was Kukutai and Taylor’s article on “Data Sovereignty for Indigenous Peoples.” Indigeneous data sovereignty is the right of a nation to govern the collection, ownership, and application of data. Yet, indigeious communities have struggled with these data issues based on the generalizations that these communities tend to be poorer and politically weaker than their surrounding neighbors. I learned that data sovereignty is very significant to these communities because it provides research information and policy advocacy to protect the rights of these people, while also promoting the interest of the Infigrnous nations in relations to the data. As a result, I believe that communities native to the land deserve to manipulate and control the collection of their own information. We also worked on various coding methods this week such as customizing histograms, scatterplots and heatmaps. This added knowledge in creating visual representations of our data was very significant for producing visualizations for our final projects that can successfully exhibit clear interpretations of the data for our audience.

Catherine D’Ignazio and Lauren Klein’s reading, “What Gets Counted” may be the most important piece of literature that explains the significance of my project. My project analyzes if gender plays a role in the donation process by differentiating between gender and amount donated. With a focus on the observed limitations of a clear wealth and wage gap for women, D’Ignazio and Klein’s reading relates significantly to my own work. In their article, they discuss the idea of Matrix domination which describes how race, gender, and class intersect to enhance opportunities for some people and constrain opportunities for others. Therefore what is not counted becomes “invisible.” In the case of my project, our data only includes names and donation amounts of the donors. Without considering the clear wage disparities during this time period and the limitations of women in terms of education, labor, and social constructs, then anyone analyzing this data may develop misinterpretations of the dataset. Some may have said that women were less willing to donate to the college, however my research and data findings prove that the constraints on women and affordability concerns were what played a significant role in donation amounts. Therefore, Catherine D’Ignazio and Lauren Klein’s reading provides a background on how datasets can “dominate, discipline, and exclude” important information that may misconstrue the validity of past history. Through my research, I plan to alter those misconceptions and elucidate the clear disparities of wealth between men and women during these time periods.

Sara Costanza-Chock’s “Design Justice: Towards an Intersectionnal Feminist Framework for Design Theoy and Practice” is the second of our readings that relates well to my project. In her work, Costanza-Chock discusses an innovative social movement known as Design Justice, which aims to ensure a more equitable distribution of design’s benefits and burdens. The author discusses that the movement is centered around people who are normally marginalized by design and proposes creative practices to address the challenges the communities face. Marginalized communities encompass more than just race, but also include the limitations women have faced throughout history. With the support of this literature and our data, my project will address the experiences throughout the donation process but also reveal the challenges women faced in terms of acquiring wealth and income. Therefore, Sara Costanza-Chock’s reading will provide a useful framework for my research and interpretation of my data.

The final piece of literature that my project will be based upon is Katie Rawson and Trevor Munoz’s “Against Cleaning.” In their article, Munoz and Rawson discuss that cleaning data implies that the data is “messy” which suggests there should be an underlying order. Therefore, when data is considered messy we then remove certain aspects of a dataset in order to make the data set look “clean.” As an economics major, I have learned that removing variables can contribute to omitted variable bias which in turn diminishes the validity of our results. In terms of our data, they may be several outliers in donation amounts. Some may feel inclined to drop these values because they do not follow a general correlation with the data, however I feel that it is important to include it because these large amounts which I believe are mostly by men will only support the wide gap in income and wealth inequality between men and women. Furthermore, we have discussed several times in class that the names of our dataset are very difficult to either read or construe whether they are male or female. If the focus was not on gender this process would not be an issue, but we cannot just ignore these challenges because differentiating between male and female will be crucial to matching amounts to gender. Ultimately, Rawson and Munoz’s article reminds me that I must analyze and think before I sort any of my data because the results could be altered.

Uncategorized

Synthesis #5

My colleague’s response provided useful feedback for shaping the final questions of my model. There were several valid concerns raised based on several different subjects such as lack of data for wage gap between male and females during these early time periods. I agree with my colleague because calculating wage gaps based on the data readily available to us through the Maine State Seminary file is very limited to only a few avenues of research. My colleague also raised an important point that “a woman who is married may donate in her name with her husband’s money.” This would be a clear discrepancy in my empirical analysis because I would have unobserved bias in my results due to the inability to determine the true donor. Based on my colleague’s comments, I plan to move forth with an abbreviated research question of my previous research. Although I believe, working with the data for men and women may be tedious for now, it would be interesting to see the discrepancy between the amount of women who donated to the school and the amount of men. Then I will analyze the amount they donated. Although my analysis will be skewed by the notion that donors could be donating in different names, I believe that I will need to accept this as a consequence of representative data. I believe my research question is a very important issue in that we should be curious about the demographics difference between towns, focusing solely on gender. As I discussed in my previous synthesis and the literature we have read, gender differences play and important in our past history. For the following empirical analyzation, I will be using our CSV file from code work 5 in order to give a sample of what my process would look like for all the Maine State seminary data.

The following code reads in the CSV file of Maine State Seminary Data.

mss_donors<-read.csv("Maine_State_Seminary_Donors_1854_1857.csv")

The subsequent code will then convert all column names to lower case letters in order for the data to function properly and make it easier throughout the process.

mss_donors$Month<- tolower(mss_donors$Month)
mss_donors$First<- tolower(mss_donors$First)
mss_donors$Middle<- tolower(mss_donors$Middle)
mss_donors$Last<- tolower(mss_donors$Last)
mss_donors$Name<-tolower(mss_donors$Name)
mss_donors$Other<-tolower(mss_donors$Other)
mss_donors$Location<-tolower(mss_donors$Location)

Next, I will analyze my research question through a Chi square test. The Chi square test os a goodness fit test that determines if my observed data fits my expected data well. In plain terms, is there a key relationship in the data. Prior to running the test, I must add another column that includes a for loop with a Boolean vector analyzing if a person is female or male. In the following code, “true” elicits that the name is female, while “false” elicits that name is other, in this case male.

mss_donors$is_female <- ""
for(i in 1:nrow(mss_donors)) {
    test_female <- mss_donors$First[i] %in% identified_female
    print(test_female)
    mss_donors$is_female[i]<-test_female
}

Now that gender is seperated based on the “for loop” and boolean function, I now have the ability to run a Chi square test. One issue that I may run into with my research question is how I plan to include both genders in the test. Can I run the test separately for both genders and compare the differences or should I only focus on one gender. Furthermore, I need to figure out how to also include the amount donated by both genders in my Chi Square test. These are questions I will need to consider as I move forward with this process of developing upon my research question. The following code runs a chi square test specifically for the female donors that are found in Maine State Seminary donors list. I would then need to run this for males, and then compare the amount donated as well.

female_donor <- table(mss_donors$Cumberland, mss_donors$is_female)

chisq.test(female_donor )

Following the implementation of the Chi-square test, I was able to determine that the p-value was not statistically significant because 0.7971> 0.05 which is the 95% significance level. Most data statisticians consider the 5% level to be the value we consider statistically significant and this p-value fails this test. Based on these results, I can determine that there is no statistical relationship between the identified female donors and Cumberland county. Although there is no significance in this relationship, that does not mean that we should disregard the result. This may be significant for my project because Cumberland county is widely considered a wealthy county in Maine. Therefore, no relationship between female donations and Cumberland county may exhibit a clear discrepancy in income and wage because fewer women can afford to live in Cumberland County.

The reading that best relates to my research question is Catherine D’Ignazio and Lauren Klein’s article “The Numbers Don’t speak for themselves.” The authors discuss the important of interrogate and analyzing the context and validity of the data before taking the data at face value and developing future interpretations. By exploring and analyzing what is missing from a dataset can become a powerful way to gain insight into the omitted and unobserved observations that may alter the interpretation of the dataset. In terms of my research question, there is a lot of interoperation because most of the names in the Maine State Seminary files are either illegible or they are not provided with their preferred representation. As a result, an outside person who reads my dataset may misinterpret my finds if they are not provided with a disclosure of these issues that could skew my data.

Uncategorized

Synthesis #4

The readings for this week focused on the influence of slavery archival records on data humanities and how data cleanining has become an unwarrannted process due to its ability to create biases in datasets. Jessica Maria Johnson, a historian of Atlantic slavery and the Atlantic Africa Disapora, describes in her article, Markup Bodies: Black [Life] Studies and Slavery [Death] Studies at the Digital Crossroads, the importannt role that slavery’s eighteenth-and nineteenth century Atlantic Archive plays alongside digital humanities. Through this reading, I learned that the collobaration between “black digital practice” and the history of slavery can offer an immense amount to historians. Although it is significant that the computational power of database technology allowed researchers to calculate new information concerning slavery, I was more intersted inn the way that the author explained that slavery offers more than just data by challening schollars and digital humanists to produce their data with the realization of the emotion, pain, and suffering that was endured by those enslaved. The Rawson and Munoz reading covered similar themes of race and gender through the analysis of the term: data cleaning. As an Economics major, data cleaning is a necessity when it comes to perfecting a dataset to meet with the requirements of regression analysis. In order to improve the fit of our models, we must include more variables in order to account for omitted variable bias. Based on this intuition and my background of Economics, Munoz and Rawson’s article created an interesting realization of how we may intercept data. As I described, when Economists think data is messy, we think that things have a rightful place and things that don’t belong need to be removed. As a result, the process of “cleaning” can lead to a lack of diversity as we remove things that may be perceived as unnecessary based on the fact that they may look out of place.

Our understanding of the college’s past history through our Maine State Seminary data can be improved in many ways, however I believe an important addition to this dataset would be through gender and its impact on wealth distribution. It is important to ask whether gender plays an integral role in the donation process by differentiating between the gender of donors and their donation amount. I believe this would be an interesting addition to our current archives because it may provide useful information on the wage and wealth disparities of men and women that were evident during these time periods. According to the Bates College website, women were present at Bates since its inception, but we do know that institution was sufficiently male dominated. Based on these archives, this data may influence our donor disparity because with a small group female alumni there would be less of an incentive to donate to the institution. Similarly in terms of wage, historically women did not have equal wages, therefore large or any donations by women may be scarce because it would have a strong financial impact. I would assume we could find data on gender through the Census Bureau, while the gender pay gap may be available in several places such as the Bureau of Labor statistics. The only challenge will be if the measurements of these data changed over the years or if we can even find recorded data.

Uncategorized

Synthesis #3

The readings for this week focused on the power of archives, how slavery had defined some of the most prestigious institutions, and finally the role of gender inclusion and correlating themes in literature. Marisa Fuentes’s literature, “Power and Historical figuring Rachael Pringle Polgreen’s Troubled Archive,” describes how archives contain the power to construe the truth and silence more than it would reveal. This reading was very interesting to me because a person’s life collections can be easily overshadowed by partial history if the people telling the stories are either biased or can benefit from construing the truth. As we described it is important that understand the perspectives of many archives and historical articles. The second article that had and impact on my perspective this week was Klein and D’Ignazio’s discussion of “What gets Counted Counts.” Our current systems of counting which plague our everyday lives have a strong way of perpetuating oppression. Those who are counted are either narrowed into smaller categories, while those who are not counted become “invisible.” I find this article’s discussion very important because it is our duty as a society to challenge this binary thinking that erases many groups of people and focuses on strategies to achieve more equitable data.

Outside of the literature, I learned about dataframes and 2D Data structures, subsetting data frames, and the importance of scatterplots in showing correlations. To test my understanding of the following material, I used our Maine State Seminary data and produced a scatter plot to test a relationship between the day of the month and the amount of money that was donated. As we can see from the following scatterplot, there does not seem to be any relationship between the two variables because the data does not visually trend in any direction.

#this command will allow us to create a scatterplot between day of the month and the amount donated
#the x variable will be the donors$Day and the y variable will be donors$Amount 
#Based on the correlation, there is no correlation between the two columns of data 
plot(donors$Day,donors$Amount)

In addition to the scatterplot I also tested the correlation which elicited a value of 0.13910. The relationship between the two variables is generally considered strong when their value is larger than 0.7. Since we found that the variables have a correlation of 0.13910 which is near 0 indicates that there is a very weak relationship or even no correlation at all.

#this command will allow us to view the correlation between the column of the day
#this command between the day of the month, x variable, and amount donated, y variable shows that there is a little to no correlation.
cor(donors$Day,donors$Amount)

Therefore, we can conclude that the scatterplot and correlation do not provide us with any further information because we cannot find a relationship between the variables. The other numerical attributes that I would be interested in plotting would be the amount of money donated and the location of the donor. I think it would be interesting to see if there is a relationship between these two variables because higher donation amounts may be related to wealthier towns and lower donations may be from donors of more middle class areas. By further understanding the background of the money donated we can then look further into wealth distribution and the jobs of certain donors. As we learned from our Fuentes reading, archives have the power and influence to influence our interpretation of data. However as Fuentes stresses, we must take this data with a grain of salt because it has the power to hide information that may not be as readily available and disrupt the truth. In terms of this dataset, we cannot see the jobs of the donors therefore making us question if the money donated as connections to enslaved labor based on our past work with data from this time period.

Uncategorized

Synthesis #2

The readings for this week focused on race, gender and also addressed the vernacular and assumptions that have been passed down and become normalized throughout society. Robin Kelley, Professor in the department of African American Studies at UCLA, taught me about the idea of racial capitalism and how it is a serious and problematic practice that plagues our communities. Racial Capitalism is when a person assigns social and economic value to a human from the racial identity of someone else. Dr. Kelley’s speech urges society to embrace a vision of justice through the honest accounting of the perils of slavery and how slavery was a framework for capitalism which still continues to affect marginalized humans. The links between capitalism and racism and Dr. Kelley makes it clear that in order to disrupt racism, we must engage in a deeper understanding of capitalism and its counterpart racial capitalism. Another reading this week that I found interesting was the article “Why Data Science Needs Feminism.” The article defined the term feminism as “wide-ranging project that name and challenge sexism and other forces of oppression, as well as those thich seek to create more just, equitable, and livable futures.”As the authors Klein and D’Ignazio discuss, Data Feminism “challenges power” and the unequal power structures that define our current job market and educational systems. Data Feminism embraces the fact that multiple perspectives are needed to promote a better working world. Without different personalities, diverse knowledge and backgrounds, we will start to see our technology and economies fall behind. With a powerful following, this movement can leave a monumental impact in the STEM field for women and humans impacted by marginalization.

At the end of the week we shifted back to our data on the Bates Cotton Invoice and used histograms to provide a deeper understanding of the information. Histograms which are frequency distributions show how often each different value in a set of data occurs. The first histogram I created analyzed the number of days of labor that were stolen from enslaved people to pick each bale of cotton. The histogram contains 8 buckets with a width of 0.16 days of stolen labor and is normally distributed. Through the standard deviation, I was able to determine that most of the data falls within 2.56 and 3.2 days of labor stolen. In order to gain a closer look at my data, I analyzed the information through the mean and median which were both approximately equal to each other at a value of 3 days of stolen labor. Therefore the data can be assumed to be approximately symmetrical. The commented code and visual for the first histogram is shown below.

#histogram measures the number of days of labor that were stolen from enslaved people to pick each cotton bale 
hist(days_labor)
#rounded median of days stolen
round(median (days_labor))
#maximum number of days stolen
max (days_labor)
#minimum number of days stolen
min (days_labor)
#This equation calculates the width of each bucket in the histogram
(3.43333333333333-2.12)/8
#standard deviation of days stolen
sd (days_labor)
#last two lines of code tells us where most of the data falls into.
mean(days_labor)-sd(days_labor)
mean(days_labor)+sd(days_labor)

In the second histogram, I analyzed the amount of money that was made from stolen labor of enslaved people. The histogram contains 5 buckets with a width of $4.0. According to our data the mean (44.364) is marginally greater than the median (44.0237) which means that the data is slightly skewed to the right. Through the utilization of the standard deviation, I was able to determine that most of the data falls within the range of approximately $39.4 to $49.3 of money made off stolen labor. The commented code and visual for the second histogram is shown below.

#histogram for the amount of money made from the stolen labor of enslaved people 
hist(money_made)
#rounded mean of money made off stolen labor 
round(mean(money_made))
#rounded median of money made off stolen labor 
round (median (money_made))
#max price of money made off stolen labor 
max (money_made)
#min price of money made off stolen labor 
min (money_made)
#this equation tells us the width of each bucket in the histogram
(52.7875-32.595)/5
#standard deviation of the histogram
sd(money_made)
#the final two lines of code tell us the values that most of the data falls within
mean(money_made)-sd(money_made)
mean(money_made)+sd(money_made)

I was able to gain useful knowledge from this data manipulation as I learned that histograms allow us to count the number of data points and the number of observations in a particular numerical bucket. With this information, we were able to use the data from the Bates Cotton Invoice and thoroughly analyze the mean, median, and standard deviation of the data set. We would be doing an injustice if we did not analyze this data in the light of theory. Although these numbers are purely statical, they also tell a story of stolen physical labor and money from humans. This data shows a clear representation of racial capitalism in how social and economic value of human was assigned and in turn stolen from them to be used for another human’s benefit.

Uncategorized

Andrew Lowman’s DCS204 Blog

Post author By admin
Post date September 6, 2020
No Comments on Andrew Lowman’s DCS204 Blog

Hi all, this will be the blog for my DCS data cultures class that will consist of weekly blog posts that analyze different cultural frameworks of Computational Humanities. I am a senior from Bates College who is a major in Economics and minor in Mathematics with a concentration in Digital Computational Studies.

Synthesis #1

Computational Humanities has emerged as an important area of research that investigates humanities, arts, and social science through advanced computer manipulation. With a previous background in Digital Computational Studies, I have worked with data through the intersection of gender and technology, studied the structure of data and how to write complex programs, and even examined the history, present, and possible future of computing through film and literature. However, I have never examined the different cultural frameworks in which data operates. During this course, I am curious to engage in new avenues of research and analyze interesting datasets that impact my life at Bates College.

The first week of Data Cultures was both knowledgeable and eye-opening. Our first reading examined the importance of structured reading for history analysis. We learned that there are three stages necessary to fully understand the information conveyed by the author: the skim, the slow read, and the post-read. McDaniel explains that the reader should read through a piece of writing three times because it is almost impossible to gain a clear understanding of the author’s point of view and their supporting evidence through a single read. During each of the three readings, the reader will delve deeper into more difficult information that may have been confusing at first and connect specific points throughout the piece. The most useful piece of information for my reading techniques is that a reader should conceptualize the writing they understand first, and then once they have a deeper understanding, the reader should then circle back to the information that may require a more in depth analysis. Our second reading was written by a Bates student for their thesis in which they discussed the controversial past and founding of Bates College. As a student, the discussion influenced my current perception of Bates. Unfortunately, the past cannot be changed and history has been written. Today, it is essential that we recognize wrongful past times in order to grow and condemn these actions for the present and future generations. As we learned, Bates College will not mention that Bates was named and funded by a very wealthy man whose business was involved with slavery. Through marketing and student recruitment, Bates relies on partial history to sell campus culture. Once a prospective student, I remember how extensively Bates used their history to recruit students. With that being said, I am disappointed and bothered by Bates’ current portrayal of our past history. Finally, our last reading of the week investigated the Rapid Response Research process. With the ongoing pandemic, programs and processes such as RRR are extremely important in or develop creative ways to mitigate interaction and promote a cure. Although RRR may not work well with some crises, I believe the Covid-19 pandemic is a great example of a process that requires the participation and responsibility of all citizens and experienced professionals. With well developed communication and collaboration among group members and dedication to a better working world, Rapid Response Research can really impact our society.

Although we are only through our first week of classes, I have gained invaluable knowledge that has changed my perspective on many aspects of life. The Rapid Response Research process and its background struck me in various ways. The most interesting feature of the process is that RRR relies on good will and committed volunteers with little to no funding to reach their goals. Usually when programs such as these have very little funding, they may struggle to gain a following of trusted and productive people. With that being said, the RRR process has changed my perspective on non for profit organizations and programs that function without funding which instead develop solutions with only good in mind.

Uncategorized

Hello world!

Welcome to WordPress. This is your first post. Edit or delete it, then start writing!