Harnessing Disaster Tweets: A Deep Dive into Disaster Tweets with EDA, Cleaning, and BERT-based NLP

Balaji Dhamodharan

Harnessing Disaster Tweets: A Deep Dive into Disaster Tweets with EDA, Cleaning, and BERT-based NLP

Authors

Balaji Dhamodharan

Abstract

Natural Language Processing (NLP) techniques play a crucial role in analyzing and understanding text data, especially in domains such as disaster management where timely and accurate information dissemination is vital. This research paper delves into the comprehensive exploration of NLP methodologies applied to disaster tweets. We commence with an in-depth Exploratory Data Analysis (EDA) to unveil patterns, trends, and insights within the dataset. Subsequently, we meticulously examine various cleaning techniques to preprocess the text data, addressing challenges like noise, misspellings, and grammatical errors inherent in tweets. Furthermore, we leverage Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art language model, to extract contextual embeddings and enhance the representation of disaster-related tweets. Through extensive experimentation and evaluation, we demonstrate the efficacy of BERT in improving classification tasks, such as sentiment analysis and disaster detection, compared to traditional NLP models. Our findings underscore the significance of employing sophisticated NLP techniques for extracting actionable insights from disaster tweets, thereby aiding decision-making processes and facilitating rapid response during crisis situations.

References

Burstein, J., Marcu, D., & Knight, K. (2003). Finding the WRITE stuff: Automatic identification of discourse structure in student essays. IEEE Intelligent Systems, 18(1), 32-39.

Chaffin, R., Graham, S., & Painter, C. (2019). NLP-based learning analytics in writing: An overview and exemplar. Journal of Writing Analytics, 3(1), 1-10.

Crossley, S. A., Allen, D. B., & McNamara, D. S. (2011). Text readability and intuitive simplification: A comparison of readability formulas. Reading in a Foreign Language, 23(1), 84-102.

Flower, L. S., & Hayes, J. R. (1980). The dynamics of composing: Making plans and juggling constraints. In L. W. Gregg, & E. R. Steinberg (Eds.), Cognitive Processes in Writing (pp. 31-50). Lawrence Erlbaum Associates.

McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix. Cambridge University Press.

Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: Sentiment classification using machine learning techniques. Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 79-86.

Shermis, M. D., & Burstein, J. (2003). Automated essay scoring: A cross-disciplinary perspective. Lawrence Erlbaum Associates.

Norris, S. P., & Phillips, L. M. (2003). How literacy in its fundamental sense is central to scientific literacy. Science Education, 87(2), 224-240.

Crossley, S. A., & McNamara, D. S. (2016). Natural language processing in writing research: Introduction to the special issue. Journal of Writing Research, 7(2), 215-218.

Baker, S., Golding, C., Krzyzanowski, M., & McEnery, T. (2008). A method for detecting complex discourse entities in spoken and written text. Language Resources and Evaluation, 42(1), 75-97.

Calvo, R. A., & D'Mello, S. (2010). Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing, 1(1), 18-37.

Gee, J. P. (2014). An introduction to discourse analysis: Theory and method. Routledge.

Graesser, A. C., & McNamara, D. S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3(2), 371-398.

Hovy, D., & Lavid, J. (2010). Towards a comprehensive architecture for discourse structure processing. In Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora (pp. 1-8).

Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.

Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25(2-3), 259-284.

Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243-281.

McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge University Press.

Suthers, D. D., & Hundhausen, C. D. (2003). An empirical study of the effects of representational guidance on collaborative learning processes. Journal of the Learning Sciences, 12(2), 183-219.

Wiener, M. (2017). Syntactic categories and syntactic structures. Routledge.

Year	Rate
2024	12.6%
2023	18.3%

Citation Indices	All	Since 2018
Citation	50854	30996
h-index	28	23
i10-index	119	72

Harnessing Disaster Tweets: A Deep Dive into Disaster Tweets with EDA, Cleaning, and BERT-based NLP