• Home
  • Blog
  • Find a Dataset and Write Pseudocode that Would Operate on the Data in a Hadoop Cluster

Find a Dataset and Write Pseudocode that Would Operate on the Data in a Hadoop Cluster

0 comments

Research Kaggle.com
datasets, and identify one of interest. Once you have identified a
dataset, discuss the data and goals of using it in a business scenario.
Construct MapReduce Pseudocode on how this data may be processed using
the MapReduce programming approach.

MapReduce is often used in a parallel processing environment, such as
Hadoop. Doing so allows operations to execute on each node in the
cluster. This approach is commonly used to process Big Data. For this
assignment, complete the following:

  • Research Kaggle.com, and identify a dataset that is suitable for MapReduce programming in a distributed environment.
  • Construct pseudocode that would operate on these data as if
    they were stored in a Hadoop cluster. This operation should be tied to a
    defined goal of the dataset. This pseudocode should have mappers and
    reducers defined.
  • Discuss how this form of processing is beneficial and can be used in a business setting.

This is the dataset I have chosen: https://www.kaggle.com/sakshigoyal7/credit-card-cu

About the Author

Follow me


{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}