Hadoop Input files Order -


i have data files arranged in folders named dates. directory structure

  • /data/2011/01/01
  • /data/2011/01/02

and on , inside each directory there around 50 files need parsed , giving input hadoop /data/** /** /** can parse files. questions are

  1. how can ask hadoop order input. need parse files date date.
  2. while parsing files of particular date, need pre load datastructure associated date , in same date directory.

thanks ankush

  1. you can't order input. in "worst case" scenario if have same number of input files have running tasks in cluster processed @ same moment in parallel.
  2. perhaps can create custom implementation of "fileinputformat" reads required config file , need?

Comments

Popular posts from this blog

java - SNMP4J General Variable Binding Error -

windows - Python Service Installation - "Could not find PythonClass entry" -

Determine if a XmlNode is empty or null in C#? -