Hadoop Input files Order -
i have data files arranged in folders named dates. directory structure
- /data/2011/01/01
- /data/2011/01/02
and on , inside each directory there around 50 files need parsed , giving input hadoop /data/** /** /** can parse files. questions are
- how can ask hadoop order input. need parse files date date.
- while parsing files of particular date, need pre load datastructure associated date , in same date directory.
thanks ankush
- you can't order input. in "worst case" scenario if have same number of input files have running tasks in cluster processed @ same moment in parallel.
- perhaps can create custom implementation of "fileinputformat" reads required config file , need?
Comments
Post a Comment