Abstract
E-commerce platforms are continuously expanding, requiring efficient methods for extracting structured product attribute values from unstructured data such as product descriptions and titles. This thesis addresses key challenges in product attribute value extraction, including the discovery of new values, implicit value identification, product attribute generation, and the correction and refinement of extracted data. Traditional methods, relying on predefined dictionaries and extractive techniques, often fail to generalize in dynamic environments. To tackle these challenges, this thesis proposes several novel systems. OpenBrand integrates character level representations to dynamically discover new brand values, while GAVI uses a category-aware generative model for implicit value extraction. For attribute identification, generative approaches are explored, with large language models (LLMs) showing promise in zero-shot generalization without retraining. CAVE, a post-extraction system, corrects noisy attribute values, and QPAVE refines coarse-grained product attributes into fine-grained values. These contributions are systematically evaluated, demonstrating improvements over existing methods and addressing limitations in scalability, accuracy, and generalization. Through the development and evaluation of these systems, this work provides a robust foundation for enhancing the accuracy and scalability of attribute value extraction in e-commerce. It also sets the stage for future work in multilingual contexts and dynamic product environments, ultimately contributing to both academia and industry applications.