RESUMO AUTOMÁTICO DE TEXTOS JURÍDICOS USANDO GRAFOS COM VOCABULÁRIO CONTROLADO E ALGORITMO K-MEANS COM WORDS EMBEDDING

Rogério Nogueira de Sousa; David Nadler Prata

doi:10.34060/reesmat.v11i18.304

Vol. 11 No. 18 (2019), ARTIGOS

Vol. 11 No. 18 (2019)

AUTOMATIC SUMMARY OF LEGAL TEXTS USING GRAPHS WITH CONTROLLED VOCABULARY AND K-MEANS ALGORITHM WITH WORDS EMBEDDING

ARTIGOS

https://doi.org/10.34060/reesmat.v11i18.304

Published 2019-10-14

Rogério Nogueira de Sousa⁺⁻
David Nadler Prata⁺⁻

Rogério Nogueira de Sousa

David Nadler Prata

PDF (Português (Brasil))

PDF

How to Cite

Sousa, R. N. de, & Prata, D. N. (2019). AUTOMATIC SUMMARY OF LEGAL TEXTS USING GRAPHS WITH CONTROLLED VOCABULARY AND K-MEANS ALGORITHM WITH WORDS EMBEDDING. ESMAT Magazine, 11(18), 65–80. https://doi.org/10.34060/reesmat.v11i18.304

Abstract

Today, the electronic process of the Judiciary Branchhas 70% of all new cases in virtual form. It is paramount to this reality an improvement in celerity to provide output to a growth demand of approximately 25 million of new cases per year. In this context, the seeking to facilitate the day-to-day operations of the Brazilian justice system is critical for efficiency in judicial provision and wide access to justice. The use of text summaries is a promissory way to speed finding out about the subject of documents, contributing to the celerity of law suit procedures. The purpose of this research is to develop a hybrid methodology capable of automatically generating summaries of legal documents, making use of techniques of Natural Language Processing and graph theory with words embedding, in conjunction with the Legal vocabulary coming from the Federal Supreme Court.

https://doi.org/10.34060/reesmat.v11i18.304

PDF (Português (Brasil))

PDF

References

AGUIAR, E. M. de. Aplicação do Word2vec e do Gradiente descendente dstocástico em tradução automática. 30 maio 2016. Disponível em: <http://bibliotecadigital.fgv.br/dspace/handle/10438/16798>. Acesso em: 9 abr. 2019.

BIRD, S.; KLEIN, E.; LOPER, E. Natural Language Processing with Python. 1. ed. Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472: Julie Steele, 2009.

BORBA, E. M. Medidas de centralidade em grafos e aplicações em redes de dados. 2013. Disponível em: <https://lume.ufrgs.br/handle/10183/86094>. Acesso em: 12 fev. 2019.

BRIN, S.; PAGE, L. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, v. 30, n. 1–7, p. 107–117, abr. 1998.

CINTRA, A. C. de A.; DINAMARCO, C. R.; GRINOVER, A. P. Teoria Geral do Processo. 26a ed. São Paulo: Malheiros, 2010.

CONSELHO NACIONAL DE JUSTIÇA. Relatório Justiça em Números 2017, ano-base 2016. Disponível em: <http://www.cnj.jus.br/files/conteudo/arquivo/2017/12/b60a659e5d5cb79337945c1dd137496c.pdf>.

CONSELHO NACIONAL DE JUSTIÇA. Painel.cnj.br. Disponível em: <https://paineis.cnj.jus.br/QvAJAXZfc/opendoc.htm?document=qvw_l%5Cpainelcnj.qvw&host=QVS%40neodimio03&anonymous=true>. Acesso em: 29 out. 2018.

COPPIN, B. Inteligência Artificial. Rio de Janeiro: LTC, 2017.

FARZINDAR, A.; LAPALME, G. LetSum, an Automatic Legal Text Summarizing System. ResearchGate, 2004. Disponível em: <https://www.researchgate.net/publication/228980166_Letsum_an_automatic_legal_text_summarizing_system>. Acesso em: 24 fev. 2019.

FELIPE, B. F. da C.; PERROTA, R. P. C. Inteligência Artificial no Direito – uma realidade a ser desbravada. Revista de Direito, Governança e Novas Tecnologias, v. 4, n. 1, p. 1–16, 21 ago. 2018.

FONSECA, M. A word2vec model trained on Brazilian legislation.: thefonseca/lex2vec. Disponível em: <https://github.com/thefonseca/lex2vec>. Acesso em: 20 maio 2019.

JAIN, A. K. Algorithms for clustering data. [s.l.] Englewood Cliffs, N.J.: Prentice Hall, 1988.

JUSTEN FILHO, M. Curso de Direito Administrativo. n. 12a, p. 1861, 2016.

LANE, H.; HOWARD, C.; HAPKE, H. M. Natural Language Processing in Action. 3. ed. [s.l.] Manning Publications Co., 2017.

LUO, C.; LI, Y.; CHUNG, S. M. Text document clustering based on neighbors. Data & Knowledge Engineering, Including Special Section: Conference on Privacy in Statistical Databases (PSD 2008) – Six selected and extended papers on Database Privacy. v. 68, n. 11, p. 1271–1288, 1 nov. 2009.

MCKINSEY GLOBAL INSTITUTE. A Future That Works: Automation, Employment and Productivity. Disponível em: <https://www.mckinsey.com/~/media/mckinsey/featured%20insights/Digital%20Disruption/Harnessing%20automation%20for%20a%20future%20that%20works/MGI-A-future-that-works-Executive-summary.ashx>. Acesso em: 30 out. 2018.

MIHALCEA, R.; TARAU, P. TextRank: Bringing Order into Texts. Proceedings of EMNLP 2004, p. 8, 2004.

MIKOLOV, T. et al. Distributed Representations of Words and Phrases and their Compositionality. In: BURGES, C. J. C. et al. (Ed.). Advances in Neural Information Processing Systems 26. [s.l.] Curran Associates, Inc., 2013. p. 3111–3119.

NETWORKX DEVELOPERS. Degree_centrality - NetworkX 1.9 documentation. Disponível em: <https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.algorithms.centrality.degree_centrality.html>. Acesso em: 12 fev. 2019.

NLTK PROJECT. Natural Language Toolkit - NLTK 3.4 documentation. Disponível em: <https://www.nltk.org/>. Acesso em: 20 nov. 2018.

PEDREGOSA, F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, v. 12, n. Oct, p. 2825–2830, 2011.

PERKINS, J. Python Text Processing with NLTK 2.0 Cookbook. 32 Lincoln Road Olton Birmingham, B27 6PA, UK.: Packt Publishing Ltd., 2010.

PYTHON.ORG. The Python Tutorial - Documentação do Python 3.7.1. Disponível em: <https://docs.python.org/3/tutorial/index.html>. Acesso em: 18 nov. 2018.

RICHARDSON, Leonard. Beautiful Soup Documentation - Beautiful Soup 4.4.0 documentation. Disponível em: <https://www.crummy.com/software/BeautifulSoup/bs4/doc/>. Acesso em: 24 jan. 2019.

RINO, L. H. M.; PARDO, T. A. S. A Sumarização Automática de Textos: Principais Características e Metodologias. p. 43, 2003.

SCHNABEL, T. et al. Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. Anais... In: PROCEEDINGS OF THE 2015 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING. Lisbon, Portugal: Association for Computational Linguistics, 2015. Disponível em: <http://aclweb.org/anthology/D15-1036>. Acesso em: 20 maio 2019.

STF. Vocabulário Jurídico (Tesauro). Disponível em: <http://www.stf.jus.br/portal/jurisprudencia/listarTesauro.asp>. Acesso em: 24 jan. 2019.

TJTO. 7 anos: Sistema de Processo Judicial Eletrônico e-Proc/TJTO é referência para Judiciário brasileiro. Disponível em: <http://www.tjto.jus.br/index.php/magistrado/plantao-forense/8-noticias/5553-7-anos-sistema-de-processo-judicial-eletronico-e-proc-tjto-e-referencia-para-judiciario-brasileiro-2>. Acesso em: 29 out. 2018.

VEITH, C. et al. How Legal Technology Will Change the Business of Law. n. Report, 2016.

O autor concede a autorização de publicação do artigo doutrinário na Revista ESMAT e em sua versão eletrônica, caso seja aprovado pela Comissão Editorial.

Os artigos publicados e as referências citadas na Revista ESMAT são de inteira responsabilidade de seus autores.

O autor se compromete ainda a identificar e creditar todos os dados, imagens e referências. Deve também declarar que os materiais apresentados estão livres de direito de autor, não cabendo, portanto, à Revista ESMAT e a seus editores, quaisquer responsabilidades jurídicas.

AUTOMATIC SUMMARY OF LEGAL TEXTS USING GRAPHS WITH CONTROLLED VOCABULARY AND K-MEANS ALGORITHM WITH WORDS EMBEDDING

How to Cite

Download Citation

Abstract

References